Day 1
Introduction to NEON/VFPv3
- Clarifying the resources shared by NEON and VFP
- Register bank, Q registers, D registers
- Data types
- Vector vs scalar
- Related system registers
- Alignment issues
- Enabling NEON/VFP
- Differences between NEONv7 and NEONv8
NEON instruction syntax
- Instructions producing wider / narrower results
- Instructions modifiers
- Selecting the shape
- Selecting the operand / result type
- Syntax flexibility
- Declaring initialized vectors in C language
- Using unions with vectors and arrays of vectors to simplify the debug
- Casting vectors
LOAD and STORE instructions
- Addressing modes
- Vector load / store
- Vector load / store multiple
- Element and structure load / store instructions
- Multiple single elements
- Single element to 1 lane
- Single elements to all lanes
- Optimizing the ordering of data in memory to take benefit of 2-, 3- and 4- element structures
Exercise: |
Example: managing audio samples |
- Processor acceleration mechanisms: store merging buffers
Exercise: |
Using load with de-interleaving instructions to store all right lane samples into a vector and left lane samples into another vector |
Day 2
Data transfer instructions
- Move
- Swap
- Table lookup
- Vector transpose
- Vector zip / unzip
- Data transfer between NEON and integer unit
Exercise: |
Clarifying narrow and long instructions, building a vector from bytes selected from a pair of vectors |
Logical and bitfield instructions
- Logical AND, Bit Clear, OR, XOR
- Operations with immediate values
- Bitwise insert instructions, avoiding branches
- Count Leading zeros, ones, signs
- Normalizing floating point numbers when VFP is not implemented
- Scalar duplicate
- Extract
- Shift with possible rounding and saturation
- Bitfield reverse
Exercise: |
Transposing a matrix, shifting a large bitmap using vector instructions |
Data processing Instructions
- Arithmetic instructions
- Add, modulo vs saturated arithmetic
- Halving / Doubling the result
- Rounding
- Subtract
- Multiply
- Multiply accumulate / Multiply subtract
- Absolute value
- Min / Max
Exercise: |
Implementing a complex multiply accumulate with NEON |
- Conversion instructions
- Converting Floating Point numbers into Fixed point numbers
- Converting Fixed point numbers into Floating point numbers
Exercise: |
Converting fixed-point elements into single precision floating point values and adding the resulting elements |
- Advanced arithmetic instructions
- Reciprocal estimate, reciprocal square root estimate, Newton-raphson algorithm
- Pairwise instructions
- Element comparison
NEON coding examples
- FIR filter
- Converting the scalar algorithm into a vector algorithm
- Finding the NEON instructions to encode the vector algorithm
- Optimizing the code
- Using the performance monitor to tune the algorithm
- FFT (DFT)
- Converting the scalar algorithm into a vector algorithm, understanding how circle properties can be used to process 4 angles concurrently
- Finding the NEON instructions to encode the vector algorithm
- Optimizing the code
- Using the performance monitor to tune the algorithm