Day 1
Introduction to NEON/VFPv3
 Clarifying the resources shared by NEON and VFP
 Register bank, Q registers, D registers
 Data types
 Vector vs scalar
 Related system registers
 Alignment issues
 Enabling NEON/VFP
 Differences between NEONv7 and NEONv8
NEON instruction syntax
 Instructions producing wider / narrower results
 Instructions modifiers
 Selecting the shape
 Selecting the operand / result type
 Syntax flexibility
 Declaring initialized vectors in C language
 Using unions with vectors and arrays of vectors to simplify the debug
 Casting vectors
LOAD and STORE instructions
 Addressing modes
 Vector load / store
 Vector load / store multiple
 Element and structure load / store instructions
 Multiple single elements
 Single element to 1 lane
 Single elements to all lanes
 Optimizing the ordering of data in memory to take benefit of 2, 3 and 4 element structures
Exercise: 
Example: managing audio samples 
 Processor acceleration mechanisms: store merging buffers
Exercise: 
Using load with deinterleaving instructions to store all right lane samples into a vector and left lane samples into another vector 
Day 2
Data transfer instructions
 Move
 Swap
 Table lookup
 Vector transpose
 Vector zip / unzip
 Data transfer between NEON and integer unit
Exercise: 
Clarifying narrow and long instructions, building a vector from bytes selected from a pair of vectors 

Logical and bitfield instructions
 Logical AND, Bit Clear, OR, XOR
 Operations with immediate values
 Bitwise insert instructions, avoiding branches
 Count Leading zeros, ones, signs
 Normalizing floating point numbers when VFP is not implemented
 Scalar duplicate
 Extract
 Shift with possible rounding and saturation
 Bitfield reverse
Exercise: 
Transposing a matrix, shifting a large bitmap using vector instructions 
Data processing Instructions
 Arithmetic instructions
 Add, modulo vs saturated arithmetic
 Halving / Doubling the result
 Rounding
 Subtract
 Multiply
 Multiply accumulate / Multiply subtract
 Absolute value
 Min / Max
Exercise: 
Implementing a complex multiply accumulate with NEON 
 Conversion instructions
 Converting Floating Point numbers into Fixed point numbers
 Converting Fixed point numbers into Floating point numbers
Exercise: 
Converting fixedpoint elements into single precision floating point values and adding the resulting elements 
 Advanced arithmetic instructions
 Reciprocal estimate, reciprocal square root estimate, Newtonraphson algorithm
 Pairwise instructions
 Element comparison
NEON coding examples
 FIR filter
 Converting the scalar algorithm into a vector algorithm
 Finding the NEON instructions to encode the vector algorithm
 Optimizing the code
 Using the performance monitor to tune the algorithm
 FFT (DFT)
 Converting the scalar algorithm into a vector algorithm, understanding how circle properties can be used to process 4 angles concurrently
 Finding the NEON instructions to encode the vector algorithm
 Optimizing the code
 Using the performance monitor to tune the algorithm
