RC1NEON-v7 programming
This course explains how to use ARMv7 NEON SIMD instructions to boost multimedia algorithms
|
Objectives
|
||||||
- Knowledge of ARMv7 instruction sets.
- Theoretical course
- PDF course material (in English) supplemented by a printed version for face-to-face courses.
- Online courses are dispensed using the Teams video-conferencing system.
- The trainer answers trainees' questions during the training and provide technical and pedagogical assistance.
- Practical activities
- Practical activities represent from 40% to 50% of course duration.
- Code examples, exercises and solutions
- For remote trainings:
- One Online Linux PC per trainee for the practical activities.
- The trainer has access to trainees' Online PCs for technical and pedagogical assistance.
- QEMU Emulated board or physical board connected to the online PC (depending on the course).
- Some Labs may be completed between sessions and are checked by the trainer on the next session.
- For face-to-face trainings:
- One PC (Linux ou Windows) for the practical activities with, if appropriate, a target board.
- One PC for two trainees when there are more than 6 trainees.
- For onsite trainings:
- An installation and test manual is provided to allow preinstallation of the needed software.
- The trainer come with target boards if needed during the practical activities (and bring them back at the end of the course).
- Downloadable preconfigured virtual machine for post-course practical activities
- At the start of each session the trainer will interact with the trainees to ensure the course fits their expectations and correct if needed
- Any embedded systems engineer or technician with the above prerequisites.
- The prerequisites indicated above are assessed before the training by the technical supervision of the traineein his company, or by the trainee himself in the exceptional case of an individual trainee.
- Trainee progress is assessed in two different ways, depending on the course:
- For courses lending themselves to practical exercises, the results of the exercises are checked by the trainer while, if necessary, helping trainees to carry them out by providing additional details.
- Quizzes are offered at the end of sections that do not include practical exercises to verifythat the trainees have assimilated the points presented
- At the end of the training, each trainee receives a certificate attesting that they have successfully completed the course.
- In the event of a problem, discovered during the course, due to a lack of prerequisites by the trainee a different or additional training is offered to them, generally to reinforce their prerequisites,in agreement with their company manager if applicable.
Course Outline
- Clarifying the resources shared by NEON and VFP
- Register bank, Q registers, D registers
- Data types
- Vector vs scalar
- Related system registers
- Alignment issues
- Enabling NEON/VFP
- Differences between NEONv7 and NEONv8
- Instructions producing wider / narrower results
- Instructions modifiers
- Selecting the shape
- Selecting the operand / result type
- Syntax flexibility
- Declaring initialized vectors in C language
- Using unions with vectors and arrays of vectors to simplify the debug
- Casting vectors
- Addressing modes
- Vector load / store
- Vector load / store multiple
- Element and structure load / store instructions
- Multiple single elements
- Single element to 1 lane
- Single elements to all lanes
- Optimizing the ordering of data in memory to take benefit of 2-, 3- and 4- element structures
| Exercise: | Example: managing audio samples | |
- Processor acceleration mechanisms: store merging buffers
| Exercise: | Using load with de-interleaving instructions to store all right lane samples into a vector and left lane samples into another vector | |
- Move
- Swap
- Table lookup
- Vector transpose
- Vector zip / unzip
- Data transfer between NEON and integer unit
| Exercise: | Clarifying narrow and long instructions, building a vector from bytes selected from a pair of vectors | |
- Logical AND, Bit Clear, OR, XOR
- Operations with immediate values
- Bitwise insert instructions, avoiding branches
- Count Leading zeros, ones, signs
- Normalizing floating point numbers when VFP is not implemented
- Scalar duplicate
- Extract
- Shift with possible rounding and saturation
- Bitfield reverse
| Exercise: | Transposing a matrix, shifting a large bitmap using vector instructions | |
- Arithmetic instructions
- Add, modulo vs saturated arithmetic
- Halving / Doubling the result
- Rounding
- Subtract
- Multiply
- Multiply accumulate / Multiply subtract
- Absolute value
- Min / Max
| Exercise: | Implementing a complex multiply accumulate with NEON | |
- Conversion instructions
- Converting Floating Point numbers into Fixed point numbers
- Converting Fixed point numbers into Floating point numbers
| Exercise: | Converting fixed-point elements into single precision floating point values and adding the resulting elements | |
- Advanced arithmetic instructions
- Reciprocal estimate, reciprocal square root estimate, Newton-raphson algorithm
- Pairwise instructions
- Element comparison
- FIR filter
- Converting the scalar algorithm into a vector algorithm
- Finding the NEON instructions to encode the vector algorithm
- Optimizing the code
- Using the performance monitor to tune the algorithm
- FFT (DFT)
- Converting the scalar algorithm into a vector algorithm, understanding how circle properties can be used to process 4 angles concurrently
- Finding the NEON instructions to encode the vector algorithm
- Optimizing the code
- Using the performance monitor to tune the algorithm
More
To book a training session or for more information, please contact us on info@ac6-training.com.
Registrations are accepted till one week before the start date for scheduled classes. For late registrations, please consult us.
You can also fill and send us the registration form
This course can be provided either remotely, in our Paris training center or worldwide on your premises.
Scheduled classes are confirmed as soon as there is two confirmed bookings. Bookings are accepted until 1 week before the course start.
Last update of course schedule: 23 February 2026
Booking one of our trainings is subject to our General Terms of Sales
Related Courses
AAA
ARM Cortex-A and R Architecture (v7/v8)
AAM
ARM Cortex-M Architecture (v7/v8)
RA0
Cortex-A5 implementation
RA1
Cortex-A8 implementation
RA2
Cortex-A9 implementation
RA3
Cortex-A15 implementation
RA4
Cortex-A7 implementation
RA5
Cortex-A17 implementation
RA6
CORTEX-A57 implementation, ARM Architecture V8
RA7
CORTEX-A53 implementation, ARM Architecture V8
RA8
CORTEX-A72 implementation, ARM Architecture V8
RA9
CORTEX-A73 implementation, ARM Architecture V8
RC2
NEON-v8 programming
RI0
AXI3 / AXI4 INTERCONNECT
RM0
Cortex-M0 / Cortex-M0+ implementation
RM1
Cortex-M1 implementation
RM2
Cortex-M3 implementation
RM3
Cortex-M4 / Cortex-M4F implementation
RM4
Cortex-M7 implementation
RM5
Cortex-M33 Implementation
RR0
Cortex-R4 implementation
RR1
Cortex-R5 implementation
RR2
Cortex-R7 implementation
RR3
ARM Cortex-R52/R52+ Implementation and software design