+ +
- -
eLearning et Cours en ligne
 
Calendrier  Détails
Systèmes d'Exploitation
Calendrier  Détails
Programmation
Calendrier  Détails
Processors
 
Calendrier  Détails
Communications
 
 
 
Calendrier  Détails
+ +
> >
- -

Dernières Nouvelles

Securing an Embedded Linux with System Workbench

Asymmetric Multi-Programming with System Workbench for Linux

 
ac6 >> ac6-training >> eLearning et Cours en ligne >> Langages >> OpenCL Télécharger le catalogue Télécharger la page Ecrivez nous Version imprimable

oL9 OpenCL

Parallel programming with OpenCL-1.2

formateur
Objectives
  • Learn parallel programming with OpenCL.
  • Know what (not) to expect from parallel programming.
  • Understand heavy multithreading and how it is mapped to the hardware.
  • Measure OpenCL code performance, locate and solve bottlenecks.
  • Write efficient OpenCL code.
Exercises will be run on multi-core CPUs with nVidia GPU running under Linux.
Course environment
  • One PC under Linux per trainee, with a multi-core CPU and an nVidia GPU
Pre-requisites
  • Good knowledge of the C language

First Session
Introduction to OpenCL
  • History
    • OpenCL 1.0
    • OpenCL 1.1
    • OpenCL 1.2
    • OpenCP/EP (Embedded Profile)
  • Design goals of OpenCL
    • CPUs, GPUs and GPGPUs
    • Data-parallel and Task-parallel
    • Hardware related and portable
  • Terminology
    • Host / Device
    • Memory model
    • Execution Model
The OpenCL Architecture
  • The OpenCL Architecture
    • Platform Model
    • Execution Model
    • Memory Model
    • Programming Model
  • The OpenCL Software Stack
  • Example
Exercise :  Installation and test of the OpenCL SDK
Second Session
The OpenCL Host API
  • Platform layer
    • Querying and selecting devices
    • Managing compute devices
    • Managing computing contexts and queues
    • The host objects: program, kernel, buffer, image
Exercise :  Write a platform discovery and analysis program (displaying CPUs, GPUs, versions...)
  • Runtime
    • Managing resources
    • Managing memory domains
    • Executing compute kernels
Exercise :  Write an image loader program, transferring image to/from compute devices
  • Compiler
    • The OpenCL C programming language
    • Online compilation
    • Offline compilation
The Basic OpenCL Execution Model
  • How code is executed on hardware
    • Compute kernel
    • Compute program
    • Application queues
  • OpenCL Data-parallel execution
    • N-dimensional computation domains
    • Work-items and work-groups
    • Synchronization and communication in a work-group
    • Mapping global work size to work-groups
    • Parallel execution of work-groups
Exercise :  Compile and execute a program to square an array on the platform computing nodes
Third Session
The OpenCL Programming Language
  • Restrictions from C99
  • Data types
    • Scalar
    • Vector
    • Structs and pointers
    • Type-conversion functions
    • Image types
Exercise :  Rewrite the square program to use vector operations
  • Required built-in functions
    • Work-item functions
    • Math and relational
    • Input/output
    • Geometric functions
    • Synchronization
  • Optional features
    • Atomics
    • Rounding modes
Exercise :  Write and execute an image manipulation program (Blur filter)
Fourth Session
Advanced OpenCL Execution modes
  • Profiling
Exercise :  Enhance the image manipulation program to measure kernel computation time
  • The OpenCL Memory Model
    • Global Memory
    • Local Memory
    • Private Memory
  • OpenCL Task-parallel execution
    • Optional OpenCL feature
    • Native work-items
Exercise :  Simulate the N-Body problem, displaying data using OpenGL
Efficient OpenCL
  • When (not) to use OpenCL
  • Code design guidelines
  • Explicit vectorization
Exercise :  Explore vectorisation on an image rotation kernel
  • Memory latency and access patterns
    • ALU latency
    • Using local memory
Exercise :  Enhance the Blur filter program to investigate memory optimisations
  • Synchronizing threads
  • Warps/Wavefronts, work groups, and GPU cores