The lecture notes are designed such that they can make the basis for practical exercising on computer labs.
The outline of lessons:
1. Introduction to parallel programming on GPU, a brief history, CUDA
2. CUDA architecture and its integration within standard C++ project
3. Threads and kernel functions
4. CUDA memories, patterns and usage
5. Memory bank conflicts
6. Program execution control, distribution of an algorithm
7. Algorithm performance with respect to its parallelization on GPU
9. Optimization on the data level, effective data structures.
10. Optimization of programs with respect to the maximum GPU performance
11. Support library CUBLAS
12. The Case study
The outline of exercises (exercises are on computer labs):
1. The first application in CUDA
2. Data transfers to/from GPU
3. Threads hierarchy, basic thread life cycle, limits, calling of kernel functions, parameters and restrictions
4. CUDA memories, patterns and usage
5. Memory bank conflicts, access optimization, suitable data structures
6. Streams, parallel calling of kernel functions, synchronization on several levels
7. The case study, experiment with more variants of the same program
8. Vectors and matrices, the case study, large data processing, parallel reduction
9. Introduction to several support libraries for linear algebra
10. The case study, image manipulation, double buffering, optimization at the level of blocks, registers, etc.
11. The case study, Interesting research topics, outline of possible Solutions, experiments
12. Program tuning, debugging with nVidia nSight
The outline of lessons:
1. Introduction to parallel programming on GPU, a brief history, CUDA
2. CUDA architecture and its integration within standard C++ project
3. Threads and kernel functions
4. CUDA memories, patterns and usage
5. Memory bank conflicts
6. Program execution control, distribution of an algorithm
7. Algorithm performance with respect to its parallelization on GPU
9. Optimization on the data level, effective data structures.
10. Optimization of programs with respect to the maximum GPU performance
11. Support library CUBLAS
12. The Case study
The outline of exercises (exercises are on computer labs):
1. The first application in CUDA
2. Data transfers to/from GPU
3. Threads hierarchy, basic thread life cycle, limits, calling of kernel functions, parameters and restrictions
4. CUDA memories, patterns and usage
5. Memory bank conflicts, access optimization, suitable data structures
6. Streams, parallel calling of kernel functions, synchronization on several levels
7. The case study, experiment with more variants of the same program
8. Vectors and matrices, the case study, large data processing, parallel reduction
9. Introduction to several support libraries for linear algebra
10. The case study, image manipulation, double buffering, optimization at the level of blocks, registers, etc.
11. The case study, Interesting research topics, outline of possible Solutions, experiments
12. Program tuning, debugging with nVidia nSight