Lectures:
1. Introduction to parallel programming. Concurrency, pseudoparallelism, parallelism. Processes and threads. Processes and threads from the operating system perspective. SIMD and FMA instructions of modern processors.
2. Sequential vs. parallel programming. Parallel programming caveats. Typical issues of parallel programs. Synchronization and mutual exclusion. Deadlock (definition, properties, conditions, detection, elimination).
3. Classification of parallel systems. Shared memory systems and distributed memory systems. Flynn's taxonomy. Data parallel and task-parallel problems.
4. Shared memory systems programming. Threads in Java, C#, C++11, and Python. The fork-join model and the OpenMP interface.
5. Distributed memory systems programming. Message passing. Posix queues, sockets. The MPI interface, fundamental MPI operations.
6. Other options for distributed memory systems: bulk synchronous parallel and partitioned global address space models.
7. Introduction to the accelerator and co-processor programming. Computation offloading. GPGPU architecture (program organization, memory organization). Data parallelism on the GPU. CUDA platform and CUDA-C language. The OpenACC interface.
8. Parallel algorithm design. Task/channel parallel computation model. Foster's design methodology (partitioning, communication, agglomeration, mapping) and its use.
9. Parallel programming patterns: finding concurrency, algorithm structure, implementation. Typical scenarios: independent tasks, replicable, reduction, divide and conquer, pipeline.
10. Representation and analysis of parallel algorithms (parallel random-access machine, bulk synchronous parallel, informal work depth, etc.). Performance and effectiveness of parallel algorithms.
11. Parallel data structures: stack, queue, pool, linked list, hash table, trees, priorities. Major differences from sequential variants.
12. Algorithms for selected problems: prefix-sum, merging, search (orthogonal trees, sorting networks). Major differences between sequential, naive parallel, and efficient parallel solutions.
13. Data structures and algorithms for parallel search: 2-3 tree, pipelining. Parallel search, insertion, and deletion.
14. Parallel algorithms and data structures for graph problems. Graph traversal, search for shortest paths, connectivity, independent components, etc.
Seminars:
The seminars take place in computer labs and are dedicated to practical examples of the introduced concepts. Students work in the course of the term on the following assignments:
1. Computationally expensive task solved with and without parallelism. The traveling salesman problem: from naive parallelism (brute-force) to more advanced solution with inter-process communication (simple parallel branch-and-bound).
2. Data parallelism and big data processing. Data decomposition, data-parallel approach, vectorization. Parallel matrix operations, clustering.
3. Parallel image processing and the impact of local dependences on parallel algorithms (convolution). Parallel seam carving.
4. A parallel approach to graph algorithms, graphs, and non-linear data structures. Parallel implementation of the iterative version of the PageRank algorithm.
1. Introduction to parallel programming. Concurrency, pseudoparallelism, parallelism. Processes and threads. Processes and threads from the operating system perspective. SIMD and FMA instructions of modern processors.
2. Sequential vs. parallel programming. Parallel programming caveats. Typical issues of parallel programs. Synchronization and mutual exclusion. Deadlock (definition, properties, conditions, detection, elimination).
3. Classification of parallel systems. Shared memory systems and distributed memory systems. Flynn's taxonomy. Data parallel and task-parallel problems.
4. Shared memory systems programming. Threads in Java, C#, C++11, and Python. The fork-join model and the OpenMP interface.
5. Distributed memory systems programming. Message passing. Posix queues, sockets. The MPI interface, fundamental MPI operations.
6. Other options for distributed memory systems: bulk synchronous parallel and partitioned global address space models.
7. Introduction to the accelerator and co-processor programming. Computation offloading. GPGPU architecture (program organization, memory organization). Data parallelism on the GPU. CUDA platform and CUDA-C language. The OpenACC interface.
8. Parallel algorithm design. Task/channel parallel computation model. Foster's design methodology (partitioning, communication, agglomeration, mapping) and its use.
9. Parallel programming patterns: finding concurrency, algorithm structure, implementation. Typical scenarios: independent tasks, replicable, reduction, divide and conquer, pipeline.
10. Representation and analysis of parallel algorithms (parallel random-access machine, bulk synchronous parallel, informal work depth, etc.). Performance and effectiveness of parallel algorithms.
11. Parallel data structures: stack, queue, pool, linked list, hash table, trees, priorities. Major differences from sequential variants.
12. Algorithms for selected problems: prefix-sum, merging, search (orthogonal trees, sorting networks). Major differences between sequential, naive parallel, and efficient parallel solutions.
13. Data structures and algorithms for parallel search: 2-3 tree, pipelining. Parallel search, insertion, and deletion.
14. Parallel algorithms and data structures for graph problems. Graph traversal, search for shortest paths, connectivity, independent components, etc.
Seminars:
The seminars take place in computer labs and are dedicated to practical examples of the introduced concepts. Students work in the course of the term on the following assignments:
1. Computationally expensive task solved with and without parallelism. The traveling salesman problem: from naive parallelism (brute-force) to more advanced solution with inter-process communication (simple parallel branch-and-bound).
2. Data parallelism and big data processing. Data decomposition, data-parallel approach, vectorization. Parallel matrix operations, clustering.
3. Parallel image processing and the impact of local dependences on parallel algorithms (convolution). Parallel seam carving.
4. A parallel approach to graph algorithms, graphs, and non-linear data structures. Parallel implementation of the iterative version of the PageRank algorithm.