Skip to main content
Skip header

Libraries for parallel data processing

Course aims

Students get an overview of libraries and frameworks for parallel processing of large data and gain a basic experience with usage of most famous libraries. The course shows basic concepts and manipulations with big data and basic paradigms and programming models for their processing. Exercises will use Python, a programming language where all well-known frameworks can be used.

Literature

• Pandas documentation: http://pandas.pydata.org/
• Spark documentation: https://spark.apache.org/docs/latest/
• Tensorflow documentation: https://www.tensorflow.org/
• Keras documentation: https://keras.io/
HENDL, J., Big data - Věda o datech, základy a aplikace, Cosmopolis, 2021.

Advised literature

• Nathan Marz and James Warren: Big Data - Principles and best practices of scalable realtime data systems, Manning, April 2015 ISBN 9781617290343 .


Language of instruction čeština, angličtina
Code 9600-1020
Abbreviation KPZD
Course title Libraries for parallel data processing
Coordinating department IT4Innovations
Course coordinator Ing. Jan Martinovič, Ph.D.