Skip to main content
Skip header

Big Data Processing

Type of study Doctoral
Language of instruction English
Code 9600-0004/02
Abbreviation ZRD
Course title Big Data Processing
Credits 10
Coordinating department IT4Innovations
Course coordinator doc. Mgr. Jiří Dvorský, Ph.D.

Subject syllabus

The importance of big data for business decisions, company strategies, research of human behaviour on social networks, and targeted advertising has recently proved to be unquestionable. Moreover, the top scientific centres (e.g. CERN) have proved the need for routine storage of previously unimaginable amounts of data. The key issue of big data processing is, first of all, the storage of extremely big data sets, for example the sets of documents, stream data from sensor networks, time series (e.g. share prices on stock market, transport data), graph database representing social networks and webs, satellite images of the Earth’s surface etc. It has proved that the standard relational databases are not suitable for processing such an enormous amount of data and that massively parallel software running on hundreds and thousands of servers needs to be employed. A part of the course is the presentation of technologies forming the current state of big data processing, technologies such as Hadoop Distributed File System, NoSQL database, or HDF5 hierarchical data format. There will also be presented data structures suitable for various kinds of data, their manipulation, effective questioning, costs of I/O operations, specific kinds of data compression, algorithms and data structures suitable for computational accelerators (CUDA, Intel Xeon Phi).

Literature

• S. Sakr, M. Gaber: Large Scale and Big Data: Processing and Management, Auerbach Publications, 2014, ISBN 978-1466581500 
• T. White: Hadoop: The Definitive Guide, Yahoo Press, 2014, ISBN 978-1449311520 
• P. J. Sadalage: NoSQL Distilled: A Brief Guide to the Emerging World of Polyglot Persistence, Addison-Wesley Professional, 2012, ISBN 978-0321826626 

Advised literature

• J. Jeffers, J. Reinders: Intel Xeon Phi Coprocessor High-Performance Programming, Morgan Kaufmann, 2013, ISBN 978-0124104143
• G. Barlas: Multicore and GPU Programming: An Integrated Approach, Morgan Kaufmann, 2014, ISBN 978-0124171374 
• J. Leskovec, A. Rajaraman, J. D. Ullman: Mining of Massive Datasets, Cambridge University Press, 2014, ISBN 978-1107077232 
• V. S. Agneeswaran: Big Data Analytics Beyond Hadoop: Real-Time Applications with Storm, Spark, and More Hadoop Alternatives, Pearson FT Press, 2014, ISBN 978-0133837940