Big Data Processing

Type of study	Doctoral
Language of instruction	English
Code	9600-0004/02
Abbreviation	ZRD
Course title	Big Data Processing
Credits	10
Coordinating department	IT4Innovations
Course coordinator	doc. Mgr. Jiří Dvorský, Ph.D.

Subject syllabus

The importance of big data for business decisions, company strategies, research of human behaviour on social networks, and targeted advertising has recently proved to be unquestionable. Moreover, the top scientific centres (e.g. CERN) have proved the need for routine storage of previously unimaginable amounts of data. The key issue of big data processing is, first of all, the storage of extremely big data sets, for example the sets of documents, stream data from sensor networks, time series (e.g. share prices on stock market, transport data), graph database representing social networks and webs, satellite images of the Earth’s surface etc. It has proved that the standard relational databases are not suitable for processing such an enormous amount of data and that massively parallel software running on hundreds and thousands of servers needs to be employed. A part of the course is the presentation of technologies forming the current state of big data processing, technologies such as Hadoop Distributed File System, NoSQL database, or HDF5 hierarchical data format. There will also be presented data structures suitable for various kinds of data, their manipulation, effective questioning, costs of I/O operations, specific kinds of data compression, algorithms and data structures suitable for computational accelerators (CUDA, Intel Xeon Phi).

Literature

• S. Sakr, M. Gaber: Large Scale and Big Data: Processing and Management, Auerbach Publications, 2014,
• T. White: Hadoop: The Definitive Guide, Yahoo Press, 2014,
• P. J. Sadalage: NoSQL Distilled: A Brief Guide to the Emerging World of Polyglot Persistence, Addison-Wesley Professional, 2012,

Advised literature

• J. Jeffers, J. Reinders: Intel Xeon Phi Coprocessor High-Performance Programming, Morgan Kaufmann, 2013, ISBN 978-0124104143
• G. Barlas: Multicore and GPU Programming: An Integrated Approach, Morgan Kaufmann, 2014,
• J. Leskovec, A. Rajaraman, J. D. Ullman: Mining of Massive Datasets, Cambridge University Press, 2014, ISBN 978-1107077232
• V. S. Agneeswaran: Big Data Analytics Beyond Hadoop: Real-Time Applications with Storm, Spark, and More Hadoop Alternatives, Pearson FT Press, 2014,

back to search page