Skip to main content
Skip header

Advanced methods for data manipulation

* Exchange students do not have to consider this information when selecting suitable courses for an exchange stay.

Course Unit Code9360-0193/01
Number of ECTS Credits Allocated3 ECTS credits
Type of Course Unit *Choice-compulsory type B
Level of Course Unit *Second Cycle
Year of Study *First Year
Semester when the Course Unit is deliveredWinter, Summer Semester
Mode of DeliveryFace-to-face
Language of InstructionCzech
Prerequisites and Co-Requisites There are no prerequisites or co-requisites for this course unit
Name of Lecturer(s)Personal IDName
LEG0015Ing. Dominik Legut, Ph.D.
Summary
This subject prepares participant for the processing and manipulation large data files. This concerns not only the work with HPC supercomputers, but also to manipulate date of daily life. Participant will learn how to work with files of million lines or million columns or files as large as several GBi.
Learning Outcomes of the Course Unit
It allows students to process large data of GBi-TBi dimensions, its manipulation and analysis.
Course Contents
This subject prepares participant for the processing and manipulation large data files and prepares to work with HPC supercomputers. Participatn will learn to work with files of million lines or columns or files as large as several GBi.
1. Unix(linux) commands for data manipulation in command line prompt
2. Handling text data and editing in unix, Vi-editor, Nano, midnight commander etc.
3. Introduction to scripting in Bash, for and while loops, etc.
4. Introduction to Awk, manipulation of data
5. How to exploit simple mathemtics in command line
6. Awk, formats of data I/O (formated input and output)
7. Basics of Ed and Sed, replacing strings, more complex constructions
8. Advance methods - Introduction to Perl
9. Perl II
10. Regular syntax I
11. Regular synax II
12. Data manipulation to and from HPC systems, dispaly forwarding, usage of scheduler and batch jobs
13. - 14. Practical sessions
Recommended or Required Reading
Required Reading:
http://becksteinlab.physics.asu.edu/pages/unix/IntroUnix/vim_basics.html for unix and vi, sed etc.
http://cs.lmu.edu/~ray/notes/bash/ for bash
https://www.tutorialspoint.com/awk/index.htm for awk
http://www.ucw.cz/~hubicka/skolicky/
http://www.abclinuxu.cz/clanky/navody/bash-i
P. Satrapa, Necortex 2001, ISBN:80-86330-02-82.
Recommended Reading:
http://www.well.ox.ac.uk/~johnb/comp/perl/intro.html
https://www.regularnivyrazy.info/regularni-vyrazy-zaklady.html#.Wtl3ada-kWM
Planned learning activities and teaching methods
Lectures, Tutorials, Project work
Assesment methods and criteria
Task TitleTask TypeMaximum Number of Points
(Act. for Subtasks)
Minimum Number of Points for Task Passing
Credit and ExaminationCredit and Examination100 (100)51
        CreditCredit40 21
        ExaminationExamination60 30