Skip to main content
Skip header

Speech Processing

* Exchange students do not have to consider this information when selecting suitable courses for an exchange stay.

Course Unit Code440-4224/01
Number of ECTS Credits Allocated4 ECTS credits
Type of Course Unit *Optional
Level of Course Unit *Second Cycle
Year of Study *First Year
Semester when the Course Unit is deliveredSummer Semester
Mode of DeliveryFace-to-face
Language of InstructionCzech
Prerequisites and Co-Requisites Course succeeds to compulsory courses of previous semester
Name of Lecturer(s)Personal IDName
SKA109Ing. Jan Skapa, Ph.D.
TOV020Ing. Jaromír Továrek, Ph.D.
PAR0038Ing. Pavol Partila, Ph.D.
Summary
Area of speech processing is one of the important part of information and communication technology. The goal of the course is to understand of basic tasks of speech processing which are SI (Speaker Identification), ASR (Automatic Speech Recognition), TTS (Text to Speech) and SER (Speech Emotion Recognition). Acquired skills can be used for design complex systems where the speech processing is used.
Learning Outcomes of the Course Unit
After completing the course, students will be able to solve problems in the field of speech processing. They will learn the basic approaches and methods of speech signal processing, such as feature extraction and processing by neural networks or hidden Markov models. They master to implement a simple system to identify the speaker or the recognition of emotion from speech signal.
Course Contents
Subject syllabus
1. Introduction to subject and speech processing, practical applications and its using.
2. Speech production, basic concepts, speech preprocessing (DC Offset, preemphases, segmentation, windowing).
3. Basic features - energy, zero cross ratio (ZCR), Jitter, Shimmer, autocorrelation, fundamental frequency.
4. Spectrum, spectrogram, spectral analysis of vowels and consonants.
5. Cepstrum, cepstral analysis, Mel frequency cepstral coefficients and other speech parameters.
6. Introduction to classification, SOM, k-NN, GMM, ANN and classifier fusion.
7. Speaker identification (SI) and possible approaches.
8. Speech emotion recognition (SER), stress recognition.
9. Automatic speech recognition (ASR) and possible approaches.
10. Text to speech (TTS), speech corpora and open-source projects.

Excercise syllabus
1. Introduction, Safety, Conditions for subject completion.
2. Practical exercises – speech preprocessing – DC offset, preemphases, segmentation, windowing.
3. Practical exercises – Feautures extraction – energy, zero cross ratio, fundamental frequency.
4. Practical exercises – Spectral analysis of speech signal.
5. Practical exercises – Features extraction – MFCC, LPC.
6. Test and assigment of project.
7. Design of speaker recognition system - GMM, ANN.
8. Example of project proposal.
9. Speech synthesis.
10. Presentation of projects.
Recommended or Required Reading
Required Reading:
MCLOUGHLIN, Ian. Speech and audio processing: a Matlab-based approach. Cambridge: Cambridge University Press, 2016. ISBN 978-1-107-08546-6.
PSUTKA, Josef. Mluvíme s počítačem česky. Praha: Academia, 2006. ISBN 80-200-1309-1.
Recommended Reading:
BAILLY, Gérard, Pascal PERRIER a Eric VATIKIOTIS-BATESON, ed. Audiovisual speech processing. Cambridge: Cambridge University Press, 2012. ISBN 978-1-107-00682-9.

OGUNFUNMI, Tokunbo, Roberto TOGNERI a Madihally NARASIMHA, ed. Speech and audio processing for coding, enhancement and recognition. New York: Springer, 2015. ISBN 978-1-4939-1455-5.
PSUTKA, Josef. Komunikace s počítačem mluvenou řečí. Praha: Academia, 1995. ISBN 80-200-0203-0.
Planned learning activities and teaching methods
Lectures, Tutorials, Experimental work in labs
Assesment methods and criteria
Task TitleTask TypeMaximum Number of Points
(Act. for Subtasks)
Minimum Number of Points for Task Passing
Credit and ExaminationCredit and Examination100 (100)51
        CreditCredit36 20
        ExaminationExamination64 15