Skip to main content
Skip header

Speech Processing

Type of study Follow-up Master
Language of instruction Czech
Code 440-4224/01
Abbreviation ZŘS
Course title Speech Processing
Credits 4
Coordinating department Department of Telecommunications
Course coordinator Ing. Jaromír Továrek, Ph.D.

Osnova předmětu

Subject syllabus
1. Introduction to subject and speech processing, practical applications and its using.
2. Speech production, basic concepts, speech preprocessing (DC Offset, preemphases, segmentation, windowing).
3. Basic features - energy, zero cross ratio (ZCR), Jitter, Shimmer, autocorrelation, fundamental frequency.
4. Spectrum, spectrogram, spectral analysis of vowels and consonants.
5. Cepstrum, cepstral analysis, Mel frequency cepstral coefficients and other speech parameters.
6. Introduction to classification, SOM, k-NN, GMM, ANN and classifier fusion.
7. Speaker identification (SI) and possible approaches.
8. Speech emotion recognition (SER), stress recognition.
9. Automatic speech recognition (ASR) and possible approaches.
10. Text to speech (TTS), speech corpora and open-source projects.

Excercise syllabus
1. Introduction, Safety, Conditions for subject completion.
2. Practical exercises – speech preprocessing – DC offset, preemphases, segmentation, windowing.
3. Practical exercises – Feautures extraction – energy, zero cross ratio, fundamental frequency.
4. Practical exercises – Spectral analysis of speech signal.
5. Practical exercises – Features extraction – MFCC, LPC.
6. Test and assigment of project.
7. Design of speaker recognition system - GMM, ANN.
8. Example of project proposal.
9. Speech synthesis.
10. Presentation of projects.

E-learning

Povinná literatura

MCLOUGHLIN, Ian. Speech and audio processing: a Matlab-based approach. Cambridge: Cambridge University Press, 2016. ISBN 978-1-107-08546-6 .

Doporučená literatura

BAILLY, Gérard, Pascal PERRIER a Eric VATIKIOTIS-BATESON, ed. Audiovisual speech processing. Cambridge: Cambridge University Press, 2012. ISBN 978-1-107-00682-9 .

OGUNFUNMI, Tokunbo, Roberto TOGNERI a Madihally NARASIMHA, ed. Speech and audio processing for coding, enhancement and recognition. New York: Springer, 2015. ISBN 978-1-4939-1455-5 .