Course Unit Code | 440-4224/01 |
---|
Number of ECTS Credits Allocated | 4 ECTS credits |
---|
Type of Course Unit * | Optional |
---|
Level of Course Unit * | Second Cycle |
---|
Year of Study * | First Year |
---|
Semester when the Course Unit is delivered | Summer Semester |
---|
Mode of Delivery | Face-to-face |
---|
Language of Instruction | Czech |
---|
Prerequisites and Co-Requisites | Course succeeds to compulsory courses of previous semester |
---|
Name of Lecturer(s) | Personal ID | Name |
---|
| SKA109 | Ing. Jan Skapa, Ph.D. |
| TOV020 | Ing. Jaromír Továrek, Ph.D. |
| PAR0038 | Ing. Pavol Partila, Ph.D. |
Summary |
---|
Area of speech processing is one of the important part of information and communication technology. The goal of the course is to understand of basic tasks of speech processing which are SI (Speaker Identification), ASR (Automatic Speech Recognition), TTS (Text to Speech) and SER (Speech Emotion Recognition). Acquired skills can be used for design complex systems where the speech processing is used. |
Learning Outcomes of the Course Unit |
---|
After completing the course, students will be able to solve problems in the field of speech processing. They will learn the basic approaches and methods of speech signal processing, such as feature extraction and processing by neural networks or hidden Markov models. They master to implement a simple system to identify the speaker or the recognition of emotion from speech signal. |
Course Contents |
---|
Subject syllabus
1. Introduction to subject and speech processing, practical applications and its using.
2. Speech production, basic concepts, speech preprocessing (DC Offset, preemphases, segmentation, windowing).
3. Basic features - energy, zero cross ratio (ZCR), Jitter, Shimmer, autocorrelation, fundamental frequency.
4. Spectrum, spectrogram, spectral analysis of vowels and consonants.
5. Cepstrum, cepstral analysis, Mel frequency cepstral coefficients and other speech parameters.
6. Introduction to classification, SOM, k-NN, GMM, ANN and classifier fusion.
7. Speaker identification (SI) and possible approaches.
8. Speech emotion recognition (SER), stress recognition.
9. Automatic speech recognition (ASR) and possible approaches.
10. Text to speech (TTS), speech corpora and open-source projects.
Excercise syllabus
1. Introduction, Safety, Conditions for subject completion.
2. Practical exercises – speech preprocessing – DC offset, preemphases, segmentation, windowing.
3. Practical exercises – Feautures extraction – energy, zero cross ratio, fundamental frequency.
4. Practical exercises – Spectral analysis of speech signal.
5. Practical exercises – Features extraction – MFCC, LPC.
6. Test and assigment of project.
7. Design of speaker recognition system - GMM, ANN.
8. Example of project proposal.
9. Speech synthesis.
10. Presentation of projects. |
Recommended or Required Reading |
---|
Required Reading: |
---|
MCLOUGHLIN, Ian. Speech and audio processing: a Matlab-based approach. Cambridge: Cambridge University Press, 2016. ISBN 978-1-107-08546-6. |
PSUTKA, Josef. Mluvíme s počítačem česky. Praha: Academia, 2006. ISBN 80-200-1309-1. |
Recommended Reading: |
---|
BAILLY, Gérard, Pascal PERRIER a Eric VATIKIOTIS-BATESON, ed. Audiovisual speech processing. Cambridge: Cambridge University Press, 2012. ISBN 978-1-107-00682-9.
OGUNFUNMI, Tokunbo, Roberto TOGNERI a Madihally NARASIMHA, ed. Speech and audio processing for coding, enhancement and recognition. New York: Springer, 2015. ISBN 978-1-4939-1455-5. |
PSUTKA, Josef. Komunikace s počítačem mluvenou řečí. Praha: Academia, 1995. ISBN 80-200-0203-0. |
Planned learning activities and teaching methods |
---|
Lectures, Tutorials, Experimental work in labs |
Assesment methods and criteria |
---|
Task Title | Task Type | Maximum Number of Points (Act. for Subtasks) | Minimum Number of Points for Task Passing |
---|
Credit and Examination | Credit and Examination | 100 (100) | 51 |
Credit | Credit | 36 | 20 |
Examination | Examination | 64 | 15 |