Skip to main content
Skip header

Methods of Analysis of Textual Data

Anotace

The course deals with basic principles of analysis of text documents. Text documents are understood as a typical representative of weak structured data. Individual areas of processing of text data - documents, web pages will be presented. The subject includes algorithms for pattern matching in the text, design of index systems for text data, work with natural languages in which texts are written. The various approaches to searching in text data, including methods of latent semantics analysis, will be also described. At the end, the course focuses on web search.

Povinná literatura

1. Manning, C. D.; Raghavan, P. & Schutze, H. Introduction to Information Retrieval, Cambridge University Press, 2008
2. Witten I. H., Moffat A., Bell T. C.: Managing Gigabytes (2nd ed.): Compressing and Indexing Documents and Images, Morgan Kaufmann Publishers Inc., 1999, ISBN 1-55860-570-3 
3. Baeza-Yates R. A., Ribeiro-Neto B.: Modern Information Retrieval, Addison-Wesley Longman Publishing Co., Inc., 1999, ISBN 020139829X 
4. Feldman R., Sanger J.: The Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data, Cambridge University Press, 2006, ISBN 978-0521836579 
5. Berry M. W., Kogan J.: Text Mining: Applications and Theory, Wiley, 2010, ISBN 978-0470749821 
6. Weiss S. M., Indurkhya N., Zhang T.: Fundamentals of Predictive Text Mining, Springer, 2010, ISBN 978-1849962254 
7. Langville, A. N. & Meyer, C. D. Google's PageRank and Beyond: The Science of Search Engine Rankings Princeton University Press, 2006
8. Korfhage, R. R. Information Storage and Retrieval, John Wiley & Sons, 1997

Doporučená literatura

1. Witten, I. H.; Gori, M. & Numerico, T. Web Dragons: Inside the Myths of Search Engine Technology, Morgan Kaufmann, 2006


Language of instruction čeština, angličtina
Code 460-4074
Abbreviation MATD
Course title Methods of Analysis of Textual Data
Coordinating department Department of Computer Science
Course coordinator doc. Mgr. Jiří Dvorský, Ph.D.