A brief outline of the lectures' topics:
- Introduction to information systems. The history and evolution of text retrieval. Differences between database systems and information retrieval (IR) systems. The general model of information retrieval system.
- Pattern matching. One sample pattern matching. Aho-Corasick algorithm. Regular expressions, finite automata. Algorithms for approximate pattern matching.
- Suffix trees. DAWG. Patricia and similar data structures.
- Primary processing of texts. Lexical analysis. Stemming. Lemmatization. Stop words.
- Construction of index systems. Zipf law and the estimated size of the index system. Indexing based on classification. Positional index systems. Methods for weighting terms. TF-IDF weight terms. Methods of compression index systems. Methods for encoding natural numbers.
- Query Languages. Relevance document. The degree of similarity between pairs of document-query. Relevance vs. similarity. The structure and query evaluation. Boolean DIS. IR system evaluation (accuracy, completeness, F-measure).
- Signature methods. Chained and layered coding signatures. Efficient evaluation of queries.
- Latent semantics. Methods for dimension reduction. Methods based on matrix decomposition. Random projection. Vector DIS. Construction and evaluation of the query vector. Other types of DIS (extended Boolean). Indexing, query structure, evaluation questions.
- Search the site. Analysis of hypertext documents, structural methods. PageRank and HITS. Metasearch and cooperative search. Application of computational intelligence and soft computing in processing a text search.
- Methods for automatic summarization: abstraction and extraction. Detection and evolution theme. Sentiment analysis, classification and clustering of documents.
- Parallel and distributed search. Decentralized P2P and search.
- Semantic and contextual search. Neural Information Retrieval.
- Introduction to information systems. The history and evolution of text retrieval. Differences between database systems and information retrieval (IR) systems. The general model of information retrieval system.
- Pattern matching. One sample pattern matching. Aho-Corasick algorithm. Regular expressions, finite automata. Algorithms for approximate pattern matching.
- Suffix trees. DAWG. Patricia and similar data structures.
- Primary processing of texts. Lexical analysis. Stemming. Lemmatization. Stop words.
- Construction of index systems. Zipf law and the estimated size of the index system. Indexing based on classification. Positional index systems. Methods for weighting terms. TF-IDF weight terms. Methods of compression index systems. Methods for encoding natural numbers.
- Query Languages. Relevance document. The degree of similarity between pairs of document-query. Relevance vs. similarity. The structure and query evaluation. Boolean DIS. IR system evaluation (accuracy, completeness, F-measure).
- Signature methods. Chained and layered coding signatures. Efficient evaluation of queries.
- Latent semantics. Methods for dimension reduction. Methods based on matrix decomposition. Random projection. Vector DIS. Construction and evaluation of the query vector. Other types of DIS (extended Boolean). Indexing, query structure, evaluation questions.
- Search the site. Analysis of hypertext documents, structural methods. PageRank and HITS. Metasearch and cooperative search. Application of computational intelligence and soft computing in processing a text search.
- Methods for automatic summarization: abstraction and extraction. Detection and evolution theme. Sentiment analysis, classification and clustering of documents.
- Parallel and distributed search. Decentralized P2P and search.
- Semantic and contextual search. Neural Information Retrieval.