Foundations of Information Retrieval – Institut für Data Science

Content

The module covers an introduction to the traditional text IR, including Boolean retrieval, vector space model as well as tolerant retrieval. Afterwards, the technical basics of Web IR are discussed, starting with the Web size estimation and duplicate detection followed by the link analysis and crawling. This leads on to the study of the modern search engine evaluation methods and various test collections. Finally, applications of classification and clustering in the IR domain are discussed. The theoretical basis is illustrated by the examples of the modern search systems, such as Google, Altavista, Clusty, etc.

Die Lehrveranstaltung behandelt Algorithmen, Strukturen und innovative Systeme, die im Rahmen des World Wide Web relevant sind bzw. durch das World Wide Web möglich geworden sind. Kernpunkte der Lehrveranstaltung sind Web-Suche (Web Crawling, Text Indexing, Ranking Mechanismen), Analyse und Struktur des World Wide Web, Datenmanagement (Suche, Topologien, Systeme), sowie weitere aktuelle Themen.

Lecturers

TEACHING ASSISTANTS

Recommended Literature

Christopher D. Manning, Prabhakar Raghavan and Hinrich Schütze, Introduction to Information Retrieval, Cambridge University Press. 2008. It is available online here: nlp.stanford.edu/IR-book/

Participants

Computer science students (recommended from the 3. semester) and ITIS students.

Lecture and Exercise dates

Lectures take place Tuesdays, 14:15 – 15:45 in room 3703-023.
Tutorial session will take place Thursdays, 16:30 – 18:00 in room 1101-F142.

Please refer to Stud.IP for more information

Exam

The exam will be in English. You can answer in English. All topics discussed in the lectures, exercises, and programming exercises are relevant.

Duration: 120 minutes.
Auxiliary material: a non-programmable calculator, dictionary.

Lecture notes

We mainly use the book "Introduction to Information Retrieval" by Christopher D. Manning, Prabhakar Raghavan and Hinrich Schütze, which is available online and as PDF here.

Lectures and Dates

April 12, 2022 Boolean retrieval

April 19, 2022 Document ingestion, Dictionary and Tolerant Retrieval

April 26, 2022 Dictionary and tolerant retrieval, Indexing, Index Compression

May 3, 2022 Index compression, Scoring, Term weighting, Vector space model

May 10, 2022 Evaluation

May 19, 2022 Query expansion

May 26, 2022 Query expansion (continued), Probabilistic information retrieval

May 31, 2022 Language models for IR

June 14, 2022 Text classification and Naive Bayes

June 21, 2022 Vector space classification

June 28, 2022 Learning to rank

July 5, 2022 Flat and Hierarchical calustering

July 12, 2022 Link Analysis

Exercises

Exercises and their solutions are published via Stud.IP.