Foundations of Information Retrieval

The lecture gives an introduction to Web Information Retrieval with particular emphasis on the algorithms and technologies used in the modern search engines.

CONTENT

The module covers an introduction to the traditional text IR, including Boolean retrieval, vector space model as well as tolerant retrieval. Afterwards, the technical basics of Web IR are discussed, starting with the Web size estimation and duplicate detection followed by the link analysis and crawling. This leads on to the study of the modern search engine evaluation methods and various test collections. Finally, applications of classification and clustering in the IR domain are discussed. The theoretical basis is illustrated by the examples of the modern search systems, such as Google, Altavista, Clusty, etc.

Die Lehrveranstaltung behandelt Algorithmen, Strukturen und innovative Systeme, die im Rahmen des World Wide Web relevant sind bzw. durch das World Wide Web möglich geworden sind. Kernpunkte der Lehrveranstaltung sind Web-Suche (Web Crawling, Text Indexing, Ranking Mechanismen), Analyse und Struktur des World Wide Web, Datenmanagement (Suche, Topologien, Systeme), sowie weitere aktuelle Themen.

RECOMMENDED LITERATURE

Christopher D. Manning, Prabhakar Raghavan and Hinrich Schütze, Introduction to Information Retrieval, Cambridge University Press. 2008.

PARTICIPANTS

Computer science students (recommended from the 3. semester) and ITIS students.

LECTURE DATES

Lectures and tutorial session will take place online. Please refer to the StudIP course for more information.

EXAM

The exam will be in English. You can answer in English or German. All topics discussed in the lectures, exercises and programming exercises are relevant.

Duration: 90 minutes.
Auxiliary material: a non-programmable calculator, dictionary.

LECTURE NOTES

We mainly use the book "Introduction to Information Retrieval" by Christopher D. Manning, Prabhakar Raghavan and Hinrich Schütze, which is available online and as PDF here.

EXERCISES

Exercises and their solutions are published via Stud.IP.