Academic Year 2023-2024

WEB INFORMATION RETRIEVAL

Teachers

Stefano Mizzaro
Unit Credits
6
Teaching Period
First Period
Course Type
Supplementary
Prerequisites. Basic knowledge of Programming, Algorithms and data structures, Web technologies, Linear algebra, Probability.
Teaching Methods. Normal lectures and talks on specific topics. The course will be neither too formal nor too practical, but mainly conceptual.
Verification of Learning. Oral exam plus an extra small term project (talk, homework, etc.) on a specific topic. The course will be taught in English and the exam can be in English as well. Alternative programs for Erasmus students are possible in principle and have to be discussed with the instructor.

The criteria for rating decision are those decided by the “Corso di Studi” and can be found at: https://www.uniud.it/it/didattica/corsi/area-scientifica/scienze-matematiche-informatiche-multimediali-fisiche/laurea/informatica/studiare/criteri.pdf (for Informatica and Artificial Intelligence & Cybersecturity) and https://www.uniud.it/it/didattica/corsi/area-scientifica/scienze-matematiche-informatiche-multimediali-fisiche/laurea-magistrale/comunicazione-multimediale-e-tecnologie-dellinformazione/studiare/criteri.pdf (for CMTI)

More Information. The course is taught in English language. Teaching material (slides, etc.) will be provided by means of the moodle and teams e-learning platforms during the course.
Objectives
https://www.uniud.it/it/didattica/info-didattiche/regolamento-didattico-del-corso/LM-informatica/all-B2
Contents
Information Retrieval (IR) is a discipline that has a high historical importance and has received an even increased attention after the coming of the Web. The course aims to present the main conceptual issues underlying IR systems, with particular emphasis on Web search engines.

Detailed contents:

* Classical IR:

– formal IR models (Boolean, vector space, probabilistic and variants as BM25, Language models);

– structure of the inverted index (basics, compression);

– user interfaces for IR (classification, survey);

– classification (definition, naive Bayes classifiers)

– clustering (hierarchical and approximate algorithms);

– evaluation (foundations, methodologies, metrics; research topics).

* Web IR:

– Web graph (size and shape: small world and scale-free networks, bow-tie shape);

– link analysis for ranking and other applications (PageRank, HITS, variants);

– crawling (concepts and architecture);

– spam (short account);

– search engine architecture (short account).

* Case studies and specific issues.

Texts
* R. Baeza-Yates, B. Ribeiro-Neto, Modern Information Retrieval, 2a edizione, 2011

* C. D. Manning, P. Raghavan e H. Schütze, Introduction to Information Retrieval, Cambridge University Press. 2008. http://nlp.stanford.edu/IR-book/

* B. Croft, D. Metzler, T. Strohman. Information retrieval in practice, Addison Wesley, 2009

* Other books and papers as detailed during lectures.