Academic Year 2019-2020

WEB INFORMATION RETRIEVAL

Teachers

Stefano Mizzaro
Unit Credits
6
Teaching Period
Second Period
Course Type
Affine/Integrativa
Prerequisites. Basic knowledge of Programming, Algorithms and data structures, Web technologies, Linear algebra, Probability.
Teaching Methods. Normal lectures and talks on specific topics. The course will be neither too formal nor too practical, but mainly conceptual.
Verification of Learning. Oral exam plus an extra small term project (talk, homework, etc.) on a specific topic. The course will be taught in English and the exam can be in English as well. Alternative programs for Erasmus students are possible in principle and have to be discussed with the instructor.
More Information. The course is taught in English language.
Objectives
At the end of the course, the student will be able to:

* Knowledge and comprehension skills: know both basic topics and advanced research trends of the field

* Practical skills: apply basic principles to design, analyse and evaluate IR systems

* Independent judgment skills: judge the quality of different design choices

* Communication skills: describe how IR systems work

* Learning skills: learn new indexing and retrieval techniques

Contents
Information Retrieval (IR) is a discipline that has a high historical importance and has received an even increased attention after the coming of the Web. The course aims to present the main conceptual issues underlying IR systems, with particular emphasis on Web search engines.

Detailed contents:

* Classical IR:

– formal IR models (Boolean, vector space, probabilistic and variants as BM25, Language models);

– structure of the inverted index (basics, compression);

– user interfaces for IR (classification, survey);

– classification (definition, naive Bayes classifiers)

– clustering (hierarchical and approximate algorithms);

– evaluation (foundations, methodologies, metrics; research topics).

* Web IR:

– Web graph (size and shape: small world and scale-free networks, bow-tie shape);

– link analysis for ranking and other applications (PageRank, HITS, variants);

– crawling (concepts and architecture);

– spam (short account);

– search engine architecture (short account).

* Case studies and specific issues.

Texts
* R. Baeza-Yates, B. Ribeiro-Neto, Modern Information Retrieval, 2a edizione, 2011

* C. D. Manning, P. Raghavan e H. Schütze, Introduction to Information Retrieval, Cambridge University Press. 2008. http://nlp.stanford.edu/IR-book/

* B. Croft, D. Metzler, T. Strohman. Information retrieval in practice, Addison Wesley, 2009

* Other books and papers as detailed during lectures.