Academic Year 2019-2020

INFORMATION RETRIEVAL

Teachers:
Stefano Mizzaro
Total Course Credits: 6
Teaching Period: Second Period
Teaching Language: Inglese
Prerequisites. Basic knowledge of Programming, Algorithms and data structures, Web technologies, Linear algebra, Probability.
Teaching Methods. Normal lectures and talks on specific topics. The course will be neither too formal nor too practical, but mainly conceptual.
Verification of Learning. Oral exam plus an extra small term project (talk, homework, etc.) on a specific topic. The course will be taught in English and the exam can be in English as well. Alternative programs for Erasmus students are possible in principle and have to be discussed with the instructor.
More Information. The course is taught in English language.

OBJECTIVES

At the end of the course, the student will be able to:

* Knowledge and comprehension skills: know both basic topics and advanced research trends of the field

* Practical skills: apply basic principles to design, analyse and evaluate IR systems

* Independent judgment skills: judge the quality of different design choices

* Communication skills: describe how IR systems work

* Learning skills: learn new indexing and retrieval techniques

CONTENTS

Information Retrieval (IR) is a discipline that has a high historical importance and has received an even increased attention after the coming of the Web. The course aims to present the main conceptual issues underlying IR systems, with particular emphasis on Web search engines.

Detailed contents:

* Classical IR:

– formal IR models (Boolean, vector space, probabilistic and variants as BM25, Language models);

– structure of the inverted index (basics, compression);

– user interfaces for IR (classification, survey);

– classification (definition, naive Bayes classifiers)

– clustering (hierarchical and approximate algorithms);

– evaluation (foundations, methodologies, metrics; research topics).

* Web IR:

– Web graph (size and shape: small world and scale-free networks, bow-tie shape);

– link analysis for ranking and other applications (PageRank, HITS, variants);

– crawling (concepts and architecture);

– spam (short account);

– search engine architecture (short account).

* Case studies and specific issues.

TEXTS

* R. Baeza-Yates, B. Ribeiro-Neto, Modern Information Retrieval, 2a edizione, 2011

* C. D. Manning, P. Raghavan e H. Schütze, Introduction to Information Retrieval, Cambridge University Press. 2008. http://nlp.stanford.edu/IR-book/

* B. Croft, D. Metzler, T. Strohman. Information retrieval in practice, Addison Wesley, 2009

* Other books and papers as detailed during lectures.