Academic Year 2022-2023

ADVANCED DATA SCIENCE

Teachers

Dario Fasino
Domenico Freni
Ettore Ritacco
Unit Credits
9
Teaching Period
Second Period
Course Type
Characterizing
Prerequisites. Basic elements of statistics and linear algebra
Teaching Methods. The lessons will both theoretical and practical. The practical part is aimed at the acquisition of languages and software tools through case studies.
Verification of Learning. The examination consists of a project and an oral examination. The project must be done individually on a topic chosen by the student. The project must use the methods, languages, and software tools seen during the course (not necessarily all, but most) in an integrated and fluid manner. The project must be documented in a report describing the objectives, analyses and results obtained. The oral test will focus on the student’s presentation of the project and some focused theory questions.
More Information. Learning resources available on the e-learning platform include handouts, lecture videos, lecture slides, and software resources. However, class attendance is strongly encouraged.
Objectives
* the student should have acquired the necessary knowledge to analyze and visualize structured (tabular and network) and free text data

* the student should have learned at least one software for the analysis and visualization of data especially for networks and text

* the student should be able to interpret experimental results and draw conclusions relevant to the domain of discourse.

* the student should be able to communicate effectively the results of an experimental analysis.

Contents
Much of modern economic activity could not take place without data, which are therefore essential factors of production such as machinery and people. The effective use of data, its analysis and visualization for the purpose of extracting information and knowledge, has the potential to transform economies, offering a new wave of productivity growth and more leisure time for people. Data can play a significant economic role to the benefit not only of private trade, but also of national economies and their citizens, particularly in health care, public administration, and in solving global problems on our planet.

In the course we will address advanced topics in data analysis and data visualization of data. In particular the topics covered include:

– Network science: centrality and power, similarity, community, resilience, distances and small worlds, power laws and scale-free networks

– Text analysis: frequency of words and documents, sentiment analysis, n-grams and co-appearance of terms, topic modeling

– Blockchain.

Texts
E. Estrada, P. Knight. A first course in network theory. Oxford 2015.

Mark Newman. Networks: An Introduction. Oxford University Press, 2010.

Networks, crowds and markets, David Easley and Jon Kleinberg