Academic Year 2023-2024

ADVANCED DATA SCIENCE

Teachers

Massimo Franceschet
Unit Credits
9
Teaching Period
First Period
Course Type
Characterizing
Prerequisites. Basic elements of statistics and linear algebra
Teaching Methods. The teaching model followed will be Decentralised Autonomous Education (DAE – cubiclearn.gitbook.io/dae/the-learning-model), based on the principles of democratic and autonomous teaching. Students will be invited to actively participate in the lessons, which will be theoretical in nature but with a substantial laboratory part. The laboratory part is aimed at the acquisition of software languages and tools through case studies.
Verification of Learning. One third of the assessment will be given on the basis of the student’s actual participation during the lessons, based on the DAE teaching model followed in the course.

The remainder of the assessment will be awarded on the basis of a project undertaken by the student. The project must be done individually on a topic chosen by the student. The project must use the methods, languages and software tools seen during the course (not necessarily all of them, but most of them) in an integrated and fluid manner. The project must be documented in a report describing the objectives, analyses and results obtained. The project will be discussed publicly. The evaluation of the project will be assigned half by the lecturer and half by the students (including the evaluated student).

During the discussion of the project, the student will be asked to answer some theory questions.

See also general criteria approved by Consiglio di Corso di Studi: https://www.uniud.it/it/didattica/corsi/area-scientifica/scienze-matematiche-informatiche-multimediali-fisiche/laurea/informatica/studiare/criteri.pdf

More Information. Nothing
Objectives
* the student should have acquired the necessary knowledge to analyze and visualize structured (tabular and network) and free text data

* the student should have learned at least one software for the analysis and visualization of data especially for networks and text

* the student should be able to interpret experimental results and draw conclusions relevant to the domain of discourse.

* the student should be able to communicate effectively the results of an experimental analysis.

Contents
Much of modern economic activity could not take place without data analysis (data science). The effective use of data, its analysis and visualisation to extract information and knowledge, has the potential to transform economies, offering a new wave of productivity growth and more leisure time for people.

Data can play a significant economic role benefiting not only private business, but also national economies and their citizens, particularly in healthcare, public administration, and in solving global problems on our planet.

In the course, we will learn how to analyse and visualise data, including big data, using the statistical computing language R and tidyverse libraries (www.tidyverse.org). The course will be taught according to the Decentralised Autonomous Education (DAE – cubiclearn.gitbook.io/dae/the-learning-model) teaching model, which gives ample space to active student participation in teaching activities.

The topics covered will be summarised as follows:

1. Introduction to the DAE learning model

2. The Web 3.0: blockchain, wallet, token, smart contract, DAO

3. Introduction to the data analysis flow: import, normalisation, transformation, visualisation, modelling and communication

4. The science of networks: centrality and power, signed networks, similarity and heterogeneity, community, resilience, distance and small world, power laws and scale-free networks, epidemics on networks

5. text analysis: word and document frequency, sentiment analysis, n-grams and co-occurrence of terms, topic modelling

Texts
R for Data Science, Hadley Wickham and Garrett Grolemund

Networks, Mark Newman

Networks, crowds and markets, David Easley and Jon Kleinberg