Academic Year 2023-2024

DATA SCIENCE PRINCIPLES AND LABORATORY

Teachers

Eddy Maddalena
Francesca Da Ros
Course Year
1
Unit Credits
6
Teaching Period
Second Period
Course Type
Characterizing
Prerequisites. Basics in programming and descriptive statistics.
Teaching Methods. The course is a sequence of teaching units. Each teaching unit has three components:

1. a brief explanation;

2. an exercise to be solved;

3. the solution of the exercise.

Some challenges about specific use cases will be proposed.

The hours dedicated to carrying out the exercises and challenges correspond to about 12 hours of laboratory work.

The course ends with the final exam.

Verification of Learning. At the end of the course, the students must take the exam, composed of an oral presentation of an individual project and some questions about the course contents. The project consist of a significant data science challenge performed on a dataset chosen by the student. Each student must work on the project individually using the methods, languages and software tools discussed during the course. The final score considers the student’s knowledge and the quality of the projects and their presentations. The presentation is public, and students are invited to attend their colleagues’ presentations.
Objectives
In this course you will learn how to organize, transform, analyse and visualize small and big data, as well as how to effectively communicate the outcomes of the workflow.

Knowledge and understanding: the student must have acquired the necessary knowledge to import, tidy, transform, visualize, and model data as well as communicate the results of the analysis. The method will mainly focus on relational data, although semistructured as well as unstructured data will also be touched.

Applied knowledge and understanding: the student must have learned R and RStudio environment for data analysis and visualization, as well as R markdown language for communication of results of the analysis.

Making judgments: the student must be able to interpret the experimental results of the analysis and draw effective conclusions relevant to the domain of discourse.

Communication skills: the student must be able to communicate effectively the results of the analysis. This includes both analyst-to-analyst communication and analyst-to-decision-maker communication. Learning skills: the student must demonstrate that they have learned the ability to choose a sufficiently rich row data set, analyse the data to extract meaningful information, draw and communicate conclusions.

Contents
The course introduces the fundamental concepts of data science. After an introductory part, it focuses on the six phases of the data science workflow: import, normalize, transform, visualize, model, and communication. Each step is individually covered through examples. Then, the course covers typical data science applications, such as temporal series, natural language processing, and geographical data.
Texts
– Course’s slides

– Python Data Science Handbook. Jake VanderPlas. O’Reilly.

– Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython (2nd Edition). William McKinney. O’Reilly.