Unit PROBABILITY AND STATISTICS II - mod. STATISTICAL LEARNING

Course

Mathematics

Study-unit Code

A005435

Curriculum

Generale

Teacher

Andrea Capotorti

Teachers

Andrea Capotorti

Hours

42 ore - Andrea Capotorti

CFU

Course Regulation

Coorte 2025

Offered

2025/26

Learning activities

Caratterizzante

Area

Formazione matematica modellistico-computazionale avanzata

Sector

MAT/06

Type of study-unit

Opzionale (Optional)

Type of learning activities

Attività formativa monodisciplinare

Language of instruction

Italian

Contents

Mathematical insights into advanced statistical methods in Statistical and Machine Learning, both in the case of supervised learning (classification and regression) and unsupervised learning (cluster analysis, dimensionality reduction).

Reference texts

Hastie T., Tibshirani R., Friedman J. The Elements of Statistical Learning Data Mining,Inference,and Prediction (available on-line at https://hastie.su.domains/ElemStatLearn/download.html) James G., Witten D., Hastie T., Tibshirani R. (2021) An Introduction to Statistical Learning with Applications in R, 2nd edition, Springer-Verlag (freely available at https://www.statlearning.com) Slides delle lezioni disponibili nella pagina UniStudium del corso.

Educational objectives

The course is a mathematical study of the main methods and techniques of Statistical and Machine Learning, both in the supervised (regression and classification) and unsupervised (clustering and dimensionality reduction) fields. The main knowledge acquired will be: • introductory concepts and specific statistical learning models; • evaluation of the predictive capacity of regression and classification models through resampling techniques. The main skills (i.e. the ability to apply the acquired knowledge) will be: • autonomously apply the appropriate methods and algorithms to real regression, classification and clustering problems; • analyze data using the R software for the estimation of supervised and unsupervised models.

Prerequisites

Knowledge of the main discrete and continuous statistical models, probability distributions and their properties, Bayes theorem, linear regression

Teaching methods

Frontal lessons and laboratory activities with R software

Other information

Attendance at classes is strongly recommended. For students with Specific Learning Disorders and/or Disabilities please refer to the we page: http://www.unipg.it/disabilita-e-dsa

Learning verification modality

Ongoing assessments and final oral exam. Laboratory activities are aimed at assessing the student's ability to put into practice the methodologies introduced in class. The final oral exam is intended to assess the level of knowledge and understanding achieved by the student regarding the computational and methodological aspects covered during the course.

Extended program

The course includes a methodological study of advanced statistical methods for Data Science, both in the case of supervised learning (classification and regression) and unsupervised learning (cluster analysis, dimensionality reduction). These methods have been successfully applied in many fields, from finance to economics, from business analytics to social and natural sciences. The methods covered will be introduced starting from real case studies and analyzed using the R software. In detail, the following topics will be covered: - Statistical and machine learning: introduction. - Forecasting vs interpretability. - Supervised vs unsupervised learning. - Classification vs regression. - Evaluation of the accuracy of a statistical model. - Supervised learning: introduction. - Extensions to the linear regression model: model selection and regularization. Polynomial regression. - Resampling methods: cross-validation and bootstrap. - Classification: introduction. - Logistic and multinomial model. - Linear and quadratic discriminant analysis. - Gaussian naive Bayes. - Gaussian finite mixture models. - K-nearest neighbor algorithm. - Advanced methods for regression and classification. - Generalized Additive Models. - Artificial neural networks. - Decision trees. - Bagging. - Random forests. - Boosting. - Unsupervised learning: introduction. - Principal component analysis. - Similarity and distance measures. Distance matrix. - Hierarchical methods for cluster analysis. - Non-hierarchical methods (k-means method). - Model-based clustering.