Unit PROBABILITY AND STATISTICS II - mod. STATISTICAL LEARNING
- Course
- Mathematics
- Study-unit Code
- A005435
- Curriculum
- Generale
- Teacher
- Andrea Capotorti
- Teachers
-
- Andrea Capotorti
- Hours
- 42 ore - Andrea Capotorti
- CFU
- 6
- Course Regulation
- Coorte 2025
- Offered
- 2025/26
- Learning activities
- Caratterizzante
- Area
- Formazione matematica modellistico-computazionale avanzata
- Academic discipline
- MAT/06
- Type of study-unit
- Opzionale (Optional)
- Type of learning activities
- Attività formativa monodisciplinare
- Language of instruction
- Italian
- Contents
- Mathematical insights into advanced statistical methods in Statistical and Machine Learning, both in the case of supervised learning (classification and regression) and unsupervised learning (cluster analysis, dimensionality reduction).
- Reference texts
- Hastie T., Tibshirani R., Friedman J. The Elements of Statistical Learning Data Mining,Inference,and Prediction (available on-line at https://hastie.su.domains/ElemStatLearn/download.html) James G., Witten D., Hastie T., Tibshirani R. (2021) An Introduction to Statistical Learning with Applications in R, 2nd edition, Springer-Verlag (freely available at https://www.statlearning.com) Slides delle lezioni disponibili nella pagina UniStudium del corso.
- Educational objectives
- The course is a mathematical study of the main methods and techniques of Statistical and Machine Learning, both in the supervised (regression and classification) and unsupervised (clustering and dimensionality reduction) fields. The main knowledge acquired will be: • introductory concepts and specific statistical learning models; • evaluation of the predictive capacity of regression and classification models through resampling techniques. The main skills (i.e. the ability to apply the acquired knowledge) will be: • autonomously apply the appropriate methods and algorithms to real regression, classification and clustering problems; • analyze data using the R software for the estimation of supervised and unsupervised models.
- Prerequisites
- Knowledge of the main discrete and continuous statistical models, probability distributions and their properties, Bayes theorem, linear regression
- Teaching methods
- Frontal lessons and laboratory activities with R software
- Other information
- Attendance at classes is strongly recommended. For students with Specific Learning Disorders and/or Disabilities please refer to the we page: http://www.unipg.it/disabilita-e-dsa
- Learning verification modality
- Ongoing assessments and final oral exam. Laboratory activities are aimed at assessing the student's ability to put into practice the methodologies introduced in class. The final oral exam is intended to assess the level of knowledge and understanding achieved by the student regarding the computational and methodological aspects covered during the course.
- Extended program
- The course includes a methodological study of advanced statistical methods for Data Science, both in the case of supervised learning (classification and regression) and unsupervised learning (cluster analysis, dimensionality reduction). These methods have been successfully applied in many fields, from finance to economics, from business analytics to social and natural sciences. The methods covered will be introduced starting from real case studies and analyzed using the R software. In detail, the following topics will be covered: - Statistical and machine learning: introduction. - Forecasting vs interpretability. - Supervised vs unsupervised learning. - Classification vs regression. - Evaluation of the accuracy of a statistical model. - Supervised learning: introduction. - Extensions to the linear regression model: model selection and regularization. Polynomial regression. - Resampling methods: cross-validation and bootstrap. - Classification: introduction. - Logistic and multinomial model. - Linear and quadratic discriminant analysis. - Gaussian naive Bayes. - Gaussian finite mixture models. - K-nearest neighbor algorithm. - Advanced methods for regression and classification. - Generalized Additive Models. - Artificial neural networks. - Decision trees. - Bagging. - Random forests. - Boosting. - Unsupervised learning: introduction. - Principal component analysis. - Similarity and distance measures. Distance matrix. - Hierarchical methods for cluster analysis. - Non-hierarchical methods (k-means method). - Model-based clustering.