Unit PROBABILITY AND STATISTICS II - mod. STATISTICAL LEARNING

Course
Mathematics
Study-unit Code
A005435
Curriculum
Matematica per le applicazioni industriali biomediche
Teacher
Andrea Capotorti
Teachers
  • Andrea Capotorti
Hours
  • 42 ore - Andrea Capotorti
CFU
6
Course Regulation
Coorte 2025
Offered
2025/26
Learning activities
Affine/integrativa
Area
Attività formative affini o integrative
Academic discipline
MAT/06
Type of study-unit
Obbligatorio (Required)
Type of learning activities
Attività formativa monodisciplinare
Language of instruction
Italian
Contents
Mathematical insights into advanced statistical methods in Statistical and Machine Learning, both in the case of supervised learning (classification and regression) and unsupervised learning (cluster analysis, dimensionality reduction).
Reference texts
Hastie T., Tibshirani R., Friedman J. The Elements of Statistical Learning Data Mining,Inference,and Prediction (available on-line at https://hastie.su.domains/ElemStatLearn/download.html) James G., Witten D., Hastie T., Tibshirani R. (2021) An Introduction to Statistical Learning with Applications in R, 2nd edition, Springer-Verlag (freely available at https://www.statlearning.com) Slides delle lezioni disponibili nella pagina UniStudium del corso.
Educational objectives
The course is a mathematical study of the main methods and techniques of Statistical and Machine Learning, both in the supervised (regression and classification) and unsupervised (clustering and dimensionality reduction) fields. The main knowledge acquired will be: • introductory concepts and specific statistical learning models; • evaluation of the predictive capacity of regression and classification models through resampling techniques. The main skills (i.e. the ability to apply the acquired knowledge) will be: • autonomously apply the appropriate methods and algorithms to real regression, classification and clustering problems; • analyze data using the R software for the estimation of supervised and unsupervised models.
Prerequisites
Knowledge of the main discrete and continuous statistical models, probability distributions and their properties, Bayes theorem, linear regression
Teaching methods
Frontal lessons and laboratory activities with R software
Other information
Attendance at classes is strongly recommended. For students with Specific Learning Disorders and/or Disabilities please refer to the we page: http://www.unipg.it/disabilita-e-dsa
Learning verification modality
Ongoing assessments and final oral exam. Laboratory activities are aimed at assessing the student's ability to put into practice the methodologies introduced in class. The final oral exam is intended to assess the level of knowledge and understanding achieved by the student regarding the computational and methodological aspects covered during the course.
Extended program
The course includes a methodological study of advanced statistical methods for Data Science, both in the case of supervised learning (classification and regression) and unsupervised learning (cluster analysis, dimensionality reduction). These methods have been successfully applied in many fields, from finance to economics, from business analytics to social and natural sciences. The methods covered will be introduced starting from real case studies and analyzed using the R software. In detail, the following topics will be covered: - Statistical and machine learning: introduction. - Forecasting vs interpretability. - Supervised vs unsupervised learning. - Classification vs regression. - Evaluation of the accuracy of a statistical model. - Supervised learning: introduction. - Extensions to the linear regression model: model selection and regularization. Polynomial regression. - Resampling methods: cross-validation and bootstrap. - Classification: introduction. - Logistic and multinomial model. - Linear and quadratic discriminant analysis. - Gaussian naive Bayes. - Gaussian finite mixture models. - K-nearest neighbor algorithm. - Advanced methods for regression and classification. - Generalized Additive Models. - Artificial neural networks. - Decision trees. - Bagging. - Random forests. - Boosting. - Unsupervised learning: introduction. - Principal component analysis. - Similarity and distance measures. Distance matrix. - Hierarchical methods for cluster analysis. - Non-hierarchical methods (k-means method). - Model-based clustering.
Share on/Follow us on