Unit STATISTICS FOR DATA SCIENCE WITH R AND PYTHON
- Course
- Finance and quantitative methods for economics
- Study-unit Code
- A003078
- Curriculum
- Data science for finance and insurance
- Teacher
- Simone Del Sarto
- Teachers
-
- Simone Del Sarto
- Hours
- 42 ore - Simone Del Sarto
- CFU
- 6
- Course Regulation
- Coorte 2024
- Offered
- 2024/25
- Learning activities
- Caratterizzante
- Area
- Matematico, statistico, informatico
- Academic discipline
- SECS-S/01
- Type of study-unit
- Obbligatorio (Required)
- Type of learning activities
- Attività formativa monodisciplinare
- Language of instruction
- English
- Contents
- Recalls of probability and statistical inference; maximum likelihood theory; simple and multiple linear regression models; method of least squares; model diagnostics; inclusion of categorical explanatory variables and analysis of variance; introduction to generalised linear models; mention of logistic regression model; Poisson model for count data; numerical methods for maximum likelihood estimation of generalised linear models.
- Reference texts
- Alan Agresti, Maria Kateri (2021): Foundations of Statistics for Data Scientists (with R and Python). CRC Press, Chapman & Hall. ISBN: 9781003159834
- Educational objectives
- Students will learn the tools for correctly formulating statistical models used for the main types of response variables, learning how to estimate them and draw inferential conclusions based on observed data. The course also aims to illustrate basic diagnostic techniques for model selection, while conveying the guiding principles of statistical modelling (which often go beyond technicalities).
- Prerequisites
- Basic knowledge of univariate and bivariate descriptive statistics, probability theory (main random variables and their mass/ probability density functions, expected values, variances etc.) and inferential statistics (point estimation, confidence intervals, hypothesis testing).
- Teaching methods
- Frontal theoretical lectures, practical sessions with the use of suitable software.
- Learning verification modality
- Oral examination with questions on theory topics; analysis and commentary on software output with estimation of models covered in the course.
- Extended program
- Recalls of probability and statistical inference: main random variables and their moments. Properties of estimators, confidence intervals and hypothesis tests for means and proportions. Likelihood theory: definition of the likelihood function and estimation of parameters through its maximisation. Properties and examples for the parameters of the main distributions. Hints at bootstrap resampling methods. Likelihood ratio test and Wald test. Simple linear regression model: parameter estimation by least squares method, standard error estimation, interpretation of effects, model diagnostics and goodness of fit. Relationship between regression analysis and linear correlation. Multiple linear regression model: parameter estimation and standard errors, interpretation of effects. Proper specification of the functional form of the model: higher-order effects and interactions. Diagnostic analysis: checking the assumptions underlying the model and remedies for possible violations. Inference on the linear model: F-tests and t-tests for global and local significance. Introduction of categorical explanatory variables and analysis of variance tests. Matrix formulation of linear models. Generalised linear models: introduction of the three key components and specification for the major distributions: Normal, Binomial and Poisson. Model deviance and test of the likelihood ratio. Model selection. Poisson model for count data. Numerical methods for estimating the parameters of a generalised linear model: Newton-Raphson algorithm and Fisher scoring.