Unit STATISTICS FOR DATA SCIENCE WITH R AND PYTHON

Course
Finance and quantitative methods for economics
Study-unit Code
A003078
Curriculum
Data science for finance and insurance
Teacher
Marco Doretti
Teachers
  • Marco Doretti
Hours
  • 42 ore - Marco Doretti
CFU
6
Course Regulation
Coorte 2023
Offered
2023/24
Learning activities
Caratterizzante
Area
Matematico, statistico, informatico
Academic discipline
SECS-S/01
Type of study-unit
Obbligatorio (Required)
Type of learning activities
Attività formativa monodisciplinare
Language of instruction
English
Contents
Recap on probability and statistical inference; likelihood theory; simple and multiple linear regression; ordinary least squares; model diagnostic; inclusion of categorical explanatory variables and analysis of variance; introduction to generalized linear models; hints on logistic regression; Poisson model for count data; numerical methods for maximum likelihood estimation of generalized linear models.
Reference texts
Alan Agresti, Maria Kateri (2021): Foundations of Statistics for Data Scientists (with R and Python). CRC Press, Chapman & Hall. ISBN: 9781003159834

Further material provided by the instructor
Educational objectives
Students will learn tools to correctly formulate the statistical models used in Data Science for the main types of outcome variables. The will learn how to estimate these models and to draw inferential conclusions based on the observed data. The aim of the course is also to illustrate the main diagnostic techniques for model selection, as well as general principles of statistical models (that often go beyond technicalities).
Teaching methods
Lectures on theory, practical sessions with statistical software.
Learning verification modality
Oral examination concerning theory as well as analysis of software output of fitted models.
Extended program
Recap on probability and inference: main random variables and their moments. Properties of estimators, confidence intervals and hypothesis tests for means, proportions, differences of means and proportions. Likelihood theory: definition of the likelihood function and parameter estimation through its maximization. Properties and examples for the parameters of main distributions. Hints on resampling methods like bootstrap. Likelihood ratio test, score test and Wald test. Simple linear regression model: parameter estimate with ordinary least squares, estimation of standard errors, effect interpretation, model diagnostic and goodness of fit. Relationship between regression and correlation analysis. Multiple linear regression model: parameter and standard error estimation, effect interpretation. Hints on causal analysis: distinction between association and causation, spurious correlation. Correct model specification: higher-order effects and interactions. Diagnostic: check of assumptions and remedies to possible violations. Inference on linear model: t-test and F test for local and global significance. Introduction of categorical explanatory variables and analysis of variance testing. Matrix form of linear models. Generalized linear models: definition of the three key components and specification for the main distributions: Normal, Binomial and Poisson. Model deviance and likelihood ratio test. Model selection. Poisson model for count data. Numerical methods for parameter estimation in a generalized linear model: Newton-Raphson and Fisher scoring algorithm.
Condividi su