Unit STATISTICS FOR DATA SCIENCE WITH R AND PYTHON

Course

Finance and quantitative methods for economics

Study-unit Code

A003078

Curriculum

Data science for finance and insurance

Teacher

Simone Del Sarto

Teachers

Simone Del Sarto

Hours

42 ore - Simone Del Sarto

CFU

Course Regulation

Coorte 2024

Offered

2024/25

Learning activities

Caratterizzante

Area

Matematico, statistico, informatico

Academic discipline

SECS-S/01

Type of study-unit

Obbligatorio (Required)

Type of learning activities

Attività formativa monodisciplinare

Language of instruction

English

Contents

Recalls of probability and statistical inference; maximum likelihood theory; simple and multiple linear regression models; method of least squares; model diagnostics; inclusion of categorical explanatory variables and analysis of variance; introduction to generalised linear models; mention of logistic regression model; Poisson model for count data; numerical methods for maximum likelihood estimation of generalised linear models.

Reference texts

Alan Agresti, Maria Kateri (2021): Foundations of Statistics for Data Scientists (with R and Python). CRC Press, Chapman & Hall. ISBN: 9781003159834

Educational objectives

Students will learn the tools for correctly formulating statistical models used for the main types of response variables, learning how to estimate them and draw inferential conclusions based on observed data. The course also aims to illustrate basic diagnostic techniques for model selection, while conveying the guiding principles of statistical modelling (which often go beyond technicalities).

Prerequisites

Basic knowledge of univariate and bivariate descriptive statistics, probability theory (main random variables and their mass/ probability density functions, expected values, variances etc.) and inferential statistics (point estimation, confidence intervals, hypothesis testing).

Teaching methods

Frontal theoretical lectures, practical sessions with the use of suitable software.

Learning verification modality

Oral examination with questions on theory topics; analysis and commentary on software output with estimation of models covered in the course.

Extended program

Recalls of probability and statistical inference: main random variables and their moments. Properties of estimators, confidence intervals and hypothesis tests for means and proportions. Likelihood theory: definition of the likelihood function and estimation of parameters through its maximisation. Properties and examples for the parameters of the main distributions. Hints at bootstrap resampling methods. Likelihood ratio test and Wald test. Simple linear regression model: parameter estimation by least squares method, standard error estimation, interpretation of effects, model diagnostics and goodness of fit. Relationship between regression analysis and linear correlation. Multiple linear regression model: parameter estimation and standard errors, interpretation of effects. Proper specification of the functional form of the model: higher-order effects and interactions. Diagnostic analysis: checking the assumptions underlying the model and remedies for possible violations. Inference on the linear model: F-tests and t-tests for global and local significance. Introduction of categorical explanatory variables and analysis of variance tests. Matrix formulation of linear models. Generalised linear models: introduction of the three key components and specification for the major distributions: Normal, Binomial and Poisson. Model deviance and test of the likelihood ratio. Model selection. Poisson model for count data. Numerical methods for estimating the parameters of a generalised linear model: Newton-Raphson algorithm and Fisher scoring.