Unit STATISTICS FOR DATA SCIENCE WITH R AND PYTHON

Course
Finance and quantitative methods for economics
Study-unit Code
A003078
Location
PERUGIA
Curriculum
Data science for finance and insurance
Teacher
Marco Doretti
Teachers
  • Marco Doretti
Hours
  • 42 ore - Marco Doretti
CFU
6
Course Regulation
Coorte 2022
Offered
2022/23
Learning activities
Caratterizzante
Area
Matematico, statistico, informatico
Academic discipline
SECS-S/01
Type of study-unit
Obbligatorio (Required)
Type of learning activities
Attività formativa monodisciplinare
Language of instruction
English
Contents
Recap of statistical inference; maximum likelihood theory; hints on Bayesian inference; simple and multiple linear regression models; ordinary least square method; model diagnostics; inclusion of categorical explanatory variables and analysis of variance; introduction to generalized linear models; hints on logistic regression model; Poisson model for count data; numerical methods for maximum likelihood estimation of generalized linear models.
Reference texts
Alan Agresti, Maria Kateri (2021): Foundations of Statistics for Data Scientists (with R and Python). CRC Press, Chapman & Hall. ISBN: 9781003159834

Further material provided by the instructor
Educational objectives
Students will learn tools to correctly formulate the statistical models used in Data Science for the main types of outcome variables. The will learn how to estimate these models and to draw inferential conclusions based on the observed data. The aim of the course is also to illustrate the main diagnostic techniques for model selection, as well as general principles of statistical models (that often go beyond technicalities).
Prerequisites
Base knowledge of Descriptive Statistics (univariate and bivariate) and of Inferential Statistics (point estimation, interval estimation, hypothesis testing).
Teaching methods
Lectures on theory, practical sessions with statistical software.
Learning verification modality
Oral examination concerning theory as well as analysis of software output of fitted models.
Extended program
Recap on point and interval estimation: estimators' properties, confidence intervals. Inference on means, proportions, differences of means and differences of proportions. Sample size definition. Likelihood theory: definition of the likelihood function and parameter estimation through its maximization. Properties and examples for the parameters of main distributions. Hints on resampling methods (bootstrap) and Bayesian Inference: prior and posterior distributions, conjugate distributions. Relationships between hypothesis test and confidence intervals: likelihood ratio test and Wald test. Simple linear regression model: ordinary least square estimates, standard errors, effect interpretation, model diagnostics and goodness of. fit. Relationship between regression analysis and correlation. Multiple linear regression model: parameter estimation and standard errors, effect interpretation. Hints on causal analysis: distinction between associational and causal effects, spurious correlation. Correct specification of the functional form: higher-order effects and interactions. Diagnostics analysis: assumption checking and remedies to possible misspecification. Inference on linear models: F-test and t-test for global and local significance. Introduction of categorical explanatory variables and analysis of variance. Matrix formulation of linear models. Generalized linear models: introduction of the three key components and specification for the main distributions: Normal, Binomial, Poisson. Model deviance and likelihood ratio test. Model selection. Poisson model for count data. Numerical methods for the estimation of a generalized linear model: Newton-Raphson and Fisher Scoring algorithms.
Condividi su