Unit STATISTICAL METHODS IN DATA ANALYSIS

Course
Physics
Study-unit Code
GP005484
Curriculum
Astrofisica e astroparticelle
Teacher
Bruna Bertucci
Teachers
  • Bruna Bertucci
Hours
  • 42 ore - Bruna Bertucci
CFU
6
Course Regulation
Coorte 2020
Offered
2020/21
Learning activities
Affine/integrativa
Area
Attività formative affini o integrative
Academic discipline
FIS/07
Type of study-unit
Opzionale (Optional)
Type of learning activities
Attività formativa monodisciplinare
Language of instruction
Italian or English if foreign students will be attending the lectures. Handouts and slides are in english.
Contents

Elements of probability theory. Monte Carlo method for the calculation of integrals / simulation of experiments. Methods to estimate observables from series of experimental measurements: least squares, maximum likelihood, confidence intervals. Hypothesis testing: simple, complex, goodness of fit.
Reference texts

Cowan, Statistical Data Analysis Handouts and Lecture notes will be made available during the course via the online platform Unistudium.
Educational objectives

The main goal of the lectures is to provide students with theoretical and practical experience in the analysis of experimental data. The main acquired knowledge will be: - The basics of probability theory and techniques of Monte Carlo simulation. - Knowledge of the main statistical distributions and their properties - Knowledge of the possible statistical methods to estimate observables from experimental measurements and evaluate their error. The main skills (ie the ability to apply their knowledge) will be: - Write simple programs for the analysis of experimental data with different statistical techniques - Write simple programs to generate Monte Carlo simulations - Correctly evaluate the measurement uncertainties in the analysis of experimental data and testing of their description by theoretical models
Prerequisites

In order to be able to understand and apply the majority of the techniques described within the Course the student must be familiar with differential and integral calculus as well as with series expansion of a function. However, these are notions that the students should have already acquired in his/her curriculum. The practical exercises will also be held at the computer, it is then required familiarity with the use of computers and the ability to write simple programs in C.
Teaching methods

Lectures of 2 hours and exercises in class on specific topics.
Other information
Even if not strictly mandatory, attendance to the lectures and exercises in the computer lab is highly recommended.
Learning verification modality

The exam consists of an independent analysis performed by the student on a specific set of data assigned at the end of the course. A short report describing the adopted procedure the obtained results shall be presented and discussed together with other topics in a 45'-60' interview. The report should be delivered at least one week in advance with respect the interview.
Extended program

Elements of probability: frequentist and Bayesian approaches. Random variables, random variables and their multi-dimensional transformations. Expectation values and momenta. Tchebycheff theorem and Bienaymé - Tchebycheff inequality. Error propagation, correlation and independence. Statistical distributions: binomial, multinomial, Poisson, Gaussian, Student t. Central Limit Theorem. Monte Carlo Method: Monte Carlo as a tool to simulate an experiment and as an integration method. Algorithms of numerical integration vs. Monte-Carlo algorithms. Dimensionality. Techniques of variance reduction. Random number generation according given distributions. Change of variables. Hit / miss. Generations of random numbers with uniform distribution. Generation algorithms. Marsaglia effect and quality. Methods of estimation of observables from series of experimental measurements: problem definition. Property distribution of an estimate: consistency, bias, variance of an estimate, efficiency. Maximum likelihood: properties, application to binned and unbinned data , use to determine the parameters of distributions with physical counting experiments. Least squares, Gauss-Markov. Multinormal distribution.Techniques based on the distribution of orthonormal functions. Fit histograms with and without constraints. Confidence intervals. Central Confidence intervals. Average normal distribution. Confidence interval on the variance. Discrete variables. Hypothesis testing: simple, complex. Test Neymann-Pearson. Fisher discriminant. Goodness of fit. Kolmogorov-Smirnov test. Run Test.
Condividi su