Università degli Studi di Perugia

Insegnamento BIG DATA ANALYTICS

Nome del corso di laurea	Ingegneria informatica e robotica
Codice insegnamento	70A00037
Curriculum	Data science
Docente responsabile	Paolo Banelli
Docenti	Paolo Banelli Paolo Di Lorenzo (Codocenza)
Ore	42 Ore - Paolo Banelli 30 Ore (Codocenza) - Paolo Di Lorenzo
CFU	9
Regolamento	Coorte 2016
Erogato	Erogato nel 2017/18
Erogato altro regolamento
Attività	Affine/integrativa
Ambito	Attività formative affini o integrative
Settore	ING-INF/03
Anno	2
Periodo	Primo Semestre
Tipo insegnamento	Obbligatorio (Required)
Tipo attività	Attività formativa monodisciplinare
Lingua insegnamento	ITALIANO
Contenuti	- FUNDAMENTALS OF STATISTICAL SIGNAL PROCESSING- FUNDAMENTALS OF CONVEX OPTIMIZATION- BIG-DATA REDUCTION- GRAPH-BASED SIGNAL PROCESSING- DISTRIBUTED OPTIMIZATION, SIGNAL PROCESSING, and LEARNING over NETWORKS
Testi di riferimento	Most of the class content will be inspired to some chapters and paragraphs of these books:- S.Kay, Fundamentals of Statistical Signal Processing, Vol. I & II, Prentice Hall, 1993-1998; - S. Theodoridis, Machine Learning: A Bayesian and optimization perspective.- T. Hastie, et. al., The Elements of Statistical Learning: data Mining, Inference, and Prediction - M. E. J Newman, Networks an Introduction- S. Boyd and L. Vandenberghe, Convex Optimization, Cambridge University Press, 2004; - S. Boyd et al., Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers, Foundations and Trends in Machine Learning, 3(1):1–122, 2011- Notes of the teacher
Obiettivi formativi	Understanding the basics of statistical inference and convex optimization as fundamental tools in big-data analytics. Understanding the concept of data-reduction and conditions under which statistical inference and reconstruction of the information does not suffer too much by reduction. Extend the knowledge of classical signal processing to signals defined over a graph, which is a natural representation of big-data either dependent on their distribution over a network, or on their statistical similarity, or both. Understand the methodological tools to distribute complex statistical inference on parallel and distributed agents (computers, etc.) as a way to empower statistical inference on big-data, possibly geographically or logically distributed over a network.
Prerequisiti	Indispensabile: Analisi I e II, Algebra lineare, Teoria della Probabilità, Teoria dei Segnali, Processi Aleatori, Elaborazione numerica dei segnali.Suggerito: Machine Learning e Data MiningUtile: Teoria della Stima e della Decisione
Metodi didattici	The class will be given face-to-face by the lecturer with the aid of computer-slides and PC-based simulations of some of the algorithms.
Altre informazioni	This will be the first year this course will be taught. Thus, the content of the course is still tentative and may have some changes during the semester, according to the outcome and the time spent to develop each topic.
Modalità di verifica dell'apprendimento	1) Short Thesis on a topic related to the class content, with computer aided simulations. To be given 1 week before the oral exam. 2) Oral Exam: Discussion of the Thesis plus typically 2 questions. Per informazioni sui servizi di supporto agli studenti con disabilità e/o DSA visita la pagina http://www.unipg.it/disabilita-e-dsa
Programma esteso	Part I: FUNDAMENTALS OF STATISTICAL SIGNAL PROCESSING (18 ore)Minimum variance unbiased estimation; Cramer-Rao lower bound;Sufficient statistics; maximum likelihood estimation, Linear estimation, least squares; Bayesian estimation: MMSE estimation, linear estimation.Adaptive estimation theory: Least mean squares estimation, recursive least squares estimation; Kalman filtering.Statistical decision theory: Neyman-Pearson, Minimum Probability of Error, Bayes Risk, Multiple Hypothesis Testing;Part II: FUNDAMENTALS OF CONVEX OPTIMIZATION (9 ore)Basics of convex optimization: Convex sets, convex functions, convex optimization problems;Duality theory: Lagrange dual problem, Slater's constraint qualifications, KKT conditions; Optimization algorithms: Primal methods (steepest descent, gradient projection, Newton method), primal-dual methods (dual ascent, alternating direction method of multipliers);Examples of applications: Approximation and fitting, statistical estimation and detection, adaptive filtering, supervised and unsupervised learning from data;Part III: BIG-DATA REDUCTION (12 ore)Compressed Sampling/Sensing and reconstructionStatistical Inference by Sparse SensingClassification by Principal Component AnalysisPart IV: GRAPH-BASED SIGNAL PROCESSING (15 ore)Signals on graph: motivating examples; algebraic graph theory, graph features; signal processing on graphs: Fourier Transform, smoothing, sampling, and data compression on graph;Part V: DISTRIBUTED OPTIMIZATION, SIGNAL PROCESSING, and LEARNING over NETWORKS (18 ore)Average consensus: Theory and algorithms; Distributed optimization: Consensus and sharing; Distributed optimization: Primal and primal-dual methods; Distributed signal processing: Estimation and detection; Distributed signal processing: LMS, RLS and Kalman Filtering on Graphs.Distributed supervised learning: Regression and data classification; Distributed unsupervised learning: Dictionary learning and data clustering;