Insegnamento BIG DATA ANALYTICS

Nome del corso di laurea Ingegneria informatica e robotica
Codice insegnamento 70A00037
Curriculum Data science
Docente responsabile Paolo Banelli
Docenti
  • Paolo Banelli
  • Paolo Di Lorenzo (Codocenza)
Ore
  • 42 Ore - Paolo Banelli
  • 30 Ore (Codocenza) - Paolo Di Lorenzo
CFU 9
Regolamento Coorte 2016
Erogato Erogato nel 2017/18
Erogato altro regolamento
Attività Affine/integrativa
Ambito Attività formative affini o integrative
Settore ING-INF/03
Anno 2
Periodo Primo Semestre
Tipo insegnamento Obbligatorio (Required)
Tipo attività Attività formativa monodisciplinare
Lingua insegnamento ITALIANO
Contenuti - FUNDAMENTALS OF STATISTICAL SIGNAL PROCESSING- FUNDAMENTALS OF CONVEX OPTIMIZATION- BIG-DATA REDUCTION- GRAPH-BASED SIGNAL PROCESSING- DISTRIBUTED OPTIMIZATION, SIGNAL PROCESSING, and LEARNING over NETWORKS
Testi di riferimento Most of the class content will be inspired to some chapters and paragraphs of these books:- S.Kay, Fundamentals of Statistical Signal Processing, Vol. I & II, Prentice Hall, 1993-1998;
- S. Theodoridis, Machine Learning: A Bayesian and optimization perspective.- T. Hastie, et. al., The Elements of Statistical Learning: data Mining, Inference, and Prediction
- M. E. J Newman, Networks an Introduction- S. Boyd and L. Vandenberghe, Convex Optimization, Cambridge University Press, 2004;
- S. Boyd et al., Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers, Foundations and Trends in Machine Learning, 3(1):1–122, 2011- Notes of the teacher
Obiettivi formativi Understanding the basics of statistical inference and convex optimization as fundamental tools in big-data analytics. Understanding the concept of data-reduction and conditions under which statistical inference and reconstruction of the information does not suffer too much by reduction. Extend the knowledge of classical signal processing to signals defined over a graph, which is a natural representation of big-data either dependent on their distribution over a network, or on their statistical similarity, or both. Understand the methodological tools to distribute complex statistical inference on parallel and distributed agents (computers, etc.) as a way to empower statistical inference on big-data, possibly geographically or logically distributed over a network.
Prerequisiti Indispensabile: Analisi I e II, Algebra lineare, Teoria della Probabilità, Teoria dei Segnali, Processi Aleatori, Elaborazione numerica dei segnali.Suggerito: Machine Learning e Data MiningUtile: Teoria della Stima e della Decisione
Metodi didattici The class will be given face-to-face by the lecturer with the aid of computer-slides and PC-based simulations of some of the algorithms.
Altre informazioni This will be the first year this course will be taught. Thus, the content of the course is still tentative and may have some changes during the semester, according to the outcome and the time spent to develop each topic.
Modalità di verifica dell'apprendimento 1) Short Thesis on a topic related to the class content, with computer aided simulations. To be given 1 week before the oral exam.
2) Oral Exam: Discussion of the Thesis plus typically 2 questions.

Per informazioni sui servizi di supporto agli studenti con disabilità e/o DSA visita la pagina http://www.unipg.it/disabilita-e-dsa
Programma esteso Part I: FUNDAMENTALS OF STATISTICAL SIGNAL PROCESSING (18 ore)Minimum variance unbiased estimation; Cramer-Rao lower bound;Sufficient statistics; maximum likelihood estimation, Linear estimation, least squares; Bayesian estimation: MMSE estimation, linear estimation.Adaptive estimation theory: Least mean squares estimation, recursive least squares estimation; Kalman filtering.Statistical decision theory: Neyman-Pearson, Minimum Probability of Error, Bayes Risk, Multiple Hypothesis Testing;Part II: FUNDAMENTALS OF CONVEX OPTIMIZATION (9 ore)Basics of convex optimization: Convex sets, convex functions, convex optimization problems;Duality theory: Lagrange dual problem, Slater's constraint qualifications, KKT conditions; Optimization algorithms: Primal methods (steepest descent, gradient projection, Newton method), primal-dual methods (dual ascent, alternating direction method of multipliers);Examples of applications: Approximation and fitting, statistical estimation and detection, adaptive filtering, supervised and unsupervised learning from data;Part III: BIG-DATA REDUCTION (12 ore)Compressed Sampling/Sensing and reconstructionStatistical Inference by Sparse SensingClassification by Principal Component AnalysisPart IV: GRAPH-BASED SIGNAL PROCESSING (15 ore)Signals on graph: motivating examples; algebraic graph theory, graph features; signal processing on graphs: Fourier Transform, smoothing, sampling, and data compression on graph;Part V: DISTRIBUTED OPTIMIZATION, SIGNAL PROCESSING, and LEARNING over NETWORKS (18 ore)Average consensus: Theory and algorithms; Distributed optimization: Consensus and sharing; Distributed optimization: Primal and primal-dual methods; Distributed signal processing: Estimation and detection; Distributed signal processing: LMS, RLS and Kalman Filtering on Graphs.Distributed supervised learning: Regression and data classification; Distributed unsupervised learning: Dictionary learning and data clustering;
Condividi su