Unit BIG DATA MANAGEMENT

Course
Computer engineering and robotics
Study-unit Code
70A00038
Curriculum
In all curricula
Teacher
Fabrizio Montecchiani
Teachers
  • Fabrizio Montecchiani
Hours
  • 48 ore - Fabrizio Montecchiani
CFU
6
Course Regulation
Coorte 2021
Offered
2022/23
Learning activities
Caratterizzante
Area
Ingegneria informatica
Academic discipline
ING-INF/05
Type of study-unit
Opzionale (Optional)
Type of learning activities
Attività formativa monodisciplinare
Language of instruction
Italian.
Contents
-Introduction to Big Data
-Programming models and technologies for distributed computing
-Data models and NoSQL technologies
Reference texts
The course presents methods and technologies that are not covered by a single textbook. To support the student, the topics covered during the lectures are presented in the slides provided by the teacher.

In addition, some textbooks are suggested for further information on the various topics of the course.

T. White, «Hadoop: The Definitive Guide», 3rd Edition, Yahoo Press.
R. Shaposhnik, C. Martella, D. Logothetis, «Practical Graph Analytics with Apache Giraph», Apress.
M. J. Fowler, P. J. Sadalage,  «NoSQL Distilled: A Brief Guide to the Emerging World of Polyglot Persistence», Addison-Wesley.
Educational objectives
The aim of the course is to provide basic notions on models and technologies for Big Data management. In particular, at the end of the course the student should have:


- knowledge of the MapReduce paradigm and of the Apache Hadoop technology;

- knowledge of the TLAV paradigm and of the Apache Giraph technology;

- ability of developing software for Apache Hadoop and Apache Giraph;

- knowledge of the basic principles on distributed databases;

- knowledge and ability to use some NoSQL technologies.
Prerequisites
Knowledge is required about the design and analysis of algorithms, imperative and object programming (Java language), and relational databases.
Teaching methods
The course is divided into two main types of lessons:

Lectures (about 60% of total time): lessons held in the classroom. In each lesson new concepts are taught with the support of projected slides.

Laboratory guided exercises (for about 40% of total time): lessons held in the software engineering lab. In each lesson the students design and implement new programs under the guidance of the teacher.
Other information
None.
Learning verification modality
The assessment methods of this course aim to estimate the theoretical knowledge of the student and his/her ability to apply this knowledge to solve both theoretical and practical problems. The different types of tests are described hereunder.

- Oral test with theoretical and practical exercises

Duration: 30 minutes

Score: 15/30

Aims: Assess the knowledge of the different theoretical notions provided by the course and the ability of developing simple programs.


- Project

Presentation and discussion of a project work (software plus documentation)

Score: 15/30

Aims: Assess the practical abilities of the student with respect to the topics covered in the course.
Extended program
1. Introduction
a. Introduction to Big Data
b. Scaling up vs scaling out
c. Key ideas for Big Data management
2. Part I: Programming models and technologies for distributed computing
a. The MapReduce model
b. The Hadoop platform
c. The Think-Like-A-Vertex model
d. The Giraph platform
e. Apache Spark
3. Part II: Data models and NoSQL technologies
a. Basic principles of NoSQL technologies
b. The CAP theorem and beyond
c. Key-value stores
d. Column-family stores
e. Document databases
f. Graph databases
Condividi su