Unit STATISTICS AND BIOINFORMATICS

Course
Agricultural and environmental biotechnology
Study-unit Code
A005361
Curriculum
In all curricula
Teacher
Andrea Onofri
CFU
10
Course Regulation
Coorte 2025
Offered
2025/26
Type of study-unit
Obbligatorio (Required)
Type of learning activities
Attività formativa integrata

BIOINFORMATICS

Code A005364
CFU 5
Teacher Alessandro Magini
Teachers
  • Alessandro Magini
Hours
  • 45 ore - Alessandro Magini
Learning activities Affine/integrativa
Area Attività formative affini o integrative
Academic discipline BIO/11
Type of study-unit Obbligatorio (Required)
Language of instruction English
Contents • Introduction to bioinformatics and major biological data types. • Key biological databases and bioinformatics tools. • Prediction and analysis of replication origins in genomes. • Motif discovery strategies and consensus sequences. • Sequence alignment techniques: global, local, and multiple alignment. • NGS sequencing data: formats (FASTQ/BAM/VCF), QC and trimming. • Genome assembly: de novo and reference-guided. • DNA metabarcoding and shotgun metagenomics. • RNA-Seq and gene expression analysis. • Protein bioinformatics: domains, motifs, and functional annotation.
Reference texts 1. Ankenbrand, M. J., & Hachmann, L. (2018). The Biostar Handbook: A Beginner’s Guide to Bioinformatics. 2nd edition. 2. Zvelebil, M., & Baum, J. O. (2007). Understanding Bioinformatics. Taylor & Francis Inc.
Educational objectives This course aims at introducing students to fundamental concepts and tools in bioinformatics, with a focus on applications relevant to Biotechnology. The central goal is transforming real and often convoluted biological problems into algorithmic challenges through a combination of theoretical understanding and computer-based practice. Students will acquire the following main competences: • Interpret and manage NGS data outputs (FASTQ/BAM/VCF), performing basic QC and trimming. • Select and use core databases/tools to retrieve, organize, and trace biological information. • Execute and evaluate sequence alignments (global/local/multiple) and extract biologically meaningful evidence. • Identify sequence motifs/consensus elements and reason about replication-origin signals. • Assemble small genomes (de novo / reference-guided) and judge assembly quality with standard metrics. • Map reads and outline essential steps for variant discovery and interpretation. • Outline minimal pipelines for RNA-Seq, DNA metabarcoding, and shotgun metagenomics.
Prerequisites For effective learning of the course content, adequate knowledge of biochemistry and molecular biology is required.
Teaching methods The course adopts a multimodal, student-centered approach, integrating lectures and video materials with brainstorming and guided discussions to foster engagement and active participation. A practical, laboratory-based methodology will be employed to promote cooperative learning and learning by doing. Students will apply bioinformatics concepts and tools to realistic case studies, in keeping with the discipline’s orientation.
Other information • Attendance is optional but strongly recommended. • The study materials will be provided by the instructor and made available to students on the UNISTUDIUM platform at www.unistudium.unipg.it • Office hours: by appointment; please arrange via email at alessandro.magini@unipg.it
Learning verification modality 1. Written exam on the dates scheduled in the calendar: a 60-minute test with multiple-choice questions and open-ended questions, aimed at assessing knowledge, understanding, and the ability to solve applied problems. 2. In-course assessment (reserved exclusively for attending students): oral presentations of the results of laboratory activities assigned by the instructor, during which presentation skills and the ability to argue the answers to questions posed during the discussion will be tested. The assessment considers accuracy of content, methodological correctness, clarity of presentation, and ability to discuss. Attending students — that is, those who attend at least 75% of the lessons and submit all required work — may choose between the in-course assessment and the written exam. Non-attending students may take only the written exam. For information on support services for students with disabilities and/or DSA, please visit: http://www.unipg.it/disabilita-e-dsa.
Extended program 1. Introduction • What is bioinformatics; • DNA/RNA/proteins, ORF, codons, strand and reverse complement, sequence notation and basic formats (FASTA/FASTQ). 2. Biological databases • INSDC (GenBank/ENA/DDBJ), RefSeq vs GenBank; UniProt/InterPro; PDB/AlphaFold. • Identifiers, citation of resources; Entrez/EBI Search; access to datasets and metadata (SRA/BioProject). 3. From biology to algorithm: oriC and transcription factor consensus sequences • Replication origin signals: DnaA-box and GC-skew; • Motif finding for transcription factors; • Case study: regulation of circadian rhythm in plants (“evening element” motifs), limits and biological validation. 4. Sequence alignment (DNA and proteins) • Global (Needleman–Wunsch), local (Smith–Waterman), multiple (MSA). • Substitution matrices (PAM/BLOSUM), identity vs similarity, “twilight zone”; BLAST: logic and E-value. • Critical reading of scores, gaps and results. 5. NGS sequencing data: formats and quality • Formats: FASTQ/BAM/VCF; paired-end, coverage, Phred. • FastQC, trimming, contaminant removal, metadata management. 6. Genome assembly: de novo and reference-guided • OLC vs de Bruijn graph; k-mer, Eulerian paths; contig vs scaffold; choice of k. • Assembly evaluation: N50/L50, misassemblies, completeness (BUSCO), coverage; good practices. 7. DNA metabarcoding and shotgun metagenomics • Marker-based. • Shotgun: taxonomic classification (e.g., Kraken/Bracken), host/contaminant removal, reading of profiles and cautions. 8. RNA-Seq • Quantification, normalization, principles of differential analysis. • Reading of result tables/figures and links to functional annotation. 9. Protein bioinformatics: domains and 3D structure • Search and annotation of domains/motifs (Pfam/InterPro, PROSITE) and functions (GO/KEGG). • 3D structures: consulting PDB and AlphaFold models; model quality and limits. • Linking structure/domains to function, variants and interactions.

EXPERIMENTAL METHODS IN AGRICULTURE

Code A005363
CFU 5
Teacher Andrea Onofri
Teachers
  • Andrea Onofri
Hours
  • 45 ore - Andrea Onofri
Learning activities Caratterizzante
Area Discipline biotecnologiche agrarie
Academic discipline AGR/02
Type of study-unit Obbligatorio (Required)
Language of instruction English
Contents The course will be aimed to give students the theoretical background and practical tools to design scientifically sound experiments as well as proceed to correct analysis and presentation of results.
Reference texts On-line e-book (linked on the UNISTUDIUM platform)
Educational objectives Knowledge 1. Basic aspects on experimental design 2. Main experimental designs. When are they applied? 3. ANOVA: what it is and when it is applied. 4. How to check for the basic assumptions of ANOVa and what to do when they are not met 5. Linear regression: what it is and when it is applied. 6. Multiple Comparison Procedures. What they are and when they are used. 7. Non-Linear regression: what it is and when it is applied. Practical skills 1. Design an experiment 2. Analyse the results of an experiment by using a statistical software 3. Check the basic assumptions for linear non-linear models. 4. Set up correction strategies 6. Build and fit basic nonlinear regression equations
Prerequisites Only a basic knowledge of computers is required (file, directory, folders and basic knowledge of the operating system)
Teaching methods Lectures and Practicals Audiovisual aids and study material: Power Point slides for all lectures/practicals Written assay for some lectures/practicals Datasets
Other information The main infos are available on the UNISTUDIUM platform
Learning verification modality Final written exam with a practical evaluation, based on case study, to be analysed by using a personal computer
Extended program The course is based on the fact that there are two main ways to obtain scientific information that is not already found in literature, i.e. (1) organise scientifically sound experiments and (2) appropriately use simulation models. LECTURES (1.5 hours each, plus 45 min. of discussion, summary and presentation of case-studies) 1 - Measuring biological phenomena; variability of experimental data. Population, sample and sampling. Estimation methods and criteria. Statistical inference. Experimental units. Replication and pseudoreplication. Independence. 2 - Experimental design and ANOVA. Completely randomised designs (CR). Randomised complete block designs (RCB) and latin square designs. Factorial experiments. Examples. 3 - Split-plot, split-block and nested designs. Repeated experiments. Examples. 4 - ANOVA on CR and RCB designs. Examples. 5 - Problems with basic assumptions. Graphical analyses of residuals. Stabilising transformations. Examples. 6 - Multiple comparison testing. Examples. 7 - ANOVA on split-plot, split-block and repeated experiments designs. Examples. 8 - Regression and correlation. Polynomial regression. Statistical inference on regression analyses. Examples. 9 - Nonlinear regression analyses. Biologic assay. Examples. 10 - Nonlinear regression analyses. Degradation of xenobiotics. Crop growth curves. Other functions. Goodness and lack of fit. Examples. PRACTICALS Students will be exposed to some selected case studies and will be guided to their solution, by using the most appropriate statistical software.
Share on/Follow us on