# Unit BIOINFORMATICS AND BIOSTATISTICS

Course
Biotechnology
Study-unit Code
GP004129
Curriculum
In all curricula
Teacher
Roberto Maria Pellegrino
Teachers
• Roberto Maria Pellegrino
Hours
• 52 ore - Roberto Maria Pellegrino
CFU
6
Course Regulation
Coorte 2021
Offered
2023/24
Learning activities
Altro
Area
Abilità informatiche e telematiche
BIO/11
Type of study-unit
Obbligatorio (Required)
Type of learning activities
Attività formativa monodisciplinare
Reference texts
Manuela Helmer Citterich et al, Fundamentals of Bioinformatics. Ed Zanichelli.
Michael C. Whitlock, Dolph Schluter, "Statistical Analysis of Biological Data", Ed Zanichelli.
Scientific articles from specialized journals and powerpoint presentation will be provided in PDF format during the course.
Extended program
1) Elements of basic computer science: Computer architecture, Operating systems, Algorithms and programs, Programming languages, Introduction to the use of "R": operations with variables, vectors and matrices. Servers and web servers, Databases, the relational model, the normalization process, relational algebra and querying a relational database, Boolean operators.
2) Elements of descriptive statistics: Definitions, populations and samples, sampling types, data types and variables, frequency distribution. Representation of frequency distribution, bar charts, pie charts, frequency tables and histograms for numerical data. Median and interquartile difference, boxplot representation, arithmetic mean and standard deviation, comparison of position and dispersion measures. The normal distribution: Formula of the normal distribution and its properties, the standardized normal distribution, statistical tables. Central limit theorem. Sample distribution of an estimate, measuring the uncertainty of an estimate, confidence interval, standard deviation and experimental error. Formulation, use and testing of hypotheses: null hypothesis, alternative hypothesis. P-value Z-test, T-test, ANOVA, F-test, ROC analysis.
3) Multivariate Statistical Analysis: properties of the data matrix: filtering, transformation and scaling of data. Univariate Analysis on multiple experiments: Fold Change Analysis, T-tests, Volcano plot, One-way Analysis of Variance (ANOVA), Correlation Heatmaps, Pattern Search, Correlation Networks (DSPC). Chemometrics Analysis: Principal Component Analysis (PCA), Partial Least Squares - Discriminant Analysis (PLS-DA). Feature Identification: Significance Analysis of Microarray (SAM), Empirical Bayesian Analysis of Microarray (EBAM). Hierarchical Clustering: Dendrogram, Heatmaps. Partitional Clustering: K-means. Classification & Feature Selection: Random Forest, Support Vector Machine (SVM). Functional Analysis, Enrichment Analysis, Pathway Analysis, Network Analysis. Biomarker Analysis: univariate and multivariate ROC curve analyses Time-series/Two-factor Analysis: ANOVA Simultaneous Component Analysis (ASCA); Multivariate Empirical Bayes Analysis of Variance (MEBA).
4) Biological and molecular evolution, molecular mechanisms underlying evolutionary processes, homologous, orthologous and paralogous genes. Notes on the theory of biological networks and their application to systems biology.
5) Alignment and comparison of biological sequences, Global alignment of sequence pairs, Dynamic programming, Substitution matrices, Local alignment of sequence pairs, Similarity database searches, BLAST: Input and output parameters, Significance of sequence alignments, Interpretation of results. Alignment of sequences to genomes, Multiple sequence alignment.
6) Proteomic and metabolomic analyses by mass spectrometry, interpretation of spectra, use of dedicated databases and web services.
Condividi su