BIOSTATISTICS AND DATA SCIENCE
- Overview
- Assessment methods
- Learning objectives
- Contents
- Delivery method
- Teaching methods
- Contacts/Info
No previous knowledge is required
At the end of the module, the student will undergo a written examination. S/he will be presented with three numerical exercises about the “Statistical methodologies” part. An open answer is required, aimed at verifying the knowledge of the logic and methodological tools required for a correct evaluation of experimental data. The allotted time for the exam is 1.5 hours and the exam will be considered passed equal or over the 18/30 mark. During the exam the use of personal computers and/or pocket calculators and the perusal of one’s own notes are allowed.
At the end of the module, the student will undergo a written examination. S/he will be presented with two open questions on the topics presented in the course and three exercises (chosen between six proposed ones) in which they will have to describe or complete simple bash and Python 3.x commands for parsing and/or analysing texts. The allotted time for the exam is 1.5 hours and the exam will be considered passed equal or over the 18/30 mark. During the exam the teacher will allow and directly provide a scheme reporting the bash/python commands expected to be used.
The final grade will be the mean of the marks obtained for both modules of the Course.
Biostatistics Module: Modern Biology and Biotechnology cannot be done without a knowledge of their statistical and biometrical aspects. It is thus necessary to provide the student with interlaced biological and statistical knowledge. The goal of this module/segment is to make the students familiar with the statistical theory and terminology, so to understand the power and pitfalls of statistical analysis, with special emphasis on the planning of the experiments and the analysis of experimental data in the field of Life Sciences.
Data Science Module: the main objective of this course is to provide basic knowledge on bioinformatics methods and software packages most used to catalog, analyze and predict characteristics of biological and biotechnological systems.
An introduction of the methods background will be provided, and some simple guided exercises will be carried out to let the student experience the practice of these methods in the biological and biotechnological fields ("parsing" of information from texts written in Excel format, access to biological databases, prediction of protein structure).
Biostatistics Module:
Basics of statistical analysis – 1 cfu (8hrs)
o Why use Statistics. Populations and samples. Basics of probability. Random variables.
o Frequency distributions; what is a statistical test: power and protection of a test, Type I and Type II errors.
The most common statistical tests – 1 cfu (8hrs)
o Quantitative and qualitative variables – which test?
o Some uses of the z variable.
o The χ2 test. Goodness-of-fit test and comparisons between proportions.
o The General Linear Model (GLM)
o Some uses of Student’s t
Other statistical tests – 1 cfu (8hrs)
o The model of Analysis of Variance (ANOVA).
o One-Way ANOVA: the completely randomised and the randomised block designs.
o Linear regression models, parameters estimate in linear, multiple and curvilinear regression.
o Use of statistical software. Examples in R.
Data Science Module: 24 hours (3 cfu)
o Operating Systems introduction (windows/MaxOS/Linux) and Linux bash
o Basics of a statistical Programming Language (Python 3.x) and software
o Biological database structures
o Machine Learning – Theory of some methods
o Data Visualization
o Some lessons will allow to practice with some examples of the methods
presented in the course (Bash/Python 3.x scripting; data visualization).
The course consists of 48 hours of frontal lessons (24 lessons).
For the Biostatistics module, traditional classes will be held. Exercises will be done under the teacher’s guidance. The participants to the course are invited to bring a pocket calculator for some (simple) hands-on analyses.
During each lecture of the Data Science Module a topic is treated starting from the discussion of theoretical principles, illustrating some practical examples and discussing the potential implications from the applicative point of view. The lessons are supported by PowerPoint slides (also available on the e-learning platform).
The teachers are available to provide information on the course and topics by appointment via e-mail (requests must be made by email - giorgio.binelli@uninsubria.it and marco.orlando@uninsubria.it from the domain @ students.uninsubria.it).
The teachers are also available for any in-depth or clarification meetings on the topics covered.