SEMINAR IN MACHINE LEARNING AND BIG DATA ANALYSIS

Degree course: 
Corso di Second cycle degree in GLOBAL ENTREPRENEURSHIP ECONOMICS AND MANAGEMENT
Academic year when starting the degree: 
2021/2022
Year: 
1
Academic year in which the course will be held: 
2021/2022
Course type: 
Compulsory subjects, characteristic of the class
Seat of the course: 
Varese - Università degli Studi dell'Insubria
Language: 
English
Credits: 
6
Period: 
Second semester
Standard lectures hours: 
40
Detail of lecture’s hours: 
Lesson (40 hours)
Requirements: 

Intermediate knowledge of econometrics, statistics and linear algebra.

Final Examination: 
Orale

The final assessment will be based for 50% on a take-home written report (15-20 pages) in which students will be asked to apply on real data examples (like a sort of kaggle-type competition) the techniques learned during the lectures. The other 50% of the final mark will based on an oral presentation (remotely via MS-Teams) of the take home/report with questions and answers on the topics covered during the lectures.

No partial exams will be held for this course.

Assessment: 
Voto Finale

Machine learning (ML) is a branch of Artificial Intelligence (AI) that was originally developed to enable computers to emulate human cognition and learn from training examples to predict future events. Today, ML techniques include a number of advanced statistical methods for regression and classification applied in a wide variety of fields (including medical diagnostics, credit card fraud detection, face and speech recognition and analysis of the stock market) where the main goal is to directly predict the dependent variable of interest, without focusing on the underlying relationships between the explanatory variables.
The statistical methods developed in the ML literature (also known as Statistical Learning methods) have been particularly successful in “Big Data” settings, where we have either information on a large number of units, or many pieces of information on each unit (or both).
The aim of this course is to present Machine Learning Techniques using an econometric perspective.
In particular, during this course, students will learn the various concepts and techniques intensively used in the Machine Learning literature such as random trees, random forests, boosting, neural networks and deep learning, and their natural extensions to time series analysis and causal inference, with the complement of many practical examples.
By the end of this course students are expected to be able to master and implement most of these techniques on real data problems using the statistical software R.

The course is organized in 10 lectures (of about 4h each) and will cover the following topics.

1 Introduction, basic concepts and definitions. History and Foundations of Econometric and Machine Learning Models (4 hours).
2 Statistical Learning: Supervised Versus Unsupervised Learning, Regression Versus Classification Problems. Assessing Model Accuracy: Quality of Fit measures and the Bias-Variance Trade-Off. Introduction to R (4 hours).
3 Linear Regression Models: Refresh, Extensions and Potential Problems. Examples of Linear Methods for Regression in R (4 hours).
4 Classification: Linear Methods, Logistic Regression, Linear Discriminant Analysis. Comparison of Classification Methods in R (4 hours).
5 Model Validation and Selection: Cross-Validation and Bootstrap techniques. Shrinkage Methods: Ridge Regression and LASSO. Dimension Reduction Methods: PCA and PLS (4 hours).
6 Non-linear models: Polynomial Regression, Regression and Smoothing Splines, Local Kernel Estimation, Generalized Additive Models. Examples: Non-linear Modeling with R (4 hours).
7 Tree-Based Methods: Regression and Classification Trees, Bagging, Random Forests, Boosting. Lab: Decision Trees in R. (4 hours).
8 Neural Networks and Deep Learning: Model selection and examples with R (4 hours).
9 Support Vector Machines and Flexible Discriminants: Comparison with Logistic Regression and Practical Examples (4 hours).
10 Extension of Machine Learning Techniques to Time Series and Causal Inference. Group “Hands-On Regression and Classification” using real data (4 hours).

• James G., Witten D., Hastie T., Tibshirani R. (2014) An Introduction to Statistical Learning: with Applications in R, Springer Publishing Company, Incorporated, (updated version 2021)
• Boehmke B., Greenwell B.M. (2020). Hands-On Machine Learning with R. 1st edition. Chapman and Hall/CRC (selected parts).
• Slides and additional readings that will be provided during the course at the end of each lecture and uploaded on the e-learning platform.

Convenzionale

Frontal Lectures in class coupled with practical examples and group “Hands-On” sessions with real data applications.

The syllabus can be subject to modifications and changes during the course. Please check periodically the course page on e-learning for possible changes and communications by the instructor.

Office hours for students: normally on Thursday morning: 10:30-12.30.
Department of Economics, Monte Generoso building, first floor, room 25).

For organizational reasons, students must always send an email (andrea.vezzulli@uninsubria.it) in advance for scheduling a meeting with the instructor (either within or outside the office hours indicated above)

For being updated on possible changes in the office hours please check also the instructor's webpage: https://www.uninsubria.it/hpp/andrea.vezzulli

Professors