DATA MINING

Degree course: 
Corso di Second cycle degree in COMPUTER SCIENCE
Academic year when starting the degree: 
2022/2023
Year: 
1
Academic year in which the course will be held: 
2022/2023
Course type: 
Compulsory subjects, characteristic of the class
Language: 
English
Credits: 
6
Period: 
Second semester
Standard lectures hours: 
48
Detail of lecture’s hours: 
Lesson (48 hours)
Requirements: 

Basic contents of the Intelligent Systems course delivered in the first year of the master's degree course.
The course is recommended for those who have knowledge of at least one programming or scripting language.
It is advisable to get a laptop (Windows, Mac or Linux) capable of executing the Python interpreter.

Final Examination: 
Orale

The exam consists of a theoretical part consisting of a set of questions and a project on one or more topics addressed in class.
The theoretical test consists of a set of questions presented to the student through the Moodle platform and is used to understand the degree of knowledge that the student has acquired on the topics covered in class.
The project is proposed by the teacher. In the project, students must implement
- existing models applied to datasets where this model has never been applied and in this case, they must try to overcome the state of the art on the new dataset;
- improvements with new ideas of an existing model that allows obtaining similar or better results on the datasets where the original model has already been applied.
The project must be accompanied by a report that introduces the problem, analyzes the existing literature, describes the project details, and shows and comments on the results obtained and the conclusions.
The grade of the theoretical test and the grade of the project are averaged together in order to obtain a base score that could increase slightly at the discretion of the teacher if the project presents some original ideas or particularly interesting results. The test is passed if the final grade is greater than or equal to 18/30

Assessment: 
Voto Finale

The term Data Mining refers to a set of techniques and tools used to explore large amounts of data, with the aim of identifying/extracting significant information/knowledge, in order to make them available to decision-making processes.
This course aims to provide the fundamentals of the discipline and then focus on some Data Mining techniques of current application/industrial interest with a particular focus on Deep Learning.
The course combines theoretical knowledge with the use of open-source software and the Python language.
The course participants, through the tools provided by Python, will learn to preprocess data (mainly images and text) and extract new information from the data by applying some models analyzed in the course.
In summary, the educational objectives of the teaching and the expected learning outcomes are the following:
1) Acquire knowledge and ability to use the Python language and libraries typical of Data Mining, in order to process datasets and execute machine learning algorithms.
2) Acquire basic knowledge for data pre-processing. At the end of the course, the student will be able to deal with the main problems relating to the data and, independently, face a real problem in the best possible way.
3) Understanding of some machine learning algorithms of current application/industrial interest, with particular focus on some successful models of Deep Learning, supervised and not, in order to extract information from data.
4) Analyze some real problems in which the models studied in the course can produce results that in many cases represent the state of the art.
In addition to the training objectives described above, the course aims to provide transversal skills, such as the critical attitude of students in evaluating the solutions obtained or proposed by third parties, the ability to independently learn new machine learning approaches for the analysis of data, the ability to analyze existing literature and the ability to use a scientific language to communicate the results obtained.

The lessons will address the following topics:
Languages, libraries, and tools for Data Mining (8 h, learning objective 1)
- IPython: intro, help, magic commands, debug.
- The Python language: introduction, data types, basic elements of the language.
- Libraries for data mining: NumPy as a data structure, Pandas for data manipulation, MatPlotLib for data visualization, Sci-Kit learn for the use of Machine Learning algorithms, Pytorch to implement the neural models of Deep Learning.
Data pre-processing (8 h, learning objective 2)
- Solutions for missing data management
- Techniques for the management of unbalanced datasets
- Normalization
- Bigrams, Stemming, Lemmatizing
- Data augmentation
- Word embedding: One Hot Encoding, TF-IDF, Word2Vec, GloVe, BERT
Machine learning algorithms (16 h, learning objective 3)
- Machine Learning and Deep Learning
- Deep neural networks
-- Convolutional Neural Networks (CNN)
-- Recurring networks (RNN)
-- AutoEncoders (AE)
-- Generative models (GAN)
-- Transformers (BERT)
-- Graph neural networks (GNN)
-- Low-shot learning
-- Transfer learning
-- Fine-grained learning
Application problems (16 h, learning objective 4)
- Action recognition
- Detection and localization
- Segmentation
- Faces recognition
- Image retrieval
- Motion and tracking
- Document understanding
- Summarization
- Question answering.

Convenzionale

48 hours of frontal lessons.
The hours of lectures are carried out in the classroom, alternating theoretical moments with practical exercises.
The analytical software used will be Python, an open-source platform that can be freely downloaded from the web.
IPython, an open source, browser-based tool that allows students to create/edit documents that contain code, views and text, will be used during lectures.
During the course, additional analytical packages will be downloaded and installed, necessary for the various topics discussed during lectures.
In the classroom, continuous assistance is provided by the teacher.

The teacher receives students by appointment, upon request sent by by e-mail to name.surname@uninsubria.it. The teacher responds only to e-mails signed and coming from the students.uninsubria.it domain.

Professors

Borrowers