SCRIPTING AND PROGRAMMING LABORATORY FOR DATA ANALYSIS

Degree course: 
Corso di Second cycle degree in PHYSICS
Academic year when starting the degree: 
2023/2024
Year: 
1
Academic year in which the course will be held: 
2023/2024
Course type: 
Compulsory subjects, characteristic of the class
Credits: 
6
Period: 
Second semester
Standard lectures hours: 
66
Detail of lecture’s hours: 
Lesson (66 hours)
Requirements: 

The following prerequisites are assumed:
- Knowledge and use of a programming language at an intermediate level for general programming
- Ability to perform basic data analysis and visualization in any programming language
- Basic usage of a command line interface

If any of the prerequisite is missing, contact the teacher in advance to discuss a workaround.

Final Examination: 
Orale

The exam mark will be based on the successful completion of homeworks assigned during the course, and on an oral exam. Specifically:
- Quick, mandatory homeworks will be assigned during the semester. Successful completion is mandatory to access the exam
- At the end of the course, the students will choose a data analysis project to develop individually or in small groups. They will prepare a short report and a presentation (15-30 min max, based on slides, a Jupyter notebook or other means) which will be discussed with the teacher

Assessment: 
Voto Finale

The course aims to provide the students with advanced skills for the analysis of physical data using the Python programming language.
In the first part of the course, the students will learn to use the Python programming language for general programming purposes, as well as to use some of the most ubiquitous Python libraries to efficiently treat multidimensional numerical data, perform basic scientific data processing, visualize data, treat tabular data. In the second part of the course, the students will learn how to apply advanced techniques of data analysis both by developing example codes in pure Python and by using well-established libraries to efficiently run data analysis.

After the course, the students will be able to:
- Know the basics of the Python programming language and apply it to write scripts for a variety of data analysis tasks
- Install and manage a Python distribution on a computer, including installation of Python libraries
- Perform data manipulation operations, including import/export from data files, obtain data from online sources, preprocess data
- Create presentation-quality graphical visualizations of data
- Use Jupyter notebooks for interactive and descriptive data analysis
- Use Python libraries for efficient and advanced data treatment and analysis
- Implement procedures for reproducible analysis, including file versioning, unit testing and using Python virtual environments

The advanced topics covered in the second part of the course will be partly selected based on the interests of the students. Depending on which topics will be covered, the students will be able to:
- Control instrumentation and automatize data acquisition
- Realize web-based graphical user interfaces (GUIs)
- Develop Python code collaboratively through online services
- Realize advanced and interactive data visualizations
- Use advanced features of the Python language such as decorators, iterators, generators, module creations
- Use machine learning/artificial intelligence for data analysis
- Perform digital signal processing

Throughout the course, the following topics will be covered:
- Python introduction, programming language properties, declaring variables, control flow, basic data types (int, float, complex, bool, string) and data types conversions, operators, some built-in functions
- Structured data types (lists, tuples, dictionaries, sets) and their methods, shallow and deep copy, defining functions (including lambda), variables scoping, a basic introduction on modules and introspection
- Files and directories handling
- The Numpy module: ndarrays, random numbers, input/output
- The matplotlib module for data plotting
- The scipy module: integration, optimisation, fast Fourier transform, linear algebra
- The pandas module: tabular data management
- Use of Jupyter notebooks
- Management of a Python virtual environment for reproducible results

Additional topics could be covered depending on the needs and interests of the students.

Throughout the course, the following topics will be covered:
- Python introduction, programming language properties, declaring variables, control flow, basic data types (int, float, complex, bool, string) and data types conversions, operators, some built-in functions
- Structured data types (lists, tuples, dictionaries, sets) and their methods, shallow and deep copy, defining functions (including lambda), variables scoping, a basic introduction on modules and introspection
- Files and directories handling
- The Numpy module: ndarrays, random numbers, input/output
- The matplotlib module for data plotting
- The scipy module: integration, optimisation, fast Fourier transform, linear algebra
- The pandas module: tabular data management
- Use of Jupyter notebooks
- Management of a Python virtual environment for reproducible results

Additional topics could be covered depending on the needs and interests of the students.

The course will privilege a hands-on experience, paired with different learning techniques:
- Use of Jupyter notebooks, either locally or online, to take notes and test new concepts
- Individual in-classroom exercises followed by group discussion
- Group in-classroom exercises followed by group discussion
- Homeworks ranging from simple, quick exercises aimed at consolidating the concepts presented during lectures, to small projects aimed at developing programming and data analysis ability

All the material used during the lectures (presentations, scripts, etc.) will be made available to the students trough the e-learning platform.