SCRIPTING AND PROGRAMMING LABORATORY FOR DATA ANALYSIS
- Overview
- Assessment methods
- Learning objectives
- Contents
- Full programme
- Bibliography
- Teaching methods
- Contacts/Info
knowledge of statistics and error theory
The students will by asked to prepare an oral dissertation of ~30 minutes on one projects analyzed during the course or any other personal project. They have to show their analysis approach, pros and cons, and answer to teacher's questions about the presented project and its details.
The goal of the course is to improve students skills in data analysis in the physics environment.
After an initial introduction to the Python programming language (and its most widely used scientific libraries like Numpy, Matplotlib, scipy and Pandas), the students will be provided some real data from different fields of physics.
The students are expected to develop their own analysis strategy, while the teacher role is limited to provide them with useful hints and suggestion on how to proceed.
Firstly, they should be able to manipulate raw data and files, as provided from the experiments, in order to ease a following main analysis program.
Some scripting techniques and data handling methods will be shown but its up to them to take to take proper choices in the perspective of the final result.
After that, they have to use their programming and analysis experience (data analysis in the laboratory courses of previous years) to perform an efficient and flexible analysis. They should focus on the results and how to effectively reach them.
Finally, the are also expected to improve their skills in data visualization being able to choose the proper tool to make others focus on what is important in the performed analysis.
# The Python programming language:
- Pyton introduction, programming language properties, declaring variables, control flow, basic data types (int, float, complex, bool, string) and data types conversions, operators, some built-in functions
- Structured data types (lists, tuples, dictionaries, sets, frozen sets) and their methods, shallow and deep copy, defining functions (including lambda), variables scoping, a basic introduction on modules and introspection
- Files and directories handling: reading and writing files, relevant modules [os, sys, shutil, glob], path handling, command line arguments.
- the Numpy module: ndarrays, random numbers, input/output
- Showing data with matplotlib (Plotting lists of data, 1D and 2D histograms, scatter plots, density plots, animations).
- The scipy module: integration, optimisation, fast fourier transform, linear algebra
- introdunction to Pandas library
# Physics cases
- Introduction to Machine Learning with the sklearn package and training on databases
- Astroproject on stellar dynamics. How to read the files of an N-body simulation and analyze them
# Advanced features
- Object Oriented Programming: classes, objects, attributes and methods
- Advanced features: decorators, iterators, generators, module creations
-
- Learning Python – Mark Lutz – O’Reilly
- Python for Data Analysis – Wes McKinney - O’Reilly
The slides/Notebooks of each lesson will be given to the students.
The course alternates two types of sessions:
- lessons for the general introductory part on programming and scripting techniques aimed to the physics data analysis.
- presentation of an exercise (for the introductory part of the course) or a physics case after which the studens have time to code and discuss with the teacher or the other students about their choices and eventually results.
The lessons are mostly provided by means of slides and Jupyter Notebooks, an interactive tool that can be run in a browser page on all the main operative systems.
To meet and discuss results and reports, write an email to the teacher.