- General
- Aims and Objectives
- Syllabus
- Tutorials
- Computing and Tools
- Marking Scheme
- Reading Assignments
- Homework
- Lecture Notes
- Past Exam Papers
Welcome to the Statistical Data Analysis (PHY-328) Home Page
The Module Organiser is Dr Adrian Bevan, and the Deputy Module Organiser is Prof. Steve Lloyd.
Office Hours: Wednesdays 12:00-13:00.
Recommended books:
- Statistical data analysis for the physical sciences (Cambridge University Press) by A. Bevan. ISBN-10 1107670349
Background Reading:
Detailed lecture notes are provided to support lecture sessions. Students are encouraged to review the lecture notes before the lectures as preparation.
The following resources may be useful for students wishing to delve deeper into the subject.- Statistics: A Guide to the Use of Statistical Methods in the Physical Sciences (Manchester Physics Series) by R. Barlow. ISBN-10: 0471922951
- Statistical Data Analysis (Oxford science publications) by G. Cowan. ISBN-10: 0198501552
- Statistical Methods in Experimental Physics (World Scientific) by F. James. ISBN-10: 9812705279
- BaBar Analysis School Lectures on Multivariate Analysis, SLAC National Laboratory, Stanford, California, USA (2009) by A. Bevan notes and examples are available on-line at http://pprc.qmul.ac.uk/~bevan/BAS/.
Please note that there are a number of reading assignments that will also help you develop an appreciation of the subject.
Aims and Objectives
A student passing this course should be suitably equipped to appreciate the meaning of the word measurement in a scientific context, and understand how to translate raw data into a robust measurement, or to otherwise interpret the data with reference to a given hypothesis. They will be prepared to use data analysis techniques in future research, either in a project assignment, industry or future graduate studies. The material learned through this module could also benefit in a non-physics environment that used similar techniques such as financial modelling or industrial research.
Syllabus
This course will review basic metrics and techniques used to describe ensembles of data such as averages, variances, standard deviation, errors and error propagation. These will be extended to treat multi-dimensional problems and circumstances where observables are correlated with one another. The Binomial, Poisson, and Gaussian distributions will be discussed, with emphasis on physical interpretation in terms of events. Concepts of probability, confidence intervals, limits, hypothesis testing will be developed. Optimisation techniques will be introduced including chi^2 minimisation and maximum-likelihood techniques. A number of multivariate analysers (sample discriminants) will be discussed in the context of data mining. These will include Fisher discriminants, multi-layer perceptron based artificial neural networks, decision trees and genetic algorithms.
Tutorials
There are a number of tutorial sessions that will be run during the course. These are aimed at providing time for students to work through problems in a supervised way, and get support should they have any questions. The tutorial question sheets can be obtained using the links below. Solutions to tutorials are also provided as an additional learning resource, and are not a substitute to turning up to tutorials.
- R tutorial notes
- Tutorial Exercise Sheet 1 (Solutions)
- Tutorial Exercise Sheet 2 (Solutions)
- Tutorial Exercise Sheet 3 (Solutions)
- Tutorial Exercise Sheet 4 (Solutions)
- Tutorial Exercise Sheet 5 (Solutions)
- Tutorial Exercise Sheet 6 (Solutions)
- Tutorial Exercise Sheet 7 (Solutions)
- Tutorial Exercise Sheet 8 (Solutions)
- Tutorial Exercise Sheet 9 (Solutions)
These tutorials are aimed at preparing students for the homework assignments and for the end of year exam. Detailed solutions are provided as a learning aid to help students understand the process of recognising the techniques required to solve a given problem. If you have questions about how to approach the problem, or want to talk in more detail about the solution you are encouraged to talk with the module organiser, or one of the graduate student demonstrators either during a tutorial, or during office hours.
Tools and Computing
The problems encountered in this course do not require the use of computers, however if one so chooses, some problems lend themselves to the use of a spreadsheet or a calculator. Note: If you are using software to help you solve homework problems, then any electronic form used should be provided (in printed form) along with your written solutions as evidence for the calculation.
If you are interested in using computer programmes to aid statistical data analysis for independent projects and are seeking advice here, please note that some basic functionality exists in Mathematica. For students working on particle physics based projects: you may find the ROOT data analysis toolkit is appropriate for your needs, and should seek advice from your project supervisor. One would require a basic comprehension of C++ in order to use ROOT.
Several of undergraduate projects use R in order to process data. This is an open source language developed to facilitate statistical data processing, and a brief tutorial on the use of R is provided as part of the course for students who wish to learn how to use R at a greater depth.
Marking Scheme
Course Assessment: 20% of the marks for this course will come from homework assignments, the remaining 80% will come from an exam. You are encouraged to work through past papers in order to test your understanding of course material. Please note that any instance of plagiarism will be dealt with in accordance with college regulations, and you will find more information on this in the student handbook.
Reading Assignments
This course includes discussion sessions based on scientific articles produced over the past century. The aim of these exercises is to expose students to a condensed set of information, where they are encouraged to understand the information from the perspective of the methods and techniques used, and how conclusions are drawn in different circumstances. These discussions are useful in helping one develop a deeper understanding of the taught material, and how different aspects of statistical data analysis can come together in the real world. The following articles are used as examples in this course. They are available for download from the preprint archive or journals that College subscribes to directly. You should be able to obtain any of the following journal papers for free from a College computer, and in accordance with APS copyright regulations, hard copies will be distributed to you in class.
- Note on the nature of cosmic-ray particles: Phys. Rev. 51, 884–886 (1937).
- Measurement of the neutrino velocity with the OPERA detector in the CNGS beam: arXiv:1109.4897.
- Evidence for Oscillation of atmospheric neutrinos: Phys. Rev. Lett. 81, 1562–1567 (1998).
Homework
The reading assignments constitute an informal set of homework. Students will be required to read and discuss a number of papers in the context of data analysis techniques used.
In addition to the reading assignments there are traditional homework exercises that students will be expected to work through. These can be obtained below:
- Homeworks (and solutions) to be posted.
The homework problem assignments are issued on Fridays and are due in on Monday afternoon nine days after the issue date.
Please note that detailed solutions are provided as part of the feedback system for this course. If you don't get the correct solution for a problem, and it is not clear from the marked script how to get to the correct solution you are expected to read through the solutions (once they are made available) to piece together your understanding. If you have any questions about the solutions once you've read them please consult the module organiser.
Lecture notes
The lecture notes for this course are provides in logical units based on taught material. Please select the topics of interest from the following list, and if there are particular areas of the course that you would like to find early on (for example if you need to understand something for an independent project you are doing) please raise this with the course MO so that they can assist you.
- Sets
- Probability
- Visualising and quantifying data
- Useful distributions
- Uncertainty and errors
- Confidence intervals
- Hypothesis testing
- Fitting
- Multivariate analysis techniques (please note that there is also a short lecture course that may be of interest as ancillary learning material)
- Appendices (appendices are available on "useful" pdfs, numerical integration, and reference tables. A glossary of terms is included.)
- Revision Lecture notes: Bayes Theorem, Weighted Averages, Poisson limits, and constraining model parameters
Please note that the Q-Review lecture capture system will be used in order to provide you with a video record of lectures so that you can review material fully, or simply catch up on any material that you may have missed during the course. These will be accessable via QM Plus.
Past Exam Papers
Past Exam papers should be available from the library. You can also download the following example papers to work through.
Note: this course is in its third year, and all available past papers have been posted on this web page.