Institute of Astronomy


Lent Term 2014

Sergey Koposov -  Statistical Techniques in Python 

We will discuss some basic and advanced statistical techniques, tools and their implementation in Python using the existing packages. We will cover:

  • Introduction into Python
  • Maximum likelihood fitting and error determination
  • Markov chain Monte-Carlo
  • Hierarchical modeling and hypothesis testing
  • Gaussian mixture modeling
  • Classification

The lecture slides and the python codes used in the lectures are avalable here.

 Lecture 1

  • Introduction into Python
  • Python datatypes
  • Useful python packages and tools

Lecture 2

  • Introduction of the Bayesian framework
  • Maximum likelihood as an approximation of the full Bayesian analysis
  • Measuring errors from the Maximum Likelihood fits
  • Practical implementation of the ML fititng in Python
  • Different ML optimizers in Python
  • Fitting data with outliers

Lecture 3

  • Markov Chain Monte-Carlo methods
  • Metropolis-Hastings algorithm
  • Python packages for performing MCMC analysis

Lecture 4

  • Mixture models
  • Gaussian Mixtures
  • Model selection (AIC, cross-validation)
  • Classification using mixture models



Mike Irwin - AstroStats

Lecture Slides

Lecture 1

  • Some illustrative "simple" problems
  • Foundations of probability theory and statistical techniques;
    • some issues when combining variables
    • Central Limit theorem to the rescue
    • noise, covariance, multivariate Gaussians and Rayleigh distribution
    • aliasing and Shannon sampling
  • Introduction to Bayes' theorem
    • as a classifier illustrated with DNA example
    • as a way to use prior information to estimate future odds 
  • Bayesian view of model selection

Lecture 2

  • Bayesian application to source identification likelihoods
  • Introducing Maximum Likelihood estimators
    •     using optimal detection of signals and cross-correlation as one example
    •     and estimating velocity dispersions in clusters/galaxies as another
  • Maximum Likelihood estimators in general
    • some numerical considerations


Comparison of cross-correlation and likelihood methods on a real CaT region spectrum using idealised templates and model atmosphere spectra illustrating practicalities and how to estimate radial velocities with realistic errors,the effect of correlated sample noise due to e.g. rebinning the spectra

Given a set of radial velocities with errors estimate the systemic velocity and line-of-sight velocity dispersion of a dwarf galaxy and hence estimate its mass-to-light ratio, what is the impact of using priors?

Some of the "simple" problems can also be used as examples

Lecture 3

  • Maximum Likelihood estimators and Fisher information,
    • confidence intervals and morphing the parameter space
    • predicting parameter errors and fine tuning the parameter model
    • realistic error models and k-sigma clipping
  • Example of estimators
    • optimal spectral extraction
    • low count rates and Lynden-Bells C-statistic;

Lecture 4

  • Further examples of Maximum Likelihood estimators
    • sparse samples, null results and the Press-Schecter method;
    • optimal astrometry in a single pass
    • structural analysis, or curve fitting with errors on both axes;
  • Classical hypothesis testing cf. to Bayesian view
  • Multivariate analysis using PCA as an example of dimensional reduction
    [if not covered elsewhere and time permits]


Given the distances and coordinates of a population of Galactic Halo dwarf galaxy satellites, what is the best fit power law model for this distribution, what is the effect of selection bias?

Given a list of Lyman limit detections, null results etc.. for a sample of QSOs, find the best fit simple power law model of the form N(z) = No(1+z)^gamma and estimate the error on the predicted number of Lyman limit systems at redshift z=3.

PCA example using model atmosphere spectra and use in noise suppression [if time permits].

Page last updated: 3 March 2014 at 22:21