Institute of Astronomy

Data Mining the ALMA archive (Oliver Shorttle)

Project Title: Data mining the ALMA archive


Dr Oliver Shorttle, Mihkel Kama, Kaisey Mandel

Importance of the area of research concerned

The ability to obtain spatially resolved chemistry and kinematics of protoplanetary disks has revolutionised our picture of a key stage in the life of stars and planets.  Coupled with the now thousands of discovered exoplanetary systems, and the detailed record we have of our own solar system’s early history, we are now able to investigate planet formation on a range of temporal and spatial scales.  The ALMA interferometer radio telescope has been key in this transformation, providing outstanding images of protoplanetary disks. The many terabytes of extremely high spectral and spatial resolution, wide bandwidth, data produced by ALMA are a major asset for the astrophysical community. Critically, ALMA has the potential to detect many important new molecules for understanding the chemistry of disks and their planets, and important work is still to be done interrogating new and existing datasets for the presence of such molecules.

Project Summary

Most ALMA data proposals are highly specific in the questions they ask and answer. This means that the fraction of excess high-quality data generated by a single project is often large. The online data archive is getting populated with a rapidly increasing flow of high-quality data that has not been analysed to its full potential for the presence of new and important chemical species that can constrain physical and chemical conditions in disks.  A notable example of the techniques that have yet to receive wide deployment are using a) matched filtering approaches which maximise signal-to-noise by assuming prior knowledge of the velocity structure of the target (Loomis et al. 2018) and b) multi-line stacking, in particular for complex molecules which have many individually weak spectral features. While individual observations with ALMA rarely have the right spectral coverage to contain many features of an individual new species, the ALMA archive has accumulated data at many frequency settings for many objects of significant interest to the community. Tapping into this data is extremely labour-intensive when done manually. However, here the student will develop machine learning techniques to automate deep exploration of the ALMA archive.  These techniques will be of general use to astrophysical datasets and thus carry significant discovery and science-yield potential.


A background in astrophysics is desirable but not required, while experience of, or interest in, statistical methods applied to large datasets is essential.  This Ph.D. is part of the STFC "Data Science" CDT programme at the University of Cambridge. The length of the Ph.D. is four years with six months, possibly as two three-month periods, spent on a placement in industry. The placement is expected to involve data science and will almost certainly be spent with Shell.


Loomis, R.A. et al. 2018. Detecting weak spectral lines in interferometric data through matched filtering. The Astronomical Journal, 155:182. or


This studentship is open to Home and EU students only.

Application Information: 

Please follow the instructions given on the IoA's admissions page you can then apply for the PhD in Astronomy on the Graduate Admissions website

To apply for this studentship, include the text "I wish to be considered for the "Data Mining the ALMA Archive with Oliver Shorttle" in Your Statement of Interest [Course specific questions section] of the application form.

It is also possible to be considered for an STFC-funded studentship simultaneously. To be considered for both include the text "I wish to be considered for the Data Mining the ALMA Archive with Oliver Shorttle and an STFC studentship" in Your Statement of Interest [Course specific questions section] of the application form.

Application Deadline Date: 

3 January 2019

Page last updated: 26 August 2018 at 13:44