RESEARCH

A.I. and Astrophysics


Creating A Virtual Research Assistant for the New Generation of Sky Surveys

Schmidt A.I. in Science Fellowship

Over the last decade humanity has developed an unprecedented capability to monitor the night sky and track its changing features: from Near-Earth Asteroids to Supernovae and other stellar explosions, the amount of transient events detected every night has already surpassed our ability to properly follow-up these events.

Explosive transients are particularly important to chategorise because the various “flavours” they come in can inform different areas of Astrophysics, from the expansion of the Universe to the origin of the elements we need to create planets like the Earth.

My Schmidt A.I. Fellowship will address this need by create a Virtual Research Assistant (V.R.A) that will help humans classify new events night after night. Working in partnership with Professor Stephen Smartt (Astrophysics) and Professor Stephen Roberts (Machine Learning), I aim to create a first working prototype of the V.R.A. before the new Vera Rubin Observatory opens its dome in 2024.

Fitting Supernova Light-Curves with Gaussian Processes

This is a summary of the take-home points from my recent study on Gaussian Processes as applied to supernova light-curves to derive physical properties of these light-curves and make inference about their progenitors. If you want to check the full paper click the button below.

What Are Gaussian Processes (in the context of supernovae)?

Gaussian Processes (GP) are a powerful machine learning method which can approximate a function of unknown form with a multivariate Gaussian distribution. In the context of fitting supernova (SN) light-curves, the function underlying our data has one independent variable (time), and one dependent variable (magnitude). Then the approximation of the observed magnitudes (m) given our observation dates (t) made by the GP model can be written as a multivariate (N dimensions) Gaussian distribution with mean function mu(t) and covariance matrix K.

In essence, each data point is its own Gaussian, and training the GP model is understanding how they all relate to each other by optimizing mu(t) and K to best fit the data. Subsequently the optimised model can be used to infer new magnitude values for our choice of dates. This makes GP a powerful fitting and re-sampling method on two accounts: firstly we do not need a physical or analytical model; secondly it naturally provides uncertainties on the inferred data points.

What are the important caveats when it comes to supernova Light-curves?

When training your GP model on a light-curve one of the most important decision you will make is chose a type of covariance kernel, that is a type of function that will relate any two points in our data to each other. There are a few “basic” kernels (as in, often-used) such as the Radial Basis Function (RBF) kernel, the Matern 3/2 kernel (used a bunch on SN data), periodic kernels (e.g. you have an eclipsing binary), or the White Noise and Bias kernels which can be useful if you want to model your noise as well as your signal. One of the key parameters that will be optimised during training is the length scale of these kernel, which essentially reflects how quickly our model expects to see large variations (or “squiggles”) in the data.

The problem? These basic kernels work under the assumption the length scale of the data is the same across the whole time series. They assume that significant changes in our time series will occur on the same sort of time-scale across the board. But we know that is not the case for SN light-curves: Stripped core collapse supernovae and type Ia SNe have an early rise to peak and initial decay that occurs on much shorter timescales than the later tail. Type IIP SNe (some of them) show a very steady plateau, then a plunge once hydrogen recombines, then a slow decay tail. That behaviour is at odds with the expectations of common kernels.

What are the consequences? In general what will happen is that your GP model will fit the early lightcurve well and then overfit the later times (speaking of non-plateau SNe here) because it expects variations on shorter scales than the physics provides and it will end up fitting the noise (or just wiggling for apparently no reason).

Can you use GPs to fit SN lightcurves? You can. Whether or not your should is a different matter. When it comes to the basic kernels (which are easy to implement with the publicly available python GP libraries) you need to take a grain of salt to what you are doing: if you are just focusing on the early light-curve, the data is well sampled, then you’ll get a good result. But in this case I would ask why you’re even using a GP in the first place - is there not a simpler way to achieve your fitting/interpolation? As a rule of thumb, use the simplest model that works. Adding complexity (as-in choosing a more complex model) that does not add value or information is counter-productive, and it does not make your fits more believable. Another interesting avenue would be to use or create a non-stationary kernel, one that does NOT assume that your data evolves on the same sort of time scale across your whole time series. I have not ventured that way because of time reasons but I would encourage other astronomers to look into that if they’re interested. Maybe find a statistician in your department and bribe them with cake and coffee.

Beware the kernel choice. Even amongst the basic kernels, the choice of kernel and how you combine them (yes you can combine them!) will make the biggest difference in terms of how well your model fits (and overfits) your data. I won’t expand in this summary but there is a lot more discussion on this topic in the paper.

Beware the implementation. One of the other key take-aways from this study is that the kernels you chose won’t necessarily lead to the same results in different python implementations of GP fitting. These libraries are extremely valuable tools, and very user friendly. As such they are black-boxy, and as astronomers we, for the most part, lack the stats background to fully understand what is going on under the hood unless we spend hours and hours reading about it (which we often don’t have time to do). So always maintain a degree of skepticism towards your train machine learning algorithm (GP or else) and testing different libraries to see what results come out of both can be interesting and informative. Also if you find someone’s set up or choices don’t work for you it could be because of the libraries you have chosen to use.

Resources

GP libraries used in this work: scikit-learn | GPy

Extra Reading:

Explore kernels and get some intuition for what they do [https://peterroelants.github.io/posts/gaussian-process-kernels/]

Extended introduction to GPs aimed at astronomers [Chapter 3.4 of McAllister (2017)]

In depth statistical view of GPs [textbook by Rasmussen & Williams (2006)]