The Competition

Welcome to our competition! Before you delve deeper into it, here is an overview:

Type of learning task:

The task is designed to be a supervised/semi-supervised learning task (but we do not impose any restriction on the type of learning), and more particularly a multi-target probabilitstic regression problem.


Each training datapoint contains an atmospheric spectrum with 52 data points (stored in SpectralData.hdf5). Each data point is a combination of an intensity measure (transit depth), its associated wavelenght of light, the size of the wavelength bin (i.e. the spectral resolution) and corresponding observed measurement uncertainty. Additionally, each example will have 9 auxilliary stellar and planetary parameters. For more details on what a spectroscopic observation is and what we are modelling, please go to the Science page.

See also the Data Formats section for a detailed description of the data.


The goal depends on which track you decided to participate (we welcome participation to both tracks!):

For any given test datapoints, you are asked to predict the same set of atmosphertic properties given the observed spectrum (Temperature and the log-abundance of five atmospheric gases H2O, CO2, CH4, CO and NH3):

  • Light Track: Predict the 16th, 50th and 84th percentiles for the atmosphertic property ranges

  • Regular Track: Predict the Conditional joint distribution (or the Bayesan Posterior Distribution) of the 6 atmospheric targets (same as above). We require only the traces of the distribution, not the analytical form of the distbribution.

See the Data Formats section of the Documentation page for a detailed description.


The models will be evaluated on a separate test set. These will be provided to you and you are expected to upload your model’s prediction on them. The ground truth for the test set examples will be available after the end of the competition.

See the Scoring System section of the Documentation page for a detailed description.

Restrictions on models:

There is no restriction on the models, algorithms or data preprocessing techniques, neither on the programming languages, environments or tools used for your implementation. You are also free to use data augmentation techniques, pretrained models or any prior domain knowledge not included in the provided dataset. Finally, you are free to choose your own way of splitting the training data between training and validation sets and to use as many of the provided datapoints or features as you wish – or can handle.

Closing date:

The competition will close on the 10th of October 2022 with the winners announced two week later.

Why should I bother:

Besides the joy of helping advance science, making detecting distant worlds and deciphering their atmospheres easier and solving a challenging applied data science problem, the prize for the two top-ranked solutions is free registration to NeurIPS 2022 and a $2000 prize for regular-track and $1000 prize for light-track. Second prizes for both tracks are $500. The winners will also be invited to the next Ariel Conference and invited participate in further collaboration in solving the more general problem of atmospheric retrieval.