The Competition

Welcome to our competition! Before you delve deeper into it, here is an overview:

Type of learning task:

The task is designed to be a supervised/semi-supervised learning task (but we do not impose any restriction on the type of learning), and more particularly a multi-target probabilitstic regression problem.


Each training datapoint contains an atmospheric spectrum with 52 data points (stored in SpectralData.hdf5). Each data point is a combination of an intensity measure (transit depth), its associated wavelenght of light, the size of the wavelength bin (i.e. the spectral resolution) and corresponding observed measurement uncertainty. Additionally, each example will have 9 auxilliary stellar and planetary parameters. For more details on what a spectroscopic observation is and what we are modelling, please go to the Science page.

See also the Data Formats section for a detailed description of the data.


For any given test datapoints, you are asked to predict the same set of 7 atmosphertic properties given the observed spectrum (Planet Radius, Temperature and the log-abundance of five atmospheric gases H2O, CO2, CO, CH4, and NH3):

  • Task: Predict the Conditional joint distribution (or the Bayesan Posterior Distribution) of the 7 atmospheric targets (same as above). We require only the traces of the distribution, not the analytical form of the distbribution.

See the Data Formats section of the Documentation page for a detailed description.


The models will be evaluated on a separate test set. These will be provided to you and you are expected to upload your model’s prediction on them.

See the Scoring System section of the Documentation page for a detailed description.

Restrictions on models:

There is no restriction on the models, algorithms or data preprocessing techniques, neither on the programming languages, environments or tools used for your implementation. You are also free to use data augmentation techniques, pretrained models or any prior domain knowledge not included in the provided dataset. Finally, you are free to choose your own way of splitting the training data between training and validation sets and to use as many of the provided datapoints or features as you wish – or can handle.

Closing date:

The competition will close on the 18th of June 2023, see timeline for most updated information.

Why should I bother:

Besides the joy of helping advance science, making detecting distant worlds and deciphering their atmospheres easier and solving a challenging applied data science problem, the prize for the three top-ranked solutions is free registration to ECML-PKDD or the money equivalent. The winners will also be invited to the next Ariel Conference and invited participate in further collaboration in solving the more general problem of atmospheric retrieval.