Deutsch Intern
    Data Science Chair

    Data Weighting for Environmental Data


    When working with global, gridded datasets, we often have data of varying quality, as e.g. the commonly used ERA5 dataset is a reanalysis of the simulated optimal state -- which induces errors.

    At our chair, we run multiple projects where we work with environmental data: we4bee, BigData@Geo2.0. When working with global, gridded datasets, we often face missing data. This could be because the observing satellite was unavailable, the data was not properly or recorded, and so on.

    What you will research in this work is ways to incorporate data quality flags into model training. For samples that have, according to the data source, a high quality because they were not covered by, e.g., clouds, the models is tuned to perform better than on “plain” data.

    The DenseLoss approach can serve as a starting point here; further re-weighting schemes are to be explored in the work.

    Supervisor: Pascal Janetzky
