Intern
    Data Science Chair

    Data-efficient learning model on the intelligent bridge dataset

    07.06.2023

    As the state-of-the-art machine learning methods in many fields rely on larger datasets, storing datasets and training models on them becomes significantly more expensive. In the case of sensor-based monitoring of infrastructures, big amounts of data are collected from the installed sensor networks. The goal of the master thesis is to propose a training set synthesis model for data-efficient learning, that learns to condense the 6 TB large intelligent bridge dataset into a small set of informative synthetic samples for training deep neural networks.

    The task can be formulated as a gradient matching problem between the gradients of deep neural network weights that are trained on the original data on the one hand and the synthetic data on the other hand. The model should further be investigated in neural architecture search in order to show the advantages and disadvantages of its usage in limited memory and computations.

    Mainly used methods for data-efficient learning like continual learning and active learning have two main shortcomings: (1) they rely on heuristics that does not guarantee any optimal solution for the downstream task and (2) on the presence of representative samples, which is neither guaranteed. The Dataset Distillation (DD) method goes beyond these limitations by modelling the network parameters as a function of the synthetic training data. Based on DD the Dataset Condensation (DC) method has been developed with the focus on learning to synthesize informative samples that are optimized to train neural networks for downstream tasks. DC proved to learn a small set of “condensed” synthetic samples such that a deep neural network trained on them obtains not only similar performance but also a close solution to a network trained on the large training data in the network parameter space. Nevertheless, most effort was made on image datasets. The thesis should thus transfer the knowledge of DC and comparable methods to the field of intelligent bridge time-series dataset.

    Supervisor:  Melanie Schaller

    Zurück