Semi-Supervised Learning with Noisy Labels
05.06.2024Having annotated data sets can be very helpful, but if they are partially annotated incorrectly this can lead to major problems. How can we handle such datasets? Can we detect how bad a dataset is annotated and is there a way to use datasets whose quality is questionable?
Scope is customizable for Bachelor's and Master's theses.
In reality, it often happens that data records are partially annotated incorrectly or not at all. With semi-supervised learning methods, not all data has to be annotated. Therefore, we would like to test the behavior of semi-supervised methods on differently poorly annotated data sets and investigate whether it could be advantageous for the learning process to use only a certain proportion of the partially incorrect labels. Furthermore, we would like to investigate whether the results of our models with a variable proportion of labels can be used to draw conclusions about the quality of the data set.
Finally, the found results can be tested on real gene data, which are inherently partially mislabeled.
Supervisor: Martin Rackl