Deutsch Intern
    Data Science Chair

    Our paper "Occupational Fraud Detection through Agent-based Data Generation" was accepted at the MIDAS workshop at ECML-PKDD 2023

    07/17/2023

    We design a multi-agent simulation that can produce company data that also contains hidden fraud cases.

    In this work, we proposa a multi-agent simulation for generating company data that also includes frauds. The data is then used to set important hyperparameters of machine learning approaches without the need of expensive expert labels. We will present our work at the MIDAS workshop that coincides with the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD 2023).

    Abstract

    Occupational fraud is an increasing concern for enterprises that is estimated to cause losses of around 5\% of company revenue each year. With the increasing data tracked by companies through enterprise resource planning systems, recent research has taken interest in the automated detection of occupational fraud. Automated detection is however hindered by the unavailability of labeled fraud cases which require known occupational frauds within company data and costly expert annotation. Even despite the existence of anomaly detection methods that can be trained on unsupervised data, selecting the ideal preprocessing techniques, the most suitable model, and the optimal hyperparameters necessitates the availability of labeled data for evaluation purposes. To alleviate this issue, we propose to use simulation through multi-agent systems for generating business processes according to best practices from economics and creating labeled synthetic data that closely matches a given unlabeled real-world dataset. We extend an existing simulation by incorporating functionality for including, tracking and automatic labeling of occupational fraud cases. Using this simulation, we propose a framework that decides on important design choices for fraud detection models in enterprise resource planning data and does not require labeled real-world data. We demonstrate in multiple experiments that the framework can aid automated occupational fraud detection through data generation.

    Back