Data Science Chair

    ERP Fraud Data

    Open ERP System Data For Occupational Fraud Detection

    Occupational fraud is defined as abusing one's occupation through the deliberate abuse of an employing organization's assets, and it is estimated that companies lose 5% of their revenue to occupational fraud each year.

    In our research project DeepScan, we aim to develop approaches to automatically detect this type of fraud in data recorded by Enterprise Ressource Planning (ERP) systems, that track large amounts of information of company operation. Since ERP system data is guarded by companies due to privacy and trade secrecy concerns, publicly available ERP system data is an important step for enabling reproducible and incremental progress in this domain.

    In our work, we propose a data generation strategy that is able to generate synthetic ERP system data free of privacy and trade secret concerns through an existing serious game, ERPsim. We additionally describe different occupational fraud cases and commit them during data generation.

    Here, we provide the data generated in five different runs of the ERPsim simulation. We offer both raw data and aggregated datasets that are ready to use for fraud detection algorithms such as machine learning approaches.

    ERP fraud detection ERPsim dataset: Download (190MB)

    The paper can be found here: Link (arxiv)