XAI Evaluation Datasets

Boolean XAI evaluation datasets

In our paper "Evaluation of post-hoc XAI approaches through synthetic tabular data", we introduce an evaluation setting using synthetic data, in order to investigate which explainable aritificial intelligence (XAI) approaches correctly explain the decision making of deep neural networks that solve basic Boolean functions. Finding that providing explanations on datasets proves no trivial task for the investigated XAI approaches, we publish the generated synthetic data as benchmark datasets.

The datasets contain 12 Boolean features with every possible permutation being included exactly once in the dataset, resulting in 2*12=4096 data samples in each dataset. The labels 'y' are generated as described in the paper, with the first columns being used for label calculation (i.e. for the XOR dataset the label is calculated by y = f1 XOR f2 ). The 'explanation' column contains the relevant feature columns for each data sample, according to the definition given in the paper.

You can download the datasets using this link.

Publication

Evaluation of post-hoc XAI approaches through synthetic tabular data. Tritscher, Julian; Ring, Markus; Schlör, Daniel; Hettinger, Lena; Hotho, Andreas. In International Symposium on Methodologies for Intelligent Systems. Springer, 2020.
- [ Abstract ]
- [ BibTeX ]
Evaluating the explanations given by post-hoc XAI approaches on tabular data is a challenging prospect, since the subjective judgement of explanations of tabular relations is non trivial in contrast to e.g. the judgement of image heatmap explanations. In order to quantify XAI performance on categorical tabular data, where feature relationships can often be described by Boolean functions, we propose an evaluation setting through generation of synthetic datasets. To create gold standard explanations, we present a definition of feature relevance in Boolean functions. In the proposed setting we evaluate eight state-of-the-art XAI approaches and gain novel insights into XAI performance on categorical tabular data. We find that the investigated approaches often fail to faithfully explain even basic relationships within categorical data.

@article{tritscher2020evaluation, abstract = {Evaluating the explanations given by post-hoc XAI approaches on tabular data is a challenging prospect, since the subjective judgement of explanations of tabular relations is non trivial in contrast to e.g. the judgement of image heatmap explanations. In order to quantify XAI performance on categorical tabular data, where feature relationships can often be described by Boolean functions, we propose an evaluation setting through generation of synthetic datasets. To create gold standard explanations, we present a definition of feature relevance in Boolean functions. In the proposed setting we evaluate eight state-of-the-art XAI approaches and gain novel insights into XAI performance on categorical tabular data. We find that the investigated approaches often fail to faithfully explain even basic relationships within categorical data.}, author = {Tritscher, Julian and Ring, Markus and Schlör, Daniel and Hettinger, Lena and Hotho, Andreas}, booktitle = {International Symposium on Methodologies for Intelligent Systems}, keywords = {xai}, organization = {Springer}, title = {Evaluation of post-hoc XAI approaches through synthetic tabular data}, year = 2020 }

Hubland Nord

Bildnachweise