Data Science Chair


    The Bib100 Dataset


    The Bib100 Evaluation Dataset contains 100 pairs of English words along with human-assigned relatedness judgments. It can be used for training and testing of semantic relatedness measures.


    The 100 pairs are composed of 122 English words and were collected from the top 3000 tags of the social tagging system BibSonomy.

    The relatedness scores were collected from 26 test subjects. Each test subject was shown all word pairs from this dataset and had to judge the relatedness on a scale of 0 (unrelated) to 10 (synonymous).

    All scores were collected from native English speakers, using the crowdsourcing platform MicroWorkers.


    The data are available at

    Bib100 dataset

    (4,3 kB)

    For any questions, refer to Thomas Niebler.