Intern
    Data Science Chair

    Bib100

    The Bib100 Dataset

    Overview

    The Bib100 Evaluation Dataset contains 100 pairs of English words along with human-assigned relatedness judgments. It can be used for training and testing of semantic relatedness measures.

    Description

    The 100 pairs are composed of 122 English words and were collected from the top 3000 tags of the social tagging system BibSonomy.

    The relatedness scores were collected from 26 test subjects. Each test subject was shown all word pairs from this dataset and had to judge the relatedness on a scale of 0 (unrelated) to 10 (synonymous).

    All scores were collected from native English speakers, using the crowdsourcing platform MicroWorkers.

    Download

    The data are available at

    Bib100 dataset

    (4,3 kB)

    For any questions, refer to Thomas Niebler.