DMIR Research Group

    The Bib100 Dataset


    The Bib100 Evaluation Dataset contains 100 pairs of English words along with human-assigned relatedness judgments. It can be used for training and testing of semantic relatedness measures.


    The 100 pairs are composed of 122 English words and were collected from the top 3000 tags of the social tagging system BibSonomy.

    The relatedness scores were collected from 26 test subjects. Each test subject was shown all word pairs from this dataset and had to judge the relatedness on a scale of 0 (unrelated) to 10 (synonymous).

    All scores were collected from native English speakers, using the crowdsourcing platform MicroWorkers.


    The data are available at

    Bib100 dataset

    (4,3 kB)

    For any questions, refer to Thomas Niebler.

    Social Media

    Andreas Hotho
    DMIR Research Group
    Am Hubland
    97074 Würzburg

    Tel.: +49 931 31-86731
    Fax: +49 931 31-86732

    Suche Ansprechpartner

    Hubland Süd, Geb. M2