Dataset Cards

Key Idea: What are Dataset Cards?

Hugging Face Dataset Cards are documentation cards that accompany Hugging Face NLP datasets and are used to alert users to potential biases within a given dataset to promote responsible dataset usage for ML purposes. Similar to Datasheets for Datasets, Dataset Cards also document the provenance, creation, and use of ML datasets; however, Dataset Cards are displayed through the Hugging Face interface and are embedded into the process of uploading a dataset to the Hub.

Fun Fact! The conceptualization of Dataset Cards was inspired by Model Cards proposed by Mitchell and colleagues (which we will cover in the next module!)

Explore: Dataset Card Creator

ML practitioners and dataset creators/curators can create their own dataset card through React, a JavaScript library for building user interfaces.

Explore the application and read more about dataset cards here.

Explore: SNLI Dataset Card

The Stanford Natural Language Inference (SNLI) corpus (version 1.0) is a collection of 570,000 manually labeled, person-written English sentence pairs.

Explore its dataset card on Hugging Face!