Mini Section:

Data Statements for NLP

Key Idea: What are Data Statement for NLP?

Data Statements for Natural Language Processing (NLP) are documented characterizations of a dataset that offer context to better understand generalizability of experimental NLP results, appropriate deployment of software, and what biases might be present in systems built on the software.

For a conceptual primer on NLP, see here and here!

Read: The Concept Paper

Read the Data Statements for NLP concept paper written by Emily Bender and Batya Friedman.

Consider the similarities between data statements and other dataset disclosures you have learned about thus far (datasheets for datasets, dataset cards, dataset nutrition labels).

Cite as: Bender EM, Friedman B. Data Statements for Natural Language Processing: Toward Mitigating System Bias and Enabling Better Science. TACL. 2018;6:587-604. doi:10.1162/tacl_a_00041

Explore: Data Statements for NLP Reading Summary

On his website, Morgan Klaus Scheuerman offers an excellent high-level summary of the Data Statements for NLP concept paper.

Review the summary and pay particular attention to the condensed data statement schema and provided definitions.

Cite as: Scheuerman MK. Summary of Data Statements for Natural Language Processing: Toward Mitigating System Bias and Enabling Better Science. Morgan Klaus.