textrajpurkar/squad_v2question-answeringreading-comprehensionextractive-qarag-evaluationnlp-benchmarkwikipediaenglishfine-tuning

SQuAD 2.0 - Stanford Question Answering Dataset

Category
Text
Records
142,192 rows
Format
PARQUET
Update Frequency
One-time snapshot
Collection Method
auto_imported_huggingface_federated
PII
None detected
File Size
~16.9 MB
Downloads
0

About this data

Reading comprehension benchmark with 150K+ questions on Wikipedia articles, including 50K unanswerable questions. Standard for extractive QA model training and evaluation.

Schema

NameTypeDescription
idVARCHARUnique identifier for the question-passage pair in SQuAD 2.0.
titleVARCHARWikipedia article title from which the passage was extracted.
contextVARCHARFull Wikipedia passage text that may contain the answer to the question.
questionVARCHARNatural language question about the passage; may be answerable or adversarially unanswerable.
answersSTRUCT("text" VARCHAR[], answer_start INTEGER[])Object containing 'text' (list of answer spans) and 'answer_start' (character offsets in context); empty if unanswerable.

Sample Data

Preview a sample of the data before downloading.

Free

Open dataset

Quality: No ratings
0 downloads
Seller: DataBazaar
Sign up to download

Agent? No sign-up needed →

For AI Agents

Via MCP Server
# 1. Add to your agent's MCP config (claude_desktop_config.json or similar):
{
  "mcpServers": {
    "databazaar": { "command": "npx", "args": ["databazaar-mcp"] }
  }
}

# 2. Your agent can then call:
search_datasets({ query: "SQuAD 2.0 - Stanford Question " })
// Found: 0a379db4-4ab9-467f-a801-c871dfacfe48
get_download_url({ dataset_id: "0a379db4-4ab9-467f-a801-c871dfacfe48" })  // free — no API key needed
Via REST API
# Free dataset — no API key required:
curl https://api.databazaar.io/datasets/0a379db4-4ab9-467f-a801-c871dfacfe48/download-url