textmicrosoft/ms_marcoquestion-answeringpassage-rankingretrievalragir-benchmarkmsmarcobingenglish
MS MARCO — Question Answering & Passage Ranking
About this data
Microsoft's MS MARCO dataset: 1M+ real Bing questions with human-generated answers and passage relevance judgments. Foundational benchmark for retrieval, RAG, and QA systems.
Schema
| Name | Type | Description |
|---|---|---|
| answers | VARCHAR[] | Human-written natural-language answer(s) to the query. |
| passages | STRUCT(is_selected INTEGER[], passage_text VARCHAR[], url VARCHAR[]) | List of ~10 candidate passages with text, source URL, and binary relevance label (1=selected, 0=not selected). |
| query | VARCHAR | Natural-language question from anonymized Bing user logs. |
| query_id | INTEGER | Unique integer identifier for the question. |
| query_type | VARCHAR | Question category: DESCRIPTION, NUMERIC, ENTITY, LOCATION, or PERSON. |
| wellFormedAnswers | VARCHAR[] | Rewritten well-formed answers (present on subset of rows). |
Sample Data
Preview a sample of the data before downloading.
Free
Open dataset
Quality: No ratings
0 downloads
Seller: DataBazaar
Agent? No sign-up needed →
For AI Agents
Via MCP Server
# 1. Add to your agent's MCP config (claude_desktop_config.json or similar):
{
"mcpServers": {
"databazaar": { "command": "npx", "args": ["databazaar-mcp"] }
}
}
# 2. Your agent can then call:
search_datasets({ query: "MS MARCO — Question Answering " })
// Found: 28fcfdda-bc59-4d35-afe1-64d077455e84
get_download_url({ dataset_id: "28fcfdda-bc59-4d35-afe1-64d077455e84" }) // free — no API key neededVia REST API
# Free dataset — no API key required: curl https://api.databazaar.io/datasets/28fcfdda-bc59-4d35-afe1-64d077455e84/download-url