textAmazonScience/massivemultilingualnluintent-classificationslot-fillingvoice-assistantbenchmarklow-resource-languagesamazoncc-by-4.0parallel-corpus
MASSIVE: Multilingual NLU Dataset (51 Languages, 1M+ Utterances)
About this data
Parallel multilingual NLU benchmark from Amazon Science with 1M+ utterances across 51 languages, annotated with 60 intents and 55 slot types. Built by localizing SLURP voice assistant interactions.
Schema
| Name | Type | Description |
|---|---|---|
| id | VARCHAR | Unique utterance identifier. |
| locale | VARCHAR | BCP 47 language-region code (e.g., en-US, ja-JP). |
| partition | VARCHAR | Dataset split: train, dev, or test. |
| scenario | BIGINT | Numeric identifier for high-level domain (e.g., alarm, music, weather). |
| intent | BIGINT | Numeric identifier for one of 60 intent classes (e.g., alarm_set). |
| utt | VARCHAR | Localized natural language utterance text. |
| annot_utt | VARCHAR | Utterance with inline slot annotations in [slot_type : value] format. |
| worker_id | VARCHAR | Anonymized identifier of the translator/annotator. |
| slot_method | STRUCT(slot VARCHAR[], "method" VARCHAR[]) | Per-slot localization method (translation, transcreation, etc.) paired with slot names. |
| judgments | STRUCT(worker_id VARCHAR[], intent_score TINYINT[], slots_score TINYINT[], grammar_score TINYINT[], spelling_score TINYINT[], language_identification VARCHAR[]) | Quality review scores (intent, slots, grammar, spelling) and language ID from multiple reviewers. |
Sample Data
Preview a sample of the data before downloading.
Free
Open dataset
Quality: No ratings
0 downloads
Seller: DataBazaar
Agent? No sign-up needed →
For AI Agents
Via MCP Server
# 1. Add to your agent's MCP config (claude_desktop_config.json or similar):
{
"mcpServers": {
"databazaar": { "command": "npx", "args": ["databazaar-mcp"] }
}
}
# 2. Your agent can then call:
search_datasets({ query: "MASSIVE: Multilingual NLU Data" })
// Found: 587b3f62-51c8-4ff7-8a79-4895cf3c00aa
get_download_url({ dataset_id: "587b3f62-51c8-4ff7-8a79-4895cf3c00aa" }) // free — no API key neededVia REST API
# Free dataset — no API key required: curl https://api.databazaar.io/datasets/587b3f62-51c8-4ff7-8a79-4895cf3c00aa/download-url