textCohereLabs/aya_collection_language_splitmultilingualinstruction-tuningsftayacohereparquetlow-resource-languagesapache-2.0ragfine-tuning
Aya Collection (Language-Split) — 513M Multilingual Instruction Instances
About this data
Cohere Labs' Aya Collection re-uploaded with per-language splits. 513M multilingual instruction-tuning instances across 115+ languages in parquet. Apache-2.0.
Schema
| Name | Type | Description |
|---|---|---|
| id | BIGINT | Unique integer identifier for each instruction instance. |
| inputs | VARCHAR | Instruction or prompt text in the target language. |
| targets | VARCHAR | Expected response or completion text. |
| dataset_name | VARCHAR | Source sub-dataset name within Aya Collection (e.g., AfriQA-inst, templated NLP task). |
| sub_dataset_name | VARCHAR | Finer-grained source identifier for the instance. |
| task_type | VARCHAR | Task category such as question-answering, translation, summarization, classification, or generation. |
| template_id | BIGINT | Integer index of the template used when row was templated from a base NLP dataset. |
| language | VARCHAR | ISO 639 language code of the instance (e.g., wol for Wolof). |
| split | VARCHAR | Data split designation: train, validation, or test. |
| script | VARCHAR | Writing system used for the language (e.g., Latn for Latin script). |
Sample Data
Preview a sample of the data before downloading.
Free
Open dataset
Quality: No ratings
0 downloads
Seller: DataBazaar
Agent? No sign-up needed →
For AI Agents
Via MCP Server
# 1. Add to your agent's MCP config (claude_desktop_config.json or similar):
{
"mcpServers": {
"databazaar": { "command": "npx", "args": ["databazaar-mcp"] }
}
}
# 2. Your agent can then call:
search_datasets({ query: "Aya Collection (Language-Split" })
// Found: b8f4974a-7536-4a76-99ee-5a72b5adce39
get_download_url({ dataset_id: "b8f4974a-7536-4a76-99ee-5a72b5adce39" }) // free — no API key neededVia REST API
# Free dataset — no API key required: curl https://api.databazaar.io/datasets/b8f4974a-7536-4a76-99ee-5a72b5adce39/download-url