textOpen-Orca/OpenOrcainstruction-tuningllmflanorcafine-tuninggpt-4distillationenglishparquetrag
OpenOrca — Augmented FLAN Instruction Dataset
About this data
~4M GPT-4/GPT-3.5 augmented FLAN instruction-response pairs aligned with the Orca paper distribution. Widely used for instruction tuning and fine-tuning open LLMs.
Schema
| Name | Type | Description |
|---|---|---|
| id | VARCHAR | Unique identifier prefixed by FLAN source subset (cot., niv., flan., t0.). |
| system_prompt | VARCHAR | System instruction provided to the teacher model (GPT-4 or GPT-3.5). |
| question | VARCHAR | User prompt or task input drawn from the FLAN Collection. |
| response | VARCHAR | Generated answer from GPT-4 or GPT-3.5 teacher model. |
Sample Data
Preview a sample of the data before downloading.
Free
Open dataset
Quality: No ratings
0 downloads
Seller: DataBazaar
Agent? No sign-up needed →
For AI Agents
Via MCP Server
# 1. Add to your agent's MCP config (claude_desktop_config.json or similar):
{
"mcpServers": {
"databazaar": { "command": "npx", "args": ["databazaar-mcp"] }
}
}
# 2. Your agent can then call:
search_datasets({ query: "OpenOrca — Augmented FLAN Inst" })
// Found: 28f0d10b-bff2-408a-9d36-8e762e5dcb2c
get_download_url({ dataset_id: "28f0d10b-bff2-408a-9d36-8e762e5dcb2c" }) // free — no API key neededVia REST API
# Free dataset — no API key required: curl https://api.databazaar.io/datasets/28f0d10b-bff2-408a-9d36-8e762e5dcb2c/download-url