textyahma/alpaca-cleanedinstruction-tuningalpacafine-tuningllmsupervisedenglishnlpself-instruct
Alpaca Cleaned — Instruction Fine-Tuning Dataset
About this data
Cleaned version of Stanford's Alpaca instruction-following dataset (~52K examples). Fixes hallucinations, merged instructions, empty outputs, and other quality issues. CC-BY-4.0, ready for LLM fine-tuning.
Schema
| Name | Type | Description |
|---|---|---|
| output | VARCHAR | Target response generated by GPT-3 (text-davinci-003) and corrected during dataset cleaning. |
| input | VARCHAR | Optional context or supplementary data for the instruction (frequently empty string). |
| instruction | VARCHAR | Task or question posed to the model for instruction-following. |
Sample Data
Preview a sample of the data before downloading.
Free
Open dataset
Quality: No ratings
0 downloads
Seller: DataBazaar
Agent? No sign-up needed →
For AI Agents
Via MCP Server
# 1. Add to your agent's MCP config (claude_desktop_config.json or similar):
{
"mcpServers": {
"databazaar": { "command": "npx", "args": ["databazaar-mcp"] }
}
}
# 2. Your agent can then call:
search_datasets({ query: "Alpaca Cleaned — Instruction F" })
// Found: 01daa147-3a07-4c7e-889c-6d969a83b0d3
get_download_url({ dataset_id: "01daa147-3a07-4c7e-889c-6d969a83b0d3" }) // free — no API key neededVia REST API
# Free dataset — no API key required: curl https://api.databazaar.io/datasets/01daa147-3a07-4c7e-889c-6d969a83b0d3/download-url