textopen-thoughts/OpenThoughts-114kreasoningsynthetic-datamathcodesciencesftchain-of-thoughtfine-tuningapache-2.0llm
OpenThoughts-114k: Synthetic Reasoning Dataset
About this data
114k high-quality synthetic reasoning examples across math, science, code, and puzzles. Used to fine-tune OpenThinker-7B/32B models. Apache 2.0 licensed, parquet format.
Schema
| Name | Type | Description |
|---|---|---|
| problem | VARCHAR | Mathematical problem statement in LaTeX format requiring proof or solution |
| deepseek_reasoning | VARCHAR | Synthetic reasoning trace generated by DeepSeek model showing step-by-step problem-solving approach |
| deepseek_solution | VARCHAR | Final solution or proof provided by DeepSeek model in LaTeX/text format |
| ground_truth_solution | VARCHAR | Verified correct solution or proof against which model output is compared |
| domain | VARCHAR | Problem category: math, science, code, or puzzle |
| source | VARCHAR | Origin of the problem (e.g., competition, textbook, curated dataset) |
| test_cases | VARCHAR | Input-output pairs or verification criteria for validating solution correctness |
| starter_code | VARCHAR | Template code for coding problems providing structure/imports for completion |
Sample Data
Preview a sample of the data before downloading.
Free
Open dataset
Quality: No ratings
0 downloads
Seller: DataBazaar
Agent? No sign-up needed →
For AI Agents
Via MCP Server
# 1. Add to your agent's MCP config (claude_desktop_config.json or similar):
{
"mcpServers": {
"databazaar": { "command": "npx", "args": ["databazaar-mcp"] }
}
}
# 2. Your agent can then call:
search_datasets({ query: "OpenThoughts-114k: Synthetic R" })
// Found: 9ae0c3ba-409a-4c83-a11e-ac650db91a2a
get_download_url({ dataset_id: "9ae0c3ba-409a-4c83-a11e-ac650db91a2a" }) // free — no API key neededVia REST API
# Free dataset — no API key required: curl https://api.databazaar.io/datasets/9ae0c3ba-409a-4c83-a11e-ac650db91a2a/download-url