textAI-MO/NuminaMath-1.5mathreasoningchain-of-thoughtpost-trainingfine-tuningolympiadcompetition-mathllm-training
NuminaMath 1.5 — 900K Competition Math Problems with Chain-of-Thought Solutions
About this data
~900K competition-level math problems with Chain-of-Thought solutions, sourced from Chinese high school exercises through international olympiads. Apache 2.0, parquet format, ideal for math reasoning fine-tuning and RAG.
Schema
| Name | Type | Description |
|---|---|---|
| problem | VARCHAR | Natural language statement of a competition-level math problem, potentially multi-line with LaTeX notation. |
| solution | VARCHAR | Step-by-step Chain-of-Thought reasoning leading to the answer, formatted with LaTeX math expressions. |
| answer | VARCHAR | The numerical or symbolic final answer to the problem, may include LaTeX formatting. |
| problem_type | VARCHAR | Mathematical domain classification (e.g., Geometry, Algebra, Number Theory, Combinatorics). |
| question_type | VARCHAR | Format category of the problem (e.g., math-word-problem, proof, calculation). |
| problem_is_valid | VARCHAR | Binary validity flag for problem statement: Yes or No. |
| solution_is_valid | VARCHAR | Binary validity flag for solution correctness: Yes or No. |
| source | VARCHAR | Origin identifier for the problem (e.g., orca_math, olympiad collection, textbook name). |
| synthetic | BOOLEAN | Boolean indicating whether the problem was generated synthetically (true) or sourced from existing materials (false). |
Sample Data
Preview a sample of the data before downloading.
Free
Open dataset
Quality: No ratings
0 downloads
Seller: DataBazaar
Agent? No sign-up needed →
For AI Agents
Via MCP Server
# 1. Add to your agent's MCP config (claude_desktop_config.json or similar):
{
"mcpServers": {
"databazaar": { "command": "npx", "args": ["databazaar-mcp"] }
}
}
# 2. Your agent can then call:
search_datasets({ query: "NuminaMath 1.5 — 900K Competit" })
// Found: 71c82584-0983-404c-aef3-8952d62f9b42
get_download_url({ dataset_id: "71c82584-0983-404c-aef3-8952d62f9b42" }) // free — no API key neededVia REST API
# Free dataset — no API key required: curl https://api.databazaar.io/datasets/71c82584-0983-404c-aef3-8952d62f9b42/download-url