textnvidia/OpenCodeInstructcodeinstruction-tuningsftllmsyntheticnvidiafine-tuningtext-generation
OpenCodeInstruct: 5M Instruction Tuning Samples for Code LLMs (NVIDIA)
About this data
NVIDIA's 5M-sample open-access instruction tuning dataset for supervised fine-tuning of code LLMs. Synthetic, diverse coding instructions and responses in English, CC-BY-4.0 licensed.
Schema
| Name | Type | Description |
|---|---|---|
| id | VARCHAR | Unique hexadecimal identifier for the sample |
| input | VARCHAR | Coding task prompt with problem description, constraints, and examples |
| output | VARCHAR | Model-generated solution code with implementation and documentation |
| domain | VARCHAR | Programming task category (e.g., generic, algorithms, data structures) |
| generation_algorithm | VARCHAR | Synthetic data generation method used (e.g., self-instruct) |
| llm_judgement | VARCHAR | LLM evaluation metrics assessing solution quality and conformance |
| unit_tests | VARCHAR | Test cases as code to validate the generated solution |
| tests_execution_status | VARCHAR | Execution result of unit tests (pass/fail/error status) |
| average_test_score | VARCHAR | Numeric score (0-1 or percentage) of solution against test suite |
Sample Data
Preview a sample of the data before downloading.
Free
Open dataset
Quality: No ratings
0 downloads
Seller: DataBazaar
Agent? No sign-up needed →
For AI Agents
Via MCP Server
# 1. Add to your agent's MCP config (claude_desktop_config.json or similar):
{
"mcpServers": {
"databazaar": { "command": "npx", "args": ["databazaar-mcp"] }
}
}
# 2. Your agent can then call:
search_datasets({ query: "OpenCodeInstruct: 5M Instructi" })
// Found: 2973f07e-a6c6-485c-a458-90d534589d7b
get_download_url({ dataset_id: "2973f07e-a6c6-485c-a458-90d534589d7b" }) // free — no API key neededVia REST API
# Free dataset — no API key required: curl https://api.databazaar.io/datasets/2973f07e-a6c6-485c-a458-90d534589d7b/download-url