scientificjablonkagroup/ChemBenchchemistrybenchmarkevaluationllm-evalmaterials-sciencequestion-answeringmultiple-choiceexpert-curated
ChemBench — Chemistry & Materials LLM Evaluation Benchmark
About this data
Manually curated benchmark for evaluating chemistry and materials science capabilities of LLMs. Expert-generated QA and multiple-choice items. MIT licensed, evaluation-only.
Schema
| Name | Type | Description |
|---|---|---|
| canary | VARCHAR | Deduplication string warning that benchmark data must not appear in training corpora, includes unique GUID |
| description | VARCHAR | Brief natural language summary of the evaluation item's topic or concept |
| examples | STRUCT("input" VARCHAR, "target" VARCHAR, target_scores VARCHAR)[] | Array of input-target pairs with scoring rubrics; input is the prompt/question, target is expected answer, target_scores maps answer options to correctness weights |
| in_humansubset_w_tool | BOOLEAN | Boolean flag indicating whether item was evaluated by human annotators using external tools or resources |
| in_humansubset_wo_tool | BOOLEAN | Boolean flag indicating whether item was evaluated by human annotators without external tools or resources |
| keywords | VARCHAR[] | Array of semantic tags describing task domain, difficulty level, required knowledge type, and assessment method |
| metrics | VARCHAR[] | Array of evaluation metric names applicable to this item (e.g. multiple_choice_grade, accuracy) |
| name | VARCHAR | Unique identifier or slug for the evaluation item within the benchmark |
| preferred_score | VARCHAR | Primary metric name recommended for scoring this specific item |
| uuid | VARCHAR | Universally unique identifier (v5 UUID) for the item |
| subfield | VARCHAR | Chemistry or materials science subdomain category (e.g. safety, synthesis, thermodynamics) |
Sample Data
Preview a sample of the data before downloading.
Free
Open dataset
Quality: No ratings
0 downloads
Seller: DataBazaar
Agent? No sign-up needed →
For AI Agents
Via MCP Server
# 1. Add to your agent's MCP config (claude_desktop_config.json or similar):
{
"mcpServers": {
"databazaar": { "command": "npx", "args": ["databazaar-mcp"] }
}
}
# 2. Your agent can then call:
search_datasets({ query: "ChemBench — Chemistry & Materi" })
// Found: ee9efdad-1794-4ecb-89d8-693621deee1d
get_download_url({ dataset_id: "ee9efdad-1794-4ecb-89d8-693621deee1d" }) // free — no API key neededVia REST API
# Free dataset — no API key required: curl https://api.databazaar.io/datasets/ee9efdad-1794-4ecb-89d8-693621deee1d/download-url