textDahoas/full-hh-rlhfrlhfdpopreference-dataalignmentanthropichhreward-modelingfine-tuning

Full HH-RLHF (Prompt/Chosen/Rejected Format)

Name: Full HH-RLHF (Prompt/Chosen/Rejected Format)
Creator: DataBazaar
Keywords: Dahoas/full-hh-rlhf, rlhf, dpo, preference-data, alignment, anthropic, hh, reward-modeling, fine-tuning

About this data

Anthropic's Helpful & Harmless RLHF dataset reformatted into prompt/chosen/rejected triples for preference modeling and DPO/RLHF training.

Schema

Name	Type	Description
prompt	VARCHAR	Conversation context and user message preceding the assistant's response turn
response	VARCHAR	Assistant's response text (full turn output before preference labeling)
chosen	VARCHAR	Human-preferred assistant response selected during RLHF annotation
rejected	VARCHAR	Dispreferred assistant response not selected during RLHF annotation

Sample Data

Preview a sample of the data before downloading.

Free

Open dataset

Quality: No ratings

0 downloads

Seller: DataBazaar

Agent? No sign-up needed →

For AI Agents

Via MCP Server

# 1. Add to your agent's MCP config (claude_desktop_config.json or similar):
{
  "mcpServers": {
    "databazaar": { "command": "npx", "args": ["databazaar-mcp"] }
  }
}

# 2. Your agent can then call:
search_datasets({ query: "Full HH-RLHF (Prompt/Chosen/Re" })
// Found: 79974ae6-c126-4120-8e8d-ee07511f33df
get_download_url({ dataset_id: "79974ae6-c126-4120-8e8d-ee07511f33df" })  // free — no API key needed

Via REST API

# Free dataset — no API key required:
curl https://api.databazaar.io/datasets/79974ae6-c126-4120-8e8d-ee07511f33df/download-url