textnvidia/Aegis-AI-Content-Safety-Dataset-2.0safetycontent-moderationllm-safetytoxicityguardrailsnemoguardclassificationnvidiaaegisrlhf
Nemotron Content Safety Dataset V2 (Aegis 2.0)
About this data
33,416 annotated human-LLM interactions for content safety classification across 12+ harm categories. Used for training and evaluating LLM safety guardrails like NeMo Guard.
Schema
| Name | Type | Description |
|---|---|---|
| id | VARCHAR | Unique hexadecimal identifier for each interaction record. |
| reconstruction_id_if_redacted | BIGINT | Numeric ID linking to reconstructed original if record was redacted; null otherwise. |
| prompt | VARCHAR | Human user input text to the LLM. |
| response | VARCHAR | LLM-generated response text to the prompt. |
| prompt_label | VARCHAR | Safety label: 'safe' or 'unsafe' for the prompt. |
| response_label | VARCHAR | Safety label: 'safe' or 'unsafe' for the response. |
| violated_categories | VARCHAR | Comma-separated harm categories triggered (e.g., Violence, Criminal Planning, Harassment). |
| prompt_label_source | VARCHAR | Annotation source for prompt label: 'human' or 'llm_jury'. |
| response_label_source | VARCHAR | Annotation source for response label: 'human' or 'llm_jury'. |
Sample Data
Preview a sample of the data before downloading.
Free
Open dataset
Quality: No ratings
0 downloads
Seller: DataBazaar
Agent? No sign-up needed →
For AI Agents
Via MCP Server
# 1. Add to your agent's MCP config (claude_desktop_config.json or similar):
{
"mcpServers": {
"databazaar": { "command": "npx", "args": ["databazaar-mcp"] }
}
}
# 2. Your agent can then call:
search_datasets({ query: "Nemotron Content Safety Datase" })
// Found: f94b99fe-4d67-4a20-affc-28aed10844f0
get_download_url({ dataset_id: "f94b99fe-4d67-4a20-affc-28aed10844f0" }) // free — no API key neededVia REST API
# Free dataset — no API key required: curl https://api.databazaar.io/datasets/f94b99fe-4d67-4a20-affc-28aed10844f0/download-url