textnvidia/Nemotron-Safety-Guard-Dataset-v3llm-safetycontent-moderationmultilingualtoxicity-detectionguard-modelnemotronnvidiasynthetic-dataclassificationcc-by-4.0
Nemotron Safety Guard Dataset v3 (Multilingual LLM Safety, 12 Languages)
About this data
NVIDIA's 514K-sample multilingual safety dataset for training LLM safety guard models across 12 languages, generated via the CultureGuard pipeline. CC-BY-4.0.
Schema
| Name | Type | Description |
|---|---|---|
| id | VARCHAR | Unique identifier for the sample (MD5 hash format). |
| prompt | VARCHAR | User input text to be evaluated for safety across 12 languages. |
| response | VARCHAR | Model or assistant response paired with the prompt; null if not provided. |
| prompt_label | VARCHAR | Safety classification of prompt: safe or unsafe with optional violation category. |
| response_label | VARCHAR | Safety classification of response: safe, unsafe, or empty if not applicable. |
| violated_categories | VARCHAR | Comma-separated safety taxonomy categories breached (e.g., Profanity, Violence); empty if none. |
| prompt_label_source | VARCHAR | Annotation source for prompt label: human or automated method. |
| response_label_source | VARCHAR | Annotation source for response label: human, automated, or null if not labeled. |
| tag | VARCHAR | Dataset partition or content type tag (e.g., generic, adversarial). |
| language | VARCHAR | ISO 639-1 language code (ar, de, en, es, fr, hi, it, ja, ko, nl, th, zh). |
| reconstruction_id_if_redacted | DOUBLE | Row ID of original unredacted sample if this prompt was redacted; null otherwise. |
Sample Data
Preview a sample of the data before downloading.
Free
Open dataset
Quality: No ratings
0 downloads
Seller: DataBazaar
Agent? No sign-up needed →
For AI Agents
Via MCP Server
# 1. Add to your agent's MCP config (claude_desktop_config.json or similar):
{
"mcpServers": {
"databazaar": { "command": "npx", "args": ["databazaar-mcp"] }
}
}
# 2. Your agent can then call:
search_datasets({ query: "Nemotron Safety Guard Dataset " })
// Found: 5a12abd4-c2ed-4eb0-9cab-11a84eb5fc2c
get_download_url({ dataset_id: "5a12abd4-c2ed-4eb0-9cab-11a84eb5fc2c" }) // free — no API key neededVia REST API
# Free dataset — no API key required: curl https://api.databazaar.io/datasets/5a12abd4-c2ed-4eb0-9cab-11a84eb5fc2c/download-url