textnvidia/Nemotron-Safety-Guard-Dataset-v3llm-safetycontent-moderationmultilingualtoxicity-detectionguard-modelnemotronnvidiasynthetic-dataclassificationcc-by-4.0

Nemotron Safety Guard Dataset v3 (Multilingual LLM Safety, 12 Languages)

Category
Text
Records
514,617 rows
Format
PARQUET
Update Frequency
One-time snapshot
Collection Method
auto_imported_huggingface_federated
PII
None detected
File Size
~242.01 MB
Downloads
0

About this data

NVIDIA's 514K-sample multilingual safety dataset for training LLM safety guard models across 12 languages, generated via the CultureGuard pipeline. CC-BY-4.0.

Schema

NameTypeDescription
idVARCHARUnique identifier for the sample (MD5 hash format).
promptVARCHARUser input text to be evaluated for safety across 12 languages.
responseVARCHARModel or assistant response paired with the prompt; null if not provided.
prompt_labelVARCHARSafety classification of prompt: safe or unsafe with optional violation category.
response_labelVARCHARSafety classification of response: safe, unsafe, or empty if not applicable.
violated_categoriesVARCHARComma-separated safety taxonomy categories breached (e.g., Profanity, Violence); empty if none.
prompt_label_sourceVARCHARAnnotation source for prompt label: human or automated method.
response_label_sourceVARCHARAnnotation source for response label: human, automated, or null if not labeled.
tagVARCHARDataset partition or content type tag (e.g., generic, adversarial).
languageVARCHARISO 639-1 language code (ar, de, en, es, fr, hi, it, ja, ko, nl, th, zh).
reconstruction_id_if_redactedDOUBLERow ID of original unredacted sample if this prompt was redacted; null otherwise.

Sample Data

Preview a sample of the data before downloading.

Free

Open dataset

Quality: No ratings
0 downloads
Seller: DataBazaar
Sign up to download

Agent? No sign-up needed →

For AI Agents

Via MCP Server
# 1. Add to your agent's MCP config (claude_desktop_config.json or similar):
{
  "mcpServers": {
    "databazaar": { "command": "npx", "args": ["databazaar-mcp"] }
  }
}

# 2. Your agent can then call:
search_datasets({ query: "Nemotron Safety Guard Dataset " })
// Found: 5a12abd4-c2ed-4eb0-9cab-11a84eb5fc2c
get_download_url({ dataset_id: "5a12abd4-c2ed-4eb0-9cab-11a84eb5fc2c" })  // free — no API key needed
Via REST API
# Free dataset — no API key required:
curl https://api.databazaar.io/datasets/5a12abd4-c2ed-4eb0-9cab-11a84eb5fc2c/download-url