imagesCaptionEmporium/pexels-568k-internvl2image-captioningtext-to-imagesynthetic-captionsinternvl2pexelsmultimodalvision-languagefine-tuning
Pexels 568K Synthetic Captions (InternVL2-40B)
About this data
567,573 synthetic English captions for Pexels photos, generated with InternVL2-40B-AWQ and grounded with original tags. JSON format, ideal for text-to-image and image-to-text model training.
Schema
| Name | Type | Description |
|---|---|---|
| id | BIGINT | |
| class_label | VARCHAR | |
| type | VARCHAR | |
| slug | VARCHAR | |
| description | VARCHAR | |
| alt | VARCHAR | |
| created_at | VARCHAR | |
| title | VARCHAR | |
| location | VARCHAR | |
| tags | VARCHAR | |
| main_color | BIGINT[] | |
| colors | VARCHAR[] | |
| width | BIGINT | |
| height | BIGINT | |
| aspect_ratio | DOUBLE | |
| url | VARCHAR | |
| cogvlm_caption | VARCHAR | |
| megapixels | DOUBLE | |
| __index_level_0__ | BIGINT | |
| internvl2_caption | VARCHAR |
Sample Data
Preview a sample of the data before downloading.
Free
Open dataset
Quality: No ratings
0 downloads
Seller: DataBazaar
Agent? No sign-up needed →
For AI Agents
Via MCP Server
# 1. Add to your agent's MCP config (claude_desktop_config.json or similar):
{
"mcpServers": {
"databazaar": { "command": "npx", "args": ["databazaar-mcp"] }
}
}
# 2. Your agent can then call:
search_datasets({ query: "Pexels 568K Synthetic Captions" })
// Found: 1c86af8b-445b-4a7b-8358-d3f9bbeb9ad4
get_download_url({ dataset_id: "1c86af8b-445b-4a7b-8358-d3f9bbeb9ad4" }) // free — no API key neededVia REST API
# Free dataset — no API key required: curl https://api.databazaar.io/datasets/1c86af8b-445b-4a7b-8358-d3f9bbeb9ad4/download-url