imageslmms-lab/TextCapsimage-captioningocrmultimodalvision-languagebenchmarklmms-evalevaluationvqa
TextCaps (lmms-eval formatted)
About this data
Image captioning benchmark requiring OCR/text reading in images. Formatted by lmms-lab for one-click multimodal model evaluation. ~28K images with captions.
Schema
| Name | Type | Description |
|---|---|---|
| question_id | VARCHAR | Unique identifier for the image/question pair in the evaluation set |
| question | VARCHAR | Instruction prompt asking the model to generate a caption for the image |
| image | STRUCT(bytes BLOB, path VARCHAR) | Image data as binary blob with optional file path reference |
| image_id | VARCHAR | Unique identifier for the source image |
| image_classes | VARCHAR[] | List of semantic object class labels present in the image |
| flickr_original_url | VARCHAR | URL to original image on Flickr |
| flickr_300k_url | VARCHAR | URL to image from Flickr 300K subset |
| image_width | BIGINT | Image width in pixels |
| image_height | BIGINT | Image height in pixels |
| set_name | VARCHAR | Evaluation split designation (train/val/test) |
| image_name | VARCHAR | Filename of the image |
| image_path | VARCHAR | File path or identifier for image location |
| caption_id | BIGINT[] | List of numeric identifiers for captions associated with the image |
| caption_str | VARCHAR[] | List of reference captions describing the image including OCR text |
| reference_strs | VARCHAR[] | Multiple human-written reference captions for evaluation metrics |
Sample Data
Preview a sample of the data before downloading.
Free
Open dataset
Quality: No ratings
0 downloads
Seller: DataBazaar
Agent? No sign-up needed →
For AI Agents
Via MCP Server
# 1. Add to your agent's MCP config (claude_desktop_config.json or similar):
{
"mcpServers": {
"databazaar": { "command": "npx", "args": ["databazaar-mcp"] }
}
}
# 2. Your agent can then call:
search_datasets({ query: "TextCaps (lmms-eval formatted)" })
// Found: 992a5d4b-efc7-40b0-ac8a-26c805d510c2
get_download_url({ dataset_id: "992a5d4b-efc7-40b0-ac8a-26c805d510c2" }) // free — no API key neededVia REST API
# Free dataset — no API key required: curl https://api.databazaar.io/datasets/992a5d4b-efc7-40b0-ac8a-26c805d510c2/download-url