imagesHuggingFaceM4/FineVisionvision-languagemultimodalvlminstruction-tuningimagesvqaparquethuggingfacefine-tuningrag
FineVision — 24M-Sample Vision-Language Training Corpus
About this data
Massive open VLM training set: 17.3M images, 24.3M samples, 88.9M turns, 9.5B answer tokens. Parquet, multi-subset, ready for fine-tuning state-of-the-art vision-language models.
Schema
| Name | Type | Description |
|---|---|---|
| images | STRUCT(bytes BLOB, path VARCHAR)[] | List of images with binary content and file path for each sample. |
| texts | STRUCT("user" VARCHAR, assistant VARCHAR)[] | List of multi-turn conversational exchanges with user query and assistant response fields. |
| source | VARCHAR | Upstream dataset identifier or task family name. |
| relevance_ratings | BIGINT[] | List of integer relevance scores (1-5 scale) from annotators. |
| relevance_min | BIGINT | Minimum relevance rating across all annotators for the sample. |
| formatting_ratings | BIGINT[] | List of integer formatting quality scores (1-5 scale) from annotators. |
| formatting_min | BIGINT | Minimum formatting rating across all annotators for the sample. |
Sample Data
Preview a sample of the data before downloading.
Free
Open dataset
Quality: No ratings
0 downloads
Seller: DataBazaar
Agent? No sign-up needed →
For AI Agents
Via MCP Server
# 1. Add to your agent's MCP config (claude_desktop_config.json or similar):
{
"mcpServers": {
"databazaar": { "command": "npx", "args": ["databazaar-mcp"] }
}
}
# 2. Your agent can then call:
search_datasets({ query: "FineVision — 24M-Sample Vision" })
// Found: 4837ae04-96b5-4da6-94a3-2f9d542f39e1
get_download_url({ dataset_id: "4837ae04-96b5-4da6-94a3-2f9d542f39e1" }) // free — no API key neededVia REST API
# Free dataset — no API key required: curl https://api.databazaar.io/datasets/4837ae04-96b5-4da6-94a3-2f9d542f39e1/download-url