imagesimageomics/TreeOfLife-200Mbiologybiodiversitycomputer-visiontaxonomyclipmultimodalspeciesimage-classificationzero-shotimageomics
TreeOfLife-200M: Biodiversity Image Dataset (214M images, 952K taxa)
About this data
214M biology/species images across 952,257 taxa from GBIF, EOL, BIOSCAN-5M, and FathomNet. CC0 licensed, parquet format, ready for CV/CLIP training and zero-shot taxonomic classification.
Schema
| Name | Type | Description |
|---|---|---|
| uuid | VARCHAR | Unique identifier (UUID v4) for the image record |
| source_url | VARCHAR | Direct HTTP(S) URL to the original image file |
| kingdom | VARCHAR | Taxonomic kingdom (e.g., Animalia, Plantae, Fungi) |
| phylum | VARCHAR | Taxonomic phylum classification |
| class | VARCHAR | Taxonomic class classification |
| order | VARCHAR | Taxonomic order classification |
| family | VARCHAR | Taxonomic family classification |
| genus | VARCHAR | Taxonomic genus classification |
| species | VARCHAR | Taxonomic species epithet (lowercase binomial component) |
| scientific_name | VARCHAR | Latin binomial scientific name (Genus species format) |
| common | VARCHAR | Vernacular common name in English where available |
| data_source | VARCHAR | Provider identifier: gbif, eol, bioscan5m, or fathomnet |
| publisher | VARCHAR | Organization that published or hosts the original record |
| basis_of_record | VARCHAR | Record type (HUMAN_OBSERVATION, MACHINE_OBSERVATION, PRESERVED_SPECIMEN, etc.) |
| img_type | VARCHAR | Image classification category (Citizen Science, Museum, etc.) |
| source_id | VARCHAR | Original record identifier from upstream provider |
| shard_filename | VARCHAR | TAR archive filename containing this image in compressed format |
| shard_file_path | VARCHAR | Full path to the shard TAR file in the dataset storage structure |
| base_dataset_file_path | VARCHAR | Path to the parquet file containing the original unsharded record |
Sample Data
Preview a sample of the data before downloading.
Free
Open dataset
Quality: No ratings
0 downloads
Seller: DataBazaar
Agent? No sign-up needed →
For AI Agents
Via MCP Server
# 1. Add to your agent's MCP config (claude_desktop_config.json or similar):
{
"mcpServers": {
"databazaar": { "command": "npx", "args": ["databazaar-mcp"] }
}
}
# 2. Your agent can then call:
search_datasets({ query: "TreeOfLife-200M: Biodiversity " })
// Found: 2f99f0c2-fe5c-442b-ba22-9522e7464e8f
get_download_url({ dataset_id: "2f99f0c2-fe5c-442b-ba22-9522e7464e8f" }) // free — no API key neededVia REST API
# Free dataset — no API key required: curl https://api.databazaar.io/datasets/2f99f0c2-fe5c-442b-ba22-9522e7464e8f/download-url