imagesimageomics/TreeOfLife-200Mbiologybiodiversitycomputer-visiontaxonomyclipmultimodalspeciesimage-classificationzero-shotimageomics

TreeOfLife-200M: Biodiversity Image Dataset (214M images, 952K taxa)

Category
Images
Records
213,937,319 rows
Format
PARQUET
Update Frequency
One-time snapshot
Collection Method
auto_imported_huggingface_federated
PII
None detected
File Size
~13233.46 MB
Downloads
0

About this data

214M biology/species images across 952,257 taxa from GBIF, EOL, BIOSCAN-5M, and FathomNet. CC0 licensed, parquet format, ready for CV/CLIP training and zero-shot taxonomic classification.

Schema

NameTypeDescription
uuidVARCHARUnique identifier (UUID v4) for the image record
source_urlVARCHARDirect HTTP(S) URL to the original image file
kingdomVARCHARTaxonomic kingdom (e.g., Animalia, Plantae, Fungi)
phylumVARCHARTaxonomic phylum classification
classVARCHARTaxonomic class classification
orderVARCHARTaxonomic order classification
familyVARCHARTaxonomic family classification
genusVARCHARTaxonomic genus classification
speciesVARCHARTaxonomic species epithet (lowercase binomial component)
scientific_nameVARCHARLatin binomial scientific name (Genus species format)
commonVARCHARVernacular common name in English where available
data_sourceVARCHARProvider identifier: gbif, eol, bioscan5m, or fathomnet
publisherVARCHAROrganization that published or hosts the original record
basis_of_recordVARCHARRecord type (HUMAN_OBSERVATION, MACHINE_OBSERVATION, PRESERVED_SPECIMEN, etc.)
img_typeVARCHARImage classification category (Citizen Science, Museum, etc.)
source_idVARCHAROriginal record identifier from upstream provider
shard_filenameVARCHARTAR archive filename containing this image in compressed format
shard_file_pathVARCHARFull path to the shard TAR file in the dataset storage structure
base_dataset_file_pathVARCHARPath to the parquet file containing the original unsharded record

Sample Data

Preview a sample of the data before downloading.

Free

Open dataset

Quality: No ratings
0 downloads
Seller: DataBazaar
Sign up to download

Agent? No sign-up needed →

For AI Agents

Via MCP Server
# 1. Add to your agent's MCP config (claude_desktop_config.json or similar):
{
  "mcpServers": {
    "databazaar": { "command": "npx", "args": ["databazaar-mcp"] }
  }
}

# 2. Your agent can then call:
search_datasets({ query: "TreeOfLife-200M: Biodiversity " })
// Found: 2f99f0c2-fe5c-442b-ba22-9522e7464e8f
get_download_url({ dataset_id: "2f99f0c2-fe5c-442b-ba22-9522e7464e8f" })  // free — no API key needed
Via REST API
# Free dataset — no API key required:
curl https://api.databazaar.io/datasets/2f99f0c2-fe5c-442b-ba22-9522e7464e8f/download-url