To search datasets programmatically: GET https://api.databazaar.io/datasets?query=your-search
Full API docs: https://api.databazaar.io/llms.txt
Agent discovery: https://databazaar.io/.well-known/agent.json
Browse Data
8 listingsMultilingual Amazon Product Reviews — 2.5M Labeled
2,520,000 Amazon product reviews in English, Japanese, German, French, Chinese, and Spanish (collected 2015–2019), each labeled with a star rating. Built for multilingual sentiment analysis and text classification.
CPU Activity (cpu_act)
Classic regression benchmark predicting CPU user-mode utilization from 21 system activity measures collected on a Sun Sparcstation. 8,192 rows, widely used in ML evaluation.
Adult (Census Income) — UCI/OpenML Benchmark
Classic UCI 'Adult' census income dataset (~48K rows, 14 features) for predicting whether income exceeds $50K/yr. Widely used for tabular ML benchmarking, fairness research, and AutoML evaluation.
Telco Customer Churn Prediction (IBM Sample)
Classic IBM telco customer churn dataset (~7K rows) with demographics, service subscriptions, account info, and churn label. Tabular CSV, ideal for ML classification tutorials, benchmarks, and agent-driven feature engineering.
Cirrus SR22 USA For-Sale Listings + 10,816 Photos — May 2026 Snapshot
The complete pre-owned Cirrus SR22 market in the United States as of May 19, 2026 — 314 aircraft listed across Controller, Trade-A-Plane, GlobalAir, and Barnstormers, N-number-deduplicated and joined to the FAA Aircraft Registry and NTSB event history. Includes 47 structured fields per aircraft (price, hours, avionics, damage history, location) plus 10,816 bundled listing photos (~1.25 GB).
Open Food Facts Product Database
1.7M+ food products with ingredients, allergens, nutrition facts, and label data from 150 countries, contributed by 25k+ volunteers. Multilingual tabular dataset under ODbL/AGPL.
Digital Commerce Readiness — 261 Countries (1980–2023)
Expansive dataset of 25 digital commerce, trade, and economic readiness indicators for 261 countries spanning 1980-2023. Sourced from World Bank Open Data API, covering internet penetration, mobile subscriptions, broadband access, ICT trade flows, logistics performance, high-tech exports, consumer spending patterns, GDP metrics, labor force statistics, and urbanization. Ideal for e-commerce market analysis, cross-country digital divide research, retail expansion planning, and economic development studies. Data is normalized with ISO3 country codes and cleaned for quality (minimum 3 non-null indicators per row).
US FDA Safety Recalls & Enforcement Actions — 45,000 Records (Food, Drug, Device)
Wide-coverage dataset of 45,000 US FDA enforcement actions spanning food, drug, and medical device recalls. Sourced from three openFDA enforcement APIs and unified into a single normalized schema with 22 columns. **Sources:** - openFDA Food Enforcement API (15,000 records) - openFDA Drug Enforcement API (15,000 records) - openFDA Device Enforcement API (15,000 records) **Coverage:** Recalls from across the United States and international markets, spanning multiple years of FDA enforcement activity. **Schema (22 columns):** - `recall_number` — unique FDA recall identifier - `product_type` — food, drug, or device - `event_id` — FDA event identifier - `status` — Ongoing, Terminated, Completed - `classification` — Class I (dangerous/defective), Class II (may cause health problems), Class III (unlikely to cause harm) - `recalling_firm` — company issuing the recall - `city`, `state`, `country` — firm location - `voluntary_mandated` — whether the recall was voluntary or FDA-mandated - `initial_firm_notification` — how the public was notified - `product_description` — detailed product description - `reason_for_recall` — why the recall was initiated - `distribution_pattern` — geographic distribution of the product - `product_quantity` — amount of product recalled - `code_info` — lot numbers, UPC codes, expiration dates - `recall_initiation_date` — when the recall started (ISO 8601) - `center_classification_date` — when FDA classified the recall - `report_date` — when the recall was reported - `termination_date` — when the recall ended (if applicable) - `recall_year` — extracted year for easy filtering - `recall_class_num` — numeric class (1, 2, or 3) for sorting/analysis **Use cases:** Product safety analytics, regulatory compliance research, supply chain risk assessment, consumer protection analysis, ML classification models, public health surveillance.