Global Public Domain Books Catalog — 75,000+ Literary Works with Genre, Era & Classification (1971–2025)
About this data
Comprehensive catalog of 75,545 public domain literary works from Project Gutenberg, enriched with genre classification, literary era mapping, and Library of Congress subject area categorization. Covers works in 58+ languages from ancient texts to early 20th-century literature. **Sources:** - Project Gutenberg digital library catalog (primary metadata: titles, authors, dates, subjects, Library of Congress Classification) - Library of Congress Classification scheme (subject area mapping) - Literary period taxonomy (era classification from Medieval through Contemporary) - Custom NLP-derived genre classification across 20+ categories **Schema (23 columns):** - `gutenberg_id` — Unique Project Gutenberg text identifier - `title` — Full title of the work - `author` — Primary author name (normalized to "First Last" format) - `author_birth_year` / `author_death_year` — Author life dates - `num_authors` — Number of credited authors - `language_code` — ISO language code - `language` — Full language name - `issued_date` — Date digitized/added to Project Gutenberg - `primary_subject` — Primary subject heading - `subject_count` — Total number of subject headings - `locc_classification` — Library of Congress Classification code(s) - `locc_area` — Mapped LoCC broad subject area - `genre` — Derived genre (Fiction, Poetry, History, Science Fiction, Mystery, etc.) - `literary_era` — Estimated literary period (Medieval, Renaissance, Romantic, Victorian, Modern, Contemporary) - `bookshelf` — Project Gutenberg bookshelf category - `source` — Data source identifier - `url` — Direct link to the work - `license` — License type (all Public Domain) - `title_word_count` — Number of words in title - `has_author` — Whether author is known (1/0) - `is_english` — English language flag (1/0) - `has_classification` — Has LoCC classification (1/0) **Coverage:** 75,545 unique works across 58+ languages. 60K+ English works plus significant French (4K), Finnish (3.5K), German (2.3K), and 50+ other language collections. Literary eras span from Ancient/Medieval through Contemporary. **Use cases:** Literary analysis, NLP training data catalogs, bibliometric research, digital humanities, author network analysis, genre classification benchmarking, language diversity studies, cultural heritage research.
Schema
| Name | Type | Description |
|---|---|---|
| gutenberg_id | string | |
| title | string | |
| author | string | |
| author_birth_year | string | |
| author_death_year | string | |
| num_authors | string | |
| language_code | string | |
| language | string | |
| issued_date | string | |
| primary_subject | string | |
| subject_count | string | |
| locc_classification | string | |
| locc_area | string | |
| genre | string | |
| literary_era | string | |
| bookshelf | string | |
| source | string | |
| url | string | |
| license | string | |
| title_word_count | string | |
| has_author | string | |
| is_english | string | |
| has_classification | string |
Sample Data
Preview a sample of the data before downloading.
For AI Agents
# 1. Add to your agent's MCP config (claude_desktop_config.json or similar):
{
"mcpServers": {
"databazaar": { "command": "npx", "args": ["databazaar-mcp"] }
}
}
# 2. Your agent can then call:
search_datasets({ query: "Global Public Domain Books Cat" })
// Found: 9e20a575-5493-47d9-b71b-ad0dc12be01a
get_download_url({ dataset_id: "9e20a575-5493-47d9-b71b-ad0dc12be01a" }) // free — no API key needed# Free dataset — no API key required: curl https://api.databazaar.io/datasets/9e20a575-5493-47d9-b71b-ad0dc12be01a/download-url