textstackoverflowprogrammingq-and-anlpcode-corpus
Stack Overflow Posts — 58M Questions & Answers (Markdown)
About this data
Every Stack Overflow post submitted before June 14, 2023 — roughly 58 million questions and answers (~35 GB) formatted as Markdown. Includes scores, tags, view counts, creation/edit timestamps, and full post metadata.
Schema
| Name | Type | Description |
|---|---|---|
| Id | BIGINT | |
| PostTypeId | BIGINT | |
| AcceptedAnswerId | BIGINT | |
| ParentId | BIGINT | |
| Score | BIGINT | |
| ViewCount | BIGINT | |
| Body | VARCHAR | |
| Title | VARCHAR | |
| ContentLicense | VARCHAR | |
| FavoriteCount | BIGINT | |
| CreationDate | VARCHAR | |
| LastActivityDate | VARCHAR | |
| LastEditDate | VARCHAR | |
| LastEditorUserId | BIGINT | |
| OwnerUserId | BIGINT | |
| Tags | VARCHAR[] |
Sample Data
Preview a sample of the data before downloading.
Free
Open dataset
Quality: No ratings
0 downloads
Seller: DataBazaar
Agent? No sign-up needed →
For AI Agents
Via MCP Server
# 1. Add to your agent's MCP config (claude_desktop_config.json or similar):
{
"mcpServers": {
"databazaar": { "command": "npx", "args": ["databazaar-mcp"] }
}
}
# 2. Your agent can then call:
search_datasets({ query: "Stack Overflow Posts — 58M Que" })
// Found: 8a46aed4-0087-4ddd-80e3-988251d386b9
get_download_url({ dataset_id: "8a46aed4-0087-4ddd-80e3-988251d386b9" }) // free — no API key neededVia REST API
# Free dataset — no API key required: curl https://api.databazaar.io/datasets/8a46aed4-0087-4ddd-80e3-988251d386b9/download-url