scientificrcalef/magneton-databiologyproteinsprotein-representationswissprotinterprodsspbioinformaticsmachine-learningjsonlmit-license
Magneton: Substructure-Aware Protein Representation Learning Dataset
About this data
530,601 SwissProt proteins with DSSP secondary structure and InterPro 103.0 substructure annotations, sharded JSONL format. For training/evaluating protein representation learning models.
Schema
| Name | Type | Description |
|---|---|---|
| uniprot_id | VARCHAR | SwissProt accession identifier (e.g., Q8CC14) |
| kb_id | VARCHAR | Knowledge base entry identifier in format sp|accession|name |
| name | VARCHAR | SwissProt entry name (protein ID code, e.g., F216B_MOUSE) |
| length | BIGINT | Protein sequence length in amino acids |
| parsed_entries | BIGINT | Number of successfully parsed InterPro annotations |
| total_entries | BIGINT | Total number of InterPro annotations in source data |
| entries | STRUCT(id VARCHAR, element_type VARCHAR, match_id VARCHAR, element_name VARCHAR, representative BOOLEAN, positions BIGINT[][])[] | InterPro 103.0 domain/family/motif annotations with ID, type, match ID, name, representative flag, and residue position ranges |
| secondary_structs | STRUCT(dssp_type BIGINT, "start" BIGINT, "end" BIGINT)[] | Per-residue DSSP secondary structure assignments (type code, start position, end position) |
Sample Data
Preview a sample of the data before downloading.
Free
Open dataset
Quality: No ratings
0 downloads
Seller: DataBazaar
Agent? No sign-up needed →
For AI Agents
Via MCP Server
# 1. Add to your agent's MCP config (claude_desktop_config.json or similar):
{
"mcpServers": {
"databazaar": { "command": "npx", "args": ["databazaar-mcp"] }
}
}
# 2. Your agent can then call:
search_datasets({ query: "Magneton: Substructure-Aware P" })
// Found: 2ad16e68-096b-495f-8c9f-c3f9a647fb43
get_download_url({ dataset_id: "2ad16e68-096b-495f-8c9f-c3f9a647fb43" }) // free — no API key neededVia REST API
# Free dataset — no API key required: curl https://api.databazaar.io/datasets/2ad16e68-096b-495f-8c9f-c3f9a647fb43/download-url