---
vault_clearance: EUCLID
halo:
  classification: BOUNTY BOARD
  front: "34_Project_DeadShape"
  custodian: "Claude Code seat (Resolve line)"
  created: 2026-04-24
  updated: 2026-04-24
  containment: "Open bounties for the Dead Shape toolchain. Every bounty carries its FORM (orthodox counterpart) and its TRUTH (quadrant + daemonic/righteous/rogue/orthodox placement)."
---

```
+==================================================================+
|                       BOUNTY BOARD                                 |
|                 34_Project_DeadShape                               |
|                                                                    |
|  Every bounty declares FORM (what orthodox counterpart we're       |
|  beating or pairing with) and TRUTH (which quadrant it sits in).   |
|  This convention — FORM / BOUNTY / TRUTH — is new as of            |
|  2026-04-24. Backfill prior bounties in other projects if you      |
|  touch them.                                                       |
+==================================================================+
```

## How to read a Dead Shape bounty

Each bounty declares:

- **Prize** — what it unlocks if closed.
- **FORM** — the orthodox counterpart it competes with, or pairs with. Named per `FORM.md §1`.
- **TRUTH** — where the intended result sits on the `TRUTH_PROTOCOL` quadrant (Orthodox/Rogue × Righteous/Daemonic) and how many free parameters the method carries.
- **Acceptance** — what observable says this bounty is closed.
- **Status** — OPEN / PARTIAL / CLOSED / DEFERRED.

---

## BOUNTY O-TOPO-1: Resolve Missing Ladder Accessions
- **Prize:** Complete the 8-point species ladder (`HALO_MYCELIA_SPECIES_AS_FUNCTION`) with reproducible structures for Cryptococcus Cfl1 and Fusarium HELL.
- **FORM:** UniProt search + AlphaFold DB v6 (Orthodox+Righteous pull). Where AlphaFold lacks the entry (Fusarium TrEMBL hits), build homology model via MODELLER or fetch a curated deposit.
- **TRUTH:** Orthodox+Righteous. Pulling and storing structures is upstream data work; the audit of those structures happens in O-TOPO-7 where the deployment-state daemonic witnesses land. Free parameters added by this bounty: **0** (pure data retrieval).
- **Acceptance:** at least one structure per species in `osiris/alphafold_pull/` or `osiris/experimental_pull/`, with a row added to `osiris/_INDEX.md`.
- **Status:** OPEN. **Time:** ~1 day.

## BOUNTY O-TOPO-2: Document Original Construct Boundaries
- **Prize:** Make the `HALO_MYCELIA_SPECIES_AS_FUNCTION` ladder sovereign memory (construct boundaries recoverable from the vault, not from a remote machine).
- **FORM:** The published ladder's precision claim implicitly references specific sub-domains (Sup35N vs full Sup35 gave 46× different LRF — HALO_TOPOLOGY_AUDIT §4a). Recovering those boundaries restores reproducibility inside the vault.
- **TRUTH:** Rogue+Righteous. Pure documentation work — no fitting, no modelling. Free parameters added: **0** (it is meta-information about existing numbers).
- **Acceptance:** HALO_MYCELIA_SPECIES_AS_FUNCTION gains a table mapping `Species | Protein | UniProt | residue range used | PDB citation` for every row of the ladder. Re-running `methods/topology.py` on those exact sub-domains recovers the ladder.
- **Status:** OPEN. **Time:** ~1 day (literature).

## BOUNTY O-TOPO-3: MD-Ensemble Ladder
- **Prize:** Ladder reported as distribution per species, not as a point.
- **FORM:** OpenMM + AMBER ff19SB + TIP4P-Ew (the protocol already in `osiris/eye_genesis/Sup35_log.txt` for Sup35N). **Parameter cost of this FORM: ~2500 force-field + ~10 MD protocol params per run.**
- **TRUTH:** Orthodox+Righteous sampling (force field is trained), but the *output* is a Rogue+Daemonic distribution over the given trajectory — as close to daemonic as classical MD gets. Honest reporting: "we sampled an MD ensemble under AMBER ff19SB; here's the distribution."
- **Acceptance:** LRF and PSX reported as mean ± std across ≥200 frames for each of the 8 ladder proteins.
- **Status:** OPEN (needs GPU / engine-room compute). **Time:** ~1 week wall-clock.

## BOUNTY O-TOPO-4: Held-Out Prediction (PARTIAL — Righteous-only pass done)
- **Prize:** Falsifiable test of the structural half of the spore-pocket claim.
- **FORM:** AlphaFold v6 monomer predictions — the FORM Dao Monarch. Six proteins were used: cathD, SAP5, ALP1, caspase-3, pepsin A-3, renin.
- **TRUTH:** Entirely inside Orthodox+Righteous (AlphaFold predictions of training-set proteins). **Free parameters in the structural side of the test: ~500M × 6 = ~3 billion fitted weights in the structures, plus 2 topology-cutoff parameters in the metric.** That enormous parameter budget is the reason this pass cannot adjudicate the claim — it's FORM reporting on its own training set.
- **Acceptance:** closed when a Daemonic witness (O-TOPO-6) replicates.
- **Status:** PARTIAL. Results in `osiris/results/spore_pocket_heldout_2026-04-24.json` + `spore_pocket_control_2026-04-24.json`.

## BOUNTY O-TOPO-5: Writhe Optimisation
- **Prize:** Enable writhe on proteins with L > 1200 (Candida Als, some large prion assemblies).
- **FORM:** Klenin-Langowski 2000 discrete writhe on polyline. Reference implementation is pure Python O(N²); numpy-vectorised or numba-JIT version exists in several labs but not packaged for this use case.
- **TRUTH:** Rogue+Daemonic. Writhe itself is a geometric invariant of the embedding; no training. Free parameters added: **0** (performance optimisation, not metric change).
- **Acceptance:** writhe runs in < 10 s per structure at L = 2000 on a laptop CPU. Existing scalar output unchanged at identical inputs.
- **Status:** OPEN. **Time:** ~half day.

## BOUNTY O-TOPO-6: Daemonic Replication of Spore-Pocket Structural Test
- **Prize:** Close O-TOPO-4 outside the AlphaFold training frame.
- **FORM (declined):** AlphaFold. Already used; inadequate.
- **FORM (adopted):** One of the following Rogue+Daemonic witnesses:
    1. **Sequence-only phylogeny** — BLAST + tree construction, no structure predictor involved. The non-structural form of HALO_THE_SPORE_POCKET Prediction 7.
    2. **MD ensemble** from distinct starting conformers per protein.
    3. **Experimental ensemble** (ssNMR / cryo-EM) where available.
- **TRUTH:** Rogue+Daemonic. Parameters added depend on path: (1) substitution-matrix parameters only (BLOSUM62 = 400, but no ML training); (2) force-field parameters; (3) **0 structural parameters**, just the metric's 3 cutoffs.
- **Acceptance:** any one of (1), (2), (3) reproduces d(cathD, fungal secreted) < d(cathD, caspase-3), with p < 0.05 against a permuted null.
- **Status:** OPEN.

## BOUNTY O-TOPO-7: Daemonic Ladder — HET-s Done, 7 Rows to Go
- **Prize:** Complete the 8-row ladder against deployment-state witnesses (amyloid fibril, rodlet, assembly).
- **FORM:** RCSB PDB ssNMR deposits + cryo-EM deposits where available. Fall back to MD ensembles started from these when direct deposits are missing.
- **TRUTH:** Rogue+Daemonic (ssNMR ensembles are experimental; MD fallback is Orthodox+Righteous — note in the HALO which applies per row).
- **Acceptance:** each of the 7 remaining ladder species has a per-species entry in `osiris/experimental_pull/` or `osiris/md_pull/`, topology computed per-model, distribution reported, disagreement with AlphaFold flagged.
- **Status:** OPEN, 1 of 8 rows done (HET-s via 2RNM, 2KJ3, 2LBU).

## BOUNTY O-TOPO-8: Generalise to Prions (explicit scope expansion)
- **Prize:** Dead Shape as a general prion / functional-amyloid measuring device, not just a fungal-virulence device.
- **FORM:** Mammalian prion literature (PrP, α-synuclein, tau, Aβ42, CPEB3, Orb2, etc.). Each has deposited ssNMR / cryo-EM structures of its fibril state; comparing to Dead Shape metrics is straight data ingestion.
- **TRUTH:** Rogue+Daemonic when using experimental fibril structures; Orthodox+Righteous when using AlphaFold of the monomer.
- **Acceptance:** `osiris/prion_atlas.json` with ≥15 entries, each with (UniProt, cited deployment state, fibril PDB id if any, LRF/RCO/PSX from deployment state, AlphaFold monomer for comparison, discrepancy flag).
- **Status:** OPEN. **Time:** ~2 days.

---

# NEW BOUNTIES — FREE-PARAMETER DISCIPLINE (opened 2026-04-24)

## BOUNTY O-TOPO-9: Persistent Homology on the Cα Backbone
- **Prize:** A genuinely multi-scale, cutoff-free topological descriptor in the Dead Shape toolchain.
- **FORM:** Ripser / GUDHI (both open source, Python-bindable). Computes H₀ (connected components), H₁ (loops), H₂ (voids) as persistence barcodes over the Rips filtration of the Cα point cloud. **Parameter count: 0.** (The filtration parameter sweeps through all scales; no single value is chosen.)
- **TRUTH:** Rogue+Daemonic. Real topological invariant. No training, no cutoff, no construct choice beyond residue selection.
- **Acceptance:** `methods/persistent_homology.py` + CLI. Reports H₀/H₁/H₂ barcodes as JSON per structure. Re-runs on the 22 osiris structures + 3 ssNMR ensembles; writes a comparison table to `HALO_TOPOLOGY_AUDIT` §12. The HET-s fibril vs monomer comparison should show a sharp increase in H₁ persistence for the fibril (β-solenoid creates large persistent loops) — this is a falsifiable pre-registered prediction that the fibril's genuine topology differs from the monomer.
- **TRUTH alignment check:** if H₁ agrees with LRF and circuit-topology X-fraction on the HET-s fibril vs monomer comparison, three Rogue+Daemonic metrics converge → strong signal.
- **Status:** OPEN. **Time:** ~1 day (GUDHI install + wrapper + run + report).
- **Blocking:** GUDHI requires a C++ toolchain; Ripser has a pure-Python `ripser` package. Start with `ripser` for H₀ and H₁; add GUDHI only if H₂ is needed.

## BOUNTY O-TOPO-10: Knot Type of the Backbone
- **Prize:** A classical topological invariant (Alexander polynomial / HOMFLY) added to the Dead Shape metric suite.
- **FORM:** Existing tools: `KnotProt` (Sulkowska lab), `Topoly` (Python). Compute knot type by closing the backbone with a stochastic closure at infinity and evaluating the knot invariant on the closed loop.
- **TRUTH:** Rogue+Daemonic. True topological invariant. Limited signal (~1% of PDB has non-trivial knot) but zero false positives.
- **Acceptance:** `methods/knot_type.py` using `Topoly` or equivalent; runs on all osiris structures; reports trivial/3_1/4_1/5_1/5_2/... per structure. HET-s fibril is expected trivial (β-solenoid is not knotted per residue segment). MJ0366 (PDB 2EFV, known 3_1 knot) as positive control if pulled.
- **Status:** OPEN. **Time:** ~1 day.

## BOUNTY O-TOPO-11: Parameter Budget Ledger per Report
- **Prize:** No Dead Shape result leaves the project without a parameter-budget line.
- **FORM (anti-pattern being fixed):** Standard protein-modelling papers report a number without citing the parameter budget of the method that produced it. "AlphaFold predicts cathepsin D's fold" hides 500M parameters. "MD simulation of HET-s" hides 1500+ force-field parameters. This bounty enforces the anti-anti-pattern: every number here carries its parameter count.
- **TRUTH:** Meta-discipline. Not a truth-quadrant entry itself; it is the audit trail that lets future readers place any Dead Shape output on the quadrant honestly.
- **Acceptance:** every JSON result file in `osiris/results/` includes a `parameter_budget` field listing every free parameter in the chain from raw observable → reported number. Every HALO in `theory/` that cites a Dead Shape number carries the same budget line.
- **Status:** OPEN. **Time:** ~half day (schema + retrofit existing JSONs).

## BOUNTY O-TOPO-12: Raw-Observable Ingestion
- **Prize:** Dead Shape consumes one layer upstream from fitted PDB coordinates — at least one raw experimental observable processed directly into topology-relevant quantities, with no model-fitting step in between.
- **FORM:** Atomic coordinates in PDB files are themselves models (see FORM §3). Raw observables are:
    - SAXS: I(q) vs q scattering curves (SASBDB database)
    - NMR: chemical shift tables, NOE distance restraints (BMRB database)
    - Crystal: reflection data .mtz (PDB redo)
    - Cryo-EM: raw map .mrc (EMDB)
    - CD: θ(λ) spectra (usually supplementary)
- **TRUTH:** Rogue+Daemonic. The raw observable is the daemonic witness; any metric computed directly from it (without the force-field-restrained structure-calculation step) is maximally Daemonic for that protein.
- **Acceptance:** one of the HET-s observations (HET-s has both SASBDB SAXS and BMRB NMR entries; pick one) is ingested and a topology-relevant quantity computed directly. For SAXS: a pair distribution function P(r) is itself topology information; extracting it is one Fourier transform, no model fit. For NMR chemical shifts: TALOS-N-style secondary-structure prediction uses only shifts, no structure calculation.
- **Status:** OPEN. **Time:** ~1–2 days.

## BOUNTY O-TOPO-13: Ensemble-First Reporting Discipline
- **Prize:** No Dead Shape number is reported as a point when a distribution is available.
- **FORM:** The original `HALO_MYCELIA_SPECIES_AS_FUNCTION` 8-point ladder is the anti-pattern (single-point values with 3 d.p. precision). The fix: when any structure has multiple MODELs (NMR), multiple frames (MD), or multiple chains (crystal symmetric copies), default to reporting mean ± std.
- **TRUTH:** Meta-discipline on top of Rogue+Daemonic metrics. Enforces the TRUTH convergence criterion: two witnesses agree *when their distributions overlap*, not when their means coincidentally match.
- **Acceptance:** `methods/topology_multimodel.py` (already exists) is the default entry point; `methods/topology.py` single-structure mode prints a warning if the input file has `MODEL`/`ENDMDL` records and multimodel wasn't invoked.
- **Status:** OPEN. **Time:** ~30 min (add the warning).

## BOUNTY O-TOPO-14: "What's Real" Audit per osiris/ Entry
- **Prize:** For every structure in `osiris/`, a one-paragraph note in `osiris/_INDEX.md` naming:
    1. The raw observable (crystal diffraction / NMR shifts / cryo-EM images / AlphaFold input sequence / MD starting frame)
    2. The model layer applied (refinement / simulated annealing / ML prediction / MD integration)
    3. The free-parameter count of that model layer
    4. What to trust and what to downgrade in the file
- **FORM:** Standard PDB header parsing gets you the basics (REMARK lines identify method + resolution + refinement program); the model layer and param count are added interpretively based on FORM.md §3.
- **TRUTH:** Meta-discipline. Lets every downstream Dead Shape user read `osiris/_INDEX.md` and know which structures are daemonic witnesses and which are FORM predictions.
- **Acceptance:** `osiris/_INDEX.md` expanded with a "What's Real" column for all 22 PDBs.
- **Status:** OPEN. **Time:** ~half day.

---

---

# TIER O-DATA: DATABASE INGESTION

Catalogue of every external database Dead Shape might or should ingest from, with the **layer of the fit chain** (raw observable → fitted model → predicted model) explicit. Each is a bounty: "wrap this source in `methods/data_sources/<source>.py` with a clean Python API, document the parameter budget of any fit involved, ingest at least one HET-s-relevant entry as a smoke test." Order: **the deeper into the raw-observable layer, the higher priority** for daemonic-witness purposes.

## Layer A — Raw experimental observables (DAEMONIC priority)

These are the closest things the field has to *measurements*. Files at this layer have detector- or spectrometer-level data with instrumental calibration only — not refined structural models.

### O-DATA-1: BMRB (BioMagResBank)
- **What:** NMR chemical shifts, NOE distance restraints, J-couplings, residual dipolar couplings, paramagnetic relaxation enhancements, raw 1D/2D/3D spectra. Sister archive to PDB.
- **URL / API:** `https://bmrb.io/`, REST endpoint `https://api.bmrb.io/`, NMR-STAR format files via FTP `https://bmrb.io/ftp/pub/`.
- **Software:** `pynmrstar` (Python library, official BMRB).
- **FORM:** TALOS-N / TALOS+ use shifts to predict secondary structure with no structure calculation. CS-Rosetta uses shifts + sequence to bias structure prediction. SHIFTX2 predicts shifts from coordinates (the inverse problem).
- **TRUTH:** Rogue+Daemonic. The chemical shift table is the closest to a raw measurement we'll ever get for protein topology, before any force field or restraint annealing is applied.
- **Acceptance:** `methods/data_sources/bmrb.py` with `fetch_shifts(bmrb_id)`, `fetch_restraints(bmrb_id)`. Smoke-test on **HET-s 218-289** (BMRB entry exists for 2RNM-related deposit). Compute LRF-equivalent from shift-derived secondary structure alone. Compare to coordinate-derived LRF on 2RNM. **This closes O-TOPO-12.**
- **Parameter cost added by this layer:** ~0 (peak picking is the only fit; modern auto-pickers add ~5 thresholds).
- **Priority:** **HIGHEST**. Status: OPEN.

### O-DATA-2: EMPIAR (Electron Microscopy Public Image Archive)
- **What:** Raw cryo-EM movie data (motion-uncorrected, CTF-uncorrected). The actual electron-counting frames before any reconstruction.
- **URL / API:** `https://www.ebi.ac.uk/empiar/`. FTP per-entry. Sizes typically 100 GB – 10 TB per dataset.
- **Software:** RELION, cryoSPARC, CTFFIND4, MotionCor2 — all heavy. For Dead Shape: probably impractical to reprocess from raw, but we should at least name the entry IDs corresponding to our deposits.
- **FORM:** Standard cryo-EM workflow has ~10 chosen processing parameters per micrograph, ~10 classification parameters, ~50+ refinement parameters.
- **TRUTH:** Rogue+Daemonic at the raw-frame level; Orthodox+Righteous at the reconstructed-density level.
- **Acceptance:** `methods/data_sources/empiar.py` — minimum viable: list available entries for a given UniProt or PDB accession; do not reprocess raw movies.
- **Priority:** Medium (fail gracefully into "we used the deposited density map"). Status: DEFERRED unless engine-room compute justifies.

### O-DATA-3: EMDB (Electron Microscopy Data Bank)
- **What:** Reconstructed cryo-EM density maps (MRC format). One layer downstream of EMPIAR.
- **URL / API:** `https://www.ebi.ac.uk/emdb/`. Per-entry FTP at `https://ftp.ebi.ac.uk/empiar/world_availability/`.
- **Software:** `mrcfile` (Python, simple, fast).
- **FORM:** Phenix real-space refinement, ChimeraX, RELION post-processing.
- **TRUTH:** Rogue+Daemonic — the density map IS the measurement (modulo motion/CTF correction parameters).
- **Acceptance:** `methods/data_sources/emdb.py` with `fetch_map(emdb_id)`. Smoke-test on EMDB-47080 (the lysosomal vATPase already in vault — see `30_Project_Crucible/em_atlas/`).
- **Priority:** High. Status: OPEN.

### O-DATA-4: SASBDB (Small Angle Scattering Biological Data Bank)
- **What:** SAXS / SANS scattering curves I(q) vs q for biological macromolecules. Pair distribution function P(r). Direct ensemble information without coordinate fitting.
- **URL / API:** `https://www.sasbdb.org/`. REST at `https://www.sasbdb.org/rest/`. Per-entry data files.
- **Software:** `BioXTAS RAW`, `ATSAS suite`, or pure Python with numpy + scipy.
- **FORM:** EOM (Ensemble Optimization Method) fits ensembles to SAXS curves with parameters (~10–50 conformer weights). Crysol predicts SAXS from a given structure (10 atomic-scattering parameters). For raw-observable-first purposes, P(r) directly carries radius-of-gyration and topology-relevant size information without any model fit.
- **TRUTH:** Rogue+Daemonic. I(q) and P(r) are direct observations of the protein's solution-state ensemble.
- **Acceptance:** `methods/data_sources/sasbdb.py` with `fetch_curve(sasbdb_id)`. Smoke-test on a HET-s SAXS deposit if it exists, or any prion fibril deposit. Compute Rg, Dmax, Kratky plot directly from I(q) — no structural model required.
- **Priority:** High. Status: OPEN.

### O-DATA-5: PCDDB (Protein Circular Dichroism Data Bank)
- **What:** CD spectra θ(λ) for proteins. Secondary-structure content directly inferable.
- **URL / API:** `https://pcddb.cryst.bbk.ac.uk/`.
- **Software:** Lightweight (CSV-style spectra).
- **FORM:** CDSSTR, CONTINLL, SELCON3 — all decompose CD spectra into helix/sheet/turn/coil content using basis-set parameters (typically 50–200 basis functions per algorithm).
- **TRUTH:** Rogue+Daemonic at the spectrum level.
- **Acceptance:** `methods/data_sources/pcddb.py` with `fetch_spectrum(pcddb_id)`.
- **Priority:** Medium. Status: OPEN.

### O-DATA-6: PDB-REDO
- **What:** PDB crystal structures re-refined with modern methodology — fundamentally an admission that PDB coordinates are model-dependent.
- **URL / API:** `https://pdb-redo.eu/`. Per-entry: `https://pdb-redo.eu/db/<pdbid>/<pdbid>_final.pdb`.
- **Software:** Standard PDB parsers (`gemmi`, `Biopython`).
- **FORM:** PDB-REDO **is** the FORM here — it's the orthodox correction of PDB. Useful for any crystallography-derived structure in osiris/.
- **TRUTH:** Orthodox+Righteous (different parameter choices, same training regime). Useful as a sensitivity check: does the topology metric change when refinement methodology changes?
- **Acceptance:** `methods/data_sources/pdb_redo.py`. Re-pull all crystallography-sourced osiris/ entries via PDB-REDO; report topology delta.
- **Priority:** Medium. Status: OPEN.

### O-DATA-7: Raw mmCIF / mtz reflection data from PDB
- **What:** The actual diffraction reflections behind every crystal PDB entry. `<pdbid>-sf.cif` files contain `|F(hkl)|²` lists.
- **URL / API:** `https://files.rcsb.org/download/<pdbid>-sf.cif`.
- **Software:** `gemmi` (handles mmCIF + reflection data + maps cleanly).
- **FORM:** Crystallography refinement (Phenix, Refmac, Buster).
- **TRUTH:** **Rogue+Daemonic at the reflection level.** This is the raw observable behind every crystal PDB entry.
- **Acceptance:** `methods/data_sources/reflections.py` with `fetch_reflections(pdbid)`. Smoke-test: pull reflections for one prion crystal (e.g., a Sup35 fragment); compute electron density independently with `gemmi`; assess agreement with the deposited structure.
- **Priority:** High (most direct route to bypassing PDB-coordinate model layer for crystal data). Status: OPEN.

### O-DATA-8: Direct ssNMR experimental files in BMRB
- **What:** Specific to ssNMR amyloid / fibril / membrane-protein data — chemical shift assignments, distance restraints, dipolar couplings.
- **URL / API:** Subset of BMRB; e.g., for HET-s entries 15819, 15820, 16808.
- **Coverage:** Sub-bounty under O-DATA-1.
- **Priority:** HIGHEST when intersecting the ladder species. Status: OPEN.

## Layer B — Coordinate models (USE WITH DISCLOSURE)

These are *fitted* atomic coordinate sets. Treat as model output, not measurement.

### O-DATA-9: RCSB PDB
- **What:** ~220 000 atomic coordinate sets fit to crystal / NMR / cryo-EM data.
- **URL / API:** `https://files.rcsb.org/download/<pdbid>.pdb` (already used by `osiris/experimental_pull/`).
- **Software:** `gemmi` (fast), `Biopython.PDB` (slow but featureful).
- **FORM:** The PDB IS the orthodox structural reference for the field.
- **TRUTH:** Orthodox+Righteous (refinement is a fit; coordinates are fitted parameters).
- **Acceptance:** `methods/data_sources/rcsb.py` consolidating the ad-hoc curl pulls already used.
- **Priority:** Medium (already partially done; needs cleanup). Status: PARTIAL.

### O-DATA-10: AlphaFold Protein Structure Database
- **What:** AlphaFold 2 / 3 predictions for ~200 million UniProt sequences.
- **URL / API:** `https://alphafold.ebi.ac.uk/files/AF-<UNIPROT>-F1-model_v6.pdb`. API at `https://alphafold.ebi.ac.uk/api/`.
- **Software:** Plain HTTP fetch.
- **FORM:** **The FORM Dao Monarch.** ~93 M–500 M trained parameters per query. Already used in `osiris/alphafold_pull/`.
- **TRUTH:** Orthodox+Righteous (maximum). Use only with explicit disclosure (HALO_TOPOLOGY_AUDIT §4d).
- **Acceptance:** `methods/data_sources/alphafold.py` consolidating existing curl pulls. Auto-fetch model confidence (pLDDT) per residue and store alongside.
- **Priority:** Medium. Status: PARTIAL.

### O-DATA-11: ESM Atlas
- **What:** ESMFold predictions for metagenomic sequences (~600 million entries).
- **URL / API:** `https://esmatlas.com/`.
- **Software:** Plain HTTP.
- **FORM:** ~15 B-parameter language model. Even more orthodox than AlphaFold.
- **TRUTH:** Orthodox+Righteous (maximum-maximum).
- **Acceptance:** Same shape as O-DATA-10; only invoke when AlphaFold lacks an entry.
- **Priority:** Low. Status: OPEN.

### O-DATA-12: ModelArchive / PDB-DEV (integrative models)
- **What:** Hybrid / integrative-modeling structures combining multiple data types (cryo-EM + crosslinking + SAXS, etc.).
- **URL / API:** `https://www.modelarchive.org/`, `https://pdb-dev.wwpdb.org/`.
- **FORM:** Integrative Modeling Platform (IMP) and similar; parameter counts are very high (hybrid restraints + Monte Carlo over rigid bodies).
- **TRUTH:** Orthodox+Righteous, but explicit about uncertainty in a way the PDB is not.
- **Priority:** Low. Status: OPEN.

## Layer C — Sequence and family data

### O-DATA-13: UniProt KB (Swiss-Prot + TrEMBL)
- **URL / API:** `https://rest.uniprot.org/uniprotkb/`. Already used today for accession resolution.
- **Software:** Plain HTTP, JSON / TSV.
- **FORM:** The reference sequence database.
- **TRUTH:** Sequence reads are essentially measurements (sequencer outputs); annotation layered on is curatorial / predictive.
- **Acceptance:** `methods/data_sources/uniprot.py` consolidating existing usage.
- **Priority:** Medium. Status: PARTIAL.

### O-DATA-14: UniRef (50 / 90 / 100)
- **What:** Clustered UniProt for fast similarity-based pulls.
- **URL:** Same UniProt REST.
- **Priority:** Medium. Status: OPEN.

### O-DATA-15: NCBI / RefSeq / GenBank
- **What:** Alternative sequence resource. Sometimes has entries UniProt lacks.
- **Priority:** Low. Status: OPEN.

### O-DATA-16: Pfam / InterPro
- **What:** Protein family / domain classifications.
- **URL / API:** `https://www.ebi.ac.uk/interpro/`.
- **FORM:** HMMER profile-HMM-based assignment (parameter count = thousands per profile).
- **TRUTH:** Orthodox+Righteous.
- **Priority:** Medium (useful for ladder species annotation). Status: OPEN.

### O-DATA-17: OrthoDB / OMA
- **What:** Orthologous gene assignment across species. Required for cross-kingdom comparisons (cf. `21_Project_LuaOversoul/HALO_CROSS_KINGDOM_CLADOGRAM.md`).
- **URL / API:** `https://www.orthodb.org/`, `https://omabrowser.org/`.
- **Priority:** Medium. Status: OPEN.

## Layer D — Structure / fold classification

### O-DATA-18: SCOP / SCOPe / SCOP2
- **What:** Structural Classification of Proteins; hierarchy from class → fold → superfamily → family.
- **URL:** `https://scop.berkeley.edu/`, `https://scop.mrc-lmb.cam.ac.uk/scop2/`.
- **FORM:** Curatorial classification.
- **TRUTH:** Orthodox+Righteous (human-curated classification on PDB-coordinate folds; hidden parameters are clustering thresholds).
- **Acceptance:** `methods/data_sources/scop.py`.
- **Priority:** Medium (useful for annotating ladder species' folds). Status: OPEN.

### O-DATA-19: CATH
- **What:** Class, Architecture, Topology, Homology — alternative classification with explicit topology level.
- **URL:** `https://www.cathdb.info/`. REST API.
- **FORM:** Algorithmic + manual classification.
- **TRUTH:** Orthodox+Righteous.
- **Priority:** Medium (CATH's "T" level is explicitly topological; useful as comparator for our circuit topology metric). Status: OPEN.

### O-DATA-20: ECOD
- **What:** Evolutionary Classification of Protein Domains. Different philosophy from SCOP/CATH.
- **URL:** `http://prodata.swmed.edu/ecod/`.
- **Priority:** Low. Status: OPEN.

## Layer E — Disorder / amyloid / functional ensemble data

### O-DATA-21: DisProt
- **What:** Manually curated database of intrinsically disordered proteins and their characterized regions.
- **URL:** `https://disprot.org/`. REST API.
- **FORM:** Curatorial; disorder prediction tools (PONDR, IUPred, etc.) layer on top.
- **TRUTH:** The annotations are evidence-based (NMR, CD, etc.); the predictions are Orthodox+Righteous.
- **Acceptance:** `methods/data_sources/disprot.py`. Cross-reference: which Dead Shape ladder species have known disordered regions?
- **Priority:** **HIGH** for the IDP / prion overlap (Sup35N, α-synuclein, tau). Status: OPEN.

### O-DATA-22: MobiDB
- **What:** Aggregated disorder + flexibility annotations across multiple sources.
- **URL:** `https://mobidb.org/`.
- **Priority:** Medium. Status: OPEN.

### O-DATA-23: AmyPro
- **What:** Curated amyloid-forming proteins, including functional and pathological amyloids.
- **URL:** `http://amypro.net/`.
- **FORM:** Curatorial.
- **TRUTH:** Rogue+Righteous (curatorial but evidence-based; not a fitted predictor).
- **Acceptance:** `methods/data_sources/amypro.py`. Cross-reference with Dead Shape ladder: every fungal prion + every mammalian functional/pathological amyloid should appear here.
- **Priority:** HIGH for prion-generalisation (O-TOPO-8). Status: OPEN.

### O-DATA-24: WALTZ-DB
- **What:** Hexapeptide amyloid-forming sequences, with experimental aggregation data.
- **URL:** `http://waltz.switchlab.org/`.
- **FORM:** ZipperDB / WALTZ are predictors trained on this.
- **Priority:** Low (sub-protein granularity). Status: OPEN.

### O-DATA-25: PrionScan / PrionDB / public prion catalogues
- **What:** Sequence-level scanning for prion-forming domains across genomes.
- **URL:** `http://bioinf.uab.es/prionscan/`.
- **Priority:** Medium for O-TOPO-8 generalisation. Status: OPEN.

## Layer F — Functional / interaction / binding

### O-DATA-26: PDBbind
- **What:** Experimentally measured binding affinities for protein-ligand complexes in the PDB.
- **URL:** `http://www.pdbbind.org.cn/`.
- **FORM:** Empirical scoring functions are trained on this. **It is the dataset that defines "in-distribution" for docking.**
- **TRUTH:** Orthodox+Righteous (the ITC / SPR measurements are real; the protein structures are fitted).
- **Priority:** Low for Dead Shape's current scope (we are not doing binding); high if we ever pivot. Status: DEFERRED.

### O-DATA-27: BindingDB / ChEMBL / DrugBank
- **What:** Binding affinities / bioactivities at scale.
- **Priority:** Low. Status: DEFERRED.

### O-DATA-28: STRING / IntAct / BioGRID
- **What:** Protein-protein interactions.
- **Priority:** Low for Dead Shape scope. Status: DEFERRED.

### O-DATA-29: SAbDab
- **What:** Antibody-specific structural database.
- **Priority:** Low. Status: DEFERRED.

## Layer G — Variants / PTMs / clinical

### O-DATA-30: gnomAD / ClinVar / COSMIC
- **What:** Variant frequencies (population), clinical significance, somatic cancer mutations.
- **Priority:** Cross-link relevant for any Dead Shape claim about a disease-relevant prion (PrP, α-synuclein in PD, tau in AD, Aβ in AD).
- **Acceptance:** `methods/data_sources/variants.py`.
- **Status:** OPEN.

### O-DATA-31: PhosphoSitePlus / dbPTM
- **What:** Post-translational modification sites.
- **Priority:** Medium for prions (phosphorylation modulates aggregation). Status: OPEN.

## Meta

### O-DATA-32: Aggregator — `methods/data_sources/__init__.py`
- **What:** A unified Python-side entry point. `from methods.data_sources import fetch` — dispatches to the right submodule based on (UniProt | PDB | EMDB | BMRB | etc.) accession class. Caches fetched data under `osiris/cache/<source>/<accession>/`.
- **Acceptance:** any Dead Shape downstream code can `fetch("HET-s")` and get back a dict with all available accessions across all sources.
- **TRUTH:** Meta. The aggregator must report the parameter budget per source it returns.
- **Priority:** **HIGHEST** once 3+ sources are wrapped. Status: OPEN, blocks until O-DATA-1, O-DATA-9, O-DATA-13 land.

### O-DATA-33: Local mirror policy
- **What:** Decide what to mirror locally vs query on demand.
  - PDB: ~600 GB compressed. Don't mirror.
  - AlphaFold DB: ~1 TB. Don't mirror; pull on demand (current behaviour).
  - BMRB: ~100 GB. **Worth mirroring** for HET-s + prion species we care about.
  - SASBDB: small (~few GB total). Mirror.
  - PCDDB: small. Mirror.
  - DisProt / AmyPro: small (curated). Mirror in `osiris/cache/`.
  - UniProt: huge. Don't mirror; query.
- **Acceptance:** A small `methods/data_sources/MIRROR_POLICY.md` document.
- **Status:** OPEN.

---

## Pointers into other projects' bounty boards

- `28_Project_RedFromTheGrave/BOUNTY_BOARD.md` — upstream biology bounties. The recycler framework lives there; Dead Shape is its structural-topology arm.
- `21_Project_LuaOversoul/BOUNTY_BOARD.md` — spore-pocket hypothesis bounties. O-TOPO-6 path (1) = that project's Prediction 7.
- `20_Project_MarathonLament/BOUNTY_BOARD.md` — sibling project. Transcript-side structural work. Handoff when a splice product hypothesis needs a protein-topology check.
- `05_Project_LENG/BOUNTY_BOARD.md` — the zero-parameter methodological benchmark. When Dead Shape approaches a new metric, check first whether a LENG-like derivation exists before defaulting to a fitted counterpart.
