---
vault_clearance: EUCLID
halo:
  classification: INTERNAL
  confidence: MEDIUM
  front: "10_Project_DiscordIntoSymphony — BOOK"
  custodian: "The Architect"
  created: 2026-03-30
  updated: 2026-04-01
  wing: CONDITIONAL
  containment: "BOOK — bibliography + Combined Codex (datasets/APIs/methods); not README / WORLDLINE / BOUNTY"
---

# DiscordIntoSymphony (GEM) — BOOK

Canonical bibliography and methods registry for **10_Project_DiscordIntoSymphony** (zero-parameter GEM / droplet-level scRNA, senescence). Convention: [`BOOK_Protocol.md`](../BOOK_Protocol.md). Orientation: [`README.md`](README.md). Open work: [`BOUNTY_BOARD.md`](BOUNTY_BOARD.md) · [`WORLDLINE.md`](WORLDLINE.md). **Orthodox scRNA vs GEM spine:** [`FORM.md`](FORM.md). **Dataset / experiment registry:** **§5 Combined Codex** (fused former `CODEX.md`). **Fusion note:** The narrative literature-gap document [`SENESCENCE_LITERATURE_GAPS.md`](SENESCENCE_LITERATURE_GAPS.md) remains the **annotated** source; this BOOK is the **deduped DOI index**.


### Local registry slice (EYE / STAFF / STARS)

| Surface | Pointers |
|---------|----------|
| **EYEs** | Runs: [`README.md`](README.md) / [`WORLDLINE.md`](WORLDLINE.md) (if present). Registry: [`EYE_PROTOCOL.md`](../EYE_PROTOCOL.md) |
| **STAFF** | Runnable tools: [`STAFF_catalogue.json`](../STAFF_catalogue.json) — filter `project_dir` for this folder. |
| **STARS** | This file; rules: [`BOOK_Protocol.md`](../BOOK_Protocol.md). |
| **Audit sheet** | [`LOGGING_AND_REGISTRY_CHECKLIST.md`](../99_Archive/root_reports/2026-04/LOGGING_AND_REGISTRY_CHECKLIST.md) |


---

## 1. Senescence — reviews and perspectives (deduped DOIs)

| ID | Venue / topic | Identifier |
|----|----------------|------------|
| GEM-B1 | Exp. Mol. Med. — hallmarks / heterogeneity (2025) | [`10.1038/s12276-025-01480-7`](https://doi.org/10.1038/s12276-025-01480-7) |
| GEM-B2 | Nat. Genet. — computational multi-omics | [`10.1038/s41588-025-02314-y`](https://doi.org/10.1038/s41588-025-02314-y) |
| GEM-B3 | npj Aging — pathways, preclinical to therapeutic | [`10.1038/s41514-024-00181-1`](https://doi.org/10.1038/s41514-024-00181-1) |
| GEM-B4 | Nat. Rev. Drug Discov. — senescence as therapeutic target | [`10.1038/s41573-024-01074-4`](https://doi.org/10.1038/s41573-024-01074-4) |
| GEM-B5 | Nat. Rev. Mol. Cell Biol. — SASP physiology / pathology | [`10.1038/s41580-024-00727-x`](https://doi.org/10.1038/s41580-024-00727-x) |
| GEM-B6 | Nat. Aging — SASP in cancer therapy | [`10.1038/s43587-025-01052-4`](https://doi.org/10.1038/s43587-025-01052-4) |
| GEM-B7 | Nat. Rev. Mol. Cell Biol. — chromatin / genome instability (2024) | [`10.1038/s41580-024-00775-3`](https://doi.org/10.1038/s41580-024-00775-3) |
| GEM-B8 | Biogerontology (2025) | [`10.1007/s10522-025-10246-7`](https://doi.org/10.1007/s10522-025-10246-7) |
| GEM-B9 | GeroScience (2025) | [`10.1007/s11357-025-01964-4`](https://doi.org/10.1007/s11357-025-01964-4) |
| GEM-B10 | Cell Death Discovery (2025) | [`10.1038/s41420-025-02655-x`](https://doi.org/10.1038/s41420-025-02655-x) |
| GEM-B11 | Proc. Jpn. Acad., Ser. B (2025) | [`10.2183/pjab.101.014`](https://doi.org/10.2183/pjab.101.014) |
| GEM-B12 | Genet. Mol. Biol. (2024) | [`10.1590/1678-4685-GMB-2023-0311`](https://doi.org/10.1590/1678-4685-GMB-2023-0311) |
| GEM-B13 | J. Clin. Invest. (2018) — mechanisms / SASP / disease | [`10.1172/JCI95148`](https://doi.org/10.1172/JCI95148) |

**PMC / Europe PMC** pointers for subsets of the above appear in [`SENESCENCE_LITERATURE_GAPS.md`](SENESCENCE_LITERATURE_GAPS.md).

---

## 2. Datasets and pipeline cross-refs (curate)

| ID | Kind | Note | Identifier |
|----|------|------|------------|
| GEM-T1 | Pipeline | Unified ingest / QC — cite method papers alongside [`08_Project_Astronomicon/BOOK.md`](../08_Project_Astronomicon/BOOK.md) | *internal* |
| GEM-D1 | Data | Curated senescence atlas / GEO rows — see **§2b**; per-study cards still live in WORLDLINE / bounty as needed | *§2b* |
| GEM-P1 | Export | NIH-facing **orthodox analysis package** (MANIFEST, PACKAGE_README, analysis_package methods) under `orthodox/out/nih_full_export_20260331T065706Z/` — see WORLDLINE Part VII for run context and EYEs used | *internal path* |
| GEM-T2 | Archive | European Nucleotide Archive (EMBL-EBI) — programmatic + browser access | [https://www.ebi.ac.uk/ena/browser/home](https://www.ebi.ac.uk/ena/browser/home) |
| GEM-T3 | Archive | BioStudies (study metadata; ArrayExpress migration) | [https://www.ebi.ac.uk/biostudies/](https://www.ebi.ac.uk/biostudies/) |
| GEM-T4 | Infrastructure | ELIXIR — European life-science data & tools coordination | [https://elixir-europe.org/](https://elixir-europe.org/) |

---

## 2b. Senescence / aging — public datasets and atlas DOIs (curated online pass, US + international)

**Note:** Complements §1–2. Prefer **DOI** + **GEO/SRA** series pages for reproducibility; many studies deposit under **BioProject** — follow links from PubMed “Related information.” **Cell type, treatment/design, and caveats** for GEO accessions: **STARS — FROM GEO · STAR index** (GEM-GEO*) plus full narrative **§2c**.

| ID | Kind | Note | Identifier |
|----|------|------|------------|
| GEM-D2 | Paper + tool | Cell-type senescence landscape / SenePy-style single-cell senescence mapping (*Nat Commun*, 2025) | [10.1038/s41467-025-57047-7](https://doi.org/10.1038/s41467-025-57047-7) |
| GEM-D3 | Paper + GEO | *Tabula Muris Senis* — mouse lifespan scRNA atlas (*Nature*, 2020); **series-level accession** is super-set **GSE132042** (subseries **GSE149590** scRNA, **GSE132040** bulk — see **STARS · FROM GEO**) | [10.1038/s41586-020-2496-1](https://doi.org/10.1038/s41586-020-2496-1) · [GSE132042](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE132042) |
| GEM-D4 | GEO | Human **hESC-derived MSC** replicative senescence scRNA (10x 3′, NextSeq; **T0/T1/T2** × 3 reps) — curated under **STARS · FROM GEO** | [GSE200157](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE200157) |
| GEM-D5 | Portal | Human Cell Atlas (international cell atlas portal; filter tissue / age in UI) | [humancellatlas.org](https://www.humancellatlas.org/) |
| GEM-D6 | Portal | Single Cell Portal (Broad) — many tumor / stress datasets with senescence-relevant contrasts | [singlecell.broadinstitute.org](https://singlecell.broadinstitute.org/) |
| GEM-D7 | Japan | DDBJ Search / INSDC — mirror and cross-link to SRA/ENA records | [ddbj.nig.ac.jp](https://www.ddbj.nig.ac.jp/) |
| GEM-D8 | EU | ArrayExpress via BioStudies (expression metadata) | [ebi.ac.uk/biostudies](https://www.ebi.ac.uk/biostudies/) |

**Keyword discovery (GEO DataSets):** [GEO senescence + single cell (example query)](https://www.ncbi.nlm.nih.gov/gds/?term=senescence+AND+%22single+cell%22) — refine with tissue and organism.

---

## 2c. Curated breakdown — cells, treatments, special considerations

**Scope:** Every **§2b** row (GEM-D2–D8) plus every **STARS · FROM GEO** accession (GEM-GEO1–GEM-GEO3 and sub-accessions). **Source:** NCBI GEO SOFT for series **GSE132042**, **GSE200157**, **GSE235996** (parsed 2026-04-01); series-level text for **GSE149590**, **GSE132040**, **GSE109774**, **GSE193093** where SOFT has no per-sample blocks; *Nat Commun* (SenePy) abstract and figure context for **GEM-D2**. Where GEO metadata is silent, caveats are called out.

### GEM-D2 — SenePy (*Nat Commun*, 2025)

| Dimension | Curated content |
|-----------|-----------------|
| **What it is** | **Method / signature resource**, not a single RNA-seq series. SenePy builds **weighted scRNA signatures** (paper: **72 mouse**, **64 human** cell-type-specific senescence programs) and scores cells in public atlases. |
| **Cells** | **Heterogeneous:** re-uses **many** published single-cell datasets (organism-wide). Mouse aging analyses lean heavily on **Tabula Muris Senis** (see GEM-GEO1a); human panels span tissues and diseases (e.g. COVID lung, heart, spatial Visium examples in figures). |
| **Treatment / design** | **No one “treatment arm”:** contrasts are defined **per figure** (e.g. young vs old age bins, disease vs control, senolytic vs vehicle, OIS time courses). |
| **Special considerations** | **Not GEO-indexable as one GSE.** Must trace each analysis to its **underlying accession(s)** in the paper’s Data availability / methods. Signatures are the deliverable — do not assume a single cell type or batch. Cross-check senescence calls against **orthogonal assays** where the paper provides them (e.g. lineage reporters, spatial context). |

### GEM-GEO1 — GSE132042 (SuperSeries)

| Dimension | Curated content |
|-----------|-----------------|
| **What it is** | **Container SuperSeries** only: **993** sample rows in SOFT are tagged with `Series_sample_id` → **GSE132042**; they mix **Tabula Muris Senis bulk**, **original Tabula Muris** (young) Smart-seq2, **aggregated matrix** rows, and **parabiosis** exports. |
| **Cells** | **Mixed by child study** (see GEM-GEO1a–1d). Do not treat as one experiment. |
| **Treatment / design** | **None at super-series level** — parse **sub-accessions** or sample titles (`bulk` vs `Smart-seq2` vs `parabiosis-facs`). |
| **Special considerations** | **Always subset by child GSE or sample metadata.** Raw TMS scRNA is also advertised on **AWS** (`s3://czb-tabula-muris-senis/` per GSE149590 series text) — GEO may hold **matrix-level** rows rather than every FASTQ. |

### GEM-GEO1a — GSE149590 (Tabula Muris Senis — scRNA entry)

| Dimension | Curated content |
|-----------|-----------------|
| **What it is** | **GEO series** for TMS single-cell resource; SOFT contains **2** samples: `tabula-muris-senis-droplet` and `tabula-muris-senis-facs` (**processed count matrices**, not 900+ cell-level GSM rows). |
| **Cells** | **Mouse (*Mus musculus*)**, **C57BL/6**; **18** tissues / organs in series summary — effectively **organ-resident cell types** as captured by **droplet** vs **FACS** enrichment strategies (two library philosophies). |
| **Treatment / design** | **Aging as the variable:** lifespan series (paired with bulk **GSE132040**). **Not** drug-induced senescence. |
| **Special considerations** | **Droplet vs FACS** are different **cell-population filters** — do not merge without harmonization. **AWS bucket** may hold fuller raw/processed layers than minimal GEO rows. |

### GEM-GEO1b — GSE132040 (Tabula Muris Senis — bulk)

| Dimension | Curated content |
|-----------|-----------------|
| **What it is** | **Bulk RNA-seq** of **17 organs** across lifespan (*M. musculus*). |
| **Cells** | **Tissue homogenates / bulk RNA** — **no single-cell type**; reflects **mixed cell populations** per organ. |
| **Treatment / design** | **Postnatal age** (lifespan); **strain C57BL/6** (bulk subset in super-series SOFT). **Sex:** **665 male**, **268 female**, **14** GEO `Sex` missing among **947** bulk `Tabula Muris Senis (bulk RNA-seq)` samples. |
| **Special considerations** | **Bulk vs scRNA** (1a) measure different biology; **batch / RNA quality** across ages. GEO bulk rows use **18 tissue tags** (including **WBC**, **BAT**, **SCAT**, **GAT**, **MAT**, **Limb Muscle**, **Marrow**, **Skin**, …) while the series title says **17 organs** — treat as **annotation vs author wording** mismatch and reconcile against the paper. **14** samples have **age NA** — drop or impute only with author guidance. **Ages present** (months postnatal): **1, 3, 6, 9, 12, 15, 18, 21, 24, 27** (~**66–102** samples per age except **24–27 mo** fewer). |

### GEM-GEO1c — GSE109774 (original Tabula Muris)

| Dimension | Curated content |
|-----------|-----------------|
| **What it is** | **scRNA** across **20 tissues** in **~3-month** mice. |
| **Cells** | **Organ-resident cells** from named tissues; **46** samples in super-series SOFT use **`age: 3 months`** (not “postnatal” wording) and **Smart-seq2**-style plate protocols in sample extracts (e.g. **NovaSeq**, **mm10** + ERCC in exemplar GSM2967045). Some titles split CNS subtypes (**Brain_Microglia**, **Brain_Neurons**). |
| **Treatment / design** | **Young adult baseline** — **not** a senescence time course. |
| **Special considerations** | **Plate-based Smart-seq2** vs TMS **droplet/FACS** (1a) — different **sensitivity and cell capture**. Use as **young reference**, not old. |

### GEM-GEO1d — GSE193093 (heterochronic parabiosis)

| Dimension | Curated content |
|-----------|-----------------|
| **What it is** | **scRNA** from **heterochronic parabionts**; SOFT under super-series includes at least one aggregate row **`parabiosis-facs`** (**20 organs of male parabionts**). |
| **Cells** | **Male** **C57BL/6**; **20 organs** listed in series design (bladder, brain, BAT, diaphragm, GAT, heart, kidney, large intestine, limb muscle, liver, lung, marrow, MAT, pancreas, skin, spleen, SCAT, thymus, tongue, trachea). |
| **Treatment / design** | **Surgical parabiosis** (young–old shared circulation) — **systemic aging / rejuvenation exposure**, not a small-molecule treatment. |
| **Special considerations** | **Parabiosis confounds:** shared **immune** and **humoral** factors; **male-only** in GEO sample shown. **Smart-seq2 + 10x** mentioned in extract protocol for this series style — confirm per-sample before pooling. **Processed h5ad** referenced in sample data_processing text. |

### GEM-GEO2 — GSE200157 (hESC-derived MSC, replicative senescence, low glucose)

| Dimension | Curated content |
|-----------|-----------------|
| **What it is** | **10x Chromium Single Cell 3′ v2**, **Illumina NextSeq 500**; **PMID 37314668**; aligns to **GRCh38** via **Cell Ranger 3.0.2** (per SOFT). **45** GSM records for **9** biological scRNA libraries (series: **3 timepoints × 3 replicates**) + **SRA run splits**. |
| **Cells** | **Human** **hESC-derived MSCs**; GEO **`cell line: hESC line Genea022`**; source name **`hESC derived Mesenchymal Stem Cells`**. |
| **Treatment / design** | **Low-glucose (LG) culture** during progression into **replicative senescence** (author-associated interpretation; filenames encode **LG**). **Senescence stages T0 → T1 → T2** aligned to culture days **~D23 / D49 / D86** in titles. **15** GSM titles are **`Gen22_LG_D22|D49|D86`** without **`MSC_`** (five instrument/batch prefixes × three days); **30** titles are **`MSC_LG_T0|1|2`** with **rep 2–3** — **rep 1** not present in filenames (verify in primary paper whether merged or omitted). |
| **Special considerations** | **Collapse 45 → 9 libraries** before count modeling. **LG** is a **metabolic condition** — alters MSC state vs high-glucose norms; interpret senescence alongside **mitochondrial / glycolysis** programs. **Time vs batch** partially confounded by naming — use author metadata or **harmony** if batch-correcting. |

### GEM-GEO3 — GSE235996 (MoS2 on bone marrow MSC)

| Dimension | Curated content |
|-----------|-----------------|
| **What it is** | **Bulk RNA-seq** (**6** libraries): **control** vs **MoS2 nanoflowers** at two **vacancy** formulations. |
| **Cells** | **Human bone marrow–derived MSCs** (GEO characteristics: **bone marrow derived mesenchymal stem cells**). |
| **Treatment / design** | **Nanoparticle exposure:** **MoS2 1:1** vs **1:6** (vacancy ratio per series summary) vs **untreated control**; **2** biological RNA-seq replicates per arm (sample titles: `hMSCs 1 control` … `hMSCs 1:6 MoS2 2`). |
| **Special considerations** | **In vitro** only; **acute material / dose** effects on **mitochondrial biogenesis** (paper claim) — not organismal aging. Compare to **GEM-GEO2** only with explicit **batch and species** caveats. |

### GEM-D5 — Human Cell Atlas (portal)

| Dimension | Curated content |
|-----------|-----------------|
| **What it is** | **Discovery portal** aggregating **many** projects. |
| **Cells** | **User-selected** (tissue, organ, disease, donor age). |
| **Treatment / design** | **Varies per project** (metadata columns). |
| **Special considerations** | **No single matrix** — export **per project UUID**. **Consent and PII** differ by cohort; **batch integration** is non-trivial. |

### GEM-D6 — Single Cell Portal (Broad)

| Dimension | Curated content |
|-----------|-----------------|
| **What it is** | **Curated study browser** (Broad). |
| **Cells** | **Per study** (tumor, immune, stress, etc.). |
| **Treatment / design** | **Per study** — read each study’s **metadata**. |
| **Special considerations** | **Senescence** is rarely the primary endpoint — use **gene / module** queries or re-score with **SenePy**-style tools cautiously. |

### GEM-D7 — DDBJ (INSDC mirror)

| Dimension | Curated content |
|-----------|-----------------|
| **What it is** | **Japanese INSDC node** — **metadata + links** to **SRA/DDBJ Sequence Read Archive** records. |
| **Cells** | **Accession-dependent** (same biological entity as SRA/ENA). |
| **Treatment / design** | **Follow BioSample attributes** for each run. |
| **Special considerations** | **Mirror, not alternate biology** — use for **regional latency / compliance**; harmonize IDs with **ENA/NCBI**. |

### GEM-D8 — BioStudies / ArrayExpress legacy

| Dimension | Curated content |
|-----------|-----------------|
| **What it is** | **EBI study metadata** hub (ArrayExpress migration). |
| **Cells** | **Study-specific.** |
| **Treatment / design** | **Study-specific.** |
| **Special considerations** | **ID mapping** from **GEO ↔ BioStudies** may require **manual cross-walk**; prefer **DOI + primary accession** when both exist. |

---

## 3. Bounty → start here

| Workstream | Start with |
|------------|------------|
| Orthodox cultivator | [`FORM.md`](FORM.md), then §1 |
| Field literature / SASP | §1, §2b–c, [`SENESCENCE_LITERATURE_GAPS.md`](SENESCENCE_LITERATURE_GAPS.md) |
| Wet-lab causation | [`BOUNTY_BOARD.md`](BOUNTY_BOARD.md), [`WORLDLINE.md`](WORLDLINE.md), §5 (D-registry) |
| Pipeline tooling | §2–2c, §5, Astronomicon BOOK |
| GenesisCode (APOLLYON) | [`HALO_GENESISCODE.md`](HALO_GENESISCODE.md), [`TRIAD_ANTENNA.md`](TRIAD_ANTENNA.md), [`XIST_CLINICAL_MAP.md`](XIST_CLINICAL_MAP.md) |

---

## 4. GenesisCode references (APOLLYON — handle accordingly)

### Triad Antenna

| ID | Topic | Identifier |
|----|-------|------------|
| GC-B1 | HERVH pluripotency lncRNA network | Lu et al. 2014, *Nat Struct Mol Biol* [`10.1038/nsmb.2799`](https://doi.org/10.1038/nsmb.2799) |
| GC-B2 | HERVH chromatin loops in stem cells | Zhang et al. 2019, *Nat Genet* [`10.1038/s41588-019-0479-7`](https://doi.org/10.1038/s41588-019-0479-7) |
| GC-B3 | Group II intron GNRA metal coordination | Lambowitz & Zimmerly 2011, *Annu Rev Genet* [`PMC3140690`](https://pmc.ncbi.nlm.nih.gov/articles/PMC3140690/) |
| GC-B4 | TEs as major contributors to lncRNA origin | Kapusta et al. 2013, *PLoS Genet* [`10.1371/journal.pgen.1003470`](https://doi.org/10.1371/journal.pgen.1003470) |
| GC-B5 | RIDL hypothesis (TE fragments as lncRNA domains) | Johnson & Guigo 2014, *RNA* [`PMC4114693`](https://pmc.ncbi.nlm.nih.gov/articles/PMC4114693/) |
| GC-B6 | Syncytin convergent domestication | Lavialle et al. 2013, *Phil Trans R Soc B* [`10.1098/rstb.2012.0507`](https://doi.org/10.1098/rstb.2012.0507) |

### XIST / X-inactivation / Sex-dimorphic aging

| ID | Topic | Identifier |
|----|-------|------------|
| GC-B7 | Xi escape doubles with age | Nature Aging 2025 [`10.1038/s43587-025-00856-8`](https://doi.org/10.1038/s43587-025-00856-8) |
| GC-B8 | Xi escape in aging hippocampus | Science Advances 2025 [`10.1126/sciadv.ads8169`](https://doi.org/10.1126/sciadv.ads8169) |
| GC-B9 | XIST directly modulates escape (7-day window) | Nature Cell Biology 2025 [`10.1038/s41556-025-01823-6`](https://doi.org/10.1038/s41556-025-01823-6) |
| GC-B10 | Xist deletion → 100% malignancy | Yildirim et al. 2013, *Cell* [`10.1016/j.cell.2013.01.034`](https://doi.org/10.1016/j.cell.2013.01.034) |
| GC-B11 | TLR7 biallelic → SLE in females | Souyris et al. 2018, *Sci Immunol* [`10.1126/sciimmunol.aap8855`](https://doi.org/10.1126/sciimmunol.aap8855) |
| GC-B12 | EXITS tumor suppressor genes | Dunford et al. 2017, *Nat Genet* [`10.1038/ng.3726`](https://doi.org/10.1038/ng.3726) |
| GC-B13 | Defective XCI increases with aging + cancer (40% risk) | Caceres et al. 2025, *Commun Biol* [`10.1038/s42003-025-07691-y`](https://doi.org/10.1038/s42003-025-07691-y) |
| GC-B14 | XIST loss → breast cancer via MED14 | Richart et al. 2022, *Cell* [`10.1016/j.cell.2022.04.032`](https://doi.org/10.1016/j.cell.2022.04.032) |

### LINC02154 published function

| ID | Topic | Identifier |
|----|-------|------------|
| GC-B15 | LINC02154 KD in esophageal cancer (RNA-seq) | Shimote et al. 2025, *Noncoding RNA Res* [`PMC12173678`](https://pmc.ncbi.nlm.nih.gov/articles/PMC12173678/) |
| GC-B16 | LINC02154 in OSCC (HNRNPK, LRPPRC, mitochondria) | Niinuma et al. 2024, *Cancer Sci* [`PMID:39576738`](https://pubmed.ncbi.nlm.nih.gov/39576738/) |
| GC-B17 | LINC02154 in ccRCC (cuproptosis, FDX1/DLST) | Shen & Wang 2023, *BMC Cancer* [`PMID:36797708`](https://pubmed.ncbi.nlm.nih.gov/36797708/) |
| GC-B18 | LINC02154 3.8x up in immunotherapy-resistant ccRCC | Katifelis et al. 2025, *In Vivo* [`PMID:39740865`](https://pubmed.ncbi.nlm.nih.gov/39740865/) |
| GC-B19 | LINC02154 ASO knockdown protocol (keratinocytes) | Loyer et al. 2026, *Methods Mol Biol* [`PMID:41872415`](https://pubmed.ncbi.nlm.nih.gov/41872415/) |

### Mo clock and intervention

| ID | Topic | Identifier |
|----|-------|------------|
| GC-B20 | MoS2 nanoparticles stimulate mito biogenesis | Nature Comms 2024, GSE235996 — cell type / treatment: **STARS · FROM GEO · GEM-GEO3** [`10.1038/s41467-024-52276-8`](https://doi.org/10.1038/s41467-024-52276-8) |
| GC-B21 | UHRF1 loss → non-canonical senescence + SASP | Nature Comms 2024 [`10.1038/s41467-024-47314-4`](https://doi.org/10.1038/s41467-024-47314-4) |

### HeLa and cancer cell lines

| ID | Topic | Identifier |
|----|-------|------------|
| GC-B22 | CCLE/DepMap cancer cell line expression atlas | DepMap Portal [`depmap.org/portal`](https://depmap.org/portal/) |
| GC-B23 | ENCODE HeLa-S3 multi-omic data | ENCODE Portal [`encodeproject.org`](https://www.encodeproject.org/) |
| GC-B24 | HeLa XIST characterization (RAP-seq) | Engreitz et al. 2013, *Science* [`10.1126/science.1237973`](https://doi.org/10.1126/science.1237973) |

---

## STARS — US and international anchors

**NIH / NLM hubs** pair with §2 ENA / BioStudies / ELIXIR. Use for accession-level work and literature mirrors.

### How to read STARS (context)

- **`GEM-S1`–`GEM-S5`:** **Infrastructure portals** (GEO, SRA, PMC, Europe PMC, DDBJ). They help you **find accessions and full text**; **cell type, treatment, and batch structure** are **not** described in these rows — read **GEO SOFT / BioSample** (and **§2c** for curated GEO cases).
- **`GEM-GEO*` (FROM GEO · STAR):** **Accession-level index** with short biology/design summaries; **full** cells / treatments / caveats: **§2c**.
- **`GEM-D2`–`GEM-D8` (§2b):** Mix of **papers, portals, and archives**; only **GEO-backed** rows overlap **GEM-GEO***.

| ID | What this STAR denotes | Typical use in this BOOK | Not / caveats |
|----|-------------------------|--------------------------|---------------|
| GEM-S1 | NCBI GEO | Land **GSE/GSM** and supplements | Superseries (e.g. **GSE132042**) need child-series disambiguation — see §2c. |
| GEM-S2 | NCBI SRA | Land **SRR** reads | Multiple SRR per library common; map via BioSample. |
| GEM-S3 | PubMed Central | **Open full text** where available | Not every PMID has PMC; check copyright. |
| GEM-S4 | Europe PMC | **EU mirror** + literature graph hooks | Useful for EU-funded author manuscripts. |
| GEM-S5 | DDBJ | **INSDC** Japan mirror | Same logical archive as SRA/ENA with regional entry. |

| ID | Region | Kind | Note | Identifier |
|----|--------|------|------|------------|
| GEM-S1 | US | GEO | NCBI Gene Expression Omnibus | [ncbi.nlm.nih.gov/geo](https://www.ncbi.nlm.nih.gov/geo/) |
| GEM-S2 | US | SRA | NCBI Sequence Read Archive | [ncbi.nlm.nih.gov/sra](https://www.ncbi.nlm.nih.gov/sra) |
| GEM-S3 | US | Full-text companion | PubMed Central (PMC) | [ncbi.nlm.nih.gov/pmc](https://www.ncbi.nlm.nih.gov/pmc/) |
| GEM-S4 | EU / UK | Europe mirror | Europe PMC | [europepmc.org](https://europepmc.org/) |
| GEM-S5 | Japan | INSDC partner | DDBJ | [ddbj.nig.ac.jp](https://www.ddbj.nig.ac.jp/) |

### FROM GEO · STAR index (cell type · treatment / design)

**Tier:** **GEM-GEO*** sits **above** raw portal links and **below** full narrative in **§2c** — use this table for **quick accession semantics**. **How to read STARS (context)** (above) explains **GEM-S*** vs **GEM-GEO***.

**Source index:** each row is anchored on **NCBI GEO** series / sample SOFT metadata (curated 2026-03-31). **FROM GEO · STAR** labels the index element; cross-ref §2b (GEM-D3–D4) and §4 GC-B20. **Per-accession breakdown** (cells, treatments, special considerations): **§2c**.

| STAR ID | Source | Accession | Cell / tissue · type | Treatment · conditions · design | GEO |
|---------|--------|-----------|----------------------|---------------------------------|-----|
| GEM-GEO1 | FROM GEO · STAR | [GSE132042](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE132042) | **SuperSeries** (*Mus musculus*) — umbrella for Tabula Muris–related deposits | **No single cell type:** aggregates subseries below (young atlas + lifespan + parabiosis). | [link](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE132042) |
| GEM-GEO1a | FROM GEO · STAR | [GSE149590](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE149590) | **18 tissues / organs** · scRNA · *M. musculus* | **Tabula Muris Senis** — single-cell transcriptomes **across lifespan**; series notes raw data also on **AWS** `s3://czb-tabula-muris-senis/` | [link](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE149590) |
| GEM-GEO1b | FROM GEO · STAR | [GSE132040](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE132040) | **17 organs** · bulk RNA-seq · *M. musculus* | **Tabula Muris Senis bulk** — transcriptomes across lifespan (companion to scRNA). | [link](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE132040) |
| GEM-GEO1c | FROM GEO · STAR | [GSE109774](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE109774) | **20 tissues** · scRNA · *M. musculus* | **Original Tabula Muris** — ~**3-month** young adults (not a senescence time course). | [link](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE109774) |
| GEM-GEO1d | FROM GEO · STAR | [GSE193093](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE193093) | **20 organs** (bladder, brain, BAT, diaphragm, GAT, heart, kidney, large intestine, limb muscle, liver, lung, marrow, MAT, pancreas, skin, spleen, SCAT, thymus, tongue, trachea) · scRNA | **Heterochronic parabiosis** — experimental union / aging transfer paradigm (*M. musculus*). | [link](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE193093) |
| GEM-GEO2 | FROM GEO · STAR | [GSE200157](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE200157) | **hESC-derived mesenchymal stem cells** (GEO: `hESC derived Mesenchymal Stem Cells`; **cell line Genea022**) · *Homo sapiens* | **Replicative senescence** trajectory — **time points T0, T1, T2** × **biological replicates 1–3**; **10x Genomics Chromium** Single Cell 3′ v2; **Illumina NextSeq 500**; total RNA; aligns to **GRCh38** (per sample SOFT). | [link](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE200157) |
| GEM-GEO3 | FROM GEO · STAR | [GSE235996](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE235996) | **Bone marrow–derived human MSCs** (GEO characteristics: bone marrow derived mesenchymal stem cells) | **Control** vs **MoS2 nanoparticles** — **1:1** (lower vacancy) vs **1:6** (higher vacancy) conditions, **n = 2** RNA-seq libraries per arm (series design: atomic vacancy dose on MSC fate). | [link](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE235996) |

---

## 5. Combined Codex registry (fused — former CODEX.md)

**Fusion (2026-04-01):** Per [vault README fusion rules](../README.md#what-fusion-means-for-ai-collaborators-read-this), the former standalone **[`CODEX.md`](_archive/codex_fused_into_book_2026-04-01/CODEX.md)** (*Combined Codex v3*) is merged into this **`BOOK.md`**. Point links at **section 5** and its subheadings. **Vault-wide code/SQL map:** [`../BOOK_Protocol.md`](../BOOK_Protocol.md), [`../CODE_DATABASE_INDEX.md`](../CODE_DATABASE_INDEX.md).

### Combined Codex overview

> **Single canonical index** for this project: **experiments**, **local datasets** (Human Codex D-, Q-, PW-, OX-, REF-prefixed IDs), **remote RNA APIs and clinical/international archives** (CL01–CL12), **methods/tools** (`methods/`, daemon shadow, orthodox), **pathway GMTs**, **literature pointers**, and **vault infrastructure** links.

### What This Project Is

A zero-parameter GEM-level framework for scRNA-seq that treats 10x Chromium data as physical droplets (GEMs), not "single cells." Applied to endothelial senescence + immune coculture, it revealed the mito-nuclear desynchronization cascade, the simplex error-correction theory of aging, and the TE silencing collapse on chromosome 19. Extended into retrotransposon biology and viral lineage analysis of cancer, culminating in the GBM Viral Lens (338k cells, 110 patients). Spawned [12_Project_BloodyEchoes](../12_Project_BloodyEchoes/) (chr19 UHRF1 investigation). Validated across 15+ datasets, 4.8M+ cells, 5 species.

---

### Experiments

| ID | Name | Cells | Confidence | Status | Key Finding |
|----|------|-------|------------|--------|-------------|
| EXP01 | Primary EC coculture | 905,263 | **HIGH** | Primary dataset | 9/9 desync predictions, DOF entanglement, simplex theory |
| EXP02 | Ovarian cancer + CAF | 9,304 | MEDIUM | Cross-validation | OxPhos-Senescence disconnection cross-tissue |
| EXP03 | Stressed EC (HUVEC) | 59,605 | LOW | Caveated | Time course cascade (P5-8 HUVECs, EndoMT model) |
| EXP04 | Donor-derived EC (T2D) | 11,243 | **HIGH** | Cross-validation | DOF collapse in fresh primary ECs |
| EXP05 | PBMC3k | 2,700 | HIGH | Negative control | No cascade (correct: glycolytic cells) |
| EXP06 | IFN-beta PBMC | 32,484 | HIGH | Negative control | IFN fires without desync (correct) |
| EXP07 | Aging pancreas | 2,544 | MEDIUM | Negative control | No desync in islets (correct: high turnover) |
| EXP08 | Aging PBMC (CMV) | 9,354 | MEDIUM | Negative control | No desync from CMV (correct: glycolytic) |
| EXP09 | Rotenone neurons | -- | -- | PENDING | Causation pre-test (blocked on FASTQ) |
| EXP10 | WI-38 time course | 56,803 | **HIGH** | Cross-validation | DOF precedes epigenetic collapse by 24h |
| EXP11 | Tabula Sapiens aorta | 42,650 | **HIGH** | Cross-validation | EC highest coupling (0.40) in native tissue |
| EXP12 | PD substantia nigra | 434,340 | HIGH | GCP only | Overall coupling 0.20 (needs annotation) |
| EXP13 | Plant leaf | -- | -- | Cross-species | Three-kingdom test |
| EXP14 | Rice root protoplast | 2,146,011 | HIGH | Cross-species | Four-kingdom: plant mito MORE integrated |
| **EXP15** | **GBM Core Map** | **338,564** | **HIGH** | **Complete** | **Deepest viral evasion of any cancer (BT-57)** |
| **EXP17** | **Rotenone DA neurons** | **3,087** | **HIGH** | **Complete** | **Bidirectional desync proof: mito deficit triggers same cascade** |
| EXP18 | Rotenone HepaRG | 40,000 | HIGH | Downloaded | Dose-response (0.2/0.4/0.8 uM) |

Living documents: `data/experiments/EXP*/LIVING_DOCUMENT.md`

**Bio designs (assembled index):** [`BIO_DESIGNS.md`](BIO_DESIGNS.md) — primary coculture arms (P vs S), USB wet-lab folders, EXP grid summary, orthodox stage biology, planned validation targets, peripheral pointers (Golgisoma, FORM row 18, council pad).

**USB bulk ingest (2026-03-30):** wet-lab correspondence + microscopy from removable drive **STORE N GO** now lives under **`data/usb_ingest_STORE_N_GO_2026-03-30/`** (subfolders: `correspondence/`, `Golgi_Experiment_Pictures/`, `Co-culture_Experiment_Pictures/`, `Co-Culture/`, `Jixiang/`). Vault-wide pointer and HALO note: [`../BOOK_Protocol.md`](../BOOK_Protocol.md) subsection *Wet-lab USB ingest (STORE N GO, 2026-03-30)*.

---

### Epistemological Split: Deterministic vs Probabilistic

#### DETERMINISTIC (Thread of GEMs — zero free parameters)

Everything below is reproducible by anyone with the same count matrix. No knobs.

| Layer | Computation | Free params |
|-------|------------|-------------|
| Jaccard co-occurrence | J = intersection / union | 0 |
| Complete pathway activation | ALL genes detected = ON | 0 |
| Dimensional scores | Raw UMI sum per axis | 0 |
| Operator coupling matrix K | Mean Jaccard per kingdom | 0 |
| det(K) error-correction | Determinant of K | 0 |
| Eigenspectrum | eigvalsh on symmetric matrix | 0 |
| Enrichment over independence | Observed / expected | 0 |
| Bridge genes (harmonic) | 2*J_A*J_B / (J_A + J_B) | 0 |
| CORUM complex detection | Binary (all subunits) | 0 |
| Mann-Whitney U tests | Rank test on raw counts | 0 |
| Heat kernel trace | exp(-tL) summed | 0 (t is scanned, not chosen) |

#### PROBABILISTIC (FORM comparison — parameters documented)

| Tool | What it adds | Parameters |
|------|-------------|------------|
| DoRothEA TF activity | Which TFs are active | Fraction threshold (50%), confidence level (A/B) |
| LIANA communication | Ligand-receptor pairs | Database choice, scoring method |
| CellTypist labels | Cell type names | ML model version, training data |
| Thread velocity | Flow direction | k=30 neighbors, attractor threshold |
| Pathway decompose | Sub-modules | Min density 0.15, seed Jaccard |
| Orthodox (FORM) | Standard pipeline comparison | 250 params in PARAMETER_REGISTRY |

**Rule:** Paper 1 uses ONLY the deterministic column. FORM runs alongside as control.

---

### Methods — GEM Pipeline (`methods/`)

| File | Description |
|------|-------------|
| `run_pipeline.py` | Unified entry point: raw 10x -> full results |
| `gem_analysis.py` | GEM program activation (binary: all genes detected = ON) |
| `dimensional_analysis.py` | 10 biology-defined gradient axes (single sparse matmul, 2s) |
| `thread_graph.py` | Core engine: co-occurrence, Jaccard, enrichment, bridges, CORUM, DoRothEA |
| `thread_atlas.py` | MSigDB / CORUM / DoRothEA pathway overlay |
| `thread_plot.py` | Visualization (6 matplotlib plots, offline) |
| `pathway_decompose.py` | Sub-module discovery within pathways |
| `pathway_math.py` | MI, odds ratios, spectral decomposition |
| `test_desync_theory.py` | Operator desync theory: 9/9 prediction tests |
| `operator_error_correction.py` | 3x3 coupling matrix, det(K), simplex analysis |
| `three_kingdoms.py` | Ribosome / mitochondria / nucleus independence test |
| `four_kingdoms.py` | Plant four-kingdom test (+ chloroplast) |
| `four_kingdoms_rice.py` | Rice-specific four-kingdom analysis |
| `shape_of_life.py` | Simplex geometry: operator vertices, volume = det(K) |
| `orthodox_benchmark.py` | GEM vs orthodox comparison scorecard |
| `run_exp10_timecourse.py` | WI-38 time course analysis |
| `run_c7_etc_assembly.py` | ETC complex-specific CORUM assembly |
| `run_communication.py` | LIANA ligand-receptor analysis |
| `sdw_analysis.py` | Spectral diffusion wavelets |
| `sdw_protein.py` | Protein-level spectral analysis |
| `endosymbiont_test.py` | Endosymbiont origin hypothesis test |
| `gpu_init.py` | CUDA DLL loading for Windows |
| `gcp_setup.py` | GCP engine room management |

---

### Methods — Daemon Shadow Engine (`V3_Daemon_Analysis_3-22-26_15-50/`)

| File | Description |
|------|-------------|
| `direct_shadow.py` | **Fast graph builder** — 51k nodes, 1.6M edges, 5 seconds |
| `probe_bottlenecks.py` | min()-semantics bottleneck gene identification |
| `zero_shadow.py` | Deterministic shadow (no magic floor, equal tie-breaking) |
| `expr_shadow.py` | Expression-weighted shadow seeding |
| `pv_shadow.py` | P vs S shadow comparison (gains/losses/shifts) |
| `origin_shadow.py` | Evolutionary origin analysis (ribosome/mito/nucleus factions) |
| `factions.py` | Nuclear factions (growth/suppress/immune/SASP/death/repair) |
| `three_states.py` | Three-state comparison: Proliferative / Senescent / Cancer |
| `retro_factions.py` | **Retrotransposon war** — TE genes + host defense across 3 states (BT-55) |
| `viral_lineage.py` | **Viral lineage of cancer** — 4 lineages, 9 cancer types (BT-56) |
| `gbm_lens.py` | **GBM viral lens** — 338k cells, 5-state comparison (BT-57) |
| `gene_expression.json` | Pre-computed gene expression dict (P1 sample) |
| `stress_test.py` | Song-based stress test (legacy, slow) |
| `gcp_shadow.py` | GCP-adapted shadow runner |

---

### Dataset registry and storage (Human Codex)

Single source of truth for **all datasets** across storage locations. **Remote-only** access (APIs, CL01–CL12) is in [Remote data access](#remote-data-access-rna-apis-clinical-archives-crosswalk) below.

#### Storage locations

| Location | Path | Size | Access |
|----------|------|------|--------|
| **Local laptop** | `C:/Users/jixia/OneDrive/.../10_Project_DiscordIntoSymphony/data/` | ~50 GB | Direct |
| **GCP desync-engine** | `~/desync/data/` | ~400 GB | SSH (key at ~/.ssh/google_compute_engine) |
| **OneDrive** | Synced with local | ~50 GB | Automatic |
| **GCP disk (stopped)** | Persistent when VM stopped | ~400 GB | Start VM first |

#### PRIMARY (our lab)

| ID | Name | Location | Cells | Genes | Format | Pipeline run? | Key finding |
|----|------|----------|-------|-------|--------|--------------|-------------|
| D01 | EC coculture P vs S | local: `gem_analysis.h5ad` | 905,263 | 38,606 | h5ad | YES (full) | 9/9 desync predictions |
| D01-cache | Co-occurrence cache | local: `cooccurrence_cache.npz` | — | 21,249 | npz | — | 15 min to build |
| D01-env | Full pipeline env | local: `pipeline_env.npz` | — | 21,249 | npz | — | J + enrichment cached |
| D01-raw | Raw 10x per sample | local: `CoCultureAnalysis.../P1_raw_feature_bc_matrix/` etc | 6 samples | — | 10x MEX | — | P1-P3, S1-S3 |
| D01-img | Microscopy | local: `Stuff/jl/` and `Stuff/jl^1/` | — | — | jpg/avi | — | 973 images, 114 videos |

#### CROSS-VALIDATION (public, downloaded)

| ID | Name | Location | Cells | Format | Pipeline run? | Key finding |
|----|------|----------|-------|--------|--------------|-------------|
| D02 | Ovarian cancer coculture (GSE224333) | local: `cancer_coculture_merged.h5ad` | 9,304 | h5ad | YES | OxPhos-Senes disconnected |
| D03 | Stressed EC HUVEC (Calandrelli) | local: `cellxgene/stressed_ec_large.h5ad` | 59,605 | h5ad | YES | Time course cascade (CAVEATED: P5-8 HUVEC) |
| D04 | Donor-derived EC T2D (Calandrelli) | local: `cellxgene/stressed_ec_small.h5ad` | 11,243 | h5ad | YES | DOF collapse in fresh primary ECs |
| D05 | PBMC3k (10x demo) | local: `Ovarian.../filtered_gene_bc_matrices/hg19/` | 2,700 | 10x MEX | YES | Clean negative control |
| D06 | IFN-beta PBMC (Kang 2018) | local: `ifnb.h5ad` | 32,484 | h5ad | YES | Clean negative (IFN without desync) |
| D07 | Aging pancreas (CellxGene) | local: `cellxgene/aging_pancreas.h5ad` | 2,544 | h5ad | YES | Mechanistic negative (high turnover cells) |
| D08 | Aging PBMC CMV (CellxGene) | local: `cellxgene/aging_pbmc_small.h5ad` | 9,354 | h5ad | YES | Mechanistic negative (glycolytic cells) |
| D09 | Tabula Sapiens aorta | local: `cellxgene/tabula_sapiens_vasculature.h5ad` | 42,650 | h5ad | YES | EC highest baseline coupling (0.40) |
| D10 | Heart atlas 59k (fibroblasts) | local: `cellxgene/heart_atlas_59k.h5ad` | 59,341 | h5ad | YES | Cardiac fibroblast coupling |
| D11 | WI-38 time course (GSE226225) | local: `experiments/EXP10.../raw/` | ~57,000 | 10x MEX x13 | YES | DOF precedes epigenetic by 24h |
| D15 | GBM Core Map (GBMap, Ruiz-Moreno 2022) | local: `gbm_core_map.h5ad` | 338,564 | h5ad (CSR) | YES | Deepest viral evasion (BT-57) |

#### GCP ONLY (too big for laptop)

| ID | Name | Location | Cells | Format | Pipeline run? | Key finding |
|----|------|----------|-------|--------|--------------|-------------|
| D12 | Kamath PD substantia nigra | GCP: `~/desync/data/kamath_pd/` | 434,340 | 10x MEX | YES (GCP) | Overall coupling 0.20 (brain mixed) |
| D13 | Heart atlas 486k (full) | GCP: `~/desync/data/cellxgene/heart_atlas_486k.h5ad` | 486,134 | h5ad | YES (GCP) | ECs highest coupling in heart tissue |
| D14 | Rice root protoplasts (GSE146034) | GCP: `~/desync/data/rice_starsolo_out/` | 2,146,011 | STARsolo | YES (GCP) | Four-kingdom test: plant mito integrated |
| D14-raw | Rice FASTQs | GCP: `~/desync/data/rice_fastq/` | — | FASTQ | Aligned | 365M reads, 88% mapping |

#### PATHWAY DATABASES

| ID | Name | Location | Entries | Format |
|----|------|----------|---------|--------|
| PW01 | MSigDB Hallmark | local: `pathways/h.all.gmt` | 50 | GMT |
| PW02 | KEGG Medicus | local: `pathways/c2.kegg.gmt` | 658 | GMT |
| PW03 | Reactome | local: `pathways/c2.reactome.gmt` | 1,736 | GMT |
| PW04 | GO Biological Process | local: `pathways/c5.go.bp.gmt` | 7,608 | GMT |
| PW05 | ImmunoSigDB | local: `pathways/c7.immunesigdb.gmt` | 4,872 | GMT |
| PW06 | CORUM complexes | local: `pathways/corum.gmt` | 1,824 | GMT |
| PW07 | All protein complexes | local: `pathways/complexes_all.gmt` | 23,463 | GMT |
| PW08 | DoRothEA TF regulons (A+B) | local: `pathways/dorothea_AB.gmt` | 222 | GMT |

#### GBM CORE MAP — RESOURCE SHEET (D15)

| Property | Value |
|----------|-------|
| **Name** | Core GBmap (harmonized GBM single-cell atlas) |
| **Citation** | Ruiz-Moreno et al. (2022) bioRxiv. doi:10.1101/2022.08.27.505439 |
| **Source** | CellxGene Collection `999f2a15-3d7e-440b-96ae-2c806799c08c` |
| **Download URL** | `https://datasets.cellxgene.cziscience.com/51c6f87a-c815-4066-9268-34acf0f732b4.h5ad` |
| **File** | `data/gbm_core_map.h5ad` |
| **Size** | 8,127 MB (8.1 GB) |
| **Shape** | 338,564 cells × 27,632 genes |
| **Sparse format** | CSR, 718,129,363 nonzero entries |
| **Cell types** | 17: malignant (127,521), macrophage, microglial, mature T cell, monocyte, B cell, dendritic, mast, NK, plasma, astrocyte, oligodendrocyte, OPC, neuron, endothelial, mural, radial glial |
| **Patients** | 110 (16 datasets harmonized via scArches/scANVI) |
| **Assays** | 10x 3' v2/v3, 10x 5' v1, Smart-seq2, Drop-seq, CEL-seq2, others |
| **Key columns** | `obs['cell_type']` (categorical), `obs['annotation_level_1-4']`, `obs['author']` |
| **Gene names** | `var['feature_name']` (categorical group in h5py) |

**Technical notes (D15):** Standard `anndata.read_h5ad()` may OOM on `layers/scaled`; use h5py CSR path as in `gbm_lens.py`. Downloads: HTTP Range resume. Avoid PowerShell bracket path bugs—use Python `requests`.

#### CROSS-SPECIES (eigenspectrum analysis)

| ID | Species | Location | Cells | Result |
|----|---------|----------|-------|--------|
| XS01 | Human (brain) | GCP (deleted, re-downloadable) | 434,340 | L1/L2=4.1, 80%=4 dims |
| XS02 | Tree shrew | GCP (deleted, re-downloadable) | 947 | L1/L2=10.6, 90%=2 dims |
| XS03 | Rat | GCP (deleted, re-downloadable) | 1,068 | L1/L2=14.9, 90%=2 dims |
| XS04 | Macaque | GCP (pending) | ~1,000 | Pending |
| XS05 | Rice (plant) | GCP: `~/desync/data/rice_starsolo_out/` | 2,146,011 | Four-kingdom: mito integrated |

#### NOT YET DOWNLOADED (identified, ready to grab)

| ID | Name | Source | Cells | Size | Why we want it |
|----|------|--------|-------|------|----------------|
| Q01 | Brain vasculature atlas (Winkler 2024) | CellxGene | 606,380 | ~3 GB | FACS-sorted fresh primary ECs from brain |
| Q02 | Kamath PD macaque | GEO GSE178265 | ~1,000 | 26 MB | Cross-species eigenspectrum |
| Q03 | SEA-AD Alzheimer brain | CellxGene | 240,000 | ~2 GB | AD pathology spectrum, neurons |
| Q04 | Human muscle aging (Lai 2024) | CNGB | 387,444 | ~2 GB | Age 15-99, skeletal muscle |
| Q05 | Diabetic kidney (Wilson/KPMP) | GEO GSE131882 | 23,980 | ~500 MB | Renal tubular under metabolic stress |
| Q06 | Liver disease spectrum | GEO GSE185477 | 117,123 | ~1 GB | NAFLD to cirrhosis, hepatocytes |

#### ORTHODOX CULTIVATOR (Docker, GCP)

| ID | Name | Location | Format | Key finding |
|----|------|----------|--------|-------------|
| OX01 | Cultivator report | GCP: `~/cultivator/output/cultivator_report.txt` | txt | 12 layers, 55.6 min, 35 params, 9/12 working |
| OX02 | Parameter table (Supp S1) | GCP: `~/cultivator/output/supplementary_S1_parameters.csv` | csv | 35 free parameters documented |
| OX03 | Docker image | GCP: `cultivator:latest` (3.33 GB) | Docker | All 12 tools pinned, Linux, no dependency hell |
| OX04 | GCS data bucket | `gs://desync-cultivator-data/stage8_curated.h5ad` | h5ad | 3.6 GB compressed, resumable upload |
| OX05 | Dockerfile | local: `orthodox/Dockerfile` | Dockerfile | Python 3.11 + R + 20 packages |
| OX06 | Cultivator script | local: `orthodox/orthodox_cultivator.py` | py | 12 layers, PARAMETER_REGISTRY |

#### GENOME REFERENCES

| ID | Species | Location | Size | Includes organellar? |
|----|---------|----------|------|---------------------|
| REF01 | Human GRCh38 | GCP: `~/desync/data/star_index/` (DELETED, rebuild 35 min) | 27 GB index | Yes (chrM) |
| REF02 | Rice IRGSP-1.0 | GCP: `~/desync/data/rice_genome/rice_star_index/` | 1 GB index | Yes (Mt + Pt) |

#### Dataset registry — how to use

1. **Finding data**: Search **§5 Combined Codex** (subsections below).
2. **Running pipeline**: Point `run_pipeline.py` at the h5ad or 10x directory.
3. **CellxGene format**: Gene names in `var['feature_name']`, not `var_names`. Use `.raw.X` for counts.
4. **GCP data**: Start `desync-engine` first (`gcloud compute instances start desync-engine --zone=us-central1-a`). SSH key at `~/.ssh/google_compute_engine`.
5. **Adding new datasets**: Add a row to the appropriate table here; update the **§5 — Combined Codex revision log** at the end of §5.
6. **Clinical / international (CL01–CL12)**: Full table under [Remote data access](#remote-data-access-rna-apis-clinical-archives-crosswalk). Do not store credentials in the vault.

#### Cost notes

- GCP `desync-engine` (n2-highmem-16): $1.10/hr when running, $0 when stopped. Disk persists.
- GCP disk: 500 GB standard, ~$20/month whether VM is running or not.
- CellxGene downloads: free, unlimited.
- GEO downloads: free, unlimited.

---

### Remote data access (RNA APIs, clinical archives, crosswalk)

**Scope:** Public **entry points** for RNA-related metadata, matrices, and reads; **not** every dataset. **Not** medical guidance. **Not** credentials in-repo.

#### Trust and access ladder (operational risk)

| Tier | Examples | What you typically get | Risk |
|------|----------|-------------------------|------|
| **A — Open deposition** | GEO (GSE), SRA, ENA, ArrayExpress/BioStudies, CellxGene `.h5ad` downloads | Metadata + public files; peer review is on the **paper**, deposition is the mirror | Low for access; **batch/confound** risk remains |
| **B — Aggregator / harmonized** | CellxGene Census, some atlas portals | Pre-harmonized objects; fast to use | **Version the census build**; harmonization choices are not raw lab output |
| **C — Controlled access** | dbGaP, Synapse with DAC, some GTEx paths, EGA, UK Biobank applications | Rich human data after approval | **No** unattended bulk download without credentials; DUC terms bind you |
| **D — Annotation only** | Ensembl REST, UCSC API, pathway portals | Genes, transcripts, coordinates—not expression | Join keys (Ensembl ID vs symbol) must match your matrix |

**Clinical ladder:** open deposition → harmonized atlases → **controlled** (dbGaP, EGA, Synapse restricted, UK Biobank) → **annotation-only** (Ensembl, no clinical RNA).

#### Master table — APIs and clients (systems, not individual studies)

| System | API / client | RNA modalities | Auth | Primary artifacts | Caveats (representative) |
|--------|----------------|----------------|------|-------------------|---------------------------|
| **NCBI GEO** | [E-utilities](https://www.ncbi.nlm.nih.gov/books/NBK25501/) (`esearch`, `efetch`, `esummary`) | Bulk RNA-seq, microarray, many scRNA-seq series | Open (rate limits) | GSE/GSM/SRP metadata; links to SRA | Expression matrices vary by submitter; not all series have processed counts in GEO |
| **NCBI SRA** | E-utilities + [SRA toolkit](https://github.com/ncbi/sra-tools) / `prefetch` | All RNA-seq instrument data | Open | SRA / FASTQ | You align or use author-processed matrices elsewhere |
| **NCBI Datasets** | [NCBI Datasets v2 API](https://www.ncbi.nlm.nih.gov/datasets/docs/v2/reference/rest-api/) + CLI | Genomes, gene reports; bridges to SRA/bioproject | Open (terms apply) | JSON packages; virus/genome bundles | **Not** a substitute for GEO matrix search—use for **assembly/gene/SRA package** workflows |
| **EBI BioStudies** | [BioStudies API](https://www.ebi.ac.uk/biostudies/) | Bulk + single-cell submissions mirrored with ArrayExpress | Open | Study JSON; file links | Cross-map accession types (E-MTAB-*) with GEO where linked |
| **ENA** | [ENA Browser API](https://www.ebi.ac.uk/ena/browser/home) + FTP | Raw reads (FASTQ), CRAM | Open | FASTQ, metadata | Same biology as SRA for many deposits; choose one pipeline consistently |
| **CZI CellxGene** | [CELLxGENE Discover](https://cellxgene.cziscience.com/) (UI + stable dataset URLs) + [dataset download links](https://api.cellxgene.cziscience.com/) patterns | scRNA-seq / snRNA-seq (AnnData) | Open | `.h5ad` per dataset | Check `var` for `feature_name` vs `var_names`; large files—resume downloads |
| **CellxGene Census** | [`cellxgene_census`](https://chanzuckerberg.github.io/cellxgene-census/) Python/R | Queryable single-cell within a **released census build** | Open | Lazy slices, metadata queries | **Pin `census_version`** for reproducibility; not identical to one-off portal downloads |
| **Broad Single Cell Portal** | Portal UI + documented study exports; API surface **varies by study** ([portal](https://singlecell.broadinstitute.org/single_cell)) | scRNA-seq | Mostly open reads/metadata | Study-specific H5AD/MEX links | Prefer study landing page + citation; automation may require HTML follow |
| **Synapse** | [Python `synapseclient`](https://python-docs.synapse.org/) / REST | Multi-omics including RNA (e.g. AD cohorts) | **Synapse account + resource access** | Tables, files, provenance | **Controlled-access** resources need DAC; cache entity IDs, not passwords |
| **GTEx Portal** | [GTEx portal](https://www.gtexportal.org/) — bulk expression downloads | Bulk RNA-seq (tissue expression) | Open tier for **summary** data; **controlled** for individual-level | Expression matrices, attributes | Individual-level genotypes/expression require **dbGaP**; read portal TOU |
| **AnVIL / Terra** | [Terra API](https://terra.bio/) / AnVIL ecosystem | Pipeline outputs, workspace tables | Google account + workspace ACL | Tables, WDL outputs, linked BigQuery | Not a generic “all RNA” API—workspace-scoped; compliance with data use |
| **Ensembl REST** | [Ensembl REST](https://rest.ensembl.org/documentation/info/species) | **Annotation** (genes, transcripts, sequences) | Open (fair use / rate limits) | JSON/XML | Use for **ID mapping** and sequence fetch—not expression |
| **UCSC REST** | [UCSC REST API](https://api.genome.ucsc.edu/) | Coordinates, track metadata | Open | JSON | Annotation and liftOver—**not** expression matrices |
| **OmniPath** | [OmniPath](https://omnipathdb.org/) (Python/R/REST) | Prior knowledge (interactions, TF) | Open | Networks, gene lists | **Not RNA abundance**—use post hoc to interpreted DE/pathways |

#### Clinical and international data archives (CL01–CL12)

| ID | Archive / program | Region | Typical modalities | Access model | Caveat |
|----|---------------------|--------|-------------------|--------------|--------|
| CL01 | [dbGaP](https://www.ncbi.nlm.nih.gov/gap/) | US (NIH) | RNA-seq, GWAS, clinical covariates | Registered investigator + Data Use Certification | Study-by-study; no universal bulk API for all restricted files |
| CL02 | [NHLBI BioData Catalyst](https://biodatacatalyst.nhlbi.nih.gov/) | US | Federated analysis on approved dbGaP-style cohorts | eRA + workspace access | Cloud analysis; not a single downloadable “all RNA” dump |
| CL03 | [Synapse](https://www.synapse.org/) | US (global) | Tables, files (e.g. AD cohorts, multi-omics) | Account + per-resource ACL / some DAC | Use `synapseclient`; never commit tokens |
| CL04 | [GTEx Portal](https://gtexportal.org/) | US | Bulk tissue RNA; genotypes in controlled tier | Open **summary**; **individual-level** controlled | Read portal TOU; harmonize IDs with Ensembl |
| CL05 | [EGA](https://ega-archive.org/) | Europe | Controlled-access genomes + phenotypes | EGA account + Data Access Committee approval | Primary EU mirror for many clinical genomics |
| CL06 | [UK Biobank](https://www.ukbiobank.ac.uk/) | UK | Deep phenome; omics in research releases | Application + fee structure | No public API for full individual dump |
| CL07 | Genomics England / related NHS research access | UK | WGS + clinical linkage | Research agreements | Jurisdiction-specific rules |
| CL08 | [JGA](https://humandbs.biosciencedbc.jp/en/) (Human DBs) | Japan | Genotype–phenotype archives | Controlled submission/access | Analogous gate to dbGaP/EGA for JP deposits |
| CL09 | [CNGBdb](https://db.cngb.org/) / GSA | China | Genomics, some scRNA (e.g. muscle atlases) | Portal accounts; **API/docs less uniform** than NCBI | Q04 pipeline: plan manual or scripted export per portal |
| CL10 | Australian Genomics / national data commons | Australia | National genomics programs | Consortium / project access | Often federated, not one REST for all RNA |
| CL11 | [BioStudies](https://www.ebi.ac.uk/biostudies/) | Global (EBI) | Study metadata; links to ENA/ArrayExpress/EGA files | Open API for metadata | File download often ENA open or EGA-gated (see CL05) |
| CL12 | Trial registries ([ClinicalTrials.gov](https://clinicaltrials.gov/), [EU CTR](https://www.clinicaltrialsregister.eu/)) | Global | **Metadata** (arms, endpoints), not expression | Open browse | Good for cohort discovery; molecular data live in GEO/dbGaP/Synapse per trial |

#### Crosswalk — registry IDs and typical remote paths

Maps **§5’s** D-prefixed and Q-prefixed registry rows to typical API or portal paths.

| Registry ID | Name (short) | Typical remote path |
|----------|--------------|---------------------|
| D02 | Ovarian cancer coculture | GEO **GSE224333** → E-utilities / SRA if raw needed |
| D03 | Stressed EC (Calandrelli) | **CellxGene** dataset URL → HTTP `.h5ad` |
| D04 | Donor-derived EC T2D | **CellxGene** |
| D05 | PBMC3k | 10x demo / GEO references; small matrix bundled with tools |
| D06 | IFN-beta PBMC (Kang) | GEO-style accession (see paper); matrix often re-hosted |
| D07 | Aging pancreas | **CellxGene** |
| D08 | Aging PBMC CMV | **CellxGene** |
| D09 | Tabula Sapiens aorta | **CellxGene** collection |
| D10 | Heart atlas (subset) | **CellxGene** |
| D11 | WI-38 time course | **GSE226225** (GEO) → FASTQ / MEX per paper |
| D12 | Kamath PD SN | GEO **GSE178265**; processed on GCP |
| D13 | Heart atlas (full) | **CellxGene** |
| D14 | Rice root | **GSE146034** (GEO/SRA) |
| D15 | GBM Core Map | **CellxGene** collection + direct `datasets.cellxgene.cziscience.com` URL (see GBM resource sheet above) |
| Q01–Q03 | Winkler brain vasculature, macaque PD, SEA-AD | **CellxGene** / GEO as listed in registry |
| Q04 | Human muscle aging (CNGB) | **CNGBdb** / national portal—API and English docs **less standardized** than NCBI; plan manual export |
| Q05–Q06 | KPMP kidney, liver GSE | **GEO** + author matrices |

#### Remote access — caveats

1. **Artifact type:** FASTQ requires alignment; **author matrix** may use unknown normalization; **CellxGene h5ad** often has `raw` + `normalized` layers—read `layers` keys.
2. **Gene identifiers:** Standardize with Ensembl REST or `mygene` and **keep the mapping table** with each analysis.
3. **Batch and confounds:** Public data combine labs and protocols. Harmonized atlases bake in model choices—cite **build ID**.
4. **Rate limits:** NCBI E-utilities expect **≤3 requests/s** without an API key; [API keys](https://ncbiinsights.ncbi.nlm.nih.gov/2017/11/02/new-api-keys-for-the-e-utilities/) raise throughput.
5. **Reproducibility:** Snapshot **accession + file checksum + software version** (e.g. `census_version`, `cellxgene_schema`).
6. **Controlled data:** Never commit **tokens, API keys, or Synapse secrets** to the vault.
7. **CL archives:** use when scoping **applications** or cross-walking open GEO/CellxGene to **phenotype-rich** restricted layers.

#### Remote data revision log (merged)

| Date | Change |
|------|--------|
| 2026-03-30 | v1 — API master table, trust ladder, CODEX crosswalk, caveats |
| 2026-03-30 | v2 — fused clinical/international archives (CL01–CL12) |
| 2026-03-30 | v3 — merged into **§5 Combined Codex** (`BOOK.md`); maintain SF10 here |

---

### Symphony file-native index (scripts and orthodox stages)

Large **AnnData / NumPy** artifacts and which scripts touch them (quick lookup). Deprecated: 11 h5ad files in `_archive/deprecated_h5ad/` superseded by `gem_analysis.h5ad`.

| ID | Name | File | Size | Cells | Scripts that use it |
|----|------|------|------|-------|---------------------|
| D01 | EC coculture P vs S | `data/gem_analysis.h5ad` | 4.7 GB | 905,263 | `run_pipeline.py`, `thread_graph.py`, `dimensional_analysis.py`, `direct_shadow.py`, `retro_factions.py`, `viral_lineage.py` |
| D01-cache | Co-occurrence cache | `data/cooccurrence_cache.npz` | 1.7 GB | — | `thread_graph.py`, `thread_atlas.py` |
| D01-env | Full pipeline env | `data/pipeline_env.npz` | 2.9 GB | — | `thread_graph.py` (cached J + enrichment) |
| D02 | Ovarian cancer + CAF | `data/cancer_coculture_merged.h5ad` | 202 MB | 9,304 | `run_pipeline.py`, `three_states.py`, `gbm_lens.py` |
| D03 | Stressed EC HUVEC | `data/cellxgene/stressed_ec_large.h5ad` | 403 MB | 59,605 | `run_pipeline.py` |
| D04 | Donor-derived EC T2D | `data/cellxgene/stressed_ec_small.h5ad` | 117 MB | 11,243 | `run_pipeline.py` |
| D05 | PBMC3k | 10x MEX (no h5ad) | — | 2,700 | `run_pipeline.py` |
| D06 | IFN-beta PBMC | `data/ifnb.h5ad` | 2.0 GB | 32,484 | `run_pipeline.py` |
| D07 | Aging pancreas | `data/cellxgene/aging_pancreas.h5ad` | 28 MB | 2,544 | `run_pipeline.py` |
| D08 | Aging PBMC CMV | `data/cellxgene/aging_pbmc_small.h5ad` | 89 MB | 9,354 | `run_pipeline.py` |
| D09 | Tabula Sapiens aorta | `data/cellxgene/tabula_sapiens_vasculature.h5ad` | 2.1 GB | 42,650 | `run_pipeline.py` |
| D10 | Heart atlas 59k | `data/cellxgene/heart_atlas_59k.h5ad` | 273 MB | 59,341 | `run_pipeline.py` |
| D11 | WI-38 time course | 10x MEX x13 dirs | — | ~57,000 | `run_exp10_timecourse.py` |
| **D15** | **GBM Core Map** | **`data/gbm_core_map.h5ad`** | **8.1 GB** | **338,564** | **`gbm_lens.py`** (h5py direct CSR, not anndata) |
| — | Heart myocarditis | `data/cellxgene/heart_myocarditis_26k.h5ad` | ~200 MB | ~26,000 | exploratory |

#### Orthodox pipeline objects (`orthodox/objects/`)

| Stage | File | Description |
|-------|------|-------------|
| 0 | `stage0_merged.h5ad` | Raw merged CellRanger |
| 2 | `stage2_qc.h5ad` | After QC filtering |
| 3 | `stage3_doublets.h5ad` | After doublet removal |
| 4 | `stage4_pca.h5ad` | After PCA |
| 5 | `stage5_clustered.h5ad` | After Leiden clustering |
| 6 | `stage6_annotated.h5ad` | After CellTypist |
| 8 | `stage8_curated.h5ad` | Manual curation |
| — | `discord_annotated.h5ad` | Final annotated object |

---

### Orthodox Pipeline (`orthodox/`)

Standard scRNA-seq pipeline as methods control. 12 stages via `orthodox_canon.py --stage N`. 33 documented parameter decisions. 24 plots, 10 reports, 8 publication figures. Full details: `orthodox/README.md`.

---

### Literature and field references

| Doc | Role |
|-----|------|
| [SENESCENCE_LITERATURE_GAPS.md](SENESCENCE_LITERATURE_GAPS.md) | Peer-reviewed senescence themes + DOIs (field gaps for paper Discussion) |
| [DATA_x_METHODS.md](DATA_x_METHODS.md) | Methods × disease matrix (cancer, aging, AD) |
| [WORLDLINE.md](WORLDLINE.md) | Full findings log (narrative) |
| [BOUNTY_BOARD.md](BOUNTY_BOARD.md) | Open work packages (SF\* = field literature) |
| [HALO_PROTOCOL.md](HALO_PROTOCOL.md) | Living document / memory discipline |
| [WING_PROTOCOL.md](WING_PROTOCOL.md) | Production readiness gates |

---

### Cross-project links

| Project | Relationship | Key Exchange |
|---------|-------------|--------------|
| **[12_BloodyEchoes](../12_Project_BloodyEchoes/)** | Spawned from EXP01 TE silencing findings | UHRF1 -71% in senescent (our data) → 833 excess repeats in human UHRF1 locus (BloodyEchoes) → UHRF1 8.2x in GBM malignant (gbm_lens.py confirms selective control) |
| **[06_Daemon](../06_Project_Daemon/)** | V3 engine used for shadow analysis | min()-semantics constraint graph on 51k nodes. Analytic Jacobian. Songs → shadows |
| **[08_Astronomicon](../08_Project_Astronomicon/)** | Edge runtime for daemon coordination | u-os.dev mailboxes, key-first tool OS, Warp Storm security |

---

### Vault infrastructure (code, databases, non-Symphony)

Where SQL, vector DBs, and storage-backed code live **across the vault** (D1 Worker, ChromaDB, LENG LOTUS archive, etc.): **[`../CODE_DATABASE_INDEX.md`](../CODE_DATABASE_INDEX.md)**.

---

### Key links

- **[WORLDLINE.md](WORLDLINE.md)** — Complete findings log (57 parts), the narrative of discovery
- **[BOUNTY_BOARD.md](BOUNTY_BOARD.md)** — Open and solved bounties
- **[QUICK_START.md](QUICK_START.md)** — Run the pipeline in 3 commands
- **[ENGINE_ROOM.md](ENGINE_ROOM.md)** — GCP VM quick reference
- **[EYE_PROTOCOL.md](EYE_PROTOCOL.md)** — Compute engine registry
- **[DAEMON_PROTOCOL.md](DAEMON_PROTOCOL.md)** — Deterministic Algorithm Engine Model Of what's Not
- **[HALO_PROTOCOL.md](HALO_PROTOCOL.md)** — Living Document format (memory layer)

---

### Combined Codex revision log

| Date | Change |
|------|--------|
| 2026-03-30 | v3 — **Fused** `data/CODEX.md` (Human Codex registry) + `RNA_PUBLIC_API_CATALOG.md` (remote APIs + CL01–CL12) + tools/script index from `CODE_DATABASE_INDEX` Symphony section into one **Combined Codex** file (then root `CODEX.md`). |
| 2026-04-01 | Vault fusion — Combined Codex body merged into **[`BOOK.md`](BOOK.md) §5**; root `CODEX.md` archived under [`_archive/codex_fused_into_book_2026-04-01/`](_archive/codex_fused_into_book_2026-04-01/); stubs point at `BOOK.md`. |
<!-- accession_scan:generated -->

## Accession gap index (automated scan)

**Generated:** 2026-03-30. **Policy:** Mentions found under this project (excluding `_archive/`, `u_os_dev/out/public_lab/`, `.git/`, `node_modules/`) outside root `BOOK.md`, that do **not** appear as a substring anywhere in **this** `BOOK.md` yet. **Curated sections above remain authoritative** — merge rows into them when ready.

| Kind | ID | Seen in |
|------|-----|---------|
| ArrayExpress | `E-MTAB-9662` | `BOUNTY_BOARD.md`, `data/experiments/EXP09_rotenone_neurons/LIVING_DOCUMENT.md` |
| DOI | `10.1038/s41467-020-18957-w` | `data/experiments/EXP03_stressed_ec_huvec/EXPERIMENT.md`, `data/experiments/EXP03_stressed_ec_huvec/LIVING_DOCUMENT.md` |
| DOI | `10.18632/aging.204666` | `data/experiments/EXP10_wi38_senescence_timecourse/EXPERIMENT.md`, `data/experiments/EXP10_wi38_senescence_timecourse/LIVING_DOCUMENT.md` |
| GSE | `GSE111976` | `HALO_GENESIS_SIGNAL.md` |
| GSE | `GSE115978` | `RIGHTEOUS_ORTHODOX_MAP.md`, `WORLDLINE.md` |
| GSE | `GSE123814` | `RIGHTEOUS_ORTHODOX_MAP.md` |
| GSE | `GSE128179` | `data/experiments/EXP01_primary_ec_coculture/dataset_comparison_audit.md` |
| GSE | `GSE130727` | `BOUNTY_BOARD.md`, `HALO_GENESISSIGNAL.md`, `QUANTUM_IDENTITY_PATTERNS.md` |
| GSE | `GSE143353` | `data/OverianCancer_Fibroblast_CoCulture_3-13-26_12_47/README.md` |
| GSE | `GSE144430` | `data/OverianCancer_Fibroblast_CoCulture_3-13-26_12_47/README.md` |
| GSE | `GSE146034` | `WORLDLINE.md`, `data/experiments/EXP14_rice_root_protoplast/EXPERIMENT.md`, `genesis/README.md` |
| GSE | `GSE178265` | `BOUNTY_BOARD.md`, `WORLDLINE.md`, `data/experiments/EXP12_parkinsons_substantia_nigra/EXPERIMENT.md` |
| GSE | `GSE183852` | `BOUNTY_BOARD.md`, `WORLDLINE.md` |
| GSE | `GSE184329` | `data/OverianCancer_Fibroblast_CoCulture_3-13-26_12_47/README.md` |
| GSE | `GSE185477` | `WORLDLINE.md` |
| GSE | `GSE195507` | `WORLDLINE.md` |
| GSE | `GSE224333` | `HALO_GENESISSIGNAL.md`, `WORLDLINE.md`, `data/OverianCancer_Fibroblast_CoCulture_3-13-26_12_47/README.md`, `data/experiments/EXP01_primary_ec_coculture/dataset_comparison_audit.md`, `data/experiments/EXP02_ovarian_cancer_coculture/EXPERIMENT.md`, `data/experiments/EXP02_ovarian_cancer_coculture/LIVING_DOCUMENT.md`, `genesis/README.md` |
| GSE | `GSE226189` | `DATA_x_METHODS.md`, `HALO_GENESISSIGNAL.md`, `QUANTUM_IDENTITY_PATTERNS.md` |
| GSE | `GSE226225` | `DARK_MATTER_MODULE.md`, `HALO_GENESISSIGNAL.md`, `QUANTUM_IDENTITY_PATTERNS.md`, `RIGHTEOUS_ORTHODOX_MAP.md`, `WORLDLINE.md`, `XIST_BREAKTHROUGH.md`, `XIST_CLINICAL_MAP.md`, `data/experiments/EXP10_wi38_senescence_timecourse/EXPERIMENT.md`, `data/experiments/EXP10_wi38_senescence_timecourse/LIVING_DOCUMENT.md`, `genesis/README.md`, `genesis/ambrosia/AMBROSIA_INDEX.md` |
| GSE | `GSE235529` | `WORLDLINE.md` |
| GSE | `GSE235996` | `BOUNTY_BOARD.md`, `HALO_GENESISSIGNAL.md`, `HALO_GENESIS_SIGNAL.md`, `QUANTUM_RECEIVER_ARCHITECTURE.md`, `WORLDLINE.md` |
| GSE | `GSE239591` | `WORLDLINE.md`, `genesis/ambrosia/AMBROSIA_INDEX.md` |
| GSE | `GSE242410` | `WORLDLINE.md` |
| GSE | `GSE250041` | `HALO_GENESISSIGNAL.md`, `RIGHTEOUS_ORTHODOX_MAP.md`, `WORLDLINE.md` |
| GSE | `GSE262157` | `WORLDLINE.md` |
| GSE | `GSE265969` | `WORLDLINE.md` |
| GSE | `GSE279002` | `WORLDLINE.md` |
| GSE | `GSE292438` | `WORLDLINE.md` |
| GSE | `GSE296698` | `WORLDLINE.md` |
| GSE | `GSE297213` | `WORLDLINE.md`, `data/experiments/EXP13_plant_leaf/EXPERIMENT.md` |
| GSE | `GSE297365` | `WORLDLINE.md` |
| GSE | `GSE302792` | `BOUNTY_BOARD.md`, `DATA_x_METHODS.md`, `NAMED_CULTIVATOR_STACK.md`, `WORLDLINE.md` |
| GSE | `GSE90063` | `WORLDLINE.md` |
| GSE | `GSE98448` | `WORLDLINE.md`, `genesis/ambrosia/AMBROSIA_INDEX.md` |
| GSM | `GSM4006845` | `data/experiments/EXP01_primary_ec_coculture/dataset_comparison_audit.md` |
| SRR | `SRR11194113` | `WORLDLINE.md` |

<!-- /accession_scan -->

---

*BOOK revision: 2026-04-01 — §5 Combined Codex fused from CODEX.md; §2c; STARS / FROM GEO.*
