---
vault_clearance: KETER
halo:
  classification: INTERNAL
  confidence: MEDIUM
  front: "20_Project_MarathonLament — BOOK"
  custodian: "Jixiang Leng"
  created: 2026-03-30
  updated: 2026-04-01
  wing: UNASSESSED
  containment: "BOOK — bibliography + methods registry; not README / WORLDLINE / BOUNTY"
---

# MarathonLament — BOOK

Canonical bibliography and methods registry for **20_Project_MarathonLament** (transcript shape / count-matrix semantics — circularization, truncation, restructuring). Convention: [`BOOK_Protocol.md`](../BOOK_Protocol.md). Orientation: [`README.md`](README.md). Open work: [`BOUNTY_BOARD.md`](BOUNTY_BOARD.md) · [`WORLDLINE.md`](WORLDLINE.md). **Orthodox count matrix vs spliceosome stack:** [`FORM.md`](FORM.md).


### Local registry slice (EYE / STAFF / STARS)

| Surface | Pointers |
|---------|----------|
| **EYEs** | Runs: [`README.md`](README.md) / [`WORLDLINE.md`](WORLDLINE.md) (if present). Registry: [`EYE_PROTOCOL.md`](../EYE_PROTOCOL.md) |
| **STAFF** | Runnable tools: [`STAFF_catalogue.json`](../STAFF_catalogue.json) — filter `project_dir` for this folder. |
| **STARS** | This file; rules: [`BOOK_Protocol.md`](../BOOK_Protocol.md). |
| **Audit sheet** | [`LOGGING_AND_REGISTRY_CHECKLIST.md`](../99_Archive/root_reports/2026-04/LOGGING_AND_REGISTRY_CHECKLIST.md) |


---

## 1. RNA biology, long-read sequencing, and isoform structure (curate)

| ID | Kind | Note | Identifier |
|----|------|------|------------|
| ML-B1 | Stub | cDNA / library prep biases, circRNA reviews, long-read error profiles — add DOIs as benchmarks cite them | *See BOUNTY_BOARD* |

---

## 2. Single-cell and matrix representations (curate)

| ID | Kind | Note | Identifier |
|----|------|------|------------|
| ML-B2 | Cross-ref | Shared tooling with Astronomicon / Symphony | [`../08_Project_Astronomicon/BOOK.md`](../08_Project_Astronomicon/BOOK.md), [`../10_Project_DiscordIntoSymphony/BOOK.md`](../10_Project_DiscordIntoSymphony/BOOK.md) |

---

## 3. Long-read / isoform / circRNA — methods and public resources (curated online pass, US + international)

**Purpose:** Stable **DOIs** and **portals** for transcript *shape* (splice structure, ends, circularization) — complements §1 stubs and STARS archives.

| ID | Region | Kind | Note | Identifier |
|----|--------|------|------|------------|
| ML-D1 | US / EU | Method | SQANTI3 — QC / structural classification for long-read transcriptomes (*Nat Methods*, 2024); used in LRGASP-style benchmarks | [10.1038/s41592-024-02229-2](https://doi.org/10.1038/s41592-024-02229-2) · [GitHub: ConesaLab/SQANTI3](https://github.com/ConesaLab/SQANTI3) |
| ML-D2 | US | Method | FLAIR2 — haplotype-aware long-read isoform analysis (*Genome Biol.*, 2024) | [10.1186/s13059-024-03301-y](https://doi.org/10.1186/s13059-024-03301-y) · [GitHub: BrooksLabUCSC/flair](https://github.com/BrooksLabUCSC/flair) |
| ML-D3 | China / international | Database | circAtlas 3.0 — curated vertebrate circRNA gateway (CNCB / NGDC) | [ngdc.cncb.ac.cn/circatlas](https://ngdc.cncb.ac.cn/circatlas) · [10.1093/nar/gkad770](https://doi.org/10.1093/nar/gkad770) |
| ML-D4 | International | Annotation | APPRIS — principal isoform selection (context for “which transcript counts as canonical”) | [appris.bioinfo.cnio.es](https://appris.bioinfo.cnio.es/) |
| ML-D5 | UK / international | Reference | Ensembl browser + API — gene / transcript models | [ensembl.org](https://www.ensembl.org/) |
| ML-D6 | US | Vendor docs | PacBio Iso-Seq / Kinnex — library and informatics overview (technical; cite alongside peer-reviewed methods) | [pacb.com](https://www.pacb.com/) |
| ML-D7 | UK | Vendor docs | Oxford Nanopore — direct RNA / cDNA sequencing guidance (technical) | [nanoporetech.com](https://nanoporetech.com/) |
| ML-D8 | International | Benchmark context | Long-read RNA-seq genome annotation assessment (LRGASP) — search PubMed / consortium site for frozen benchmark accession lists | [LRGASP.org](https://www.lrgasp.org/) |

---

## 4. Bounty → start here

| Workstream | Start with |
|------------|------------|
| Orthodox baseline | [`FORM.md`](FORM.md) |
| Shape / matrix semantics | [`README.md`](README.md), [`BOUNTY_BOARD.md`](BOUNTY_BOARD.md) |
| Methods / data anchors | §3, STARS |

---

## STARS — US and international anchors

**RNA / transcriptomics infrastructure** — US NIH hubs plus **GENCODE** (international consortium).

### How to read STARS (context)

**STARS** (`ML-S*`) are **broad archives and reference annotations**. **§3 `ML-D*`** rows are **methods, databases, and vendor docs** (SQANTI3, circAtlas, Ensembl, etc.) — **finer-grained** than STARS. Neither STARS nor §3 replace **GEO SOFT** for a specific study’s cell type and treatment.

| ID | What this STAR denotes | Typical use in this BOOK | Not / caveats |
|----|-------------------------|--------------------------|---------------|
| ML-S1 | NCBI SRA | Discover **raw reads** (long- and short-read) | Same caveats as Astronomicon: BioSample grouping, batch effects. |
| ML-S2 | GENCODE | **Gene / transcript** annotation reference | Version matters — pin GENCODE release per analysis. |
| ML-S3 | ENCODE Portal | **Functional genomics** assays and metadata | Cell line and treatment per experiment — read metadata. |
| ML-S4 | NCBI GEO | **Expression series** and supplements | Accession-level design in SOFT; see Symphony **§2c** style curation for complex superseries elsewhere. |

| ID | Region | Kind | Note | Identifier |
|----|--------|------|------|------------|
| ML-S1 | US | Sequence reads | NCBI SRA | [ncbi.nlm.nih.gov/sra](https://www.ncbi.nlm.nih.gov/sra) |
| ML-S2 | International | Gene annotations | GENCODE | [gencodegenes.org](https://www.gencodegenes.org/) |
| ML-S3 | US | Functional genomics | ENCODE Portal | [encodeproject.org](https://www.encodeproject.org/) |
| ML-S4 | US | GEO (expression) | NCBI GEO | [ncbi.nlm.nih.gov/geo](https://www.ncbi.nlm.nih.gov/geo/) |
<!-- accession_scan:generated -->

## Accession gap index (automated scan)

**Generated:** 2026-03-30. **Policy:** Mentions found under this project (excluding `_archive/`, `u_os_dev/out/public_lab/`, `.git/`, `node_modules/`) outside root `BOOK.md`, that do **not** appear as a substring anywhere in **this** `BOOK.md` yet. **Curated sections above remain authoritative** — merge rows into them when ready.

| Kind | ID | Seen in |
|------|-----|---------|
| DOI | `10.1038/s41467-021-24975-z` | `circRNA_operator_overlap.md` |
| GSE | `GSE111976` | `FORM_TRUTH_MAP.md` |
| GSE | `GSE159487` | `FORM_TRUTH_MAP.md` |
| GSE | `GSE235996` | `FORM_TRUTH_MAP.md` |

<!-- /accession_scan -->

---

*BOOK revision: 2026-04-01 — STARS context table; §3 long-read / circRNA online pass.*
