---
vault_clearance: KETER
halo:
  classification: INTERNAL
  front: "Front MarathonLament"
  created: 2026-03-28
---

# FORM — MarathonLament

**BOOK:** [BOOK.md](BOOK.md) — RNA/splice databases and method refs as rows are added.

> **F**unctional **O**rthodox **R**eality **M**odel for the spliceosome.

## The Two Paradigms

| | Orthodox (Standard scRNA-seq) | MarathonLament (Ours) |
|---|---|---|
| **What it counts** | Gene = integer (RPL5 = 47 reads) | Transcript = structure (RPL5 = 47 reads of ONE rigid isoform with entropy 0.002) |
| **What it assumes** | Same gene → same function | Same gene → DIFFERENT function depending on isoform, fold, structure |
| **What it misses** | Splice variants, retained introns, circularization, truncation, structural RNA | Everything the spliceosome decides |
| **Parameters** | Many (normalization, HVG, clustering resolution) | Zero (ViennaRNA is deterministic, entropy is arithmetic, reads/UMI is counting) |
| **Output** | Gene expression matrix | Operator-resolved splice profile: entropy + structure + function per gene per operator |

## The Murder Board

### "Splicing is handled by the count matrix"

**NO.** The count matrix counts GENES, not TRANSCRIPTS. A gene with 5 isoforms appears as one integer. Whether the cell makes isoform 1 (protein-coding) or isoform 3 (retained intron, potentially lncRNA) — the count matrix says "47" either way. The spliceosome's decision is invisible.

### "Alternative splicing is well-studied"

**STUDIED, NOT RESOLVED.** 95% of human genes have multiple isoforms. We showed that the NUMBER of isoforms doesn't differ dramatically between operators (GOLGI 27.8 vs RIBO 22.2, only 1.25x). What differs is the USAGE — how the spliceosome distributes expression across isoforms. RIBO entropy = 0.167 (one isoform dominates at 99.7%). GOLGI entropy = 0.388 (dominant is only 81%). The spliceosome makes DIFFERENT DECISIONS for different operators. This is not in any database.

### "You can't predict RNA structure from sequence"

**YES YOU CAN.** ViennaRNA RNAfold uses the minimum free energy algorithm based on experimentally determined thermodynamic parameters (Turner energy model). Same sequence → same structure → forever. Zero parameters (the energy parameters are measured physics, not fitted). The FTH1 IRE stem-loop is a validation case — RNAfold should predict it from sequence alone.

### "Reads/UMI doesn't measure structure"

**IT DOES, INDIRECTLY.** A polymerase reads a molecule. If the molecule is folded (stem-loop, G-quadruplex), the polymerase stalls or falls off. The same molecule gets sequenced multiple times because each attempt fails partway through. High reads/UMI = the molecule is hard to sequence = it has structure. This is why MT-CO3 (mitochondrial, structured) has reads/UMI of 1.98 while RPL3 (cytoplasmic, unstructured mRNA) has 1.65.

### "This is just correlation, not mechanism"

**WE'RE BUILDING THE MECHANISM.** The mathematical mapping is:

```
Sequence features (deterministic)
  → ViennaRNA fold prediction (deterministic)
    → Predicted stall points (deterministic)
      → Compare to BAM coverage holes (observed)
        → Match = the fold is real
          → Condition comparison = the fold CHANGED
            → Operator grouping = the change is OPERATOR-SPECIFIC
```

Every step is deterministic or empirical. No fitting. No training. The mechanism is: sequence → thermodynamics → structure → sequencing behavior → operator coupling.

## What MarathonLament Beats

| Claim | Orthodox answer | Our answer | Evidence |
|---|---|---|---|
| Do operators have different splice behavior? | Not asked | Yes: entropy RIBO 0.167, GOLGI 0.388 | SG-NEx nanopore (Breakthrough 14) |
| Do transcripts change structure in disease? | Not measured | Yes: all genes increase reads/UMI in senescence | molecule_info.h5 (Part 94) |
| Can you predict fold from sequence? | "Too complex" | Yes: ViennaRNA, FTH1 IRE as validation | Running (ML1) |
| Is the spliceosome operator-specific? | Not in any database | Yes: entropy + GENCODE + reads/UMI all show hierarchy | Three independent measurements |
| What makes a gene a Jaccard hub? | Not asked | 3'UTR length + ARE density + splice flexibility + structural complexity | Parts 68, 87, Breakthrough 14, Part 94 |

## Data Sources

| Source | What it provides | Status |
|---|---|---|
| SG-NEx nanopore (1.1 GB) | Ground-truth full-length isoforms, K562 | COMPLETE |
| molecule_info.h5 (6 × ~20 GB) | Per-molecule reads/UMI, 941M molecules | COMPLETE |
| GENCODE v44 (Ensembl REST) | Annotated isoforms per gene | COMPLETE |
| ViennaRNA 2.7.2 | MFE structure prediction | INSTALLED |
| BAM files (5 × 17 GB) | Coverage shapes per gene | RUNNING |
| weird_transcripts.json | Sequencing-resistant genes | COMPLETE |
| SpliceVarDB | Experimentally validated splice variants | AVAILABLE |
| VastDB | Alternative splicing quantification | AVAILABLE |
| circBase/CIRCpedia | Circular RNA annotations | AVAILABLE |
