---
vault_clearance: KETER
halo:
  classification: INTERNAL
  confidence: HIGH
  front: "Front MarathonLament"
  custodian: "Jixiang Leng"
  created: 2026-03-28
  updated: 2026-03-28
---

# Front MarathonLament

> *The lament of the marathon runner: the body changes form to endure. Transcripts fold, circularize, truncate, restructure — not because they're broken, but because they're adapting. The count matrix sees a number. MarathonLament sees the shape.*

## What This Is

**The solution to the Central Dogma.** The Central Dogma says DNA → RNA → Protein. One direction, one output. MarathonLament shows: DNA → Spliceosome → {mRNA, lncRNA, circRNA, structural RNA, regulatory RNA} → {Protein, Membrane scaffold, miRNA sponge, Chromatin organizer, Translation regulator}. The spliceosome is the ROUTER. It takes one DNA sequence and produces MULTIPLE outputs depending on the operator state. Which output it chooses is DETERMINED by the four-operator coupling tensor.

RNA is not a messenger. It is the MEDIUM. The message is which isoform the spliceosome produces. The operator coupling tensor is the context. The Central Dogma described one path through a many-branched tree and called it the whole tree. MarathonLament maps the full tree.

## The Splice Decision Function

For each gene g in operator O:

```
P(isoform_i | gene_g, operator_O, state_K) =
    splice_site_strength(i)    ← deterministic from DNA sequence (MaxEntScan)
    × operator_policy(O)       ← measured (RIBO=0.167, GOLGI=0.388 entropy)
    × coupling_modifier(K)     ← from the 4-operator tensor (disease shifts this)
```

Healthy: coupling_modifier ≈ 1.0. Spliceosome follows operator policy.
Disease: coupling_modifier shifts. GOLGI more efficient in senescence. RIBO retained introns may fire when 5S RNP checkpoint fails in cancer.

## The Operator-Specific Spliceosome Policy

| Operator | Entropy | Splice sites | circRNA | Retained intron | Policy |
|---|---|---|---|---|---|
| RIBO | 0.167 | 3x stronger (MAG\|R) | 0 produced | 83% annotated, SUPPRESSED | RIGID — one transcript, one protein |
| MITO | 0.240 | N/A (no introns on mtDNA) | 0 possible | N/A | CONSTRAINED — energy is non-negotiable |
| GOLGI | 0.388 | Standard + weak alternatives | 2 known | 61%, actively used | FLEXIBLE — substrate adapts |
| NUCLEAR | 0.430 | Most alternative sites | 4+ known | 79%, regulatory | DIVERSE — control requires options |

## What This Replaces

The Central Dogma: DNA → RNA → Protein (one path)
MarathonLament: DNA → Spliceosome(operator_state) → {multiple outputs} (the full tree)

The hypothesis: when operators desynchronize, the spliceosome's decisions change AND the transcripts themselves physically restructure. The count matrix can't see this. The coupling tensor can't see this. But the BAM file, the molecule_info.h5, and the structural prediction together can.

## The MarathonLament Pipeline

```
For each gene in each operator:

  1. SEQUENCE: Fetch mRNA sequence from GENCODE (deterministic)
  2. FOLD: ViennaRNA RNAfold → minimum free energy structure (deterministic)
  3. PREDICT: Where does the fold create:
     - Stem-loops (polymerase stall points)
     - G-quadruplexes (extreme stalls)
     - Long-range base pairs (circularization candidates)
     - Exposed single-stranded regions (RBP binding sites)
  4. OBSERVE: From BAM/molecule data:
     - Coverage drops WHERE stems form?
     - Reads/UMI high WHERE G4s form?
     - Truncation points WHERE long-range pairs close?
  5. COMPARE: Predicted stall points vs observed coverage holes
     - Match = the transcript IS folded as predicted
     - Mismatch = alternative structure or protein-bound
  6. CONDITION: How does the match change between P and S?
     - Better match in S = transcript MORE folded in senescence
     - Worse match in S = transcript UNFOLDED or refolded differently
```

## BitMath: RNA as Binary Computation

MarathonLament's deeper layer: RNA is a binary computational substrate. See [RNA_BITMATH.md](RNA_BITMATH.md).

- **Encoding:** A=00, U=11, G=10, C=01 (XOR of complements = 11)
- **Interaction = XOR:** Two strands base-pairing produces an XOR interaction string. All-11 = perfect duplex. Non-11 = loops/bulges/mismatches.
- **Self-fold = self-XOR:** A stem-loop is the RNA XORing two regions of itself. MFE structure = maximal 11-count.
- **Spliceosome = CPU:** Splice decision is an XOR mask applied by the operator state to select which exons survive.
- **TRX file = instruction cache:** The `.trx` binary format records the output of spliceosome computation (coverage, splice ratio, entropy per gene per sample). 128 bytes per record, memory-mappable, O(1) access.

**The deliverable:** A tool that takes any BAM, encodes all transcripts as binary, computes the XOR interaction catalogue, identifies structural (invariant) vs programmatic (condition-dependent) interactions, and maps them to operator states. The spliceosome solved as a programmable binary processor.

### TRX Binary Format

Built: `methods/trx_format.py` (writer/reader/CLI) + `methods/bam_to_trx.py` (BAM extraction).

```python
from methods.trx_format import TrxReader
r = TrxReader('data/coculture.trx')       # memmap, instant
bins = r.coverage_matrix('FTH1')           # (5, 20) float32, O(1)
df = r.to_dataframe()                      # pandas if needed
```

Current: 27 genes x 5 samples = 18 KB. Genome-wide (38K genes): ~30 MB. Replaces all JSON outputs.

## Connection to DiscordIntoSymphony

MarathonLament extends the Genesis Eigenspectrum from 14 breakthroughs into the structural domain:

- **Breakthrough 14** showed the spliceosome treats operators differently (entropy)
- **Part 94** showed all transcripts get more structured in senescence (reads/UMI)
- **MarathonLament** predicts the SPECIFIC structures from sequence and validates against molecular data

The pipeline produces a per-gene STRUCTURAL SCORE that feeds back into the coupling tensor: genes whose structure changes during disease alter their co-occurrence (because folded RNA interacts with different partners than unfolded RNA), which changes the Jaccard matrix, which changes K[4,4].

## Key Targets

| Gene | Operator | Why it matters |
|---|---|---|
| FTH1 | BRIDGE | Highest reads/UMI shift (+0.140). IRE stem-loop in 5'UTR. |
| RPL5 | RIBO (sensor) | 5% more structured than other RPLs. 5S rRNA binding domain. |
| RPL11 | RIBO (sensor) | Tracks RPL5 structurally. MDM2-binding element. |
| GOLGA2 | GOLGI | GM130. Self-scaffolding? Transcript becoming its own structure. |
| MALAT1 | NUCLEAR (lncRNA) | Known structured lncRNA. Triple helix at 3' end. |
| ENSG00000255029 | CHR11 CORE | Hub #1. Structured lncRNA behavior in molecule data. |
| ENSG00000254526 | CHR11 CORE | Hub #2. Same shift as hub #1. |
| GBF1 | GOLGI | 126 isoforms (most of any Golgi gene). 78 non-coding. |
| All RPL/RPS | RIBO | Do structural predictions match the ultra-rigid splice entropy? |
| All GOLGI genes | GOLGI | Does structural flexibility match splice flexibility? |

## Dependencies

- ViennaRNA 2.7.2 (installed)
- GENCODE v44 transcript sequences (fetch from Ensembl REST API)
- molecule_info.h5 data (in vault)
- BAM coverage shapes (running)
- SG-NEx isoform data (complete)
