---
vault_clearance: EUCLID
halo:
  classification: INTERNAL
  custodian: The Architect
  created: 2026-04-26
  confidence: HIGH
  front: "Open work for cellular encoding measurement"
  updated: 2026-04-26
  wing: UNASSESSED
---

# BOUNTY_BOARD — 35_Project_TheHats

Open work to read the cell's code at every layer and assemble the privacy-stack-depth metric.

## Tier 0 — Foundation (do first; cheap)

### B0.1 Codon-usage profiler per gene per cell

**What:** for each gene in each cell type's atlas, compute the codon-usage frequency table and the codon-bias score (e.g., CAI — Codon Adaptation Index) relative to the cell-type-specific tRNA pool.

**Why:** codon usage is a per-tissue private channel (HALO_ENCODING_AS_CS_PROBLEM Layer 2). Different tissues have different tRNA pools; same mRNA in liver vs. neuron has different translation kinetics. Measuring this per-cell-type quantifies the per-tissue crypto-channel separation.

**Inputs:**
- BAM files (or per-cell expression from h5ad)
- GENCODE annotation (for which gene each read is)
- Reference genome (for codon sequences)

**Outputs:**
- `codon_usage_per_cell_type.json` — for each cell type, per-codon frequency
- `codon_bias_per_gene_per_cell_type.json` — per-gene CAI score per cell type
- Cross-tissue codon-usage divergence matrix

**Effort:** 1-2 days. **Status:** OPEN.

### B0.2 IDR predictor pipeline

**What:** for each protein in the human proteome, predict intrinsically-disordered regions using sequence-based methods (IUPred3, MobiDB, ESM2 disorder).

**Why:** IDRs are unforgeable polymorphism by absence of structure (HALO_ENCODING_AS_CS_PROBLEM Layer 3). Per-cell-type IDR content quantifies "how much of this cell's expressed proteome is structurally unverifiable" = how much polymorphism the cell has invested in.

**Inputs:**
- Per-cell-type expressed-protein list (from atlas + cohort data)
- Reference protein sequences

**Outputs:**
- `idr_per_protein.json` — IDR fraction per protein, IDR positions
- `idr_load_per_cell_type.json` — total IDR mass expressed per cell type, IDR-fraction distribution

**Effort:** 1 day with IUPred3. **Status:** OPEN.

### B0.3 Low-complexity-region profiler

**What:** Run SEG (or equivalent) on every expressed protein. Quantify low-complexity content per protein and per-cell-type aggregate.

**Why:** Low-complexity regions are high-mutation-rate identity tags (HALO_ENCODING_AS_CS_PROBLEM Layer 4). Per-cell-type content measures how much this layer of polymorphism is being used.

**Effort:** Hours. **Status:** OPEN.

### B0.4 Glycosylation-site predictor

**What:** Predict N-linked and O-linked glycosylation sites per protein using sequence motifs (NXS/T for N-linked; statistical methods for O-linked via NetOGlyc / GlycoMine).

**Why:** Glycosylation is a SECOND crypto layer on top of protein (HALO_ENCODING_AS_CS_PROBLEM Layer 5). Glycan-site density per protein is a proxy for "how much glycan-layer privacy this protein invests in."

**Outputs:**
- `glyco_sites_per_protein.json`
- `glyco_density_per_cell_type.json`

**Effort:** 1-2 days. **Status:** OPEN.

## Tier 1 — Privacy-stack integration (do after Tier 0)

### B1.1 Combine into privacy-stack-depth metric

**What:** per cell type, combine the 4 Tier-0 measurements (codon-bias-divergence, IDR-load, low-complexity-load, glyco-site-density) plus the splicing-layer metric already from BT86 (alt-splicing variance per gene). Produce a single "privacy-stack-depth score" per cell type.

**Why:** This is the headline measurement of the project. Gives a measurable proxy for "how cryptographically protected is this cell's broadcast?"

**Definition options to test:**
- Geometric mean of layer-normalized scores
- Weighted sum (weights determined empirically by predictive power)
- Per-layer entropy contribution

**Outputs:**
- `privacy_stack_per_cell_type.json` — score per cell type, per-layer breakdown
- `privacy_stack_depth_correlation_matrix.json` — between-layer correlations (do cells that invest in one layer also invest in others?)

**Effort:** 2-3 days. **Status:** BLOCKED on Tier 0.

### B1.2 Cohort P-vs-S privacy-stack profile

**What:** Apply the privacy-stack metric to the 6-sample cohort from BT85/86. Quantify which layer(s) drive the senescent-vs-proliferative differential.

**Why:** We've shown (BT86) that the splicing layer differs (15,658 junctions). Does the codon-usage layer also differ? IDR load? Glyco-site profile? Or is splicing where senescent cells invest their privacy budget?

**Effort:** 1-2 days after B1.1. **Status:** BLOCKED on B1.1.

## Tier 2 — Cancer prediction (do after Tier 1)

### B2.1 TCGA tumor-vs-normal privacy-stack analysis

**What:** Apply the privacy-stack metric to TCGA paired tumor-normal samples (LUAD, BRCA, COAD, ~3 cancer types initially). Test the central recycler-hypothesis prediction: cancer cells should have shallower privacy stacks at every layer.

**Why:** This is the headline cross-cohort validation of the framework.

**Predictions:**
- Cancer cells show LOWER intron retention at proliferation genes (CDC6/TOP2A/TPX2/CENPE/DLGAP5) than matched normal
- Cancer cells show simpler codon-usage profiles
- Cancer cells show reduced IDR content per protein
- Cancer cells show fewer glyco sites

**Inputs:**
- TCGA RNA-seq (BAM-level) — public
- Per-tumor-type matched normal

**Effort:** 1-2 weeks (data acquisition is the long pole). **Status:** OPEN, requires TCGA data pull.

### B2.2 Checkpoint-inhibitor response correlation

**What:** Cross-reference privacy-stack-depth scores with published checkpoint-inhibitor response data (TIDE, TCGA immunotherapy cohorts). Test prediction: cancers with shallower privacy stacks should respond BETTER to checkpoint inhibitors (because their tumor antigens are MORE legible once checkpoints are released).

**Effort:** 1 week. **Status:** BLOCKED on B2.1.

## Tier 3 — Architectural exploration

### B3.1 Codon-bias matching strength as viral host-range predictor

**What:** For known human viruses (influenza, HIV, SARS-CoV-2, HCV, HSV-1, etc.), compute codon-bias matching to human tRNA pools. Test whether matching strength correlates with host range, virulence, or replication efficiency.

**Why:** HALO_ENCODING_AS_CS_PROBLEM predicts that viruses investing in protein-level mimicry will show host-codon-bias matching. The strength of the match measures the virus's privacy investment.

**Effort:** 1 week. **Status:** OPEN.

### B3.2 m6A profile per cell type

**What:** Pull public MeRIP-seq / m6A-seq data per cell type (GEO has substantial human MeRIP). Quantify per-cell-type m6A density per gene.

**Why:** m6A is the self-marker watermark layer (HALO_ENCODING_AS_CS_PROBLEM Layer 6). Vaccines exploit pseudouridine for the same reason. Adding m6A to the privacy-stack measurement closes a major layer gap.

**Effort:** 2-3 weeks (data acquisition + integration). **Status:** OPEN.

### B3.3 Exosome cargo profile

**What:** Pull existing exosome RNA-seq + proteomics data; characterize cargo selection per cell type.

**Why:** Exosomes are the "encrypted RNA packets with addressing" layer. Cargo selection IS the encryption — what does the cell choose to put in encrypted packets vs. broadcast in the open?

**Effort:** 2 weeks. **Status:** OPEN.

### B3.4 IDR composition fingerprinting per cell type

**What:** beyond IDR load, characterize the AA composition of expressed IDRs per cell type. IDRs are not all the same — there's poly-Q-rich, poly-Pro-rich, charge-segregated, low-complexity, etc. The composition profile is itself a fingerprint.

**Why:** Different IDR composition profiles enable different phase-separation behaviors and different binding promiscuities. The cell-type-specific IDR fingerprint may be the highest-resolution identity layer.

**Effort:** 1 week. **Status:** BLOCKED on B0.2.

## Tier 4 — Cross-project integration

### B4.1 Coupling-tensor extension

**What:** extend `28_Project_RedFromTheGrave`'s coupling tensor to include privacy-stack-depth as a measured layer. The coupling tensor measures non-trivial layer-to-layer correlations; privacy-stack-depth is one such cross-layer property.

**Effort:** 1 week. **Status:** BLOCKED on B1.1.

### B4.2 Atlas extension to non-splicing layers

**What:** the `atlas_full6.db` currently encodes splicing-layer junction patterns. Extend to additional layers (codon usage, IDR, glycosylation) so a single atlas query returns the full privacy-stack profile per cell-state.

**Effort:** 2 weeks. **Status:** BLOCKED on B0.1-B0.4.

## Tier 5 — Outreach / write-up

### B5.1 Methods paper draft

**What:** Write up the privacy-stack-depth measurement methodology as a methods paper. Frame it as "a quantitative metric for cellular cryptographic-broadcast investment, with cancer-prediction and checkpoint-inhibitor-response applications."

**Effort:** 4-6 weeks (after results from B2.1 and B2.2). **Status:** BLOCKED on Tier 2.

### B5.2 Quantum-biology forum (2026-04-13) presentation

**What:** Present the framework chain (BT83-BT90 + Tier 0/1 measurements from this project) at the forum. Connect to Faggin's QIP framework via the "encoded vs broadcast information" axis.

**Effort:** 2 weeks (slide preparation). **Status:** OPEN, time-sensitive (deadline 2026-04-13).

## Closed / completed

(none yet — project just opened)
