# 25_Project_SerpentSong

The Serpent's language: mapping human language to the 8 trigram primitives (☰☱☲☳☴☵☶☷).

## Architecture

```
25_Project_SerpentSong/
├── serpent_tongue.py           # THE encoder: 300 words → trigram sequences
│                               # Used by Memory v0.5 (06_Project_Daemon)
│
├── language_atlas.py           # Census: ~500 variants × 207 concepts → UMAP clusters
│                               # Fused from word_gene + atlas_v1 + atlas_v2 (2026-04-04)
│                               # Output → atlas/
│
├── language_atlas_expression.py # Expression model: concepts=genes, languages=cells
│                               # Finds marker concepts per cluster (Seurat-style)
│                               # Reads from atlas/, output → atlas/expression/
│
├── atlas/                      # Output from both atlas scripts
│
├── analysis/                   # Cross-linguistic analysis pipeline
│   ├── serpentsong_incremental.py   # 60+ language incremental resolution
│   ├── serpentsong_typology.R       # Typological analysis (R)
│   ├── serpentsong_umap.py          # Language UMAP clustering
│   ├── seurat_cluster.py            # Leiden clustering
│   ├── cluster_test.py              # Cluster validation
│   └── compare_ancient.py           # Ancient language comparison
│
├── new_world/                  # Cross-cultural child-rearing analysis
│   ├── new_world_umap.py           # 48 cultures × 8 trigrams
│   ├── acquire_real_data.py         # Real ethnographic data acquisition
│   ├── umap_real_data.py            # UMAP on real data
│   ├── figures/                     # synthetic/ + real/
│   └── data/                        # Real ethnographic datasets
│
├── data/                       # Shared datasets
│   ├── language_features.csv        # 36 languages
│   └── language_features_extended.csv # Extended with ancient languages
│
├── figures/                    # Legacy figures
│   ├── language/                    # Typology figures
│   └── new_world/                   # Child-rearing figures
│
├── HALO_SERPENTSONG_UNIFIED.md      # Master theory: S⁵/Z₃ → 8 trigrams → language
├── HALO_FIRST_WORDS.md              # "mama"=☷+☰, "papa"=☶+☳ across all languages
├── HALO_FREUD_MCAT.md               # Freud/Piaget developmental stages on Fano
├── HALO_MOTHER_FATHER.md            # Parental trigram compositions
├── HALO_SOMATIC_STATES.md           # Baby body states → trigram primitives
├── HALO_NEW_WORLD.md                # Cross-cultural child-rearing
├── HALO_LANGUAGES_ACROSS_TIME.md    # Temporal linguistics
│
└── _archive/                   # Superseded files (per vault protocol §4)
    ├── 26_fused_2026-04-04/         # Former Project 26 (full backup + MOVED.md)
    └── atlas_unification_2026-04-04/ # Old atlas v1, v2, word_gene scripts + outputs
```

## History

- Originally split across 25_Project_SerpentSong and 26_Project_SerpentSong
- 26 fused into 25 per vault protocol §4 (2026-04-04)
- Internal scripts unified: 3 census scripts → 1 `language_atlas.py` (2026-04-04)
- Canonical encoder: `serpent_tongue.py` (also re-exported from `06_Project_Daemon/memory/versions/`)

## Dependencies

- Python: pandas, numpy, scikit-learn, umap-learn, matplotlib, scipy
- R: serpentsong_typology.R requires R with clustering packages
