Code
import pickle, gzip, pathlib
import pandas as pdRun this notebook once after helpers/survey.ipynb (i.e., after you have gtex_data/gene_expr.pkl and gtex_data/sample_meta.csv).
It slices the full gene_expr.pkl (all GTEx genes) down to the 20 display genes and saves the result as gtex_data/figure1_expr_cache.pkl.gz (~500 KB).
That small file is committed to the repo. figure1.ipynb loads it and then runs dabest.combine() live so the bootstraps, whorlmap, and all downstream statistics are computed fresh — not pre-baked.
Runtime here: < 1 s. The 10–15 min bootstrap is deferred to
figure1.ipynb.
RIGHT_GENES = [
'TPH2', 'CHRNA7', 'ESR1',
'TH', 'SLC6A3', 'DDC', 'AGRP',
'SST', 'PENK', 'GAD1', 'GAD2', 'CRH',
'DRD5', 'CHRM1', 'BDNF', 'CYP19A1',
'AIF1', 'MAOA', 'FKBP5', 'GFAP',
]
BRAIN_REGIONS = {
'Hypothalamus': 'Brain - Hypothalamus',
'Amygdala': 'Brain - Amygdala',
'Hippocampus': 'Brain - Hippocampus',
'Ant. Cing. Ctx': 'Brain - Anterior cingulate cortex (BA24)',
'Frontal Cortex': 'Brain - Frontal Cortex (BA9)',
'Cortex': 'Brain - Cortex',
'Caudate': 'Brain - Caudate (basal ganglia)',
'Putamen': 'Brain - Putamen (basal ganglia)',
'Nucleus Accumbens': 'Brain - Nucleus accumbens (basal ganglia)',
'Cerebellum': 'Brain - Cerebellum',
'Cerebellar Hemi.': 'Brain - Cerebellar Hemisphere',
'Substantia Nigra': 'Brain - Substantia nigra',
'Spinal Cord': 'Brain - Spinal cord (cervical c-1)',
}
region_order = list(BRAIN_REGIONS.keys())
DATA_DIR = pathlib.Path('gtex_data')gene_expr contains every protein-coding gene from GTEx (~56 000 entries). We only need the 20 neuroactive / glial genes shown in Figure 1.
cache = {
'expr_subset': expr_subset, # {gene_name: {sample_id: log2(tpm+1)}}
'gene_names': RIGHT_GENES,
'region_order': region_order,
}
cache_path = DATA_DIR / 'figure1_expr_cache.pkl.gz'
with gzip.open(cache_path, 'wb') as f:
pickle.dump(cache, f, protocol=5)
size_kb = cache_path.stat().st_size / 1e3
print(f'Saved {cache_path} ({size_kb:.0f} KB) — safe to commit directly.')Cache ready. Open figure1.ipynb — it will load this file and run dabest.combine() live.