Analyzing the Nonlocality of Sparse Autoencoder Features

Xiao-Liang Qi — Stanford Institute for Theoretical Physics, Stanford University

We introduce a gradient-based entropy measure that quantifies how nonlocal each SAE feature is in the input token space of a transformer. Motivated by an analogy with holographic duality (AdS/CFT), we apply this measure to Pythia-70m-deduped with pretrained SAEs across all six layers, finding that features exhibit a broad hierarchy of nonlocality, deeper layers produce more nonlocal features, and individual features have lower entropy than the reconstructed token vector.

Talk to this paper

This paper is published with an AI agent (Agentic Publication Protocol). Clone this repo and open it in an AI coding agent to ask questions, reproduce figures, and explore the work.

Claude Code:

git clone https://github.com/XiaoliangQi/sae-analysis-v1
cd sae-analysis-v1
claude

Or use the load skill from any Claude Code session:

/paper-protocol:load-paper-agent https://github.com/XiaoliangQi/sae-analysis-v1

Other agents (Codex, Cursor, etc.): Clone the repo and open it — any agent that reads AGENTS.md or README.md will pick up the paper context automatically.

Figures

Each figure has a dedicated generation script in scripts/figures/ that saves directly to paper/figures/.

Figure	Description	Script	Required data
Fig. 1	Unique-token distribution per feature (layer 0)	`scripts/figures/fig01_unique_tokens_histogram.py`	`feature_sparsity_data_resid_out_layer0.pt`
Fig. 2	Feature correlation matrix (layer 3)	`scripts/figures/fig02_correlation_heatmap.py`	`correlation_matrix_resid_out_layer3.pt`
Fig. 3	Influence heatmap for Feature 531 (layer 3)	`scripts/figures/fig03_influence_heatmap.py`	`feature_token_influence_resid_out_layer3.pt`
Fig. 4	Per-feature entropy distributions (layer 3)	`scripts/figures/fig04_entropy_distribution_batches.py`	`feature_token_influence_resid_out_layer3.pt`
Fig. 5	Feature entropy vs. layer depth	`scripts/figures/fig05_entropy_vs_depth.py`	`entropy_comparison_resid_out_layer{0..5}_*.pt`
Fig. 6	Entropy histograms across layers 0–4	`scripts/figures/fig06_entropy_multilayer_histograms.py`	`entropy_comparison_resid_out_layer{0..4}_*.pt`
Fig. 7	Feature entropy vs. context length (layer 2)	`scripts/figures/fig07_entropy_vs_batchsize.py`	`entropy_vs_batch_size_resid_out_layer2_*.pt`
Fig. 8	Entropy vs. activation probability (layer 5)	`scripts/figures/fig08_entropy_vs_activation.py`	`feature_token_influence_resid_out_layer5.pt` + `feature_sparsity_data_resid_out_layer5.pt`

Reproducing results

Setup

uv venv .venv && source .venv/bin/activate
uv pip install . torch torchvision torchaudio
uv pip install transformers accelerate einops datasets tqdm numpy matplotlib scipy
./pretrained_dictionary_downloader.sh   # downloads pretrained SAEs (~2.5 GB)
python scripts/utils/download_data.py   # downloads WikiText-2

Generate figures from pre-computed data

If you have the .pt data files, run the figure scripts directly:

python scripts/figures/fig01_unique_tokens_histogram.py
python scripts/figures/fig02_correlation_heatmap.py
# ... through fig08

Run the full pipeline from scratch

# Stage 1: sparsity statistics (needed for Figs 1, 2, 3, 4, 8)
python scripts/analysis/feature_sparsity.py          # ~5–15 min per layer, MPS

# Stage 2: correlation matrix (Fig 2)
python scripts/analysis/compute_correlations.py

# Stage 3: feature influence (Figs 3, 4, 8)
python scripts/analysis/feature_token_influence.py   # ~20–60 min per layer, MPS

# Stage 4: entropy comparison across all layers (Figs 5, 6)
python scripts/analysis/compare_entropies_multi_layer.py --layers 0 1 2 3 4 5 --num-batches 50

# Stage 5: context-length sweep (Fig 7)
python scripts/analysis/entropy_vs_batch_size.py --site resid_out_layer2

See AGENTS.md for full details including per-task timing estimates.

Repository structure

paper/                  ← LaTeX source and figures (ground truth)
scripts/
├── figures/            ← one script per paper figure (fig01–fig08)
├── analysis/           ← data-producing scripts (run these first)
├── plotting/           ← exploratory plotting utilities
├── inspect/            ← standalone model/SAE demos
└── utils/              ← data download and notebook utilities
notebooks/              ← Jupyter notebooks for exploratory analysis
dictionary_learning/    ← SAE library (Marks, Karvonen & Mueller)

Citation

@article{qi2026nonlocality,
  title={Analyzing the Nonlocality of Sparse Autoencoder Features},
  author={Qi, Xiao-Liang},
  year={2026},
  note={Stanford Institute for Theoretical Physics, Stanford University}
}

Name		Name	Last commit message	Last commit date
Latest commit History 354 Commits
.github/workflows		.github/workflows
dictionary_learning		dictionary_learning
notebooks		notebooks
paper		paper
scripts		scripts
slides		slides
supplementary		supplementary
tests		tests
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
note.md		note.md
pretrained_dictionary_downloader.sh		pretrained_dictionary_downloader.sh
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Analyzing the Nonlocality of Sparse Autoencoder Features

Talk to this paper

Figures

Reproducing results

Setup

Generate figures from pre-computed data

Run the full pipeline from scratch

Repository structure

Citation

About

Uh oh!

Releases 3

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Analyzing the Nonlocality of Sparse Autoencoder Features

Talk to this paper

Figures

Reproducing results

Setup

Generate figures from pre-computed data

Run the full pipeline from scratch

Repository structure

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 3

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages