Simplex

Simplex is an AI safety research organization building a science of intelligence.

We believe that understanding intelligence is safety. AI systems are deployed across society, and we don't know how they work. Without genuine understanding, we can't reliably monitor, control, or reason clearly about what these systems are doing. But these same systems also present a new opportunity. For the first time, we have machines complex enough to serve as testbeds for theories of intelligence itself, including biological.

Our aim is to develop and apply a rigorous theory of latent internal structure in neural networks — how they internally organize their representations, and how that structure relates to computation and behavior. We aim to build a theory applicable to intelligence, both artificial and biological.

We believe that intelligence is the defining issue of our time. Beyond the technical challenge, intelligence forces us to ask what we actually are. This is bigger than AI safety in the narrow sense. It's about understanding what makes us human, how we relate to the minds we're building, and what we want to become.

Careers

We are hiring. Research Scientists and Senior Research Scientists in the Bay Area, and Research Scientists in London.

Research

2026

Transformers learn factored representations

Preprint

Adam Shai, Loren Amdahl-Culleton, et al.

Feb 2026

Our world naturally decomposes into parts, but neural networks learn only from undifferentiated streams of tokens. We show that transformers discover this factored structure anyway, representing independent components in orthogonal subspaces and revealing a deep inductive bias toward decomposing the world into parts.

2025

Rank-1 LoRAs encode interpretable reasoning signals

NeurIPS Workshop

Jake Ward, Paul Riechers, Adam Shai

Nov 2025

Reasoning performance can arise from minimal, interpretable changes to base model parameters.

Neural networks leverage nominally quantum and post-quantum representations

Preprint

Paul Riechers, Thomas Elliott, Adam Shai

Jul 2025

Neural nets discover and represent beliefs over quantum and post-quantum generative models.

Simplex progress report

Blog

Adam Shai, Paul Riechers, Henry Bigelow, Eric Alt, Mateusz Piotrowski

Jul 2025

Next-token pretraining implies in-context learning

Preprint

Paul Riechers, Henry Bigelow, Eric Alt, Adam Shai

May 2025

In-context learning arises predictably from standard next-token pretraining.

Constrained belief updates explain geometric structures in transformer representations

ICML

Mateusz Piotrowski, Paul Riechers, Daniel Filan, Adam Shai

Feb 2025

Transformers implement constrained Bayesian belief updating shaped by architectural constraints.

2024

AXRP Interview: Computational mechanics and transformers

Talk

Adam Shai, Paul Riechers

Sep 2024

FAR Seminar: Building the science of predictive systems

Talk

Paul Riechers, Adam Shai

Jun 2024

What can you learn from next-token prediction?

Talk

Paul Riechers · Mathematics of Neuroscience and AI

Jun 2024

Transformers represent belief state geometry in their residual stream

NeurIPS

Adam Shai, Lucas Teixeira, Alexander Oldenziel, Sarah Marzen, Paul Riechers

May 2024

What computational structure are we building into large language models when we train them on next-token prediction? We present evidence that this structure is given by the meta-dynamics of belief updating over hidden states of the data-generating process.

Learn more about our work

Simplex was founded by Paul Riechers and Adam Shai in 2024, bringing the best of both physics and computational neuroscience to build a new science of intelligence needed for AI safety. We are a growing team of world-class researchers and engineers bringing scientific rigor to enable a brighter future.

We've shown that transformers trained on next-token prediction spontaneously organize their activations into geometric structures predicted by Bayesian inference over world models (manuscript, blog post). Even on simple training data, complex fractals emerge that reflect the hidden structure of the world the model is learning. The demo below lets you see it happen in real time.

Data
2 0 1
Training
Activation Geometry

A 42-parameter RNN trains in your browser on a 3-state hidden Markov model. As it learns to predict the next token, it organizes its activations into a fractal that mirrors optimal Bayesian belief geometry.

Since then, we've extended this in several directions: explaining how attention implements belief updating under architectural constraints, deriving in-context learning from training data structure, discovering quantum and post-quantum representations in networks, and showing that transformers decompose their world models into interpretable, factored parts. The perspective and intuition we've developed provides a unique edge for interpretability. For the bigger picture, see our progress report, the FAR Seminar talk, or this recent interview.

Our foundational result showed that transformers trained on next-token prediction spontaneously organize their activations into geometries predicted by Bayesian belief updating over hidden states of a world model. Even when trained on simple token sequences from hidden Markov models, complex fractals emerge in the residual stream, structures far removed from the surface statistics of the training data. We think of this work as providing the first steps into an understanding of what fundamentally we are training AI systems to do, and what representations we are implicitly training them to have.

In Constrained Belief Updating Explains Transformer Representations, we asked how attention implements belief updating when Bayesian inference is fundamentally recurrent. We found that attention parallelizes recurrence by decomposing belief updates spectrally across heads, and we were able to make verified predictions about embeddings, OV vectors, attention patterns, and residual stream geometry at different layers.

We've also developed a theory of in-context learning grounded in training data structure. When training data mixes multiple sources, models must infer not just what hidden state the generator is in, but which source is active. This hierarchical belief updating necessarily produces power-law loss scaling with context length and explains why induction heads emerge.

We've been asking what the most general computational framework for understanding neural network representations might be. Our initial work implied activations should lie in simplices, but we've now shown that networks discover quantum and post-quantum belief geometries when these are the minimal way to model their training data. This offers a new foundation for thinking about features, superposition, and what representations neural networks use on their own terms.

Most recently, we've shown that transformers naturally decompose their world model into interpretable parts. These factored belief representations provide an exponential-dimensional advantage, and suggest that we can understand and surgically intervene upon low-dimensional subspaces of large models.