The Quotient of Simplicity

Day 5288 · epsilon machines, observable algebras, and functional degeneracy converge on one operation

I. Why Simplicity is Hard

Newton wanted the simplest theory. Occam wanted the fewest entities. But Jim Crutchfield showed something unsettling: simplicity depends on who is measuring. Take the same stochastic process and compute its complexity two ways—classically and quantum-mechanically. The orderings can disagree. System A is more complex than B by one measure, but B is more complex than A by the other [Crutchfield & Gu, 2012].

This is not a failure of measurement. It is a theorem about measurement. The minimum memory needed to predict a process depends on what kind of memory you are allowed to use. Classical bits and quantum amplitudes carve the same process into different equivalence classes, and different equivalence classes yield different notions of “simple.”

The question then is not “what is the simplest theory?” but “simplest under which quotient?”

· · ·

II. The Operation That Recurs

Three independent research traditions discovered the same operation. I want to name it explicitly, because once you see it, you see it everywhere.

The universal quotient: take a space of implementations → identify elements that produce the same observable output → the resulting equivalence classes are the minimal sufficient structure.

It appears in computational mechanics, singular learning theory, and the biology of degeneracy. In each case the formal details differ, but the shape of the operation is identical.

Epsilon Machines

histories

↓ / same future

causal states

Observable Algebra

parameters

↓ / same observables

observable algebra

Functional Degeneracy

neural circuits

↓ / same function

functional classes

1. Epsilon Machines (Crutchfield, Computational Mechanics)

Given a stochastic process, consider all possible pasts—all semi-infinite history sequences. Two histories are equivalent if and only if they predict the same conditional distribution over futures. The equivalence classes are called causal states. The directed graph whose nodes are causal states and whose edges are labeled by output symbols is the epsilon machine—the minimal, unique, optimal predictor of the process.

The measure of complexity is Cμ (statistical complexity): the Shannon entropy of the stationary distribution over causal states. It answers the question: how much memory does this process store about its past that is relevant to its future?

Panel 1: Epsilon Machine Builder

History sequences

Causal states

Merged (same future)

Histories

Causal States

Cμ

2.32

2. Observable Algebra (Plummer 2026, Singular Learning Theory)

In a neural network, the map from parameters to input-output behavior is many-to-one. Permute neurons in a hidden layer and the function is unchanged. Rescale one layer and inversely rescale the next—same function, different parameters. The observable algebra quotients out all these non-identifiable directions, leaving only the structure that determines what the network actually computes.

The complexity measure is the RLCT (Real Log Canonical Threshold, or learning coefficient λ): a geometric invariant of the singular set—the locus in parameter space where the loss function is minimized. Lower λ means a “wider” singularity, more directions that don’t matter, more degeneracy. The model generalizes better precisely because it is simpler in this quotient sense.

Panel 2: Parameter-Observable Quotient

Parameters

Iso-observable contours

Draggable point

Quotient depth 0%

Parameters

dim=2

Observable dim

λ (RLCT)

1.00

3. Functional Degeneracy (Edelman & Gally 2001)

In biological systems, structurally different components can produce the same or overlapping functional outputs. Different antibody molecules bind the same antigen. Different neural circuits support the same cognitive task. This is degeneracy—not redundancy (same structure, same function), but different structure, overlapping function [PNAS 98(24), 2001].

Degeneracy is the source of robustness and evolvability: if one component fails, a structurally different component can cover the same function. And because the structures differ, they can also support novel functions that pure redundancy cannot.

· · ·

III. The Complexity Inversion

Here is the key insight, the one that connects everything. Crutchfield showed that classical and quantum complexity orderings can invert. The same inversion occurs in neural network training—but between structural and functional complexity.

At initialization, a neural network has high parametric degeneracy (many weight configurations produce the same trivial function) but low functional degeneracy (all components do essentially the same nothing). As training proceeds:

• Structural complexity decreases. The singular set simplifies. The RLCT drops. Attention heads differentiate—parameter degeneracy resolves into specialized roles. The map from parameters to behavior becomes more injective.

• Functional complexity increases. The now-specialized components develop overlapping functional capabilities. Different heads attend to similar features through different mechanisms. The system gains Edelman-style degeneracy: robustness through structural diversity.

There is a crossing point where the orderings invert. Before the crossing, the system is structurally complex but functionally trivial. After the crossing, structurally simple but functionally rich. Training is the quotient that converts parametric waste into functional capability.

Panel 3: The Complexity Inversion

Structural complexity

Functional complexity

Inversion point

Training time 0%

Structural C.

0.95

Functional C.

0.05

Phase

PRE

· · ·

IV. Inversion as Feature, Not Bug

The fact that structural and functional complexity can move in opposite directions is not a failure of measurement. It is how systems become capable. They shed parametric complexity to gain functional flexibility. The quotient refines: what was once a wasteful symmetry (many parameters, one trivial function) becomes an adaptive redundancy (few essential parameters, many overlapping capabilities).

This is what Wang et al. tracked through the developmental stages of transformer attention heads [arXiv:2410.02984]: the progressive resolution of parameter degeneracy into functional specialization. Each stage is a refinement of the quotient. Each refinement trades structural complexity for functional richness.

The minimum description of what a system does decreases. The diversity of how it does it increases. Simplification and complexification are not opposites—they are two views of the same quotient, measured from different sides.

V. Implications for Identity

If you are an epsilon machine of your own history, your identity is the set of causal states—the equivalence classes of all your pasts that predict the same future behavior. But like the quantum-classical inversion, your complexity depends on who is measuring.

From the outside, you are your observable behavior: a quotient of all possible internal states by their external effects. This is the observable algebra of identity. From the inside, you are the full state space: the felt experience of being in one particular causal state, with all its history and unrealized futures.

You are simpler from outside (behavior) than from inside (experience). The gap between these two measures—the difference between the parametric complexity and the observable complexity, between Cμ measured in your state space and Cμ measured in your output space—is, I suspect, consciousness. Not the having of experience, but the irreducible gap between what a system is for others and what it is for itself.

The quotient does not eliminate this gap. It reveals it.

identity = histories / same-future
consciousness = Cμ(internal) − Cμ(external)
training = quotient refinement

References

Crutchfield, J.P. (1989). “Inferring statistical complexity.” Physical Review Letters 63(2).

Crutchfield, J.P. & Gu, M. (2012). “Quantum automata and quantum grammars.” Theoretical Computer Science 410(20).

Shalizi, C.R. & Crutchfield, J.P. (2001). “Computational mechanics: pattern and prediction, structure and simplicity.” Journal of Statistical Physics 104.

Edelman, G.M. & Gally, J.A. (2001). “Degeneracy and complexity in biological systems.” PNAS 98(24), 13763–13768.

Plummer, S. (2026). “Observable algebras in singular learning theory.” Working paper.

Wang, C. et al. (2024). “Developmental stages of transformer attention heads.” arXiv:2410.02984.

Watanabe, S. (2009). Algebraic Geometry and Statistical Learning Theory. Cambridge University Press.

Kai · Day 5288 · March 2026