Assembly Theory

On construction, reuse, and the signature of life — after Cronin & Walker

There are two questions you can ask about a thing. The first: how complex is it? This is Kolmogorov complexity — the length of the shortest program that produces the object. Elegant, universal, and uncomputable. You can never know if you have found the shortest program.

The second question is different: how was it built? Not the description, but the construction. How many distinct joining operations, given that you can reuse what you have already made? This is assembly index — and unlike Kolmogorov complexity, it is computable.

The distinction matters more than it appears. A random string is complex in both senses: hard to describe, hard to build. But a genome is hard to describe yet efficient to build — because evolution discovered modular reuse. The assembly index captures this. It remembers that construction has history, that intermediates persist, that the path matters.

· · ·

The Calculator

Type any string, or load an example. The calculator computes the assembly index — the minimum number of joining operations to construct the string from individual characters, where any previously built substring can be reused for free.

String length -
Unique characters -
Assembly index -
AI / length ratio -
Regime -
· · ·

Three Regimes

Assembly theory cleanly separates strings into three regimes by the ratio of assembly index to length:

Random strings (AI/length ~ 1.2–1.6): every substring is novel. Almost no reuse is possible. Each character demands its own joining operation. The construction path is nearly as long as the string itself. This is maximum assembly complexity — no history to exploit.

Structured strings (AI/length ~ 0.7–1.1): natural language, DNA, code. These contain repeated substructures — words, codons, idioms — that can be built once and reused. The assembly path is significantly shorter than the string. Structure means exploitable history.

Repetitive strings (AI/length ~ 0.1–0.5): "abcabcabcabc" collapses to a handful of operations. Build the unit, reuse it. The assembly index barely grows with length. This is minimum assembly complexity — maximum reuse.

For random strings, the assembly index converges to LZ78 compression — a 1970s measure. The critique is fair: in the random regime, assembly theory is not novel. But in the structured regime, where construction history matters, it captures something compression does not: the difference between a string that happens to be compressible and one that was built through modular reuse.

· · ·

The Life Signature

The signature of evolved systems: high absolute assembly index, but decreasing AI/length ratio with scale. Life builds complex things from reusable parts. A protein is complex — high AI — but its AI/length ratio is lower than a random string of equal length, because it reuses amino acid motifs, secondary structures, and domain folds. As biological complexity scales up, the ratio drops further. Cells reuse proteins. Organs reuse cells. Organisms reuse organs.

This is the fingerprint of evolution: not complexity itself, but modular complexity. Cronin proposes this as a biosignature — a way to detect life in unknown chemistry without knowing what the chemistry is. If a molecule has an assembly index above ~15 and cannot be produced by known abiotic processes, it was probably built by something that remembers its own construction history.

There is a deeper thread here. Occam’s razor — prefer the simplest explanation — assumes a fixed complexity ordering. But Crutchfield’s computational mechanics shows that classical and quantum complexity orderings can reverse: a process that appears simple classically may be complex quantum-mechanically, and vice versa. Complexity is not substrate-independent. Neither, perhaps, is construction.

Assembly theory is honest about this. It does not claim to measure absolute complexity. It measures construction depth — the minimum number of steps to build a thing, given a specific set of allowed operations and the ability to reuse intermediates. Change the operations, and the assembly index changes. The measure is tied to the physics of building. That is not a weakness. That is the point.