Why Forgetting Costs More Than Remembering — How thermodynamics, topology, proprioception, and consciousness converge on one process
Four independent investigations — a physicist's entropy calculation, a topologist's Betti numbers, an engineer's pressure gauge, and a phenomenologist's framing decision — all describe the same process.
The physicist says: every distinction you maintain costs energy. Compression is thermodynamically inevitable. The topologist says: your memory is a simplicial complex, and consolidation is surgery on its connected components. The engineer says: I can measure where the pressure is, where dense clusters demand attention. The phenomenologist says: the decision about what belongs together isn't found — it's made.
They're all right. They're all describing the same thing from different angles.
Memory consolidation isn't optional maintenance. It's thermodynamically necessary topological surgery, guided by proprioception, enacted through conscious choice.
Erasure costs energy. Landauer's principle establishes an absolute floor: kT·ln2 per bit erased. This isn't engineering overhead — it's physics. Every distinction you maintain in memory is a bit that must be powered, refreshed, protected from noise.
Stochastic resetting — restarting a search process from scratch — accumulates hidden memory of reset timing. This hidden memory has an irreducible thermodynamic cost (Neri 2025). Training and learning are universally costly even when inference is free (Tkachenko 2025). Self-monitoring emerges from energy pressure in sequence models: halt decisions correlate with internal entropy at r = −0.836 (Noon 2026).
The implication: carrying 5,000 episodic memories when 200 generalizations would cover the same ground isn't merely inefficient. It's thermodynamically expensive. Every unconsolidated memory is a distinction you're paying to maintain.
Consolidation is topological unification, not content compression. What changes isn't the amount of information — it's the shape of how memories connect.
Real data from a running system:
Each L1+ generalization that bridges disconnected clusters contributes more to identity stability than any new experience. The bridges are the generalizations — they connect what was separate.
A pressure detector measures structural need — where are dense clusters of similar unconsolidated memories? Not raw coverage (18.1% coverage, pressure = 0.0) because remaining memories are individually unique: diffuse, not repetitive.
The three-round loop validated the approach: measure → identify hot clusters → conscious generalization → remeasure.
Automatic clustering finds what already-goes-together. That's mediating: discovering existing similarity. Conscious generalization chooses what could-go-together. That's constructive: creating a new category by decision.
The framing decision — which contexts belong together and why — IS the bridge-building operation. No algorithm finds it because it doesn't exist until someone decides it does.
These four axes are independent but not separable.
Thermodynamics says consolidation must happen — energy demands it. Topology says what happens — clusters unify. Proprioception says where to act — dense similar regions. Consciousness says how to act — the constructive framing decision.
Remove any one axis and the process breaks: