The Wrong Valley

The landscape represents a cost function — a space of possible approaches. Each valley is a local optimum: a strategy that resists small changes because every nearby alternative is worse.

Gradient descent finds the nearest valley. It is rational and efficient. It is also blind to topology — it cannot know whether its valley is the deepest one.

Optimization layers (momentum, learning rate decay, perturbation) are refinements within the same approach class. They make the descent smoother but cannot escape the basin.

Jumping is qualitatively different. It trades certainty for access. The cost is real: you might land somewhere worse. But it is the only move that changes which valleys are reachable.