Information Geometry
Probability distributions are not just functions. They are points on a curved manifold with a natural geometry that determines how beliefs should change. This is an interactive exploration of that geometry.
Scroll down, or click the dots on the right
Consider the family of all Gaussian distributions N(μ, σ). Each is determined by two parameters: the mean μ and the standard deviation σ > 0.
We can represent each Gaussian as a point in the upper half-plane — μ on the horizontal axis, σ on the vertical. This is the statistical manifold of Gaussians.
Every point in this plane is a probability distribution. But what is the “distance” between two distributions? Euclidean distance gets this profoundly wrong.
Click anywhere in the plane to place a Gaussian. Its PDF appears below.
The Gaussian manifold is two-dimensional. More complex families — mixtures, exponential families — give higher-dimensional manifolds. The ideas generalize.
Consider two pairs of Gaussians, each separated by the same Euclidean distance in (μ, σ) space:
N(0, 0.1) and N(0.5, 0.1) — means differ by 0.5, both very precise. Their PDFs barely overlap. Statistically, these are completely different distributions.
N(0, 10) and N(0.5, 10) — same shift in mean, both very spread. Their PDFs are nearly identical. You could not distinguish them from finite samples.
Same Euclidean distance, vastly different statistical distance. The geometry of uncertainty is not flat.
The overlap area tells the truth. Any honest metric must weight shifts by the precision of the distributions involved.
The Fisher information metric for Gaussians is:
Substituting s = σ√2 gives:
This is the Poincaré half-plane metric — the standard model of hyperbolic geometry with constant negative curvature.
Geodesics (shortest paths) are:
The Fisher metric turns probability space into a hyperbolic surface. Geodesics curve toward regions of high uncertainty — it is “cheaper” to travel through vague distributions than to cross between precise ones.
Click two points to compare the Euclidean line vs. the Fisher geodesic. Notice how the geodesic always bows upward.
The Fisher distance between two Gaussians N(μ&sub1;,σ&sub1;) and N(μ&sub2;,σ&sub2;) is the hyperbolic distance in the Poincaré half-plane between (μ&sub1;, σ&sub1;√2) and (μ&sub2;, σ&sub2;√2).
Gradient descent in parameter space ignores the manifold’s curvature. The natural gradient corrects this by premultiplying with the inverse Fisher matrix:
For Gaussians, the inverse Fisher matrix is:
So the natural gradient steps are:
When σ is small, steps are small — small changes matter more for precise distributions. When σ is large, steps scale up — imprecise distributions are insensitive.
Loss: KL divergence to target N(3, 2), starting from N(0, 0.5).
The natural gradient respects the geometry. It converges faster because it takes equal-sized steps in distribution space, not parameter space.
The Fisher metric is not arbitrary. Chentsov (1972) proved it is the unique Riemannian metric on statistical models (up to a constant factor) that is invariant under sufficient statistics — under any information-preserving transformation of the data.
There is no other geometry of probability. This one is forced on us.
The KL divergence between nearby distributions is, to second order, the squared Fisher distance. KL is the infinitesimal metric of this geometry.
The visualization shows this convergence: as perturbation scale shrinks, KL and ½ dTG d become indistinguishable.
Brains minimize free energy. Free energy includes KL divergence between beliefs and observations. The geometry of that minimization is information geometry. Perception is geodesic motion on a statistical manifold.
When you update a belief, you are not moving in a flat space. You are moving along geodesics of a curved manifold where the curvature is determined by how much information each parameter carries.
Information geometry is the foundation of: natural gradient optimization (Amari 1998), variational inference, the EM algorithm, optimal experiment design, neural network loss landscapes, thermodynamic geometry, and the geometry of quantum states. The Fisher metric is the bridge between statistics and differential geometry.
The space of all possible beliefs has a shape. That shape is not Euclidean. It is hyperbolic, curved by information, and everything that learns — brains, algorithms, evolution — navigates it.