🏠
Guest Not signed in

Entropy, cross-entropy, KL: cost of the wrong model

How costly is it to use the wrong model? KL(P||Q) is the extra bits you pay encoding samples from $P$ with a code optimal for $Q$.

Method · KL divergence
Intro

Entropy is your information budget for the True distribution. Cross-entropy is what you actually pay when you encode $P$-data with $Q$'s code. KL divergence is the difference β€” the “excess” cost of using the wrong distribution. KL is always non-negative and zero iff $P = Q$. ML training minimises cross-entropy for exactly this reason.

βœ“ Intro Β· expand
Independent · Legal