Page "Information theory" Paragraph 53

« »

Another interpretation of KL divergence is this: suppose a number X is about to be drawn randomly from a discrete set with probability distribution p ( x ).

If Alice knows the true distribution p ( x ), while Bob believes ( has a prior ) that the distribution is q ( x ), then Bob will be more surprised than Alice, on average, upon seeing the value of X.

The KL divergence is the ( objective ) expected value of the Bob's ( subjective ) surprisal minus Alice's surprisal, measured in bits if the log is in base 2.

In this way, the extent to which Bob's prior is " wrong " can be quantified in terms of how " unnecessarily surprised " it's expected to make him.

Page 1 of 1.

1.842 seconds.

Most text is available under the Creative Commons Attribution-ShareAlike License; additional terms may apply.