Help


from Wikipedia
« »  
Another interpretation of KL divergence is this: suppose a number X is about to be drawn randomly from a discrete set with probability distribution p ( x ).
If Alice knows the true distribution p ( x ), while Bob believes ( has a prior ) that the distribution is q ( x ), then Bob will be more surprised than Alice, on average, upon seeing the value of X.
The KL divergence is the ( objective ) expected value of the Bob's ( subjective ) surprisal minus Alice's surprisal, measured in bits if the log is in base 2.
In this way, the extent to which Bob's prior is " wrong " can be quantified in terms of how " unnecessarily surprised " it's expected to make him.

1.842 seconds.