The ’true’ distribution is usually expressed in terms of a one-hot distribution.
Suppose that the true label of an instance is B. The one-hot distribution for this instance is:
Pr(Class A) Pr(Class B) Pr(Class C)
0.0 1.0 0.0
Suppose a machine learning algorithm predicts the following probability distribution:
Pr(Class A) Pr(Class B) Pr(Class C)
0.228 0.619 0.153
The cross-entropy loss of this case is 0.479:
H = - (0.0*ln(0.228) + 1.0*ln(0.619) + 0.0*ln(0.153)) = 0.479
Python code
import numpy as np
p = np.array([0, 1, 0]) # True probability (one-hot)
q = np.array([0.228, 0.619, 0.153]) # Predicted probability
cross_entropy_loss = -np.sum(p * np.log(q))
print(cross_entropy_loss)
# 0.47965000629754095
References
This content was originally shared by stackoverflowuser2010 on Stack Overflow.
For more details, you can check out the full discussion here.