$$ H(p, q) = - \sum_{c \in C} p(c)\log q(c) $$

The ’true’ distribution is usually expressed in terms of a one-hot distribution.

Suppose that the true label of an instance is B. The one-hot distribution for this instance is:

Pr(Class A)     Pr(Class B)     Pr(Class C)
    0.0             1.0             0.0

Suppose a machine learning algorithm predicts the following probability distribution:

Pr(Class A)     Pr(Class B)     Pr(Class C)
    0.228           0.619         0.153

The cross-entropy loss of this case is 0.479:

H = - (0.0*ln(0.228) + 1.0*ln(0.619) + 0.0*ln(0.153)) = 0.479

Python code

import numpy as np

p = np.array([0, 1, 0])             # True probability (one-hot)
q = np.array([0.228, 0.619, 0.153]) # Predicted probability

cross_entropy_loss = -np.sum(p * np.log(q))
print(cross_entropy_loss)
# 0.47965000629754095

References

This content was originally shared by stackoverflowuser2010 on Stack Overflow.

For more details, you can check out the full discussion here.