This is a brief derivation that maximum likelihood estimation approximates the minimization of the Kullback-Leibler (KL) distance.

Suppose the observed data x are from a probability distribution P(x). Our model of the distribution is Q(x|θ), where θ is the model parameter. The Kullback-Leibler distance between the model and the true distribution is

K(θ)=P(x)logP(x)Q(x|θ).

Estimation of θ is obtained by minimizing K(θ):

ˆθ=argminθK(θ)=argmaxθP(x)logQ(x|θ).

The last step is because P(x) does not depend on x and the P(x)logP(x) term is dropped.

Finally, using the Law of Large Numbers, the Equation (2) becomes

ˆθ=argmaxθP(x)logQ(x|θ)argmaxθ1NNi=1logQ(xi|θ)=argmaxθlogΠNi=1Q(xi)=argmaxθΠNi=1Q(xi)=argmaxθL(x1,x2,,xN|θ),

where N is the number of observed data points, L(x1,x2,,xN|θ) is the likelihood. This derivation shows that by approximating the true distribution with the observed sample data, maximum likelihood estimation is equivalent to minimizing the KL distance.