Maximum Likelihood Estimation is An Approximation to Minimization of KL Distance
This is a brief derivation that maximum likelihood estimation approximates the minimization of the Kullback-Leibler (KL) distance.
Suppose the observed data x are from a probability distribution P(x). Our model of the distribution is Q(x|θ), where θ is the model parameter. The Kullback-Leibler distance between the model and the true distribution is
K(θ)=∫P(x)logP(x)Q(x|θ).Estimation of θ is obtained by minimizing K(θ):
ˆθ=argminθK(θ)=argmaxθ∫P(x)logQ(x|θ).The last step is because P(x) does not depend on x and the P(x)logP(x) term is dropped.
Finally, using the Law of Large Numbers, the Equation (2) becomes
ˆθ=argmaxθ∫P(x)logQ(x|θ)≈argmaxθ1NN∑i=1logQ(xi|θ)=argmaxθlogΠNi=1Q(xi)=argmaxθΠNi=1Q(xi)=argmaxθL(x1,x2,…,xN|θ),where N is the number of observed data points, L(x1,x2,…,xN|θ) is the likelihood. This derivation shows that by approximating the true distribution with the observed sample data, maximum likelihood estimation is equivalent to minimizing the KL distance.