Conditional Distribution of Multivariate Gaussian Variables: A Simple Derivation
We present a straightforward derivation for calculating the conditional probability distribution of multivariate Gaussian variables.
Let’s consider a multivariate Gaussian random variable $\mathbf{y}$ with a mean of zero, denoted as:
\[\mathbf{y} \sim \mathcal{N}(0, \Sigma).\notag\]where $\Sigma$ is the covariance matrix of $\mathbf{y}$.
We partition $\mathbf{y}$ into two parts, $\mathbf{y}_1$ and $\mathbf{y}_2$:
\[\mathbf{y}=\left[ \begin{array}{c} \mathbf{y}_1\\ \mathbf{y}_2 \end{array} \right]. \notag\]Our goal is to find out the probability distribution of $\mathbf{y}_2$ conditioned on $\mathbf{y}_1$.
Since $\Sigma$ is symmetric and positive definite, we can perform a Cholesky decomposition to express it as the product of a lower triangular matrix $L$ and its transpose:
\[\Sigma =L L^T, \notag\]We can further express $L$ in block form:
\[L = \left[ \begin{array}{cc} A & 0 \\ C & D \end{array} \right], \notag\]where $A$ and $D$ are square matrices that are lower triangular. The dimensions of $A$, $D$, and $C$ are $n\times n$, $m\times n$, and $m\times m$ respectively, where $n$ and $m$ represent the lengths of $\mathbf{y}_1$ and $\mathbf{y}_2$ respectively.
By utilizing this decomposition, we can represent $\mathbf{y}$ as a linear transformation of independent, normally distributed variables:
\[\mathbf{y} = L \mathbf{u}, \label{eqn:transform}\]where $\mathbf{u} \sim \mathcal{N}(0, \mathbf{I})$ and $\mathbf{I}$ denotes the identity matrix.
Expanding this equation, we have:
\[\left[ \begin{array}{c} \mathbf{y}_1\\ \mathbf{y}_2 \end{array} \right]= \left[ \begin{array}{cc} A & 0 \\ C & D \end{array} \right] \left[ \begin{array}{c} \mathbf{u}_1\\ \mathbf{u}_2 \end{array} \right],\]where the lengths of $\mathbf{u}_1$ and $\mathbf{u}_2$ are $n$ and $m$, respectively.
Isolating $\mathbf{y}_2$, we find: \(\mathbf{y}_2 = C \mathbf{u}_1 + D\mathbf{u}_2.\)
Given that $\mathbf{y}_1$ and $\mathbf{u}_1$ are fixed, we can express $\mathbf{u}_1$ as:
\[\mathbf{y}_1 = A \mathbf{u}_1\] \[\mathbf{u}_1=A^{-1}\mathbf{y}_1.\]This implies that $\mathbf{y}_2$ follows a multivariate Gaussian distribution:
\[\mathbf{x}_2 \mid \mathbf{x}_1 \sim \mathcal{N}(C\mathbf{u}_1, DD^T).\]To proceed, let’s calculate $C\mathbf{u}_1$ and $D D^T$.
First we express the covariance matrix in block form:
\[\Sigma =\left[ \begin{array}{cc} \Sigma_{11} & \Sigma_{12}\\ \Sigma_{21} & \Sigma_{22} \end{array} \right], \notag\]where $\Sigma_{ij}=\mathrm{cov}(\mathbf{y}_i, \mathbf{y}_j)$ and $i, j=1,2$. Then we can rewrite $\Sigma$ as:
\[\Sigma = L L^T =\left[ \begin{array}{cc} A A^T & A C^T\\ C A^T & C C^T + D D^T \end{array} \right], \label{eqn:cov_matrix}\]where $L$ is the matrix given earlier. From the equation above, we can observe that:
\(\Sigma_{21} = C A^T \Rightarrow C = \Sigma_{21} (A^T)^{-1}\) Now, let’s calculate $C\mathbf{u}_1$:
\[C\mathbf{u}_1 = \Sigma_{21} (A^T)^{-1} A^{-1}\mathbf{y}_1 = \Sigma_{21}(A A^T)^{-1} \mathbf{y}_1 = \Sigma_{21}\Sigma_{11}^{-1} \mathbf{y}_1,\]where we have used the fact that $(A A^T)^{-1} = (A^T)^{-1} A^{-1}$.
Next, we calculate $C C^T$:
\[C C^T = \Sigma_{21}(A^T)^{-1} A^{-1}\Sigma_{12}=\Sigma_{21}(A A^T)^{-1}\Sigma_{12}=\Sigma_{21}\Sigma_{11}^{-1}\Sigma_{12}.\]Using this result, we can find $D D^T$:
\[D D^T = \Sigma_{22} - C C^T = \Sigma_{22}-\Sigma_{21}\Sigma_{11}^{-1}\Sigma_{12}.\]Finally, when the mean $\mathbf{\mu}$ is non-zero:
\[\mathbf{\mu}=\left[ \begin{array}{c} \mathbf{\mu}_1\\ \mathbf{\mu}_2 \end{array} \right],\notag\]we adjust the mean and obtain: \(\mathbf{y}_2 \mid \mathbf{y}_1 \sim \mathcal{N}(\mathbf{\mu}_2+\Sigma_{21}\Sigma_{11}^{-1} (\mathbf{y}_1-\mathbf{\mu}_1), \Sigma_{22}-\Sigma_{21}\Sigma_{11}^{-1}\Sigma_{12}).\)