Factorization of a Multivariate Normal Distribution

PUBLISHED ON DEC 10, 2019 — CATEGORIES: proofs

The Factorization Theorem is a very useful mathematical tool and crucial to statistics, probabilistic calculus and many applications. Formally, it is closely related to the idea that $P(x, y) = P(x | y) P(y)$. One of the many nice properties of the normal distribution in this context is that it factorizes into normal distributions. The above link gives the proof for cases with unit covariance matrix. Here we will cover the general case for any covariance matrix, and provide an explicit expression for the factorization via direct proof. As a bonus, the math required for it is a very nice piece of linear algebra, so let’s get to it!

Definitions



Multivariate normal PDF

The probability density function (PDF) for the multivariate normal distribution is defined as:

$ \mathcal{N}(x; \mu, S) = \frac{1}{\sqrt{(2\pi)^D |S|}} exp(-\frac{1}{2}(x-\mu)^T S^{-1} (x-\mu)) $

For $x,\, \mu \, \in \mathbb{R}^D, \quad S \, \in \mathbb{R}^{D\times D}$

Where $S$ is the (symmetric, positive semidefinite) covariance matrix:

$ \begin{pmatrix} \sigma_{1}^2 & Cov(X_1, X_2) & \dots \\\
Cov(X_1, X_2) & \sigma_{2}^2 & \dots \\\
\vdots & \vdots & \ddots \\\
\end{pmatrix} $

And $|S|$ is the determinant of $S$.



Schur Complement

The Schur Complement establishes the relation between the partition of a matrix and the partition of its inverse, as follows:

$$ \begin{eqnarray*} S &&= \begin{pmatrix} S_{11} & S_{12} \\\
S_{21} & S_{22} \end{pmatrix}\\\
S^{-1} &&= \Delta = \begin{pmatrix} \Delta_{11} & \Delta_{12} \\\
\Delta_{21} & \Delta_{22} \end{pmatrix}\\\
&&= \begin{pmatrix} (\frac{S}{S_{22}})^{-1} & -(\frac{S}{S_{22}})^{-1} S_{12} S_{22}^{-1} \\\
-S_{22}^{-1} S_{21} (\frac{S}{S_{22}})^{-1} & (\frac{S}{S_{11}})^{-1} \end{pmatrix} \end{eqnarray*} $$

Where the complement terms are defined as:

$$ \frac{S}{S_{22}} = S_{11} - S_{12} S_{22}^{-1}\\\
\frac{S}{S_{11}} = S_{22} - S_{21} S_{11}^{-1}\\\
S_{ab}, \Delta_{ab} \; \in \mathbb{R}^{d_a \times d_b} $$

And the following implication holds:

$$ det(S) = |S| = |S_{22}| \cdot |\frac{S}{S_{22}}| $$

Also note that the Schur Complement can be applied both ways, since the inverse of a positive definite matrix is also positive definite, so $S_{22} = (\frac{\Delta}{\Delta_{11}})^{-1}$.

Theorem

With the given definitions, we already can express the goal of our proof:

Factorization of a multivariate $\mathcal{N}$ distribution

$$ \begin{eqnarray*} \mathcal{N}(x; \; \mu, S) = &&\mathcal{N}(x_2; \; \mu_2, S_{22}) \; \cdot \\\
&&\mathcal{N}(x_1, \; \mu_1 - \frac{S}{S_{22}} \Delta_{12} (x_2-\mu_2), \; \frac{S}{S_{22}}) \end{eqnarray*} $$

$$ x_1, \mu_1 \, \in \mathbb{R}^{d_1}, \quad x_2, \mu_2 \, \in \mathbb{R}^{d_2} $$

Proof

The Schur Complement, together with completing the square, allows for an elegant direct proof.

First, split the determinant in the normalization term, reparametrize the exp argument and split it into the 2 input partitions corresponding to $x_1$ and $x_2$:

$$ \begin{eqnarray*} &&\mathcal{N}(x; \mu, S) = \frac{1}{\sqrt{(2\pi)^D |S|}} exp(-\frac{1}{2}(x-\mu)^T S^{-1} (x-\mu))\\\
&&= \frac{1}{\sqrt{(2\pi)^{d_2} |S_{22}|}}\frac{1}{\sqrt{(2\pi)^{d_1} |\frac{S}{S_{22}}|}} exp(-\frac{1}{2} z^T \Delta z)\\\
z^T \Delta z &&= z_1^T \Delta_{11} z_1 + 2 z_1^T \Delta_{12} z_2 + z_2^T \Delta_{22} z_2\\\
z &&:= (x-\mu) \end{eqnarray*} $$

Now complete the square in the exponential term in order to factor out the second partition. This can be done by finding $M,m,c$ so that the following holds:

$$ \begin{eqnarray*} z^T \Delta z &&= (z_1-m)^T M (z_1-m) + c\\\
&&= z_1^T M z_1 - 2 z_1^T M m + m^T M m + c\\\
\end{eqnarray*} $$

We first find $M$, then $m$, then $c$:

$$ \begin{eqnarray*} z_1^T M z_1 \hat{=} z_1^T \Delta_{11} z_1 \iff \, &&M \hat{=} \Delta_{11} = (\frac{S}{S_{22}})^{-1}\\\
-2 z_1^T M m \hat{=} 2 z_1^T \Delta_{12} z_2 \iff \, &&\Delta_{11} m \hat{=} -\Delta_{12} z_2\\\
\iff \, &&m \hat{=} - \Delta_{11}^{-1} \Delta_{12} z_2 =\\\
&&- \frac{S}{S_{22}} \Delta_{12} z_2\\\
z_2^T \Delta_{22} z_2 \hat{=} m^T M m + c \iff \, &&c \hat{=} z_2^T \Delta_{22} z_2 - m^T \Delta_{11} m\\\
\iff \, c \hat{=} z_2^T &&(\Delta_{22} - \Delta_{21} \Delta_{11}^{-1} \Delta_{12}) z_2\\\
\iff \, &&c \hat{=} z_2^T S_{22}^{-1} z_2 \end{eqnarray*} $$

And now we can rewrite the exponential as the following equivalent expression:

$$ \begin{eqnarray*} z^T \Delta z = &&(z_1 + \frac{S}{S_{22}} \Delta_{12} z_2)^T (\frac{S}{S_{22}})^{-1} (z_1 + \frac{S}{S_{22}} \Delta_{12} z_2)\\\
&&+ z_2^T S_{22}^{-1} z_2\\\
= &&(x_1 - (\mu_1 - \frac{S}{S_{22}} \Delta_{12} (x_2-\mu_2)))^T (\frac{S}{S_{22}})^{-1} \, \cdot\\\
&&(x_1 - (\mu_1 - \frac{S}{S_{22}} \Delta_{12} (x_2-\mu_2)))\\\
&&+ (x_2-\mu_2)^T S_{22}^{-1} (x_2-\mu_2)\\\
= &&(x_1 - \tilde{\mu_1})^T (\frac{S}{S_{22}})^{-1} (x_1 - \tilde{\mu_1})\\\
&&+ (x_2-\mu_2)^T S_{22}^{-1} (x_2-\mu_2)\\\
\tilde{\mu_1} := && \mu_1 - \frac{S}{S_{22}} \Delta_{12} (x_2-\mu_2) \end{eqnarray*} $$

Which allows to finish the proof:

$$ \begin{eqnarray*} &&\mathcal{N}(x; \mu, S)\\\
= &&\frac{1}{\sqrt{(2\pi)^D |S|}} exp(-\frac{1}{2}(x-\mu)^T S^{-1} (x-\mu))\\\
= &&\frac{1}{\sqrt{(2\pi)^{d_2} |S_{22}|}}\frac{1}{\sqrt{(2\pi)^{d_1} |\frac{S}{S_{22}}|}} exp(-\frac{1}{2} z^T \Delta z)\\\
= &&\frac{1}{\sqrt{(2\pi)^{d_2} |S_{22}|}}\frac{1}{\sqrt{(2\pi)^{d_1} |\frac{S}{S_{22}}|}} \, \cdot\\\
&&exp(x_1 - \tilde{\mu_1})^T (\frac{S}{S_{22}})^{-1} (x_1 - \tilde{\mu_1}) \, \cdot\\\
&&exp(-\frac{1}{2} (x_2-\mu_2)^T S_{22}^{-1} (x_2-\mu_2))\\\
= && \mathcal{N}(x_2; \, \mu_2, S_{22}) \, \cdot \, \mathcal{N}(x_1, \; \mu_1 - \frac{S}{S_{22}} \Delta_{12} (x_2-\mu_2), \, \frac{S}{S_{22}})\\\
&&\square \end{eqnarray*} $$