$$ \hat {\mathbf h}{mle}=\frac{1}{N}\sum^N{i=1}\mathbf x_i\triangleq \bar {\mathbf x} $$
$$ \hat {\mathbf\Sigma}{mle}=\frac{1}{N}\sum^N{i=1}(\mathbf x_i-\bar{\mathbf x})(\mathbf x_i-\bar{\mathbf x})^\top=\frac{1}{N}(\sum^N_{i=1}\bar{\mathbf x_i} \bar{\mathbf x_i}^\top)-\bar{\mathbf x}\bar{\mathbf x}^\top $$
Each base distribution in the mixture is a multivariate Gaussian with mean $\mu_k$ and covariance matrix $\Sigma_k$.
$Z_i \in \{1,...,K\}$, representing a discrete latent state. A discrete prior for this, $p(z_i)=Cat(\pi)$. For the likelihood, we use $p(x_i|z_i=k)=p_k(x_i)$, where $p_k$ is the k’th base distribution for the observations; this can be of any type.
$$ p(\mathbf x_i|\mathbf \theta)=\sum^K_{k=1}\pi_k\mathcal N(\mathbf x_i|\mathbf \mu_k, \mathbf \Sigma_k) \text{ s.t.} \sum^K_{k=1}\pi_k=1 $$
This is a convex combination of the $p_k$’s, since we are taking a weighted sum, where the mixing weights $\pi_k$ satisfy $0\le\pi_k\le1$ and $\sum_{k=1}^K\pi_k=1$
$$ p(\mathbf x|\mathcal D)=\frac{1}{N}\sum^N_{i=1} \mathcal N (\mathbf x| \mathbf x_i, \sigma^2\mathbf I) $$
GMM → KDE
$K \rightarrow N$, $\pi_k=1/N$, $\mu_k=x_i$, $\sigma=\text{hyperparameter (bandwidth)}$
$$ \hat p(x)=\frac{1}{N}\sum^N_{i=1} \kappa_h(\mathbf x - \mathbf x_i) $$