Gaussian Mixture Model (GMM)

A Gaussian Mixture Model is a probabilistic clustering method that assumes data points are generated from a mixture of multiple Gaussian distributions whose parameters are unknown.

Unlike K-Means, which does hard clustering (each point belongs to exactly one cluster), GMM performs soft clustering, meaning every point has a probability of belonging to each cluster.

Gaussian mixture model showing multiple clusters represented by normal distributions with different means and variances

Working of GMM (By Dimri Sir)

Assume we have K Gaussian clusters.
Each cluster corresponds to a Gaussian distribution with its own mean and covariance.

For a data point , the probability that it belongs to cluster is:

P(z_{n}=k\mid x_{n})=\frac{\pi_k \cdot\mathcal{N}(x_n \mid\mu_k , \Sigma_k)} {\sum_{j = 1}^{K} \pi_j \cdot\mathcal{N}(x_n \mid\mu_j , \Sigma_j)}

Where:

→ latent variable indicating point belongs to cluster
→ mixing coefficient for the Gaussian
→ Gaussian distribution with mean and covariance

The overall likelihood of observing data point is:

p(x_{n})=\sum_{j=1}^{k}\pi_{j}\,\mathcal{N}(x_{n}\mid\mu_{j},\Sigma_{k})

Expectation–Maximization (EM) Algorithm

To fit a GMM to data, we use the EM algorithm, an iterative method that optimizes the parameters

1. E-Step (Expectation)

Calculate the responsibility of each cluster for every data point:

How likely is point to belong to cluster
Based on current estimates of mean, covariance, and mixing coefficients

2. M-Step (Maximization)

Update the parameters:

Update means
Update covariances
Update mixing coefficients

These updated parameters maximize the likelihood of observing the data.

The process repeats until convergence.

Working of GMM (My Own)

1. Initialization

Choose:

Number of clusters
For each cluster:
- Mean
- Covariance
- Weight

Values are initially random or taken from K-Means.

2. E-Step (Expectation Step)

Compute responsibility for each data point:

👉 Probability that a data point belongs to each cluster.

Formula (conceptually):

If the point is close to cluster mean → high probability
If far → low probability

This assigns soft memberships.

3. M-Step (Maximization Step)

Update the parameters based on responsibilities:

New means = weighted average of points
New covariances = weighted spread
New weights = how much responsibility each cluster has

So clusters get reshaped according to data.

4. Repeat until convergence

Keep repeating:
E-Step → M-Step → E-Step → M-Step
until the parameters stop changing.

Output

GMM gives:

The final clusters
For each point → probability of belonging to each cluster
Cluster shapes (elliptical, not circular like K-Means)

EM Algorithm

11 Oct 2025

1. Initialization

Randomly initialize the parameters for each of the Gaussians:

Means
Covariances
Mixing coefficients

2. E-Step (Expectation)

Calculate the responsibility.

r_{ik}=\frac{\pi_k \cdot\mathcal{N}(x_i \mid\mu_k, \Sigma_k)} {\sum_{j=1}^{K} \pi_j \cdot\mathcal{N}(x_i \mid\mu_j, \Sigma_j)}

3. M-Step (Maximization)

Update the parameters.

\boxed{\pi_k = \sum_{i = 1}^N r_{ik}}

\boxed{\mu_k^{\text{new}} = \frac{\sum_{i=1}^N r_{ik} \cdot x_i}{\sum_{i=1}^N r_{ik} }}

\sum_{k}^{\text{new}}=\sum_{i=1}^{N}r_{ik}\left(x_{i}-\mu_{k}^{\text{new}}\right)\cdot\left(x_{i}-\mu_{k}\right)^{T}

4. Convergence

Repeat E-step and M-step until:

Parameters stop changing significantly, or
Likelihood converges.

Applications of GMM

1. Clustering

Find hidden groups in data.
Used in:

Marketing
Medicine
Genetics
Customer segmentation

2. Anomaly Detection

Identify rare or unusual patterns, e.g.,

Fraud detection
Medical error detection
Network intrusion detection

3. Image Segmentation

Divide images into meaningful regions.
Used in:

Medical imaging
Remote sensing
Military applications

4. Density Estimation

Model complex probability distributions for:

Generative modeling
Sampling
Feature understanding

Advantages of GMM

Flexible cluster shapes
Can model ellipsoidal / overlapping clusters (unlike K-Means).
Soft assignment
Assigns probabilities instead of hard labels.
Handles missing data
More robust to incomplete observations.
Interpretable parameters
Each Gaussian has:
- Mean
- Covariance
- Mixing coefficient

All easy to analyze and understand.