Gaussian mixture clustering identifies groups in your data that might overlap or have different shapes. Unlike rigid grouping, this method assigns each data point a probability of belonging to multiple groups. It is the right choice when you need to understand the ambiguity between segments rather than just hard assignments.
Beyond rigid boundaries
Most grouping methods draw a hard line around data points, forcing every observation into one distinct bucket. Gaussian mixture clustering takes a more flexible approach by assuming that data is generated from a mixture of different distributions. This allows the model to capture clusters that are not perfectly circular or are positioned close together.
By treating each cluster as a probability distribution, the tool can model elliptical shapes and handle cases where groups overlap in your feature space.
Understanding soft assignments
The primary output of this method is the probability of membership. For every row, you receive a breakdown of how well it fits into each cluster. This is particularly useful in business or research scenarios where a single label is often too reductive, such as identifying users who exhibit behaviors of multiple customer personas simultaneously.
Automated model selection
Deciding how many groups exist in a dataset can be subjective. Gaussian mixture clustering includes internal scoring metrics like BIC (Bayesian Information Criterion) to evaluate how well different model structures fit your data. This helps you identify the most statistically plausible number of clusters based on your input.
Flexible cluster shapes
You can control how the model defines cluster shapes through the covariance type. By default, the method uses a flexible 'full' configuration that allows each group to have its own unique size, orientation, and shape. For specific needs, you can switch to restricted versions that assume shared shapes or independent dimensions, allowing you to tailor the complexity to your specific data structure.
1 · Intent → method
An LLM picks cluster_gmm from a fixed catalog.
2 · Method → numbers
Deterministic Python engine runs the math. Same input → same output.
3 · Numbers → plain language
A second LLM translates the result into your domain’s vocabulary.
How is this different from simple grouping methods like K-means?
While K-means forces every point into one specific group, Gaussian mixture clustering calculates a probability for each group. It allows for 'soft' assignments where a point might be 70% in one group and 30% in another, and it can adapt to different cluster shapes.
What does 'soft probability' mean in my results?
Soft probability means the tool provides a likelihood score for each group assignment. For example, if a customer is hard to classify, you will see percentages reflecting how likely they are to fit into different segments, helping you identify ambiguous cases.
Tool input schema
Schema for cluster_gmm not exported yet (run pnpm export:registry).