Methods

Clustering

Clustering helps operations analysts, researchers, and quality engineers identify natural groupings within complex datasets. You should reach for this family when you have unlabelled observations—such as patient symptoms, batch performance metrics, or user behavior logs—and need to uncover hidden patterns or distinct segments without needing predefined categories.

This family organizes data by finding structural similarities between samples. Instead of predicting future labels, these tools reveal how your current data naturally partitions itself. Lattice handles this through a precise three-stage pipeline: first, the LLM parses your request to select the appropriate grouping logic (like K-means or Hierarchical methods); second, the deterministic engine processes the math, performing essential tasks like automatic standardization and distance matrix calculation; finally, the LLM translates technical outputs—such as cluster centers, silhouette scores, and inertia—into plain language summaries that describe the characteristics of each group.

When to choose this family

What this family does

The tools in this family identify groups by minimizing the distance between samples within a cluster. When you run these methods, the platform calculates the geometric center of each group and provides summary statistics, such as the average and standard deviation for every feature.

Beyond simply assigning group labels to your rows, these methods also evaluate the quality of the formation. You receive metrics like the silhouette score, which indicates how well-separated your clusters are, and inertia, which tells you how compact your groupings are.

Differentiating your approach

The primary distinction within this family lies in how you define the number of groups. K-means is a direct approach where you specify the number of clusters in advance, making it efficient for large datasets where you have a clear hypothesis about how many segments exist.

Hierarchical methods, by contrast, build a structure that shows how samples merge together. This is helpful if you aren't sure how many clusters are appropriate, as it allows you to observe the distance at which groups combine and decide where to 'cut' the structure based on the specific needs of your analysis.

Common pitfalls to avoid

A frequent error is assuming that the machine will automatically assign meaningful business labels to a group. While the platform provides the statistical profile—showing you which variables define a cluster—the interpretation of whether a group represents 'high-performing' or 'anomalous' units remains a task for your domain expertise.

Additionally, forgetting that these methods are sensitive to the scale of your data can lead to misleading results. The platform defaults to automatic standardization, but if you opt out, features with larger numerical ranges will disproportionately influence the grouping, often masking the impact of smaller-scale variables.

Frequently asked questions

How do I know if the number of clusters I chose is actually correct?
Look at the silhouette score provided in the output. A score below 0.25 suggests the group structure is weak. If you see this, consider using the hierarchical tools to visualize the relationships or adjust the number of clusters to see if the score improves.
Does Lattice automatically assign names to the groups I discover?
No. The methodology produces statistics like cluster means and standard deviations, which represent the numerical characteristics of the group. You should review these summaries to determine the business significance of each cluster yourself.
What happens if my data contains missing values?
The tools automatically perform a listwise removal, dropping any row that contains missing values in the columns you selected. The final report will include a notice regarding how many rows were excluded, ensuring you have visibility into the remaining sample size.

Methods in this family