This family organizes data by finding structural similarities between samples. Instead of predicting future labels, these tools reveal how your current data naturally partitions itself. Lattice handles this through a precise three-stage pipeline: first, the LLM parses your request to select the appropriate grouping logic (like K-means or Hierarchical methods); second, the deterministic engine processes the math, performing essential tasks like automatic standardization and distance matrix calculation; finally, the LLM translates technical outputs—such as cluster centers, silhouette scores, and inertia—into plain language summaries that describe the characteristics of each group.
When to choose this family
- You have a dataset of numeric observations and need to see if they naturally form 2 to 10 distinct groups.
- You want to identify subgroups or anomalies in your process logs without knowing the specific labels beforehand.
- You need to visualize the hierarchical relationship between your samples to determine if a multi-level grouping structure exists.
- You have a need to summarize the average characteristics of different groups to compare their performance or behavior.
What this family does
The tools in this family identify groups by minimizing the distance between samples within a cluster. When you run these methods, the platform calculates the geometric center of each group and provides summary statistics, such as the average and standard deviation for every feature.
Beyond simply assigning group labels to your rows, these methods also evaluate the quality of the formation. You receive metrics like the silhouette score, which indicates how well-separated your clusters are, and inertia, which tells you how compact your groupings are.
Differentiating your approach
The primary distinction within this family lies in how you define the number of groups. K-means is a direct approach where you specify the number of clusters in advance, making it efficient for large datasets where you have a clear hypothesis about how many segments exist.
Hierarchical methods, by contrast, build a structure that shows how samples merge together. This is helpful if you aren't sure how many clusters are appropriate, as it allows you to observe the distance at which groups combine and decide where to 'cut' the structure based on the specific needs of your analysis.
Common pitfalls to avoid
A frequent error is assuming that the machine will automatically assign meaningful business labels to a group. While the platform provides the statistical profile—showing you which variables define a cluster—the interpretation of whether a group represents 'high-performing' or 'anomalous' units remains a task for your domain expertise.
Additionally, forgetting that these methods are sensitive to the scale of your data can lead to misleading results. The platform defaults to automatic standardization, but if you opt out, features with larger numerical ranges will disproportionately influence the grouping, often masking the impact of smaller-scale variables.
Frequently asked questions
- How do I know if the number of clusters I chose is actually correct?
- Look at the silhouette score provided in the output. A score below 0.25 suggests the group structure is weak. If you see this, consider using the hierarchical tools to visualize the relationships or adjust the number of clusters to see if the score improves.
- Does Lattice automatically assign names to the groups I discover?
- No. The methodology produces statistics like cluster means and standard deviations, which represent the numerical characteristics of the group. You should review these summaries to determine the business significance of each cluster yourself.
- What happens if my data contains missing values?
- The tools automatically perform a listwise removal, dropping any row that contains missing values in the columns you selected. The final report will include a notice regarding how many rows were excluded, ensuring you have visibility into the remaining sample size.