Methods

Correlation & Association

This family is for operations, quality, biotech, and process engineering teams who need to understand how variables shift in tandem. Reach for these methods when you have continuous or ordered data and want to identify which factors move together before committing to deeper inferential modeling or experimental design.

The Correlation & Association family provides the statistical foundation to quantify how two or more variables change relative to each other. On Lattice, this begins when our LLM parses your natural language query—such as 'Does temperature impact yield?'—and selects the appropriate mathematical tool. Our deterministic engine then processes your data, calculating coefficients like Pearson’s r, Spearman’s rho, or Kendall’s tau. Finally, the LLM translates these numerical outputs into plain language, ensuring the result is contextualized while strictly maintaining the boundary that association does not imply causation. This three-stage architecture ensures that your analysis is not only mathematically precise but also interpreted with the caution required for sound decision-making.

When to choose this family

What this family does

These tools measure the intensity and direction of joint movement between variables. By returning coefficients between -1 and +1, they tell you whether variables tend to increase together, decrease together, or show no clear pattern.

Beyond simple pairs, these methods scale to multi-variable matrices. This allows you to identify hotspots of association across your entire dataset, directing your focus toward the variables that matter most for your process or operational goals.

Differentiation from other families

Correlation is fundamentally about description, not prediction or causation. While the Regression family attempts to model the exact effect size of one variable on another, this family strictly quantifies the strength of association without making assumptions about which variable drives the other.

Unlike the T-test or ANOVA families, which compare means between distinct categories, these methods look at how continuous scales track one another. If your data involves categories (e.g., 'Group A' vs 'Group B'), Lattice will guide you toward classification testing rather than correlation coefficients.

Common mistakes to avoid

The most frequent error is assuming that a strong correlation confirms a causal link. A high coefficient might arise from a common underlying factor, such as time or ambient temperature, rather than a direct mechanism.

Another pitfall is ignoring non-linear relationships. If your data follows a U-shape or exponential curve, standard linear coefficients may report zero correlation despite a clear underlying pattern. Always check the visual scatter plots provided by Lattice alongside the numerical results.

Frequently asked questions

Why does my result say 'correlation' isn't 'causation'?
Correlation only shows that two variables move together. It cannot tell you if X causes Y, Y causes X, or if a third variable is causing both. Lattice outputs this warning to prevent you from assuming a causal relationship that would require a controlled experiment to prove.
Which method should I use if I have extreme outliers?
If your data contains extreme values that could skew a standard Pearson calculation, use Spearman or Kendall methods. These use the 'rank' of the data rather than the exact values, making the result more resistant to the influence of outliers.
What do I do if I have a mix of categories and continuous numbers?
Correlation methods are designed for numerical or ranked scales. If you try to run these on categorical data, Lattice will prompt you to use a different tool, such as an ANOVA or T-test, which are specifically designed to compare metrics across different groups.

Methods in this family