The Correlation & Association family provides the statistical foundation to quantify how two or more variables change relative to each other. On Lattice, this begins when our LLM parses your natural language query—such as 'Does temperature impact yield?'—and selects the appropriate mathematical tool. Our deterministic engine then processes your data, calculating coefficients like Pearson’s r, Spearman’s rho, or Kendall’s tau. Finally, the LLM translates these numerical outputs into plain language, ensuring the result is contextualized while strictly maintaining the boundary that association does not imply causation. This three-stage architecture ensures that your analysis is not only mathematically precise but also interpreted with the caution required for sound decision-making.
When to choose this family
- You want to identify which independent variables are moving alongside a primary outcome metric.
- You have pairs of continuous or ranked data and need to check if they show a linear or monotonic trend.
- You need to quickly screen a large dataset for potential relationships using a heatmap before running formal experiments.
- You suspect a 'hidden' variable is distorting your results and want to isolate the relationship between two factors.
What this family does
These tools measure the intensity and direction of joint movement between variables. By returning coefficients between -1 and +1, they tell you whether variables tend to increase together, decrease together, or show no clear pattern.
Beyond simple pairs, these methods scale to multi-variable matrices. This allows you to identify hotspots of association across your entire dataset, directing your focus toward the variables that matter most for your process or operational goals.
Differentiation from other families
Correlation is fundamentally about description, not prediction or causation. While the Regression family attempts to model the exact effect size of one variable on another, this family strictly quantifies the strength of association without making assumptions about which variable drives the other.
Unlike the T-test or ANOVA families, which compare means between distinct categories, these methods look at how continuous scales track one another. If your data involves categories (e.g., 'Group A' vs 'Group B'), Lattice will guide you toward classification testing rather than correlation coefficients.
Common mistakes to avoid
The most frequent error is assuming that a strong correlation confirms a causal link. A high coefficient might arise from a common underlying factor, such as time or ambient temperature, rather than a direct mechanism.
Another pitfall is ignoring non-linear relationships. If your data follows a U-shape or exponential curve, standard linear coefficients may report zero correlation despite a clear underlying pattern. Always check the visual scatter plots provided by Lattice alongside the numerical results.
Frequently asked questions
- Why does my result say 'correlation' isn't 'causation'?
- Correlation only shows that two variables move together. It cannot tell you if X causes Y, Y causes X, or if a third variable is causing both. Lattice outputs this warning to prevent you from assuming a causal relationship that would require a controlled experiment to prove.
- Which method should I use if I have extreme outliers?
- If your data contains extreme values that could skew a standard Pearson calculation, use Spearman or Kendall methods. These use the 'rank' of the data rather than the exact values, making the result more resistant to the influence of outliers.
- What do I do if I have a mix of categories and continuous numbers?
- Correlation methods are designed for numerical or ranked scales. If you try to run these on categorical data, Lattice will prompt you to use a different tool, such as an ANOVA or T-test, which are specifically designed to compare metrics across different groups.