Understanding your data is the foundational first step of any analysis. Our descriptive methodology provides a clear view of your metrics through three primary lenses: central tendency, dispersion, and distribution shape. In Lattice, this process follows a deterministic architecture to ensure total accuracy: first, our LLM identifies the need for a summary; second, the deterministic engine executes the calculations using standard mathematical frameworks (such as scipy and pandas) to generate exact, reproducible statistics; finally, the LLM translates these metrics into plain language, highlighting key insights like skewness or potential anomalies. This structure allows you to identify outliers or unusual patterns early, ensuring you select the correct downstream tests without wasting time on models unsuited for your current data distribution.
When to choose this family
- You want to know the baseline average and spread of a new dataset.
- You have a set of data and need to check for outliers or potential errors before modeling.
- You want to compare the distribution shape between two different batches or groups.
- You need to confirm if your data follows a normal distribution to justify further testing.
What this family does
This family transforms raw numbers into summarized insights. It calculates the center (mean, median), the spread (standard deviation, interquartile range), and the distribution shape (skewness, kurtosis).
Beyond simple numbers, these tools provide visual summaries. Histograms reveal the underlying structure—such as whether data is skewed or multi-modal—while boxplots effectively flag extreme values that might warrant a second look.
Why it differs from other approaches
Unlike inferential methods that attempt to draw conclusions about a larger population, this family is purely observational. It exists to report exactly what is present in your current dataset without making assumptions.
It acts as a critical safety gate. By running these checks first, you avoid the common trap of applying complex models, such as ANOVA, to data that is inherently unsuitable, such as datasets with zero variance or extreme skew.
Avoiding common pitfalls
A frequent mistake is ignoring the shape of the data. If you see a heavily skewed distribution, proceeding directly to a model that assumes normality will likely yield misleading results. Always check your skewness and peak indicators.
Another error is blindly deleting data points flagged as outliers in a boxplot. Descriptive tools identify these points to prompt a decision, not to automatically discard them; always investigate whether an outlier represents a genuine system extreme or a measurement error.
Frequently asked questions
- Do I need to clean my data before running descriptive statistics?
- Our descriptive tools are designed to be permissive. They handle missing values automatically by excluding them from calculations and reporting the count. You do not need to pre-clean the data; instead, let the tool generate a summary, then use the report to decide if specific data quality actions are necessary.
- Why does the tool sometimes tell me a test is blocked?
- A 'block' signal occurs when the underlying math cannot be computed meaningfully. For instance, if your data has zero variance—meaning every value in a column is identical—there is no 'spread' to analyze, making standard deviation or distribution tests mathematically impossible. The tool alerts you to this immediately to prevent invalid analysis.