Use the Kaplan-Meier survival curve when you need to understand the timing of events—such as patient recovery, equipment failure, or customer churn—over a specific period. It is designed to handle cases where some observations are incomplete, allowing you to estimate survival probabilities even when you don't know the final outcome for every subject.
Understanding Survival Estimates
The Kaplan-Meier survival curve is a statistical tool used to estimate the probability that an event will occur at a certain time. Unlike simple averages, which can be misleading when data is incomplete, this approach calculates the probability of survival at each specific moment an event occurs, based on the number of subjects still 'at risk' of that event.
By focusing on the ratio of events to the total number of subjects at risk, the calculation provides a clear picture of how survival changes over time. This makes it an effective way to visualize performance or health metrics in clinical, industrial, or business settings.
Handling Censored Data
In many studies, you cannot observe the event for every participant. A patient might move away, a machine might be taken out of service, or a study might conclude before every subject experiences the event. These incomplete observations are called 'censored' data.
The Kaplan-Meier survival curve incorporates these censored data points by keeping them in the 'at risk' pool until the moment they drop out. This ensures that the estimate remains accurate, reflecting the true duration for which each subject was observed without the event.
Comparing Multiple Groups
When you have multiple groups—such as different treatment arms or product batches—you can use the log-rank test alongside your Kaplan-Meier survival curve. This test assesses whether there is a statistically significant difference between the curves of your groups.
Lattice provides the log-rank results automatically, giving you the chi-squared value and p-value. These metrics help you decide if the differences you see in the survival curves are likely due to real underlying factors rather than random variation.
Interpreting Results
The output provides the 'median survival time,' which is the time at which the probability of survival drops to 50%. If your data does not reach this threshold, you will see a notice indicating that the median was not reached.
You also receive 95% confidence intervals, which provide a range of uncertainty for your survival estimates. These intervals are vital for understanding the reliability of your findings, particularly when the number of subjects at risk becomes small toward the end of your study timeline.
1 · Intent → method
An LLM picks survival_km from a fixed catalog.
2 · Method → numbers
Deterministic Python engine runs the math. Same input → same output.
3 · Numbers → plain language
A second LLM translates the result into your domain’s vocabulary.
What does it mean if my Kaplan-Meier survival curve stays flat?
A flat curve typically indicates that no events (such as failures or deaths) occurred during that time interval. If your data has zero events, the Kaplan-Meier survival curve cannot provide a useful estimate, and you should ensure your event column correctly identifies which observations reached the endpoint.
How does this method handle subjects who left the study early?
This method treats subjects who leave early as 'censored.' Instead of ignoring them or assuming they never experienced the event, the Kaplan-Meier survival curve uses their data to inform the calculation until the point they dropped out, ensuring your results are not biased by missing information.
Tool input schema
Schema for survival_km not exported yet (run pnpm export:registry).