The random forest classifier is a versatile tool for predicting categories in your data. Use it when you need to understand which variables drive outcomes without making strict assumptions about the shape of your data. It is ideal for identifying patterns in complex, non-linear relationships where multiple variables interact together.
Predictive Modeling with Random Forest
The random forest classifier works by constructing a large number of individual decision trees during training. By combining the insights from these trees, the model effectively captures complex data trends and interactions that simpler models might miss. It is particularly effective for tabular data where variables have non-linear effects on the target category.
To ensure reliable results, this tool uses 200 trees by default. It also employs a deterministic seed to make sure that the same input data produces the exact same result every time you run the analysis, providing stability for your reporting.
Understanding Your Results
When the analysis is complete, the platform provides performance metrics such as accuracy and log loss. Beyond these numbers, you receive a detailed breakdown of feature importance. This helps you move from simply predicting outcomes to understanding which factors are actually driving those outcomes.
By utilizing SHAP TreeExplainer, the model provides specific, direction-aware insights. For instance, it can clarify not just that a variable is important, but how its values correspond to increases or decreases in the probability of a specific category occurring.
Built-in Quality Checks
To prevent misleading conclusions, the random forest classifier includes automated post-check flags. If the model shows signs of overfitting—where it performs significantly better on training data than on test data—or if your target categories are highly imbalanced, the platform will alert you.
These warnings help you identify when the results should be interpreted with caution. For example, if a category represents more than 90% of your data, the system warns that accuracy might be a deceptive metric and suggests focusing on precision, recall, or F1 scores instead.
1 · Intent → method
An LLM picks ml_random_forest from a fixed catalog.
2 · Method → numbers
Deterministic Python engine runs the math. Same input → same output.
3 · Numbers → plain language
A second LLM translates the result into your domain’s vocabulary.
How does this tool decide which features are important?
The random forest classifier uses three different methods to rank features: MDI (impurity reduction), Permutation (actual impact on test results), and SHAP. We prioritize SHAP scores as they provide the most accurate, direction-aware attribution for your specific data.
What happens if my data is too small?
The platform automatically checks for data size. If you have fewer than 50 training rows, the tool will issue a low-data concern, as the random forest classifier may become unstable. In such cases, we often recommend using classical statistical methods instead.
Tool input schema
Schema for ml_random_forest not exported yet (run pnpm export:registry).