Ensemble Learning
Once you have a classification or prediction model, how can you make it better?
Hyperparameter tuning is one option, but it can only go so far
Why not combine multiple models to get better performance?
If we have \(B\) learners \(\hat{f}_1, \ldots, \hat{f}_B\), each with variance \(\sigma^2\), and pairwise correlation \(\rho\):
\[\operatorname{Var}\!\left(\frac{1}{B}\sum_{b=1}^B \hat{f}_b(x)\right) = \rho\sigma^2 + \frac{1-\rho}{B}\,\sigma^2\]
Note
Each bootstrap sample excludes ~36.8% of observations (out-of-bag) → We can use OOB observations to estimate test error without a validation set
Two main approaches:
To prevent overfitting by controlling how much each tree can correct the errors of the previous ones.
| Parameter | Effect |
|---|---|
eta (learning rate) |
Lower = more trees, less overfitting |
max_depth |
Deeper = more complex, more overfitting |
subsample |
Fraction of data per tree (reduces overfitting) |
colsample_bytree |
Fraction of features per tree |
min_child_weight |
Minimum sum of instance weight in leaf |
lambda (reg_lambda) |
L2 regularization on leaf weights |
alpha (reg_alpha) |
L1 regularization on leaf weights |
Each model family has different inductive biases → different errors → a well-chosen meta-learner can exploit complementary strengths
Pros: - Fast baseline performance - Less manual tuning required - Good for initial exploration
Cons: - Can be a “black box” - Risk of data leakage in validation - May miss domain-specific insights - Often overkill for simple problems
| Method | Key Idea | When to Use |
|---|---|---|
| Bagging | Average bootstrap trees | Reduce variance of unstable models |
| Random Forests | Bagging + feature subsampling | Default for tabular data |
| Boosting | Sequential error correction | When you need high accuracy |
| XGBoost | Regularized boosting | Industry standard for competitions |
| Stacking | Combine different model families | When you have diverse models |
| AutoML | Automated search | Fast baseline, exploration |