Hyperparameter Search: From Exhaustive to Intelligent¶

What you'll learn:

Why hyperparameter tuning matters for model performance
How grid search works and when it fails
Why random search often beats grid search
When to use halving search for expensive models
How to choose the right strategy for your problem

Prerequisites: Why Pipelines Matter, basic sklearn familiarity.

The problem: finding good hyperparameters¶

Most machine learning models have hyperparameters—settings that control learning but aren't learned from data. A decision tree's max_depth, a regularized model's penalty strength C, a neural network's learning rate.

These parameters matter enormously. The wrong regularization strength can cause underfitting (too strong) or overfitting (too weak). Default values are reasonable starting points, but rarely optimal for your specific data.

Hyperparameter search systematically explores the parameter space to find better configurations. sklab's search() method wraps various search strategies with consistent logging.

Strategy 1: Grid search¶

Grid search is the simplest approach: specify a set of values for each parameter, and try every combination.

How grid search works¶

Given these parameter values:

C: [0.1, 1.0, 10.0]
gamma: [0.01, 0.1]

Grid search evaluates all 6 combinations:

(C=0.1, gamma=0.01), (C=0.1, gamma=0.1),
(C=1.0, gamma=0.01), (C=1.0, gamma=0.1),
(C=10.0, gamma=0.01), (C=10.0, gamma=0.1)

Complexity: O(∏ᵢ |Vᵢ|) where Vᵢ is the set of values for parameter i.

Grid search with sklab¶

from sklearn.datasets import load_iris
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler

from sklab.experiment import Experiment
from sklab.search import GridSearchConfig

X, y = load_iris(return_X_y=True)

pipeline = Pipeline([
    ("scale", StandardScaler()),
    ("model", LogisticRegression(max_iter=200)),
])

experiment = Experiment(
    pipeline=pipeline,
    scoring="accuracy",
    name="iris-grid",
)

# Grid search over regularization strength
result = experiment.search(
    GridSearchConfig(
        param_grid={"model__C": [0.01, 0.1, 1.0, 10.0, 100.0]},
        refit=True,
    ),
    X, y,
    cv=5,
    run_name="grid-search",
)

print(f"Best params: {result.best_params}")
print(f"Best score: {result.best_score:.4f}")

When grid search works¶

Small parameter spaces: 2-3 parameters with a few values each
Need exact reproducibility: Grid search is deterministic
All parameters matter equally: Grid allocates equal attention to each

When grid search fails: the curse of dimensionality¶

The curse of dimensionality kills grid search in high dimensions:

Parameters	Values each	Total combinations
2	10	100
3	10	1,000
5	10	100,000
10	10	10,000,000,000

With expensive model training, exhaustive search becomes intractable.

Strategy 2: Random search¶

Random search samples parameters independently from specified distributions.

How random search works¶

Instead of a grid, you define distributions:

C: log-uniform(0.01, 100)
gamma: log-uniform(0.001, 1)

Each trial draws random values. After n trials, you keep the best.

The key insight: why random beats grid¶

In 2012, Bergstra and Bengio published a surprising result: random search often finds better hyperparameters than grid search with the same budget.

Why? In most problems, only a few parameters actually matter. If 2 of 10 parameters drive performance, grid search wastes most of its budget on irrelevant dimensions. Random search samples the important dimensions densely regardless of how many unimportant dimensions exist.

Consider a 2D search when only one dimension matters:

Grid (9 points):          Random (9 points):
x x x                     x   x    x
x x x                       x   x
x x x                     x    x  x x

Grid tests 3 unique        Random tests 9 unique
values on the important    values on the important
dimension.                 dimension.

The grid wastes 6 evaluations testing the same 3 values repeatedly. Random search explores 9 unique values on the dimension that matters.

Random search with sklab¶

from sklearn.datasets import load_iris
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import RandomizedSearchCV
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from scipy.stats import loguniform

from sklab.experiment import Experiment

X, y = load_iris(return_X_y=True)

pipeline = Pipeline([
    ("scale", StandardScaler()),
    ("model", LogisticRegression(max_iter=200)),
])

# Define distributions instead of fixed values
param_distributions = {
    "model__C": loguniform(0.01, 100),  # log-uniform from 0.01 to 100
}

searcher = RandomizedSearchCV(
    pipeline,
    param_distributions=param_distributions,
    n_iter=20,  # number of random samples
    scoring="accuracy",
    cv=5,
    random_state=42,
    refit=True,
)

experiment = Experiment(
    pipeline=pipeline,
    scoring="accuracy",
    name="iris-random",
)

result = experiment.search(searcher, X, y, run_name="random-search")
print(f"Best params: {result.best_params}")
print(f"Best score: {result.best_score:.4f}")

When to use random search¶

Medium-to-high dimensional spaces: When you can't afford exhaustive search
Cheap evaluations: When you can afford many trials
Low effective dimensionality: When only a few parameters matter (common)

Strategy 3: Halving search¶

Halving search is a budget-aware strategy that quickly discards unpromising candidates.

How halving search works¶

The idea: don't give every configuration a full evaluation. Start with many candidates using small budgets (few samples, few iterations), then progressively increase the budget while keeping only the best performers.

Start with n candidates, evaluate each with budget b
Keep the top 1/factor candidates
Multiply budget by factor
Repeat until one candidate remains

This is related to the successive halving algorithm, which inspired Hyperband (used in Optuna's pruning).

Halving search with sklab¶

from sklearn.experimental import enable_halving_search_cv  # noqa: F401
from sklearn.model_selection import HalvingRandomSearchCV
from sklearn.datasets import load_iris
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from scipy.stats import loguniform

from sklab.experiment import Experiment

X, y = load_iris(return_X_y=True)

pipeline = Pipeline([
    ("scale", StandardScaler()),
    ("model", LogisticRegression(max_iter=200)),
])

param_distributions = {
    "model__C": loguniform(0.01, 100),
}

searcher = HalvingRandomSearchCV(
    pipeline,
    param_distributions=param_distributions,
    n_candidates=16,  # start with 16 candidates
    factor=2,         # halve candidates each round
    scoring="accuracy",
    cv=3,
    random_state=42,
    refit=True,
)

experiment = Experiment(
    pipeline=pipeline,
    scoring="accuracy",
    name="iris-halving",
)

result = experiment.search(searcher, X, y, run_name="halving-search")
print(f"Best params: {result.best_params}")
print(f"Best score: {result.best_score:.4f}")

When halving search works¶

Large candidate pools: When you have many configurations to try
Scalable budget: When you can meaningfully vary training budget (more data, more iterations, more trees)
Early differentiation: When bad configurations show poor performance early

Comparison: same problem, three strategies¶

Let's compare all three strategies on the same dataset to see how they behave.

from sklearn.datasets import load_breast_cancer
from sklearn.ensemble import RandomForestClassifier
from sklearn.experimental import enable_halving_search_cv  # noqa: F401
from sklearn.model_selection import GridSearchCV, RandomizedSearchCV, HalvingRandomSearchCV
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from scipy.stats import randint

from sklab.experiment import Experiment

X, y = load_breast_cancer(return_X_y=True)

pipeline = Pipeline([
    ("scale", StandardScaler()),
    ("model", RandomForestClassifier(random_state=42)),
])

experiment = Experiment(
    pipeline=pipeline,
    scoring="accuracy",
    name="search-comparison",
)

# Define the search space
param_grid = {
    "model__n_estimators": [10, 50, 100],
    "model__max_depth": [3, 5, 10, None],
    "model__min_samples_split": [2, 5, 10],
}

param_distributions = {
    "model__n_estimators": randint(10, 150),
    "model__max_depth": [3, 5, 10, 20, None],
    "model__min_samples_split": randint(2, 20),
}

# Grid search: 3 × 4 × 3 = 36 combinations
grid_searcher = GridSearchCV(
    pipeline,
    param_grid=param_grid,
    scoring="accuracy",
    cv=5,
    refit=True,
)
grid_result = experiment.search(grid_searcher, X, y, run_name="grid")
print(f"Grid: {grid_result.best_score:.4f} (36 combinations)")

# Random search: 20 random samples
random_searcher = RandomizedSearchCV(
    pipeline,
    param_distributions=param_distributions,
    n_iter=20,
    scoring="accuracy",
    cv=5,
    random_state=42,
    refit=True,
)
random_result = experiment.search(random_searcher, X, y, run_name="random")
print(f"Random: {random_result.best_score:.4f} (20 samples)")

# Halving: starts with many, progressively eliminates
halving_searcher = HalvingRandomSearchCV(
    pipeline,
    param_distributions=param_distributions,
    n_candidates=32,
    factor=2,
    scoring="accuracy",
    cv=3,
    random_state=42,
    refit=True,
)
halving_result = experiment.search(halving_searcher, X, y, run_name="halving")
print(f"Halving: {halving_result.best_score:.4f} (32 initial candidates)")

Bring your own searcher¶

sklab doesn't lock you into specific searchers. Any object that conforms to the Searcher protocol (structural typing, no inheritance required) and exposes fit(X, y), best_params_, best_score_, and best_estimator_ works.

from dataclasses import dataclass
import random

from sklearn.base import clone
from sklearn.datasets import load_iris
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import cross_val_score
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler

from sklab.experiment import Experiment

@dataclass
class SimpleRandomSearch:
    """A minimal random searcher for demonstration."""
    estimator: Pipeline
    param_sampler: callable  # function that returns random params
    n_iter: int = 10
    cv: int = 5
    scoring: str = "accuracy"

    best_params_: dict | None = None
    best_score_: float | None = None
    best_estimator_: Pipeline | None = None

    def fit(self, X, y=None):
        best_score = float("-inf")
        best_params = None

        for _ in range(self.n_iter):
            params = self.param_sampler()
            estimator = clone(self.estimator).set_params(**params)
            score = cross_val_score(
                estimator, X, y,
                scoring=self.scoring,
                cv=self.cv,
            ).mean()

            if score > best_score:
                best_score = score
                best_params = params

        self.best_params_ = best_params
        self.best_score_ = float(best_score)
        self.best_estimator_ = (
            clone(self.estimator)
            .set_params(**best_params)
            .fit(X, y)
        )
        return self

X, y = load_iris(return_X_y=True)

pipeline = Pipeline([
    ("scale", StandardScaler()),
    ("model", LogisticRegression(max_iter=200)),
])

def sample_params():
    return {"model__C": 10 ** random.uniform(-2, 2)}

searcher = SimpleRandomSearch(
    estimator=pipeline,
    param_sampler=sample_params,
    n_iter=15,
)

experiment = Experiment(
    pipeline=pipeline,
    scoring="accuracy",
    name="custom-searcher",
)

result = experiment.search(searcher, X, y, run_name="custom")
print(f"Best params: {result.best_params}")
print(f"Best score: {result.best_score:.4f}")

Accessing the underlying searcher¶

The SearchResult returned by experiment.search() exposes the underlying sklearn searcher via the .raw attribute. This gives you access to detailed cross-validation results, timing information, and other sklearn-specific data.

from sklearn.datasets import load_iris
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler

from sklab.experiment import Experiment
from sklab.search import GridSearchConfig

X, y = load_iris(return_X_y=True)

pipeline = Pipeline([
    ("scale", StandardScaler()),
    ("model", LogisticRegression(max_iter=200)),
])

experiment = Experiment(
    pipeline=pipeline,
    scoring="accuracy",
    name="iris-raw-access",
)

result = experiment.search(
    GridSearchConfig(
        param_grid={"model__C": [0.01, 0.1, 1.0, 10.0]},
        refit=True,
    ),
    X, y,
    cv=5,
)

# Access the underlying GridSearchCV via .raw
searcher = result.raw
print(f"Number of candidates: {len(searcher.cv_results_['params'])}")
print(f"Best index: {searcher.best_index_}")

Inspecting CV results¶

The cv_results_ attribute contains detailed information about every parameter combination tested:

# View results for each candidate
for i, params in enumerate(searcher.cv_results_["params"]):
    mean_score = searcher.cv_results_["mean_test_score"][i]
    std_score = searcher.cv_results_["std_test_score"][i]
    print(f"{params}: {mean_score:.4f} (+/- {std_score:.4f})")

For more convenient analysis, convert to a DataFrame:

import polars as pl

cv_df = pl.DataFrame({
    "C": [p["model__C"] for p in searcher.cv_results_["params"]],
    "mean_score": searcher.cv_results_["mean_test_score"],
    "std_score": searcher.cv_results_["std_test_score"],
    "mean_fit_time": searcher.cv_results_["mean_fit_time"],
})
print(cv_df)

This works the same way for RandomizedSearchCV and HalvingRandomSearchCV.

Decision guide: which strategy to use¶

Situation	Recommended Strategy
Small grid (< 100 combinations)	Grid search
Need exact reproducibility	Grid search
Medium space, cheap evaluations	Random search
High-dimensional space	Random search
Large candidate pool, scalable budget	Halving search
Expensive evaluations	Optuna (see Optuna Search)
Complex search logic	Custom searcher

Best practices¶

Start with random search. Unless your space is tiny, random search is a safe default that works well across many problems.
Use log-uniform for scale parameters. Parameters like learning rate, regularization strength, and kernel width often span orders of magnitude. Log-uniform sampling explores this space more evenly.
Set a budget, not a grid size. Decide how many evaluations you can afford, then choose a strategy that uses that budget well.
Log everything. sklab logs all search results automatically. Review them to understand which parameters matter.
Don't over-tune. Hyperparameter optimization has diminishing returns. If your model is fundamentally wrong, no amount of tuning will save it.

Notes¶

Config classes default to the experiment scoring when scoring is not set.
Config classes use cv from Experiment.search() unless you set cv on the config.
When you pass a searcher instance directly, Experiment.search() will call its fit() method and log best_params_, best_score_, and best_estimator_ if the searcher exposes them.
Use step__param names to target pipeline steps (for example, model__C).