Logger Adapters: Tracking Your Experiments¶

What you'll learn:

Why experiment tracking matters for reproducibility
How sklab's logging works with MLflow and W&B
When to use each logging backend
How to build custom loggers for other backends

Prerequisites: The Experiment Class, basic understanding of ML workflows.

The problem: experiments are easy to lose¶

You run 50 experiments over two weeks. Some use different hyperparameters, some use different preprocessing, some use different data splits. At the end, you know one worked well—but which one? What were its settings?

Manual tracking (spreadsheets, notes, file names) breaks down at scale. You forget to update the sheet. You overwrite a file. You can't remember if "model_v3" was before or after you changed the learning rate.

Experiment tracking solves this by automatically logging: - Parameters: Every hyperparameter and setting - Metrics: Training and validation scores - Artifacts: Models, plots, predictions - Metadata: Timestamps, run names, tags

sklab integrates with logging backends through adapters—pluggable components that translate experiment events into backend-specific API calls.

How sklab logging works¶

Every Experiment method (fit, evaluate, cross_validate, search) logs automatically when you provide a logger:

experiment.fit(X, y)
    └── with logger.start_run() as run:
            └── run.log_params(pipeline params)
            └── run.log_metrics(training metrics)
            └── run.log_model(fitted pipeline) [if enabled]
        └── (cleanup on context exit)

Without a logger (the default), nothing is logged. With a logger, everything is captured consistently across all operations.

Default: No-op logger¶

If you don't specify a logger, sklab uses a no-op that does nothing. This is useful for development and testing when you don't need tracking.

from sklearn.datasets import load_iris
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler

from sklab.experiment import Experiment

X, y = load_iris(return_X_y=True)

# No logger specified = no-op logging
experiment = Experiment(
    pipeline=Pipeline([
        ("scale", StandardScaler()),
        ("model", LogisticRegression(max_iter=200)),
    ]),
    scoring="accuracy",
    name="no-logging",
)

experiment.fit(X, y, run_name="noop-fit")
eval_result = experiment.evaluate(X, y, run_name="noop-eval")
print(eval_result.metrics)

Weights & Biases adapter¶

W&B provides cloud-based experiment tracking with rich visualization. The adapter logs everything to your W&B project.

from sklearn.datasets import load_iris
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler

from sklab.experiment import Experiment
from sklab.logging import WandbLogger

X, y = load_iris(return_X_y=True)

experiment = Experiment(
    pipeline=Pipeline([
        ("scale", StandardScaler()),
        ("model", LogisticRegression(max_iter=200)),
    ]),
    scoring="accuracy",
    logger=WandbLogger(project="sklab-demo"),
    name="wandb-demo",
)

experiment.fit(X, y, run_name="wandb-fit")
eval_result = experiment.evaluate(X, y, run_name="wandb-eval")
print(eval_result.metrics)

Concept: W&B Projects

W&B organizes runs into projects. Each run tracks one experiment execution. The project dashboard shows all runs with their parameters and metrics, making comparison easy.

Why it matters: You can filter, sort, and compare runs across days or weeks of experimentation. The web UI handles visualization so you don't have to build custom dashboards.

MLflow adapter¶

MLflow provides open-source experiment tracking with local or remote storage. Good for teams that want control over their tracking infrastructure.

from sklearn.datasets import load_iris
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler

from sklab.experiment import Experiment
from sklab.logging import MLflowLogger

X, y = load_iris(return_X_y=True)

experiment = Experiment(
    pipeline=Pipeline([
        ("scale", StandardScaler()),
        ("model", LogisticRegression(max_iter=200)),
    ]),
    scoring="accuracy",
    logger=MLflowLogger(experiment_name="sklab-demo"),
    name="mlflow-demo",
)

experiment.fit(X, y, run_name="mlflow-fit")
eval_result = experiment.evaluate(X, y, run_name="mlflow-eval")
print(eval_result.metrics)

Concept: MLflow Tracking Server

MLflow can store runs locally (default) or on a remote tracking server. Local storage is simple but team collaboration requires a server.

Why it matters: For personal projects, local MLflow "just works." For teams, deploy a tracking server to share experiments.

Decision guide: which logger to use¶

Situation	Recommendation
Quick experiments, no tracking needed	No logger (default)
Personal projects, cloud convenience	W&B
Team projects, need control over infrastructure	MLflow
Already using a specific platform	Use that platform's adapter
Need something custom	Build a custom logger

W&B vs. MLflow¶

Feature	W&B	MLflow
Hosting	Cloud (SaaS)	Self-hosted or local
Setup	Sign up, done	Install, run server (for teams)
Cost	Free tier, paid for teams	Free, open source
UI	Rich, polished	Functional, simpler
Collaboration	Built-in	Requires tracking server

Custom logger: build your own¶

Loggers are simple to build. Implement the protocol and you can log to any backend—databases, cloud storage, custom dashboards.

from contextlib import contextmanager
from dataclasses import dataclass
from typing import Any


@dataclass
class PrintLogger:
    """A logger that prints everything to stdout."""

    @contextmanager
    def start_run(self, name=None, config=None, tags=None, nested=False):
        print("start_run", name)
        if config:
            self.log_params(config)
        if tags:
            self.set_tags(tags)
        try:
            yield self
        finally:
            print("end_run")

    def log_params(self, params) -> None:
        print("params", params)

    def log_metrics(self, metrics, step=None) -> None:
        print("metrics", metrics)

    def set_tags(self, tags) -> None:
        print("tags", tags)

    def log_artifact(self, path: str, name: str | None = None) -> None:
        print("artifact", path, name)

    def log_model(self, model: Any, name: str | None = None) -> None:
        print("model", name)

Concept: The Logger Protocol

sklab uses structural typing (protocols) rather than inheritance. A logger needs start_run() as a context manager that yields an object with log_params(), log_metrics(), etc. The simplest approach is start_run() yielding self.

Why it matters: You don't need to inherit from a base class. Just implement the methods and it works.

Using the custom logger¶

from sklearn.datasets import load_iris
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler

from sklab.experiment import Experiment

X, y = load_iris(return_X_y=True)

experiment = Experiment(
    pipeline=Pipeline([
        ("scale", StandardScaler()),
        ("model", LogisticRegression(max_iter=200)),
    ]),
    scoring="accuracy",
    logger=PrintLogger(),  # Uses our custom logger
    name="custom-logger",
)

result = experiment.fit(X, y, run_name="custom-fit")

What gets logged¶

Every sklab operation logs specific data:

Method	Logged Data
`fit()`	Pipeline parameters, fit timing
`evaluate()`	Metrics, predictions (optional)
`cross_validate()`	Per-fold metrics, mean/std metrics
`search()`	All trial parameters, best params, best score

The exact data depends on your configuration—some loggers support model artifacts, others only capture metrics.

Best practices¶

Start with no logging. Get your experiment working first. Add logging when you need to compare runs.
Use consistent naming. Run names should describe the experiment: "ridge-alpha-0.1" not "test3".
Add tags for filtering. Tags like "baseline", "production-candidate", or "debugging" make it easier to find runs later.
Log early, log often. Once you have logging set up, use it for all experiments—even quick tests. You never know which one will be important.
Don't log secrets. Hyperparameters are fine. API keys and credentials are not.

Next steps¶

The Experiment Class — Full API reference
Hyperparameter Search — Track all search trials
Optuna Search — Advanced search with logging