Skip to content

Logger Adapters: Tracking Your Experiments

What you'll learn:

  • Why experiment tracking matters for reproducibility
  • How sklab's logging works with MLflow and W&B
  • When to use each logging backend
  • How to build custom loggers for other backends

Prerequisites: The Experiment Class, basic understanding of ML workflows.

The problem: experiments are easy to lose

You run 50 experiments over two weeks. Some use different hyperparameters, some use different preprocessing, some use different data splits. At the end, you know one worked well—but which one? What were its settings?

Manual tracking (spreadsheets, notes, file names) breaks down at scale. You forget to update the sheet. You overwrite a file. You can't remember if "model_v3" was before or after you changed the learning rate.

Experiment tracking solves this by automatically logging: - Parameters: Every hyperparameter and setting - Metrics: Training and validation scores - Artifacts: Models, plots, predictions - Metadata: Timestamps, run names, tags

sklab integrates with logging backends through adapters—pluggable components that translate experiment events into backend-specific API calls.


How sklab logging works

Every Experiment method (fit, evaluate, cross_validate, search) logs automatically when you provide a logger:

experiment.fit(X, y)
    └── with logger.start_run() as run:
            └── run.log_params(pipeline params)
            └── run.log_metrics(training metrics)
            └── run.log_model(fitted pipeline) [if enabled]
        └── (cleanup on context exit)

Without a logger (the default), nothing is logged. With a logger, everything is captured consistently across all operations.


Default: No-op logger

If you don't specify a logger, sklab uses a no-op that does nothing. This is useful for development and testing when you don't need tracking.

from sklearn.datasets import load_iris
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler

from sklab.experiment import Experiment

X, y = load_iris(return_X_y=True)

# No logger specified = no-op logging
experiment = Experiment(
    pipeline=Pipeline([
        ("scale", StandardScaler()),
        ("model", LogisticRegression(max_iter=200)),
    ]),
    scoring="accuracy",
    name="no-logging",
)

experiment.fit(X, y, run_name="noop-fit")
eval_result = experiment.evaluate(X, y, run_name="noop-eval")
print(eval_result.metrics)

Weights & Biases adapter

W&B provides cloud-based experiment tracking with rich visualization. The adapter logs everything to your W&B project.

from sklearn.datasets import load_iris
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler

from sklab.experiment import Experiment
from sklab.logging import WandbLogger

X, y = load_iris(return_X_y=True)

experiment = Experiment(
    pipeline=Pipeline([
        ("scale", StandardScaler()),
        ("model", LogisticRegression(max_iter=200)),
    ]),
    scoring="accuracy",
    logger=WandbLogger(project="sklab-demo"),
    name="wandb-demo",
)

experiment.fit(X, y, run_name="wandb-fit")
eval_result = experiment.evaluate(X, y, run_name="wandb-eval")
print(eval_result.metrics)

Concept: W&B Projects

W&B organizes runs into projects. Each run tracks one experiment execution. The project dashboard shows all runs with their parameters and metrics, making comparison easy.

Why it matters: You can filter, sort, and compare runs across days or weeks of experimentation. The web UI handles visualization so you don't have to build custom dashboards.


MLflow adapter

MLflow provides open-source experiment tracking with local or remote storage. Good for teams that want control over their tracking infrastructure.

from sklearn.datasets import load_iris
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler

from sklab.experiment import Experiment
from sklab.logging import MLflowLogger

X, y = load_iris(return_X_y=True)

experiment = Experiment(
    pipeline=Pipeline([
        ("scale", StandardScaler()),
        ("model", LogisticRegression(max_iter=200)),
    ]),
    scoring="accuracy",
    logger=MLflowLogger(experiment_name="sklab-demo"),
    name="mlflow-demo",
)

experiment.fit(X, y, run_name="mlflow-fit")
eval_result = experiment.evaluate(X, y, run_name="mlflow-eval")
print(eval_result.metrics)

Concept: MLflow Tracking Server

MLflow can store runs locally (default) or on a remote tracking server. Local storage is simple but team collaboration requires a server.

Why it matters: For personal projects, local MLflow "just works." For teams, deploy a tracking server to share experiments.


Decision guide: which logger to use

Situation Recommendation
Quick experiments, no tracking needed No logger (default)
Personal projects, cloud convenience W&B
Team projects, need control over infrastructure MLflow
Already using a specific platform Use that platform's adapter
Need something custom Build a custom logger

W&B vs. MLflow

Feature W&B MLflow
Hosting Cloud (SaaS) Self-hosted or local
Setup Sign up, done Install, run server (for teams)
Cost Free tier, paid for teams Free, open source
UI Rich, polished Functional, simpler
Collaboration Built-in Requires tracking server

Custom logger: build your own

Loggers are simple to build. Implement the protocol and you can log to any backend—databases, cloud storage, custom dashboards.

from contextlib import contextmanager
from dataclasses import dataclass
from typing import Any


@dataclass
class PrintLogger:
    """A logger that prints everything to stdout."""

    @contextmanager
    def start_run(self, name=None, config=None, tags=None, nested=False):
        print("start_run", name)
        if config:
            self.log_params(config)
        if tags:
            self.set_tags(tags)
        try:
            yield self
        finally:
            print("end_run")

    def log_params(self, params) -> None:
        print("params", params)

    def log_metrics(self, metrics, step=None) -> None:
        print("metrics", metrics)

    def set_tags(self, tags) -> None:
        print("tags", tags)

    def log_artifact(self, path: str, name: str | None = None) -> None:
        print("artifact", path, name)

    def log_model(self, model: Any, name: str | None = None) -> None:
        print("model", name)

Concept: The Logger Protocol

sklab uses structural typing (protocols) rather than inheritance. A logger needs start_run() as a context manager that yields an object with log_params(), log_metrics(), etc. The simplest approach is start_run() yielding self.

Why it matters: You don't need to inherit from a base class. Just implement the methods and it works.

Using the custom logger

from sklearn.datasets import load_iris
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler

from sklab.experiment import Experiment

X, y = load_iris(return_X_y=True)

experiment = Experiment(
    pipeline=Pipeline([
        ("scale", StandardScaler()),
        ("model", LogisticRegression(max_iter=200)),
    ]),
    scoring="accuracy",
    logger=PrintLogger(),  # Uses our custom logger
    name="custom-logger",
)

result = experiment.fit(X, y, run_name="custom-fit")

What gets logged

Every sklab operation logs specific data:

Method Logged Data
fit() Pipeline parameters, fit timing
evaluate() Metrics, predictions (optional)
cross_validate() Per-fold metrics, mean/std metrics
search() All trial parameters, best params, best score

The exact data depends on your configuration—some loggers support model artifacts, others only capture metrics.


Best practices

  1. Start with no logging. Get your experiment working first. Add logging when you need to compare runs.

  2. Use consistent naming. Run names should describe the experiment: "ridge-alpha-0.1" not "test3".

  3. Add tags for filtering. Tags like "baseline", "production-candidate", or "debugging" make it easier to find runs later.

  4. Log early, log often. Once you have logging set up, use it for all experiments—even quick tests. You never know which one will be important.

  5. Don't log secrets. Hyperparameters are fine. API keys and credentials are not.

Next steps