Skip to content

EmpiricML Logo

EmpiricML

Python Version License Status

EmpiricML is an open-source Python framework designed to bring the rigor of empirical science to the Machine Learning development process. Are you tired of scattered Jupyter Notebooks and untracked experiments? EmpiricML provides a structured "Laboratory" environment to help you move from messy scripts to reproducible science.

The Philosophy: ML as an Empirical Science

The core idea behind EmpiricML is that building a machine learning model is an iterative, scientific process. You form a hypothesis (e.g., "Adding these specific features will decrease the error"), and you must test it in a controlled environment. EmpiricML provides that environment through the Lab class. It encapsulates everything needed for rigorous ML experimentation:

  • Train and test data management
  • Cross-validation strategies
  • Evaluation metrics
  • Standardized criteria for comparing models

Key Features

Experiment Tracking

Keep a detailed ledger of every run. EmpiricML automatically stores:

  • Metric performance and overfitting percentages
  • Training and inference latency
  • Generated predictions for downstream analysis

Polars-Native Pipelines

Performance is at the heart of EmpiricML. Unlike scikit-learn pipelines which are NumPy-based, EmpiricML transformations utilize Polars LazyFrames. This allows for lightning-fast, memory-efficient data handling even with large datasets.

Automated Workflows

Stop writing boilerplate code for standard tasks. EmpiricML automates:

  • Hyperparameter Optimization (HPO)
  • Feature Importance calculation
  • Automated Feature Selection

Rigorous Model Comparison

Compare experiments with statistical confidence. Define comparison criteria in your Lab class based on:

Performance Thresholds: Does Model B outperform Model A by a significant margin? Statistical Tests: Use built-in tests to ensure your improvements aren't just noise

EmpiricML can automatically update and store your "Best Model" based on these predefined rules.

Fast ML Baselines

Go from zero to a leaderboard in seconds. With just a few lines of code, you can evaluate up to 10 baseline models (including LightGBM, XGBoost, Random Forest, MLP, and more) to establish a performance floor for your project.

Multi-Metric Evaluation

Evaluate models on multiple metrics simultaneously. Define a list of metrics and the Lab will track each one independently, requiring improvement on all metrics before considering a model as better. Supports per-metric minimize/maximize configuration and multi-metric HPO.

Early Stopping

Aborts unpromising experiments early to save compute resources.

Checkpointing

Save/Restore your Lab state to pause and resume work seamlessly.