Skip to content
Ivan Svetunkov edited this page Jun 29, 2026 · 10 revisions

greybox — Toolbox for Model Building and Forecasting

Overview

greybox is a toolbox for regression model building, selection, and forecasting evaluation. It provides the Augmented Linear Model (ALM) — a flexible regression framework supporting 26 distributions and 7 loss functions — along with stepwise selection, model combination, rolling origin cross-validation, and a comprehensive suite of forecast accuracy measures.

The package exists in two implementations:

  • R package (CRAN) — the mature original, actively maintained
  • Python package (PyPI) — a port covering core functionality with a scikit-learn-compatible API

Navigation aids

  • Glossary — overloaded terms (distribution values vs d*() functions, loss vs ic, lags meanings, R $ vs Python .).
  • Roadmap — what is R-only or not yet ported to Python.
  • R-Python-differences — numerical-parity status and intentional convention/interface differences.
  • Installation — installation instructions.
  • Resources — academic references and DOIs.
  • llms.txt — machine-readable index of every wiki page.

Installation

See details on Installation page.

R, From CRAN (Recommended)

The easiest way to install Greybox in R is from CRAN:

install.packages("greybox")

Python, From PyPI

The easiest way to install Greybox in Python is from PyPI:

pip install greybox

Quick Start

R

install.packages("greybox")
library(greybox)

# Fit a model
model <- alm(y ~ x1 + x2, data=mydata, distribution="dnorm")
summary(model)

# Stepwise selection
best <- stepwise(mydata, ic="AICc", distribution="dnorm")

# Forecast evaluation
ro_result <- ro(y, h=5, origins=10,
                call="predict(alm(y~1, data=data), h=5)")

Python

pip install greybox
import numpy as np
from greybox import ALM, formula, stepwise

# Fit a model
y, X = formula("y ~ x1 + x2", data)
model = ALM(distribution="dnorm")
model.fit(X, y)
print(model.summary())

# Stepwise selection
best = stepwise(data, ic="AICc", distribution="dnorm")

# Forecast evaluation
from greybox.measures import measures
result = measures(actual, forecast, insample)

Documentation

Page Description
EDA Exploratory data analysis — Seasonality/Trend/Irregular decomposition via stick()
ALM Augmented Linear Model — core estimator, 26 distributions, 7 loss functions
stepwise Forward stepwise variable selection
CALM Combination of ALM (model averaging)
distributions Distribution families: d/p/q/r functions
measures Forecast accuracy metrics (point, interval, quantile, half-moment)
manipulations Variable transformations: xreg functions
association Measures of association: correlation, partial correlation, determination
rolling_origin Rolling origin cross-validation for time series
diagnostics Outlier detection and model diagnostics
Smoothers Non-parametric smoothers: LOWESS and SuperSmoother
AID Automatic Identification of Demand — six demand categories, stockout detection
RMCB Regression for Multiple Comparison with the Best — Nemenyi/MCB test for comparing forecasting methods

Python version also includes the standard dataset from the R stats package called mtcars, which is a pandas data frame. It can be imported in Python via:

from greybox import mtcars

R vs Python Implementation Comparison

Exploratory Data Analysis

Feature R Function Python Function Status
STI decomposition (Seasonality/Trend/Irregular) stick() stick() Implemented

Model Fitting

Feature R Function Python Function Status
Augmented Linear Model alm() ALM().fit() Implemented
Scale Model sm() R only
Bootstrap coefficients coefbootstrap() R only

Model Selection

Feature R Function Python Function Status
Stepwise selection stepwise() stepwise() Implemented
Model combination calm() CALM() Implemented

Prediction

Feature R Function Python Function Status
Predict / Forecast predict() / forecast() ALM.predict() Implemented

Forecast Evaluation

Feature R Function Python Function Status
Rolling origin ro() rolling_origin() Implemented
RMCB test rmcb() rmcb() Implemented
Point measures (16) ME(), MAE(), MSE(), etc. me(), mae(), mse(), etc. Implemented
Interval measures MIS(), sMIS() mis(), smis() Implemented
Half-moment measures hm(), ham(), asymmetry(), etc. hm(), ham(), asymmetry(), etc. Implemented
Pinball loss pinball() pinball() Implemented
measures() measures() measures() Implemented

Measures of Association

Feature R Function Python Function Status
Association association() association() Implemented
Cramer's V cramer() R only
Partial correlation pcor() pcor() Implemented
Multiple correlation mcor() mcor() Implemented
Determination determination() determination() Implemented

Distributions

Feature R Function Python Function Status
26 distributions d/p/q/r functions d/p/q/r functions Implemented
Three-param lognormal dtplnorm() etc. R only

Feature Engineering

Feature R Function Python Function Status
Variable expansion xregExpander() xreg_expander() Implemented
Transformations xregTransformer() xreg_transformer() Implemented
Cross-products xregMultiplier() xreg_multiplier() Implemented
Temporal dummies temporalDummy() temporal_dummy() Implemented
Outlier dummies outlierdummy() outlier_dummy() Implemented

Smoothers

Feature R Function Python Function Status
LOWESS lowess() (from stats) lowess() Implemented
SuperSmoother supsmu() (from stats) supsmu() Implemented

Information Criteria

Feature R Function Python Function Status
AIC / AICc / BIC / BICc AIC(), AICc(), BIC(), BICc() ALM.aic, .aicc, .bic, .bicc Implemented (as properties)
Point IC pointLik(), pAIC(), pAICc(), pBIC() point_lik(), point_lik_cumulative() Partial

Visualization

Feature R Function Python Function Status
Graph maker graphmaker() R only
Spread plot spread() R only
Table plot tableplot() R only

Demand Analysis

Feature R Function Python Function Status
Demand identification (single series) aid() aid() Implemented
Demand identification (many series) aidCat() aid_cat() Implemented

Utilities

Feature R Function Python Function Status
DST detection detectdst() R only
Leap year detection detectleap() R only
Polynomial products polyprod() R only
DSR bootstrap dsrboot() R only

Naming Conventions

R uses camelCase, Python uses snake_case:

R Python
alm() ALM().fit()
calm() CALM()
xregExpander() xreg_expander()
xregTransformer() xreg_transformer()
xregMultiplier() xreg_multiplier()
temporalDummy() temporal_dummy()
outlierdummy() outlier_dummy()
pointLik() point_lik()
ro() rolling_origin()
aidCat() aid_cat()

References

Clone this wiki locally