measures

Measures — Forecast Accuracy Metrics

Function signatures

R

measures(holdout, forecast, actual, digits=NULL, benchmark=c("naive","mean"))

Python

def measures(
    holdout: np.ndarray,
    forecast: np.ndarray,
    actual: np.ndarray,
    digits: int | None = None,
    benchmark: Literal["naive", "mean"] = "naive",
) -> dict: ...

Overview

The forecast accuracy metrics are organized across three modules:

greybox.point_measures — Point forecast measures (ME, MAE, MSE, RMSE, MPE, MAPE, MASE, etc.) and convenience function (measures())
greybox.quantile_measures — Quantile scoring (pinball()) and interval scoring (mis(), smis(), rmis())
greybox.hm — Half-moment measures (hm(), ham(), asymmetry(), extremity(), cextremity()) and Mean Root Error (mre())

Point Forecast Measures

Scale-dependent measures

Measure	Formula	R Function	Python Function	Description
ME	`mean(y - f)`	`ME(actual, forecast)`	`me(actual, forecast)`	Mean Error — measures bias
MAE	`mean(\|y - f\|)`	`MAE(actual, forecast)`	`mae(actual, forecast)`	Mean Absolute Error
MSE	`mean((y - f)^2)`	`MSE(actual, forecast)`	`mse(actual, forecast)`	Mean Squared Error — penalizes large errors
RMSE	`sqrt(MSE)`	`RMSE(actual, forecast)`	`rmse(actual, forecast)`	Root MSE — same units as data

Percentage measures

Measure	Formula	R Function	Python Function	Description
MPE	`mean((y - f) / y) * 100`	`MPE(actual, forecast)`	`mpe(actual, forecast)`	Mean Percentage Error — percentage bias
MAPE	`mean(\|y - f\| / \|y\|) * 100`	`MAPE(actual, forecast)`	`mape(actual, forecast)`	Mean Absolute Percentage Error — undefined when y=0

Scaled measures

Measure	Formula	R Function	Python Function	Description
MASE	`MAE / mean(\|diff(y)\|)`	`MASE(actual, forecast)`	`mase(actual, forecast, scale)`	Mean Absolute Scaled Error (Hyndman & Koehler, 2006)
RMSSE	`sqrt(MSE / mean(diff(y)^2))`	`RMSSE(actual, forecast)`	`rmsse(actual, forecast, scale)`	Root Mean Squared Scaled Error (M5 Competition)
SAME	`\|ME\| / mean(\|diff(y)\|)`	—	`same(actual, forecast, scale)`	Scaled Absolute Mean Error — scaled bias
sMSE	`MSE / scale^2`	`sMSE(actual, forecast, scale)`	`smse(actual, forecast, scale)`	Scaled MSE (Petropoulos & Kourentzes, 2015)
sCE	`sum(y - f) / scale`	`sCE(actual, forecast, scale)`	`sce(actual, forecast, scale)`	Scaled Cumulative Error
sPIS	`sum(cumsum(f - y)) / scale`	`sPIS(actual, forecast, scale)`	`spis(actual, forecast, scale)`	Scaled Periods-In-Stock (Wallstrom & Segerstedt, 2010)

Relative measures (require benchmark forecast)

Measure	Formula	R Function	Python Function	Description
rMAE	`MAE / MAE_bench`	`rMAE(actual, forecast, bench)`	`rmae(actual, forecast, benchmark)`	Relative MAE (Davydenko & Fildes, 2013)
rRMSE	`RMSE / RMSE_bench`	`rRMSE(actual, forecast, bench)`	`rrmse(actual, forecast, benchmark)`	Relative RMSE
rAME	`\|ME\| / \|ME_bench\|`	`rAME(actual, forecast, bench)`	`rame(actual, forecast, benchmark)`	Relative Absolute Mean Error
GMRAE	`exp(mean(log(\|e\| / \|e_bench\|)))`	`GMRAE(actual, forecast, bench)`	`gmrae(actual, forecast, benchmark)`	Geometric Mean Relative Absolute Error

Quantile Measures

`pinball()` — Pinball cost function

The pinball function measures the quality of quantile or expectile forecasts.

from greybox.quantile_measures import pinball

holdout = np.array([1, 2, 3, 4, 5])
forecast = np.array([1.1, 2.0, 3.2, 3.9, 5.1])

pinball(holdout, forecast, level=0.5)    # Median pinball
pinball(holdout, forecast, level=0.975)  # Upper quantile
pinball(holdout, forecast, level=0.025)  # Lower quantile

Parameters:

Parameter	Type	Default	Description
`holdout`	`np.ndarray`	—	Actual values
`forecast`	`np.ndarray`	—	Forecasted quantile/expectile values
`level`	`float`	—	Quantile level (e.g., 0.5 for median, 0.975 for upper)
`loss`	`int`	`1`	1 = L1 (quantile loss), 2 = L2 (expectile loss)
`na_rm`	`bool`	`True`	Remove NA values

Formulas:

For quantiles (loss=1):

pinball = (1 - level) * sum(|e| * I(e <= 0)) + level * sum(|e| * I(e > 0))
where e = holdout - forecast

For expectiles (loss=2):

pinball = (1 - level) * sum(e^2 * I(e <= 0)) + level * sum(e^2 * I(e > 0))

Interval Forecast Measures

from greybox.quantile_measures import mis, smis, rmis

Measure	R Function	Python Function	Description
MIS	`MIS(actual, lower, upper, level)`	`mis(actual, lower, upper, level)`	Mean Interval Score (Gneiting & Raftery, 2007)
sMIS	`sMIS(actual, lower, upper, scale, level)`	`smis(actual, lower, upper, scale, level)`	Scaled MIS
rMIS	—	`rmis(actual, lower, upper, bench_lower, bench_upper, level)`	Relative MIS

The MIS rewards narrow intervals and penalizes when actuals fall outside:

MIS = mean(upper - lower + (2/alpha) * (lower - y) * I(y < lower) + (2/alpha) * (y - upper) * I(y > upper))
where alpha = 1 - level

Half-Moment Measures

The half-moment measures (from greybox.hm) characterize distribution asymmetry and extremity using square root transformations. They are based on the concept of the Half Central Moment (Svetunkov, Kourentzes & Svetunkov, 2023).

from greybox.hm import hm, ham, asymmetry, extremity, cextremity, mre

Function	Signature	Returns	Description
`hm`	`hm(x, center=None)`	`complex`	Half Moment — `mean(sqrt(x - C))` where C defaults to `mean(x)`
`ham`	`ham(x, center=None)`	`float`	Half Absolute Moment — `mean(sqrt(\|x - C\|))`
`asymmetry`	`asymmetry(x, center=None)`	`float`	Asymmetry coefficient — range [-1, 1], 0 = symmetric
`extremity`	`extremity(x, center=None)`	`float`	Extremity coefficient — measures tail heaviness
`cextremity`	`cextremity(x, center=None)`	`complex`	Complex Extremity — captures both magnitude and phase
`mre`	`mre(actual, forecast)`	`float`	Mean Root Error (Kourentzes, 2014) — `Re(mean(sqrt(y - f)))`

Notes:

hm() returns a complex number because sqrt(x - C) is complex when x < C.
asymmetry() is computed as 1 - Arg(hm(x)) / (pi/4). Values: 1 = all below center, 0 = symmetric, -1 = all above center.
For all functions, center defaults to mean(x) when not provided.
mre() is the real part of hm() applied to forecast errors.

Convenience Functions

`measures()` — Comprehensive evaluation with training data

# Python
from greybox.point_measures import measures

# holdout = test actuals, forecast = predictions, actual = training data
result = measures(holdout, forecast, actual_train, digits=4, benchmark="naive")
# Returns dict with: ME, MAE, MSE, MPE, MAPE, sCE, sMAE, sMSE, MASE,
#                     RMSSE, SAME, rMAE, rRMSE, rAME, asymmetry, sPIS

# R
measures(holdout, forecast, actual_train, digits=4)

Examples

Individual Measures

# R
actual <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
forecast <- c(1.1, 2.0, 3.2, 3.9, 5.1, 6.0, 7.1, 8.0, 9.2, 10.1)

MAE(actual, forecast)     # 0.12
MSE(actual, forecast)     # 0.018
MAPE(actual, forecast)    # percentage error

# Python
import numpy as np
from greybox.point_measures import mae, mse, rmse, mape, mase

actual = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
forecast = np.array([1.1, 2.0, 3.2, 3.9, 5.1, 6.0, 7.1, 8.0, 9.2, 10.1])

print(f"MAE:  {mae(actual, forecast):.4f}")
print(f"MSE:  {mse(actual, forecast):.4f}")
print(f"RMSE: {rmse(actual, forecast):.4f}")
print(f"MAPE: {mape(actual, forecast):.2f}%")
print(f"MASE: {mase(actual, forecast):.4f}")

Relative Measures with Benchmark

# Python
from greybox.point_measures import rmae, rrmse

# Compare against a naive forecast (last value repeated)
benchmark = np.full_like(forecast, actual[-1])

print(f"rMAE:  {rmae(actual, forecast, benchmark):.4f}")
print(f"rRMSE: {rrmse(actual, forecast, benchmark):.4f}")
# Values < 1 mean the forecast is better than the benchmark

Interval Score

# Python
from greybox.quantile_measures import mis

actual = np.array([1, 2, 3, 4, 5])
lower = np.array([0.5, 1.5, 2.5, 3.5, 4.5])
upper = np.array([1.5, 2.5, 3.5, 4.5, 5.5])

score = mis(actual, lower, upper, level=0.95)
print(f"MIS: {score:.4f}")

Pinball Loss

# Python
from greybox.quantile_measures import pinball

holdout = np.array([1, 2, 3, 4, 5])
forecast_median = np.array([1.1, 2.0, 3.2, 3.9, 5.1])
forecast_upper = np.array([1.5, 2.5, 3.5, 4.5, 5.5])

# Pinball at median
print(f"Pinball (median): {pinball(holdout, forecast_median, level=0.5):.4f}")
# Pinball at 97.5th percentile
print(f"Pinball (upper):  {pinball(holdout, forecast_upper, level=0.975):.4f}")

Half-Moment Analysis

# Python
import numpy as np
from greybox.hm import hm, ham, asymmetry, extremity, mre

x = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])

print(f"Half Moment:  {hm(x)}")
print(f"HAM:          {ham(x):.4f}")
print(f"Asymmetry:    {asymmetry(x):.4f}")
print(f"Extremity:    {extremity(x):.4f}")

# MRE for forecast evaluation
actual = np.array([1, 2, 3, 4, 5])
forecast = np.array([1.1, 2.0, 3.2, 3.9, 5.1])
print(f"MRE:          {mre(actual, forecast):.4f}")

Implementation Status

Measure	R	Python
ME, MAE, MSE, RMSE	Yes	Yes (`greybox.point_measures`)
MPE, MAPE	Yes	Yes (`greybox.point_measures`)
MASE, RMSSE, SAME	Yes	Yes (`greybox.point_measures`)
rMAE, rRMSE, rAME	Yes	Yes (`greybox.point_measures`)
GMRAE	Yes	Yes (`greybox.point_measures`)
sMSE, sPIS, sCE	Yes	Yes (`greybox.point_measures`)
MIS, sMIS, rMIS	Yes	Yes (via `greybox.quantile_measures`)
pinball	Yes	Yes (via `greybox.quantile_measures`)
asymmetry	Yes	Yes (via `greybox.hm`)
hm, ham	Yes	Yes (via `greybox.hm`)
extremity, cextremity	Yes	Yes (via `greybox.hm`)
MRE	Yes	Yes (via `greybox.hm`)
`measures()`	Yes	Yes

References

Hyndman, R.J. and Koehler, A.B. (2006). Another look at measures of forecast accuracy. International Journal of Forecasting, 22, pp.679-688.
Davydenko, A. and Fildes, R. (2013). Measuring Forecasting Accuracy: The Case Of Judgmental Adjustments To Sku-Level Demand Forecasts. International Journal of Forecasting, 29(3), pp.510-522.
Petropoulos, F. and Kourentzes, N. (2015). Forecast combinations for intermittent demand. Journal of the Operational Research Society, 66, pp.914-924.
Wallstrom, P. and Segerstedt, A. (2010). Evaluation of forecasting error measurements and techniques for intermittent demand. International Journal of Production Economics, 128, pp.625-636.
Gneiting, T. and Raftery, A.E. (2007). Strictly proper scoring rules, prediction, and estimation. Journal of the American Statistical Association, 102(477), pp.359-378.
Kourentzes, N. (2014). The Bias Coefficient: a new metric for forecast bias.
Svetunkov, I., Kourentzes, N. and Svetunkov, S. (2023). Half Central Moment for Data Analysis. Working Paper of Department of Management Science, Lancaster University, 2023:3, pp.1-21.
Svetunkov, I. (2017). Naughty APEs and the quest for the holy grail. https://openforecast.org/2017/07/29/naughty-apes-and-the-quest-for-the-holy-grail/

measures

Measures — Forecast Accuracy Metrics

Function signatures

R

Python

Overview

Point Forecast Measures

Scale-dependent measures

Percentage measures

Scaled measures

Relative measures (require benchmark forecast)

Quantile Measures

pinball() — Pinball cost function

Interval Forecast Measures

Half-Moment Measures

Convenience Functions

measures() — Comprehensive evaluation with training data

Examples

Individual Measures

Relative Measures with Benchmark

Interval Score

Pinball Loss

Half-Moment Analysis

Implementation Status

References

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally

`pinball()` — Pinball cost function

`measures()` — Comprehensive evaluation with training data