Skip to content

measures

Ivan Svetunkov edited this page Jun 24, 2026 · 4 revisions

Measures — Forecast Accuracy Metrics

Function signatures

R

measures(holdout, forecast, actual, digits=NULL, benchmark=c("naive","mean"))

Python

def measures(
    holdout: np.ndarray,
    forecast: np.ndarray,
    actual: np.ndarray,
    digits: int | None = None,
    benchmark: Literal["naive", "mean"] = "naive",
) -> dict: ...

Overview

The forecast accuracy metrics are organized across three modules:

  • greybox.point_measures — Point forecast measures (ME, MAE, MSE, RMSE, MPE, MAPE, MASE, etc.) and convenience function (measures())
  • greybox.quantile_measures — Quantile scoring (pinball()) and interval scoring (mis(), smis(), rmis())
  • greybox.hm — Half-moment measures (hm(), ham(), asymmetry(), extremity(), cextremity()) and Mean Root Error (mre())

Point Forecast Measures

Scale-dependent measures

Measure Formula R Function Python Function Description
ME mean(y - f) ME(actual, forecast) me(actual, forecast) Mean Error — measures bias
MAE mean(|y - f|) MAE(actual, forecast) mae(actual, forecast) Mean Absolute Error
MSE mean((y - f)^2) MSE(actual, forecast) mse(actual, forecast) Mean Squared Error — penalizes large errors
RMSE sqrt(MSE) RMSE(actual, forecast) rmse(actual, forecast) Root MSE — same units as data

Percentage measures

Measure Formula R Function Python Function Description
MPE mean((y - f) / y) * 100 MPE(actual, forecast) mpe(actual, forecast) Mean Percentage Error — percentage bias
MAPE mean(|y - f| / |y|) * 100 MAPE(actual, forecast) mape(actual, forecast) Mean Absolute Percentage Error — undefined when y=0

Scaled measures

Measure Formula R Function Python Function Description
MASE MAE / mean(|diff(y)|) MASE(actual, forecast) mase(actual, forecast, scale) Mean Absolute Scaled Error (Hyndman & Koehler, 2006)
RMSSE sqrt(MSE / mean(diff(y)^2)) RMSSE(actual, forecast) rmsse(actual, forecast, scale) Root Mean Squared Scaled Error (M5 Competition)
SAME |ME| / mean(|diff(y)|) same(actual, forecast, scale) Scaled Absolute Mean Error — scaled bias
sMSE MSE / scale^2 sMSE(actual, forecast, scale) smse(actual, forecast, scale) Scaled MSE (Petropoulos & Kourentzes, 2015)
sCE sum(y - f) / scale sCE(actual, forecast, scale) sce(actual, forecast, scale) Scaled Cumulative Error
sPIS sum(cumsum(f - y)) / scale sPIS(actual, forecast, scale) spis(actual, forecast, scale) Scaled Periods-In-Stock (Wallstrom & Segerstedt, 2010)

Relative measures (require benchmark forecast)

Measure Formula R Function Python Function Description
rMAE MAE / MAE_bench rMAE(actual, forecast, bench) rmae(actual, forecast, benchmark) Relative MAE (Davydenko & Fildes, 2013)
rRMSE RMSE / RMSE_bench rRMSE(actual, forecast, bench) rrmse(actual, forecast, benchmark) Relative RMSE
rAME |ME| / |ME_bench| rAME(actual, forecast, bench) rame(actual, forecast, benchmark) Relative Absolute Mean Error
GMRAE exp(mean(log(|e| / |e_bench|))) GMRAE(actual, forecast, bench) gmrae(actual, forecast, benchmark) Geometric Mean Relative Absolute Error

Quantile Measures

pinball() — Pinball cost function

The pinball function measures the quality of quantile or expectile forecasts.

from greybox.quantile_measures import pinball

holdout = np.array([1, 2, 3, 4, 5])
forecast = np.array([1.1, 2.0, 3.2, 3.9, 5.1])

pinball(holdout, forecast, level=0.5)    # Median pinball
pinball(holdout, forecast, level=0.975)  # Upper quantile
pinball(holdout, forecast, level=0.025)  # Lower quantile

Parameters:

Parameter Type Default Description
holdout np.ndarray Actual values
forecast np.ndarray Forecasted quantile/expectile values
level float Quantile level (e.g., 0.5 for median, 0.975 for upper)
loss int 1 1 = L1 (quantile loss), 2 = L2 (expectile loss)
na_rm bool True Remove NA values

Formulas:

For quantiles (loss=1):

pinball = (1 - level) * sum(|e| * I(e <= 0)) + level * sum(|e| * I(e > 0))
where e = holdout - forecast

For expectiles (loss=2):

pinball = (1 - level) * sum(e^2 * I(e <= 0)) + level * sum(e^2 * I(e > 0))

Interval Forecast Measures

from greybox.quantile_measures import mis, smis, rmis
Measure R Function Python Function Description
MIS MIS(actual, lower, upper, level) mis(actual, lower, upper, level) Mean Interval Score (Gneiting & Raftery, 2007)
sMIS sMIS(actual, lower, upper, scale, level) smis(actual, lower, upper, scale, level) Scaled MIS
rMIS rmis(actual, lower, upper, bench_lower, bench_upper, level) Relative MIS

The MIS rewards narrow intervals and penalizes when actuals fall outside:

MIS = mean(upper - lower + (2/alpha) * (lower - y) * I(y < lower) + (2/alpha) * (y - upper) * I(y > upper))
where alpha = 1 - level

Half-Moment Measures

The half-moment measures (from greybox.hm) characterize distribution asymmetry and extremity using square root transformations. They are based on the concept of the Half Central Moment (Svetunkov, Kourentzes & Svetunkov, 2023).

from greybox.hm import hm, ham, asymmetry, extremity, cextremity, mre
Function Signature Returns Description
hm hm(x, center=None) complex Half Moment — mean(sqrt(x - C)) where C defaults to mean(x)
ham ham(x, center=None) float Half Absolute Moment — mean(sqrt(|x - C|))
asymmetry asymmetry(x, center=None) float Asymmetry coefficient — range [-1, 1], 0 = symmetric
extremity extremity(x, center=None) float Extremity coefficient — measures tail heaviness
cextremity cextremity(x, center=None) complex Complex Extremity — captures both magnitude and phase
mre mre(actual, forecast) float Mean Root Error (Kourentzes, 2014) — Re(mean(sqrt(y - f)))

Notes:

  • hm() returns a complex number because sqrt(x - C) is complex when x < C.
  • asymmetry() is computed as 1 - Arg(hm(x)) / (pi/4). Values: 1 = all below center, 0 = symmetric, -1 = all above center.
  • For all functions, center defaults to mean(x) when not provided.
  • mre() is the real part of hm() applied to forecast errors.

Convenience Functions

measures() — Comprehensive evaluation with training data

# Python
from greybox.point_measures import measures

# holdout = test actuals, forecast = predictions, actual = training data
result = measures(holdout, forecast, actual_train, digits=4, benchmark="naive")
# Returns dict with: ME, MAE, MSE, MPE, MAPE, sCE, sMAE, sMSE, MASE,
#                     RMSSE, SAME, rMAE, rRMSE, rAME, asymmetry, sPIS
# R
measures(holdout, forecast, actual_train, digits=4)

Examples

Individual Measures

# R
actual <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
forecast <- c(1.1, 2.0, 3.2, 3.9, 5.1, 6.0, 7.1, 8.0, 9.2, 10.1)

MAE(actual, forecast)     # 0.12
MSE(actual, forecast)     # 0.018
MAPE(actual, forecast)    # percentage error
# Python
import numpy as np
from greybox.point_measures import mae, mse, rmse, mape, mase

actual = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
forecast = np.array([1.1, 2.0, 3.2, 3.9, 5.1, 6.0, 7.1, 8.0, 9.2, 10.1])

print(f"MAE:  {mae(actual, forecast):.4f}")
print(f"MSE:  {mse(actual, forecast):.4f}")
print(f"RMSE: {rmse(actual, forecast):.4f}")
print(f"MAPE: {mape(actual, forecast):.2f}%")
print(f"MASE: {mase(actual, forecast):.4f}")

Relative Measures with Benchmark

# Python
from greybox.point_measures import rmae, rrmse

# Compare against a naive forecast (last value repeated)
benchmark = np.full_like(forecast, actual[-1])

print(f"rMAE:  {rmae(actual, forecast, benchmark):.4f}")
print(f"rRMSE: {rrmse(actual, forecast, benchmark):.4f}")
# Values < 1 mean the forecast is better than the benchmark

Interval Score

# Python
from greybox.quantile_measures import mis

actual = np.array([1, 2, 3, 4, 5])
lower = np.array([0.5, 1.5, 2.5, 3.5, 4.5])
upper = np.array([1.5, 2.5, 3.5, 4.5, 5.5])

score = mis(actual, lower, upper, level=0.95)
print(f"MIS: {score:.4f}")

Pinball Loss

# Python
from greybox.quantile_measures import pinball

holdout = np.array([1, 2, 3, 4, 5])
forecast_median = np.array([1.1, 2.0, 3.2, 3.9, 5.1])
forecast_upper = np.array([1.5, 2.5, 3.5, 4.5, 5.5])

# Pinball at median
print(f"Pinball (median): {pinball(holdout, forecast_median, level=0.5):.4f}")
# Pinball at 97.5th percentile
print(f"Pinball (upper):  {pinball(holdout, forecast_upper, level=0.975):.4f}")

Half-Moment Analysis

# Python
import numpy as np
from greybox.hm import hm, ham, asymmetry, extremity, mre

x = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])

print(f"Half Moment:  {hm(x)}")
print(f"HAM:          {ham(x):.4f}")
print(f"Asymmetry:    {asymmetry(x):.4f}")
print(f"Extremity:    {extremity(x):.4f}")

# MRE for forecast evaluation
actual = np.array([1, 2, 3, 4, 5])
forecast = np.array([1.1, 2.0, 3.2, 3.9, 5.1])
print(f"MRE:          {mre(actual, forecast):.4f}")

Implementation Status

Measure R Python
ME, MAE, MSE, RMSE Yes Yes (greybox.point_measures)
MPE, MAPE Yes Yes (greybox.point_measures)
MASE, RMSSE, SAME Yes Yes (greybox.point_measures)
rMAE, rRMSE, rAME Yes Yes (greybox.point_measures)
GMRAE Yes Yes (greybox.point_measures)
sMSE, sPIS, sCE Yes Yes (greybox.point_measures)
MIS, sMIS, rMIS Yes Yes (via greybox.quantile_measures)
pinball Yes Yes (via greybox.quantile_measures)
asymmetry Yes Yes (via greybox.hm)
hm, ham Yes Yes (via greybox.hm)
extremity, cextremity Yes Yes (via greybox.hm)
MRE Yes Yes (via greybox.hm)
measures() Yes Yes

References

  • Hyndman, R.J. and Koehler, A.B. (2006). Another look at measures of forecast accuracy. International Journal of Forecasting, 22, pp.679-688.
  • Davydenko, A. and Fildes, R. (2013). Measuring Forecasting Accuracy: The Case Of Judgmental Adjustments To Sku-Level Demand Forecasts. International Journal of Forecasting, 29(3), pp.510-522.
  • Petropoulos, F. and Kourentzes, N. (2015). Forecast combinations for intermittent demand. Journal of the Operational Research Society, 66, pp.914-924.
  • Wallstrom, P. and Segerstedt, A. (2010). Evaluation of forecasting error measurements and techniques for intermittent demand. International Journal of Production Economics, 128, pp.625-636.
  • Gneiting, T. and Raftery, A.E. (2007). Strictly proper scoring rules, prediction, and estimation. Journal of the American Statistical Association, 102(477), pp.359-378.
  • Kourentzes, N. (2014). The Bias Coefficient: a new metric for forecast bias.
  • Svetunkov, I., Kourentzes, N. and Svetunkov, S. (2023). Half Central Moment for Data Analysis. Working Paper of Department of Management Science, Lancaster University, 2023:3, pp.1-21.
  • Svetunkov, I. (2017). Naughty APEs and the quest for the holy grail. https://openforecast.org/2017/07/29/naughty-apes-and-the-quest-for-the-holy-grail/

Clone this wiki locally