diagnostics

Diagnostics — Outlier Detection

Function signatures

R

outlierdummy(object, level=0.999, type=c("rstandard","rstudent"), ...)

Python

def outlier_dummy(
    model,
    level: float = 0.999,
    type: Literal["rstandard", "rstudent"] = "rstandard",
) -> OutlierResult: ...

Overview

The diagnostics module provides functions for detecting outliers in fitted regression models. The outlier_dummy() function identifies observations that lie outside the expected distribution bounds and creates dummy variables that can be used to re-estimate the model with outlier effects removed.

Import

# R — function is in the greybox namespace
library(greybox)

# Python
from greybox.diagnostics import outlier_dummy, OutlierResult

Algorithm

Extract residuals from the fitted model
Compute standardised (rstandard) or studentised (rstudent) residuals using leverage (hat values)
Determine critical bounds based on the model's distribution family and the specified confidence level
Flag observations whose standardised residuals fall outside the bounds
Return a matrix of dummy variables (one column per outlier)

`outlier_dummy` — Detect Outliers and Create Dummy Variables

Parameters

Parameter	R	Python	Type	Default	Description
model/object	`object`	`model`	`alm` / `ALM`	—	Fitted `alm` model
level	`level`	`level`	`numeric` / `float`	`0.999`	Confidence level for outlier detection
type	`type`	`type`	`character` / `str`	`"rstandard"`	Residual type: `"rstandard"` or `"rstudent"`

Return Value

Field	R	Python	Type	Description
outliers	`$outliers`	`.outliers`	`matrix` / `np.ndarray` or `None`	Matrix of dummy variables (or `NULL` if no outliers)
statistic	`$statistic`	`.statistic`	`numeric` / `np.ndarray`	Critical values used for detection
id	`$id`	`.id`	`vector` / `np.ndarray`	Indices of outlier observations
level	`$level`	`.level`	`numeric` / `float`	Confidence level used
type	`$type`	`.type`	`character` / `str`	Residual type used
errors	`$errors`	`.errors`	`vector` / `np.ndarray`	Standardised/studentised residuals

Examples

Basic Outlier Detection

# R
library(greybox)

x <- rnorm(100)
y <- 2 * x + rnorm(100)
y[50] <- 100  # inject outlier

model <- alm(y ~ x, distribution="dnorm")
result <- outlierdummy(model, level=0.999)
print(result$id)       # Which observations are outliers
print(result$outliers)  # Dummy variable matrix

# Python
import numpy as np
from greybox.alm import ALM
from greybox.formula import formula
from greybox.diagnostics import outlier_dummy

np.random.seed(42)
x = np.random.randn(100)
y = 2 * x + np.random.randn(100)
y[50] = 100  # inject outlier

data = {"y": y, "x": x}
y_vec, X = formula("y ~ x", data)

model = ALM(distribution="dnorm")
model.fit(X, y_vec)

result = outlier_dummy(model, level=0.999)
print(f"Outlier indices: {result.id}")
print(f"Critical bounds: {result.statistic}")

Re-estimating with Outlier Dummies

# R — add outlier dummies to the model
if (!is.null(result$outliers)) {
  model2 <- alm(y ~ x + result$outliers, data=data, distribution="dnorm")
  print(paste("Original scale:", model$scale))
  print(paste("With dummies:", model2$scale))
}

# Python — add outlier dummies to the model
if result.outliers is not None:
    X_with_dummies = np.column_stack([X, result.outliers])
    model2 = ALM(distribution="dnorm")
    model2.fit(X_with_dummies, y_vec)
    print(f"Original scale: {model.scale:.4f}")
    print(f"With dummies:   {model2.scale:.4f}")

Using Studentised Residuals

# R — rstudent is more sensitive to single outliers
result_student <- outlierdummy(model, level=0.999, type="rstudent")
print(result_student$id)  # Outliers

# Python — rstudent is more sensitive to single outliers
result_student = outlier_dummy(model, level=0.999, type="rstudent")
print(f"Outliers (rstudent): {result_student.id}")

R vs Python Function Names

R Function	Python Function
`outlierdummy()`	`outlier_dummy()`

References

Cook, R.D. and Weisberg, S. (1982). Residuals and Influence in Regression. Chapman and Hall.
Svetunkov, I. (2023). Statistics for Business Analytics. https://openforecast.org/sba/

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

diagnostics

Diagnostics — Outlier Detection

Function signatures

R

Python

Overview

Import

Algorithm

`outlier_dummy` — Detect Outliers and Create Dummy Variables

Parameters

Return Value

Examples

Basic Outlier Detection

Re-estimating with Outlier Dummies

Using Studentised Residuals

R vs Python Function Names

References

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally

diagnostics

Diagnostics — Outlier Detection

Function signatures

R

Python

Overview

Import

Algorithm

outlier_dummy — Detect Outliers and Create Dummy Variables

Parameters

Return Value

Examples

Basic Outlier Detection

Re-estimating with Outlier Dummies

Using Studentised Residuals

R vs Python Function Names

References

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally

`outlier_dummy` — Detect Outliers and Create Dummy Variables