-
Notifications
You must be signed in to change notification settings - Fork 8
diagnostics
Ivan Svetunkov edited this page Jun 24, 2026
·
3 revisions
outlierdummy(object, level=0.999, type=c("rstandard","rstudent"), ...)def outlier_dummy(
model,
level: float = 0.999,
type: Literal["rstandard", "rstudent"] = "rstandard",
) -> OutlierResult: ...The diagnostics module provides functions for detecting outliers in fitted regression models. The outlier_dummy() function identifies observations that lie outside the expected distribution bounds and creates dummy variables that can be used to re-estimate the model with outlier effects removed.
# R — function is in the greybox namespace
library(greybox)# Python
from greybox.diagnostics import outlier_dummy, OutlierResult- Extract residuals from the fitted model
- Compute standardised (
rstandard) or studentised (rstudent) residuals using leverage (hat values) - Determine critical bounds based on the model's distribution family and the specified confidence level
- Flag observations whose standardised residuals fall outside the bounds
- Return a matrix of dummy variables (one column per outlier)
| Parameter | R | Python | Type | Default | Description |
|---|---|---|---|---|---|
| model/object | object |
model |
alm / ALM
|
— | Fitted alm model |
| level | level |
level |
numeric / float
|
0.999 |
Confidence level for outlier detection |
| type | type |
type |
character / str
|
"rstandard" |
Residual type: "rstandard" or "rstudent"
|
| Field | R | Python | Type | Description |
|---|---|---|---|---|
| outliers | $outliers |
.outliers |
matrix / np.ndarray or None
|
Matrix of dummy variables (or NULL if no outliers) |
| statistic | $statistic |
.statistic |
numeric / np.ndarray
|
Critical values used for detection |
| id | $id |
.id |
vector / np.ndarray
|
Indices of outlier observations |
| level | $level |
.level |
numeric / float
|
Confidence level used |
| type | $type |
.type |
character / str
|
Residual type used |
| errors | $errors |
.errors |
vector / np.ndarray
|
Standardised/studentised residuals |
# R
library(greybox)
x <- rnorm(100)
y <- 2 * x + rnorm(100)
y[50] <- 100 # inject outlier
model <- alm(y ~ x, distribution="dnorm")
result <- outlierdummy(model, level=0.999)
print(result$id) # Which observations are outliers
print(result$outliers) # Dummy variable matrix# Python
import numpy as np
from greybox.alm import ALM
from greybox.formula import formula
from greybox.diagnostics import outlier_dummy
np.random.seed(42)
x = np.random.randn(100)
y = 2 * x + np.random.randn(100)
y[50] = 100 # inject outlier
data = {"y": y, "x": x}
y_vec, X = formula("y ~ x", data)
model = ALM(distribution="dnorm")
model.fit(X, y_vec)
result = outlier_dummy(model, level=0.999)
print(f"Outlier indices: {result.id}")
print(f"Critical bounds: {result.statistic}")# R — add outlier dummies to the model
if (!is.null(result$outliers)) {
model2 <- alm(y ~ x + result$outliers, data=data, distribution="dnorm")
print(paste("Original scale:", model$scale))
print(paste("With dummies:", model2$scale))
}# Python — add outlier dummies to the model
if result.outliers is not None:
X_with_dummies = np.column_stack([X, result.outliers])
model2 = ALM(distribution="dnorm")
model2.fit(X_with_dummies, y_vec)
print(f"Original scale: {model.scale:.4f}")
print(f"With dummies: {model2.scale:.4f}")# R — rstudent is more sensitive to single outliers
result_student <- outlierdummy(model, level=0.999, type="rstudent")
print(result_student$id) # Outliers# Python — rstudent is more sensitive to single outliers
result_student = outlier_dummy(model, level=0.999, type="rstudent")
print(f"Outliers (rstudent): {result_student.id}")| R Function | Python Function |
|---|---|
outlierdummy() |
outlier_dummy() |
- Cook, R.D. and Weisberg, S. (1982). Residuals and Influence in Regression. Chapman and Hall.
- Svetunkov, I. (2023). Statistics for Business Analytics. https://openforecast.org/sba/