This document describes the Welch's t-test functions available in the quantpolars package.
The package provides two main functions for performing Welch's t-tests on Polars DataFrames:
one_t- Test if a sample mean differs from a hypothesized valuetwo_t- Compare means between two samples (with two different modes)
Both functions support:
- Directional and two-sided tests
- Group-by functionality for stratified analysis
- LazyFrame input
- Automatic handling of null values
one_t(
df: Union[pl.DataFrame, pl.LazyFrame],
column: str,
mu: float = 0.0,
alternative: Literal["two-sided", "greater", "less"] = "two-sided",
group_by: Optional[Union[str, list[str]]] = None,
) -> pl.DataFrame- df: Polars DataFrame or LazyFrame containing the data
- column: Name of the column to test
- mu: Hypothesized population mean (default: 0.0)
- alternative: Direction of the test
"two-sided": Test if mean ≠ mu"greater": Test if mean > mu"less": Test if mean < mu
- group_by: Optional column name(s) to group by before testing
A Polars DataFrame with columns:
n: Sample sizemean: Sample meanstd: Sample standard deviationt_statistic: Welch's t-statisticdf: Degrees of freedomp_value: P-value for the testalternative: Direction of testsignificant_at_0.05: Boolean indicator (1.0 or 0.0)
import polars as pl
from quantpolars import one_t
df = pl.DataFrame({"values": [1.2, 2.3, 1.8, 2.1, 1.9]})
# Test if mean differs from 2.0
result = one_t(df, column="values", mu=2.0)
print(result)# Test if mean is greater than 1.5
result = one_t(
df,
column="values",
mu=1.5,
alternative="greater"
)df = pl.DataFrame({
"score": [75, 82, 78, 90, 88, 92],
"class": ["A", "A", "A", "B", "B", "B"]
})
# Test each class separately
result = one_t(
df,
column="score",
mu=80.0,
group_by="class"
)two_t(
df: Union[pl.DataFrame, pl.LazyFrame],
column1: str,
column2: Optional[str] = None,
group_column: Optional[str] = None,
alternative: Literal["two-sided", "greater", "less"] = "two-sided",
group_by: Optional[Union[str, list[str]]] = None,
) -> pl.DataFrame- df: Polars DataFrame or LazyFrame containing the data
- column1: Name of first column (or value column in grouping mode)
- column2: Name of second column (required in two-columns mode)
- group_column: Column defining groups (required in grouping mode, must have exactly 2 unique values)
- alternative: Direction of the test
"two-sided": Test if mean1 ≠ mean2"greater": Test if mean1 > mean2"less": Test if mean1 < mean2
- group_by: Optional column name(s) to group by before testing
Compare values in two different columns (e.g., before/after measurements).
df = pl.DataFrame({
"before": [100, 105, 98, 102],
"after": [105, 110, 102, 108]
})
result = two_t(
df,
column1="after",
column2="before",
alternative="greater"
)Compare two groups defined by a grouping column (e.g., A/B testing).
df = pl.DataFrame({
"conversion": [0.12, 0.15, 0.11, 0.16, 0.14, 0.17],
"variant": ["Control", "Treatment", "Control", "Treatment", "Control", "Treatment"]
})
result = two_t(
df,
column1="conversion",
group_column="variant"
)A Polars DataFrame with columns:
group1,group2: Group labels (only in grouping mode)n1,n2: Sample sizesmean1,mean2: Sample meansstd1,std2: Sample standard deviationst_statistic: Welch's t-statisticdf: Welch-Satterthwaite degrees of freedomp_value: P-value for the testalternative: Direction of testsignificant_at_0.05: Boolean indicator (1.0 or 0.0)
import polars as pl
import numpy as np
from quantpolars import two_t
# Simulate A/B test data
np.random.seed(42)
df = pl.DataFrame({
"revenue": np.concatenate([
np.random.normal(50, 10, 500), # Control
np.random.normal(55, 10, 500) # Treatment
]),
"variant": ["Control"] * 500 + ["Treatment"] * 500
})
result = two_t(
df,
column1="revenue",
group_column="variant",
alternative="two-sided"
)
print(result)# Test across multiple segments
df = pl.DataFrame({
"revenue": [...],
"variant": [...],
"segment": ["Mobile", "Desktop", ...]
})
result = two_t(
df,
column1="revenue",
group_column="variant",
group_by="segment"
)Test if manufacturing measurements meet specifications:
df = pl.DataFrame({
"measurement_mm": [100.5, 100.2, 100.8, 99.9, 100.3],
"machine": ["A", "A", "A", "A", "A"]
})
result = one_t(
df,
column="measurement_mm",
mu=100.0,
group_by="machine"
)Compare treatment effectiveness:
result = two_t(
clinical_data,
column1="recovery_time",
group_column="treatment_group",
alternative="less" # New treatment should reduce recovery time
)Evaluate campaign performance:
result = two_t(
campaign_data,
column1="click_rate",
group_column="campaign_version",
group_by=["region", "device_type"]
)- Welch's T-Test: Does not assume equal variances between groups (unlike Student's t-test)
- Degrees of Freedom: Uses Welch-Satterthwaite equation for more accurate results
- Null Handling: Automatically excludes null values from calculations
- Minimum Sample Size: Requires at least 2 non-null observations per group
- Use LazyFrame for large datasets
- Use
group_byto perform multiple tests in a single pass - The functions are optimized for Polars' columnar operations
A comprehensive demo script is included:
python demo_ttest.pyThis demonstrates all major features with realistic examples.