Test Cases and Validation

🧪 Test Cases & Statistical Validation

StegX maintains a rigorous continuous integration test suite and formal statistical benchmarks to validate both functional correctness and steganographic invisibility. This page documents the testing methodology, the mathematical definitions behind each statistical test, and the concrete results.

1. Automated Test Suite

1.1 Test Categories

The pytest suite is organized into four directories:

Directory	Scope	What It Validates
`tests/unit/`	Individual functions	KDF output correctness, GF(2⁸) arithmetic, header pack/unpack round-trips, compression codec identity, SecureBuffer zeroization
`tests/integration/`	End-to-end pipeline	Full encode → decode cycle across PNG, BMP, TIFF, WebP; all embedding methods; all compression codecs
`tests/security/`	Adversarial scenarios	HMAC corruption detection, wrong-password rejection, brute-force timing validation, panic destruction completeness
`tests/system/`	CLI interface	Argument parsing, exit codes, stdin/stdout piping, shell completion validation

1.2 Running the Suite

pip install -r requirements/dev.txt
python -m pytest tests/ -v --tb=short

1.3 Key Test Vectors

AEAD Forgery Detection: The test suite intentionally corrupts individual bits within the AEAD authentication tag and verifies that decrypt_data() raises AuthenticationFailure rather than returning corrupted plaintext. This validates resistance against Chosen Ciphertext Attacks (CCA2).

Deterministic Payload Recovery: Payloads of varying sizes (1 byte, 1 KB, 100 KB, 1 MB, 50 MB) are encoded with each combination of:

Embedding method: LSB Matching, LSB Replacement, Matrix Hamming
Compression codec: zstd, brotli, lzma, zlib, bz2, none
Cost map: Laplacian, HILL, disabled
Cipher mode: single (AES-GCM), dual (AES-GCM + ChaCha20)

After decoding, the output is verified byte-for-byte against the original using SHA-256 digest comparison.

Argon2id Timing Validation: The test measures the wall-clock time of derive_master_key() with the default parameters and asserts that it exceeds a minimum threshold (e.g., 50ms), confirming that the memory-hard computation is actually being performed and not short-circuited.

Shamir Round-Trip: Random payloads are split into $N$ shares with threshold $K$. The test verifies:

Any $K$ shares reconstruct the original payload exactly.
Any $K-1$ shares fail to produce the correct payload (information-theoretic security).
Shares with inconsistent thresholds or duplicate x-coordinates are rejected.

Panic Destruction: A stego image is created, then destroy_real_region_in_place() is called. The test verifies:

The real region's LSBs have been overwritten with random data.
The decoy region remains intact and extractable (in decoy mode).
The original stego file has been atomically replaced.
The shred command was invoked on the original (Linux only).

SecureBuffer Zeroization: A SecureBuffer is created with known key material. After .close(), the test reads the underlying bytearray and confirms every byte is zero.

2. Formal Statistical Steganalysis

2.1 Chi-Square ($\chi^2$) Analysis

Purpose: Detect Pairs of Values (PoV) artifacts caused by LSB substitution.

Background: In a natural image, pixel values $2k$ and $2k+1$ (e.g., 120 and 121) occur with naturally varying frequencies. Classical LSB substitution forces these pairs toward equal frequency because it randomly sets the LSB to 0 or 1 with equal probability, regardless of the original value. This equalization is the PoV anomaly.

Mathematical Definition:

$$\chi^2 = \sum_{i=0}^{127} \frac{(n_{2i} - n_{2i+1})^2}{n_{2i} + n_{2i+1}}$$

where $n_v$ is the observed frequency of pixel value $v$ in a single color channel.

Under the null hypothesis (no steganography), pixel value pairs have naturally unequal frequencies, producing a low $\chi^2$. Under LSB substitution, pairs are forced toward equality, producing a high $\chi^2$.

StegX Benchmark Results:

Tool	Embedding Method	$\chi^2$ Score	Detection
Steghide	Sequential LSB	119,531.0	Detected
StegX (standard)	Non-Linear LSB Matching	4,209.3	Borderline
StegX (`--extreme`)	Matrix Hamming	1,187.78	Undetected
Clean image (control)	None	1,024.6	Baseline

Analysis: StegX with Matrix Embedding produces a $\chi^2$ value within the natural variance of unmodified images. This is because:

Matrix Embedding modifies only $\frac{n}{n+1}$ of blocks (≈87.5%), and within those, only 1 bit per block.
LSB Matching uses ±1 perturbation rather than forced replacement, avoiding the PoV artifact entirely.
Adaptive cost-map filtering restricts embedding to high-texture regions where pixel value distributions are already noisy.

2.2 Shannon Entropy ($H$)

Purpose: Detect regions of suspiciously uniform randomness that indicate encrypted data.

Mathematical Definition:

$$H(X) = -\sum_{i=0}^{255} P(x_i) \log_2 P(x_i)$$

where $P(x_i)$ is the probability of pixel value $x_i$ occurring in the analyzed region.

Properties:

A completely uniform distribution (all 256 values equally likely) yields $H_{\max} = \log_2(256) = 8.0$ bits/byte.
Natural images exhibit $H \in [6.5, 7.8]$ depending on texture complexity.
Encrypted ciphertext exhibits $H \approx 7.99$ — near-perfect randomness.

The Attack Vector: Forensic tools scan the image in sliding windows. If a localized region of a flat sky (expected $H \approx 4.0$) suddenly shows $H \approx 7.99$, the presence of encrypted steganographic data is confirmed.

How StegX Defeats This:

StegX's Laplacian/HILL cost maps exclude flat regions entirely. Data is embedded only in high-texture areas where the natural entropy is already $H \geq 7.0$. The injection of $H \approx 7.99$ data into a region with $H \approx 7.5$ is statistically indistinguishable from natural sensor noise.

Furthermore, Matrix Embedding modifies so few bits ($R_m \approx 0.29$) that the overall entropy shift is negligible:

$$\Delta H \leq R_m \cdot \frac{1}{C} \approx \frac{0.29}{3.7 \times 10^6} \approx 7.8 \times 10^{-8} \text{ bits/pixel}$$

This is orders of magnitude below the measurement precision of any steganalysis tool.

2.3 Structural Similarity Index (SSIM)

Purpose: Quantify visual degradation between the original cover image and the stego image, accounting for human visual perception.

Mathematical Definition:

$$\text{SSIM}(x, y) = \frac{(2\mu_x \mu_y + C_1)(2\sigma_{xy} + C_2)}{(\mu_x^2 + \mu_y^2 + C_1)(\sigma_x^2 + \sigma_y^2 + C_2)}$$

where:

$\mu_x, \mu_y$ are the mean pixel intensities of the original and stego image patches
$\sigma_x^2, \sigma_y^2$ are the variances
$\sigma_{xy}$ is the covariance
$C_1 = (K_1 L)^2$, $C_2 = (K_2 L)^2$ are stabilization constants ($L = 255$ for 8-bit images, $K_1 = 0.01$, $K_2 = 0.03$)

Interpretation:

$\text{SSIM} = 1.0$: Identical images
$\text{SSIM} \geq 0.99$: Visually indistinguishable
$\text{SSIM} < 0.95$: Noticeable artifacts

StegX Benchmark:

Configuration	Payload Size	SSIM
LSB Matching, Laplacian	10 KB in 1920×1080	0.999987
LSB Matching, Laplacian	100 KB in 1920×1080	0.99994
Matrix Hamming, HILL	10 KB in 1920×1080	0.999998
Matrix Hamming, HILL	100 KB in 1920×1080	0.99997

All configurations maintain $\text{SSIM} > 0.9999$, confirming zero perceptible visual degradation.

2.4 Peak Signal-to-Noise Ratio (PSNR)

Purpose: Complementary metric to SSIM, measuring the ratio of maximum possible signal power to noise power.

Mathematical Definition:

$$\text{PSNR} = 10 \cdot \log_{10}\left(\frac{255^2}{\text{MSE}}\right) \text{ dB}$$

where MSE is the Mean Squared Error between original and stego images:

$$\text{MSE} = \frac{1}{W \cdot H \cdot C} \sum_{x,y,c} (I_{\text{original}}(x,y,c) - I_{\text{stego}}(x,y,c))^2$$

Since LSB modifications change pixel values by at most ±1, the maximum MSE is bounded by $R_m$:

$$\text{MSE}_{\max} = R_m \cdot 1^2 = R_m$$

For Matrix Embedding at full capacity ($R_m \approx 0.125$ per pixel):

$$\text{PSNR}_{\min} = 10 \cdot \log_{10}\left(\frac{65{,}025}{0.125}\right) \approx 57.2 \text{ dB}$$

PSNR values above 50 dB are considered imperceptible. StegX consistently exceeds this threshold.

3. Resistance to Automated Steganalysis Tools

Tool	Technique	Result Against StegX
stegseek	Brute-force + Steghide format detection	Fails completely (StegX uses a different container format and Argon2id KDF)
zsteg	LSB analysis, PoV detection, entropy scanning	No patterns found (Non-Linear embedding + adaptive filtering)
binwalk	Signature scanning, entropy analysis	Clean output (encrypted data has no recognizable signatures)
exiftool	Metadata inspection	Metadata clean (StegX strips all EXIF/PNG metadata on save)

4. Comparative Brute-Force Resistance

4.1 Methodology

A stego image is created with a known 8-character password. A brute-force attack is simulated by calling derive_master_key() in a loop with random passwords and measuring throughput.

4.2 Results

Tool	KDF	Iterations/Memory	Passwords/sec (single core)	Time to crack 8-char alphanumeric
Steghide	MD5	1	20,000,000+	< 1 second
OpenStego	PBKDF2-SHA256	1,000	500,000+	~2.8 hours
StegX (PBKDF2 mode)	PBKDF2-SHA256	600,000	~830	~2,700 years
StegX (default)	Argon2id	t=3, m=64MB	~9	~$7.9 \times 10^{10}$ years

The Argon2id configuration makes dictionary attacks and even targeted brute-force attacks computationally infeasible against passwords with reasonable entropy.

5. Advanced Steganalysis Resistance (Statistical + ML + CNN)

5.1 Methodology

A comprehensive steganalysis resistance evaluation was conducted using 10 independent detection methods spanning three categories: classical statistical attacks, information-theoretic similarity measures, and machine-learning/deep-learning classifiers.

Test Environment:

Component	Local (CPU)	Cloud (GPU)
Hardware	Intel CPU, 16GB RAM	Google Colab, Tesla T4 16GB VRAM
Dataset	30 image pairs per mode	500 image pairs
Payload	512 bytes random binary	256 bytes random binary
Image size	512×512 RGB PNG	256×256 RGB PNG
Script	`tests/steganalysis/run_full_steganalysis.py`	`tests/steganalysis/colab_cnn_steganalysis.py`

Embedding modes tested:

Standard: LSB matching (±1) with PRNG-shuffled pixel positions
Adaptive: Laplacian cost-map filtered embedding (high-edge regions only)
Matrix: F5-style Hamming(7,3) matrix embedding (0.29 modifications per message bit)
Adaptive + Matrix: Combined mode (strongest configuration)

Data integrity controls:

All cover images are procedurally generated with controlled randomness (seeded PRNG) for reproducibility
Cover/stego pairs from the same source image are never split across train/test partitions (GroupKFold for ML, image-level indexing for CNN), eliminating data leakage
Balanced dataset: 50% cover, 50% stego in every experiment
Test set contains only unseen images — no image appears in both training and evaluation

5.2 Classical Steganalysis Attacks

Four classical statistical attacks were applied independently to each cover and stego image. Detection significance was assessed via two-sided Mann-Whitney U tests comparing the distributions of cover statistics against stego statistics. A p-value above 0.05 indicates no statistically significant difference (undetected).

Chi-Square (χ²) Analysis: Measures the deviation of Pairs of Values (PoV) from expected uniformity in the LSB plane. Classic LSB replacement creates detectable asymmetry; LSB matching (±1) eliminates it.

RS Analysis: Classifies pixel blocks into Regular, Singular, and Unusable groups under positive and negative flipping masks. A discrepancy between R and S group counts reveals hidden data.

Sample Pair Analysis (SPA): Estimates embedding rate by counting close pixel pairs (|p₁ − p₂| ≤ 1) and comparing observed ratios to theoretical baselines.

Shannon Entropy Deviation: Computes per-channel Shannon entropy H = −Σ pᵢ log₂(pᵢ) and measures the absolute difference between cover and stego.

Test	Metric	Standard	Adaptive	Matrix	Adaptive+Matrix
Chi-Square	p-value	1.000	1.000	1.000	1.000
	Verdict	UNDETECTED	UNDETECTED	UNDETECTED	UNDETECTED
RS Analysis	p-value	0.959	0.751	0.994	0.784
	Verdict	UNDETECTED	UNDETECTED	UNDETECTED	UNDETECTED
Sample Pair	p-value	0.888	0.871	0.906	0.929
	Verdict	UNDETECTED	UNDETECTED	UNDETECTED	UNDETECTED
Entropy	Δ	0.000264	0.000249	0.000166	0.000150
	Verdict	UNDETECTED	UNDETECTED	UNDETECTED	UNDETECTED

All p-values far exceed the 0.05 significance threshold. The chi-square p-value of 1.000 confirms that LSB matching completely eliminates the PoV asymmetry exploited by classical detectors.

5.3 Statistical Indistinguishability (Image Quality)

Three information-theoretic metrics were computed between each cover/stego pair to quantify pixel-level distortion.

PSNR (Peak Signal-to-Noise Ratio): Measures pixel-level fidelity. PSNR > 50 dB is considered imperceptible to human vision.

SSIM (Structural Similarity Index): Measures structural degradation. SSIM > 0.999 indicates no perceivable structural change.

KL Divergence: Measures the statistical distance between the pixel-value probability distributions of the cover and stego images. KL < 0.001 indicates the distributions are effectively identical.

Metric	Standard	Adaptive	Matrix	Adaptive+Matrix	Threshold
PSNR	72.28 dB	72.30 dB	74.20 dB	74.23 dB	> 50 dB
SSIM	0.999998	0.999998	0.999999	0.999999	> 0.999
KL Divergence	7.04×10⁻⁶	8.30×10⁻⁶	4.09×10⁻⁶	4.52×10⁻⁶	< 0.001
Verdict	IMPERCEPTIBLE	IMPERCEPTIBLE	IMPERCEPTIBLE	IMPERCEPTIBLE	—

All metrics exceed their imperceptibility thresholds by orders of magnitude. Matrix Embedding mode achieves the highest PSNR (74+ dB) and lowest KL divergence due to its reduced per-bit modification rate.

5.4 ML Classifier Resistance (SRM Features + GroupKFold)

Feature extraction: SRM-like (Spatial Rich Model) features were extracted from each image, including first-order and second-order residual histograms (horizontal and vertical), Laplacian residual histograms, LSB statistics, and PoV ratios — yielding a high-dimensional feature vector per image.

Classifiers:

Random Forest: 200 trees, max depth 10
Gradient Boosting: 100 trees, max depth 5

Cross-validation: 5-Fold GroupKFold, where the group key is the source image index. This guarantees that a cover image and its corresponding stego image are always in the same fold, preventing the classifier from memorizing image-specific textures rather than steganographic artifacts.

Classifier	Standard	Adaptive	Matrix	Adaptive+Matrix
Random Forest	50.0% ± 5.3%	55.0% ± 14.5%	50.0% ± 5.3%	48.3% ± 6.2%
Gradient Boosting	48.3% ± 9.7%	51.7% ± 3.3%	50.0% ± 5.3%	55.0% ± 4.1%
Verdict	UNDETECTED	UNDETECTED	UNDETECTED	UNDETECTED

All accuracies cluster around 50% (random guessing baseline), confirming that the SRM feature space contains no exploitable signal for distinguishing StegX stego images from clean covers.

5.5 CNN Deep Learning Resistance (SRNet on GPU)

Model: SRNet (Deep Residual Network for Steganalysis), a purpose-built CNN architecture for spatial-domain image steganalysis. 11,511,234 trainable parameters.

Dataset: 500 cover images + 500 stego images (Adaptive + Matrix Embedding mode). Split at the image level: 350 pairs for training, 75 pairs for validation, 75 pairs for testing. Zero overlap between splits was programmatically verified.

Training configuration:

Parameter	Value
GPU	NVIDIA Tesla T4 (16GB VRAM)
Optimizer	Adam (lr = 2×10⁻⁴, weight decay = 10⁻⁵)
LR Schedule	StepLR (step=25, γ=0.5)
Epochs	60
Batch size	16
Loss	CrossEntropyLoss
Train augmentation	RandomCrop(256), RandomHorizontalFlip, RandomVerticalFlip
Test transform	CenterCrop(256)

Results:

Metric	Value	Interpretation
Test Accuracy	50.0%	Equivalent to random guessing
AUC-ROC	0.4981	No discriminative power (0.5 = random)
Best Val Accuracy	51.3%	No meaningful improvement over 60 epochs
Final Train Loss	0.693	= ln(2), theoretical minimum for random binary classification
Val Accuracy Trend	Flat at 50%	Model failed to learn any steganographic signal
Data Leakage	0 pairs	Image-level split verified

The validation accuracy remained locked at 50.0% across all 60 training epochs. The training loss converged to ln(2) ≈ 0.693, which is the information-theoretic minimum for a binary classifier making uniformly random predictions. These results confirm that even a state-of-the-art deep learning steganalysis architecture, trained specifically on StegX output, is unable to extract any distinguishing features from the embedded images.

5.6 Reproducing the Tests

Local statistical + ML tests (CPU, ~20 minutes):

pip install scikit-learn scipy numpy pillow
cd StegX
python tests/steganalysis/run_full_steganalysis.py --num-images 30 --modes standard adaptive matrix adaptive_matrix

CNN deep learning test (Colab GPU, ~2-3 hours):

Open Google Colab
Upload tests/steganalysis/colab_cnn_steganalysis.py
Set Runtime → GPU (T4)
Run All

5.7 Summary

Category	Methods	Verdict
Classical Steganalysis	Chi-Square, RS Analysis, Sample Pair Analysis, Entropy Deviation	All UNDETECTED
Image Quality	PSNR, SSIM, KL Divergence	All IMPERCEPTIBLE / INDISTINGUISHABLE
Machine Learning	Random Forest + Gradient Boosting (SRM features, GroupKFold)	All ~50% accuracy (random)
Deep Learning	SRNet CNN (11.5M params, 60 epochs, T4 GPU)	50.0% accuracy, AUC 0.498

All 10 detection methods across 4 embedding modes returned UNDETECTED verdicts, confirming that StegX v2.0 achieves statistical invisibility across the full spectrum of known classical, machine-learning, and deep-learning steganalysis techniques.

📖 StegX v2.0 Wiki

User Guide

Technical Reference

Validation

Test Cases & Statistics

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Test Cases and Validation

🧪 Test Cases & Statistical Validation

1. Automated Test Suite

1.1 Test Categories

1.2 Running the Suite

1.3 Key Test Vectors

2. Formal Statistical Steganalysis

2.1 Chi-Square ($\chi^2$) Analysis

2.2 Shannon Entropy ($H$)

2.3 Structural Similarity Index (SSIM)

2.4 Peak Signal-to-Noise Ratio (PSNR)

3. Resistance to Automated Steganalysis Tools

4. Comparative Brute-Force Resistance

4.1 Methodology

4.2 Results

5. Advanced Steganalysis Resistance (Statistical + ML + CNN)

5.1 Methodology

5.2 Classical Steganalysis Attacks

5.3 Statistical Indistinguishability (Image Quality)

5.4 ML Classifier Resistance (SRM Features + GroupKFold)

5.5 CNN Deep Learning Resistance (SRNet on GPU)

5.6 Reproducing the Tests

5.7 Summary

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

📖 StegX v2.0 Wiki

Clone this wiki locally