Skip to content

[RFC]: Develop a Project Test Runner for stdlib #220

@Kartikeya-guthub

Description

@Kartikeya-guthub

[RFC]: Develop a Project Test Runner for stdlib


Your Background

Full name
Kartikeya Sharma

University status
Yes

University name
ABES Engineering College

University program
B.Tech in Computer Science Engineering

Expected graduation
2027

Short biography
I am a Computer Science undergraduate focused on backend systems and developer tooling. My primary stack is JavaScript and Node.js, with experience building systems that emphasize correctness, failure handling, and deterministic behavior.

I am particularly interested in infrastructure-level problems — testing systems, execution semantics, and reliability under edge cases. I have contributed to open-source projects and worked through review cycles while adapting to project conventions.

Timezone
India Standard Time (IST, UTC+5:30)

Contact details
email: kartikeya9917@gmail.com, github: https://github.com/Kartikeya-guthub


Programming Experience

Platform
Windows

Editor
VSCode — because of its rich extension ecosystem, integrated terminal, and excellent JavaScript/Node.js debugging support.

Programming experience
I have experience building backend systems and developer tooling in Node.js, with a focus on correctness, deterministic execution, and failure handling.

My work involves designing event-driven systems with explicit control over execution flow, implementing idempotent processing for at-least-once delivery guarantees, and ensuring consistency under retries and partial failures. I have handled concurrency control, state transitions, and ordering constraints in distributed systems.

I am familiar with managing asynchronous execution in Node.js, including event loop behavior, timers, and callback-based workflows. I have worked with systems that require precise lifecycle control and predictable outcomes despite asynchronous operations.

I have also developed CLI-based tooling and modular codebases with clear separation between execution, processing, and reporting layers. This includes designing structured pipelines, validating inputs, and producing deterministic outputs suitable for automation.

Across my work, I focus on:

  • deterministic execution under asynchronous conditions
  • lifecycle management and completion guarantees
  • failure modeling, retries, and idempotency
  • structured system design with explicit execution flow

These capabilities directly align with building a minimal, deterministic test runner with strict lifecycle control and reliable execution semantics.

JavaScript experience
I primarily use JavaScript for backend and tooling work. My favorite feature is its event-driven, non-blocking I/O model which makes it well-suited for building test runners and execution queues. My least favorite feature is the implicit type coercion — especially in equality comparisons — which can produce subtle, hard-to-debug bugs.

Node.js experience
I use Node.js for building CLI tools, backend systems, and scripting. I am familiar with the module system, the event loop, streams, process lifecycle, and child process management — all of which are directly relevant to building a test runner.

C/Fortran experience
Limited experience. I have read and understood C-level code within stdlib (such as native add-ons and benchmark files) while navigating the codebase, but my primary language is JavaScript/Node.js.

Interest in stdlib
I am interested in stdlib due to its focus on correctness, minimalism, and well-structured utilities. While exploring the codebase and contributing, I found the testing and benchmarking infrastructure particularly interesting — especially @stdlib/bench/harness. Building a minimal, internally owned test runner that replaces external dependencies aligns directly with stdlib's design philosophy and my interest in systems-level infrastructure.

Version control
Yes

Contributions to stdlib

  • PR — Fix JavaScript lint errors in benchmark (ztest2)Merged
    chore: fix JavaScript lint errors (issue #11193) stdlib#11199
    This contribution helped me understand repository structure, linting conventions, and the contribution review workflow. It also gave me direct exposure to how stdlib test and benchmark files are structured — which informs my design decisions for this proposal.

stdlib showcase

  • Event Metrics Analyzer
    https://github.com/Kartikeya-guthub/event-metrics-analyzer-stdlib
    A system that simulates backend request latency and failures and computes statistical insights using stdlib modules. Demonstrates real-world usage of stdlib in a system-oriented context, integrating multiple stdlib utilities for statistical computation and data analysis.

Project Description

Goals

Project Abstract:

Design and implement a minimal, stdlib-owned test runner to replace tape — preserving existing test behavior while enabling incremental, automated migration with no additional runtime dependencies.

The proposed system exposes two entry points:

  • @stdlib/test/compat — a compatibility layer for incremental migration from tape
  • @stdlib/test/runner — the final minimal test runner API

This separation enables a smooth transition without requiring immediate refactoring of existing test files.

The runner will provide TAP-compatible output along with structured reporting suitable for CI integration.

Project Size: Large (350 hours)


Understanding of stdlib Test Usage

Before designing the API, I analyzed stdlib test files to understand what is actually used. The findings directly inform the assertion surface.

Assertions observed in practice:

Assertion Frequency
t.equal Very common
t.deepEqual Common
t.ok Common
t.throws Occasional
t.end Every test

Key observations:

  • The assertion surface is intentionally small. Based on auditing representative stdlib packages, most tests rely on only 3–4 assertion types.
  • t.strictEqual is redundant given how t.equal behaves in stdlib usage — it will not be included.
  • Messages are often repetitive ("returns expected value"). Making msg optional with a sensible default reduces boilerplate without changing semantics.
  • String construction in assertions often uses manual concatenation — this should be normalized to @stdlib/string/format for consistency.

Proposed Assertion Surface

Supported:

t.ok( value, msg )
t.equal( actual, expected, msg )
t.deepEqual( actual, expected, msg )
t.throws( fn, msg )
t.end()

Explicitly excluded:

  • t.strictEqual — redundant, not aligned with stdlib usage patterns
  • t.plan — adds unnecessary coupling between test intent and count
  • Nested tests — not present in stdlib test files
  • Hooks (beforeEach / afterEach) — out of scope for minimal runner
  • Parallel execution — determinism is a hard requirement

Developer Experience Improvements

  1. Optional Default Messages — Most stdlib tests pass "returns expected value" as the message. The runner will make msg optional and supply a default internally:
// Current (tape)
t.equal( actual, expected, 'returns expected value' );

// With runner — message is optional
t.equal( actual, expected );
// internally defaults to: 'assertion <N> passed'
  1. String Formatting via @stdlib/string/format — Assertion error messages currently use manual concatenation. The runner's internal error formatting will use @stdlib/string/format for consistency:
format( 'expected %s, got %s', expected, actual )

Before / After Example

BEFORE — using tape:

var tape = require( 'tape' );

tape( 'returns expected value', function test( t ) {
    var actual = add( 2, 3 );
    t.equal( actual, 5, 'returns expected value' );
    t.end();
});

AFTER — using stdlib runner:

var test = require( '@stdlib/test/runner' );

test( 'returns expected value', function test( t ) {
    var actual = add( 2, 3 );
    t.equal( actual, 5 );
    t.end();
});

Migration impact:

  • Replace require( 'tape' ) with require( '@stdlib/test/runner' )
  • Message argument on assertions becomes optional (can be removed or left as-is)
  • No structural changes required to test logic or lifecycle

Technical Design

Execution Model — The runner maintains an internal test queue. Execution is strictly sequential and deterministic by design.

┌──────────────┐
│  test files  │  ← test() calls register entries
└──────┬───────┘
       │
       ▼
┌──────────────┐
│  test queue  │  ← FIFO, push on register
└──────┬───────┘
       │
       ▼
┌──────────────────────────────────┐
│  executor                        │
│  - dequeue one test              │
│  - invoke callback               │
│  - await t.end() or timeout      │
│  - collect pass/fail results     │
│  - move to next                  │
└──────┬───────────────────────────┘
       │
       ▼
┌──────────────┐
│  reporter    │  ← TAP-compatible output
└──────┬───────┘
       │
       ▼
┌──────────────┐
│  exit(code)  │  ← 0 if all pass, 1 if any fail
└──────────────┘

Lifecycle Per Test:

  1. Registertest( name, fn ) pushes to queue
  2. Start — executor dequeues and calls fn( t )
  3. Assert — assertions record pass/fail, do not throw by default
  4. Endt.end() marks test complete; executor moves to next
  5. Timeout — if t.end() is never called within threshold, test is marked failed

Async Handling — The runner wraps each test callback in a controlled executor. Async tests work naturally since execution does not advance until t.end() is called:

test( 'async test', function test( t ) {
    setTimeout( function() {
        t.equal( result, expected );
        t.end(); // runner only proceeds after this
    }, 100 );
});

Runner Components:

  1. Core Engine (runner/lib/runner.js) — Maintains test queue, enforces sequential execution, manages test lifecycle
  2. Assertion Layer (runner/lib/assert.js) — Minimal API surface, non-throwing by default, optional message with internal default
  3. Reporter (runner/lib/reporter.js) — TAP-compatible output, CI-friendly exit codes, deterministic line ordering
  4. CLI (runner/bin/cli.js) — runner test.js or runner 'test/**/*.js'

Migration Strategy

Step 1 — Compatibility Shim: A thin shim maps the tape API to the runner API, allowing files to run under the new runner without modification.

Step 2 — Codemod Automation: AST-based codemod script automates transformations:

  • require( 'tape' )require( '@stdlib/test/runner' )
  • t.strictEqual( a, b, m )t.equal( a, b, m )

Step 3 — Dual-Run Validation: Each migrated package is validated by running its test suite under both tape and the new runner and comparing pass/fail results, assertion counts, and output format.

Step 4 — Incremental Package Migration: Low-risk packages first, high-complexity packages last.

Step 5 — Tape Removal: Once full parity is confirmed, tape is removed from package.json dependencies.

Risk Mitigation:

Risk Mitigation
Behavioral mismatch between tape and runner Dual-run validation before each package migration
Migration scale (many test files) Codemod automation reduces manual effort
CI breakage TAP-compatible output format preserved; exit codes unchanged
Async edge cases Timeout handling catches missing t.end() calls

Why this project?

While exploring stdlib and working on contributions, I observed that reliance on tape and external scripts introduces friction when enforcing consistent testing patterns across the repository.

Building an in-house runner aligns with stdlib's design philosophy and improves long-term maintainability. The project is interesting because it combines systems design, testing infrastructure, and real-world migration at scale — all areas I actively work in.


Qualifications

  • Strong JavaScript and Node.js background with focus on backend systems and developer tooling
  • Hands-on experience with stdlib's codebase through contributions and exploration
  • Familiarity with test runner internals: execution queues, assertion semantics, lifecycle management, TAP output format
  • Experience with AST-based code transformation tooling (relevant for codemod automation)
  • Direct exposure to stdlib test and benchmark file structure through my open PR

Prior art

Runner Style Notes
tape Minimal, stream-based Current dependency; being replaced
jest Integrated, feature-rich Too heavy for stdlib's philosophy
mocha Flexible, plugin-based External dependencies, not minimal
@stdlib/bench/harness Internal, minimal Direct inspiration for design approach

The stdlib runner follows tape's minimal philosophy while being internally owned, convention-aligned, and free of external runtime dependencies — similar in spirit to @stdlib/bench/harness.


Commitment

  • Availability: 30–40 hours/week throughout the GSoC period
  • No conflicting commitments during the program
  • Active communication on Zulip and GitHub
  • Available before the coding period for design review and early feedback

Schedule

Assuming a 12 week schedule,

  • Community Bonding Period: Deep-dive into stdlib test file patterns across packages. Finalize runner API design with mentor feedback. Set up repository scaffold and project structure. Read relevant documentation and review @stdlib/bench/harness implementation.

  • Week 1: Analyze test patterns across stdlib packages. Finalize runner API design and project structure. Set up repository scaffold.

  • Week 2: Implement core execution engine — test queue, sequential executor, lifecycle management (t.end() tracking, timeout handling).

  • Week 3: Implement assertion layer (ok, equal, deepEqual, throws, end). Implement compatibility shim. Write unit tests for the runner itself.

  • Week 4: Implement CLI and test discovery. Validate runner against representative packages. Compare output with tape execution.

  • Week 5: Stabilize TAP output format and exit semantics. Begin dual-run validation across selected test suites.

  • Week 6 (midterm): Midterm deliverables — working test runner with complete execution lifecycle, full assertion surface implemented, compatibility shim for drop-in tape replacement, CLI-based execution working, verified behavioral parity on sample packages.

  • Week 7: Expand compatibility coverage. Resolve behavioral differences found during dual-run. Begin migration of low-risk packages.

  • Week 8: Continue package-by-package migration. Integrate runner into repository scripts and CI. Validate in CI environments.

  • Week 9: Continue migration. Handle edge cases and async test patterns discovered during migration.

  • Week 10: Complete migration of remaining packages. Remove tape from standard unit testing workflows.

  • Week 11: Code freeze — focus on completing tests and documentation. Finalize migration notes and codemod tooling guide.

  • Week 12: Final cleanup, documentation review, and buffer for any remaining issues.

  • Final Week: Submit project. Write final report. Ensure all PRs are merged or in review.


Related issues


Checklist

  • I have read and understood the Code of Conduct.
  • I have read and understood the application materials found in this repository.
  • I understand that plagiarism will not be tolerated, and I have authored this application in my own words.
  • I have read and understood the patch requirement which is necessary for my application to be considered for acceptance.
  • I have read and understood the stdlib showcase requirement which is necessary for my application to be considered for acceptance.
  • The issue name begins with [RFC]: and succinctly describes my proposal.
  • I understand that, in order to apply to be a GSoC contributor, I must submit my final application to https://summerofcode.withgoogle.com/ before the submission deadline.

Metadata

Metadata

Assignees

No one assigned

    Labels

    20262026 GSoC proposal.received feedbackA proposal which has received feedback.rfcProject proposal.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions