Skip to content

Latest commit

 

History

History
78 lines (63 loc) · 2.6 KB

File metadata and controls

78 lines (63 loc) · 2.6 KB

Contributing to datalint

Thanks for your interest! datalint is built so that adding a check is small, isolated, and testable. Most contributions are a new rule.

Getting started

git clone https://github.com/didrod205/datalint.git
cd datalint
npm install
npm test            # vitest
npm run typecheck   # tsc --noEmit
npm run build       # tsup -> dist/
node dist/cli.js scan examples/messy.csv

Project layout

src/
  parse/            # delimiter detection + RFC4180 CSV/TSV parser -> Dataset
  infer.ts          # per-cell type inference + numeric/date helpers
  profile.ts        # per-column profiles (type, stats, distribution)
  rules/            # one module group per rule family (the interesting part)
  score.ts          # issues -> score & grade
  report/           # json | markdown | console renderers
  config.ts, types.ts, index.ts, loader.ts, cli.ts
tests/              # vitest specs
examples/           # clean.csv, messy.csv, config + sample reports

Adding a rule

  1. Add the rule id to RuleId and RULE_LABELS in src/types.ts.

  2. Write the rule in the relevant src/rules/*.ts (or a new file):

    import type { Issue } from "../types.js";
    import { makeIssue, type Rule, type RuleContext } from "./context.js";
    
    export const myRule: Rule = {
      id: "my-rule",
      severity: "warning",
      run(ctx: RuleContext): Issue[] {
        const issues: Issue[] = [];
        for (const p of ctx.profiles) {
          // inspect ctx.dataset.rows / p (profile) and push makeIssue(...)
        }
        return issues;
      },
    };
  3. Register it in src/rules/index.ts (order = report order).

  4. Add a test in tests/ with a tiny CSV proving it fires (and doesn't fire on clean input).

Principles

  • Deterministic. No randomness, no network, no time-dependent output. Same file must always produce the same report.
  • Nothing leaves the machine. Never add telemetry or uploads.
  • Dependency-light. Only cac and picocolors at runtime — the CSV parser is intentionally hand-rolled.
  • Actionable. Every problem finding should carry a concrete fix message.
  • Conservative. Prefer a missed edge case over a false positive; document heuristics in code comments and the README "FAQ".

Checklist before opening a PR

  • npm run typecheck passes
  • npm test passes (tests added for new behavior)
  • New rules have stable ids and a fix message
  • CHANGELOG.md updated for user-facing changes
  • Regenerated examples/sample-report.* if output changed

This project follows the Contributor Covenant.