Skip to content

YauheniPo/ci-tests-manager

Repository files navigation

Test Filter Service

Service for managing disabled E2E tests in a CI/CD pipeline. CI calls POST /resolve before running E2E tests and receives a list of tests to skip.

Run

docker compose up --build                        # first start / rebuild
docker compose down && docker compose up --build # restart

down preserves MongoDB data (named volume). The -v flag removes the volume together with its data.

Service: http://localhost:3000 | Swagger UI: http://localhost:3000/docs

Local development (code on the host, Mongo in Docker)

npm run docker:mongo:dev   # single docker-compose.yml: only the mongo-dev service (local-mongo profile)
npm run dev                # or: npm run dev:local - first Mongo, then dev server

Manually: docker compose up -d mongo-dev. A regular docker compose up does not touch the mongo-dev service (it is behind the local-mongo profile). Do not run docker compose --profile local-mongo up -d without a service list - it will start both the full stack (mongo1...app) and mongo-dev.

By default, src/config.ts uses mongodb://localhost:27017, so a separate .env file is not required. Example variables are in .env.example (set them in your shell or through your IDE mechanism).

To stop only the dev Mongo: npm run docker:mongo:dev:down (data in the mongo_dev_data volume is preserved; to fully reset the volume: docker volume rm ...).


Testing

npm test                  # unit tests (Jest + supertest, MongoDB is mocked)
npm run test:coverage     # unit + coverage (threshold: 100%)
npm run test:integration  # integration tests (mongodb-memory-server, Docker not required)
npm run test:all          # unit + integration

Build script (npm run build): format:check -> test:coverage -> swagger -> tsc. The Docker build fails if the code is not formatted or coverage is below 100%.


API

GET /health

200: { "status": "ok", "mongo": "connected" } | 503: { "status": "degraded", "mongo": "unavailable" }

POST /filters - disable a test

{ "testId": "TP000001", "targetBranch": "MASTER", "reason": "Flaky, ticket JIRA-1234", "author": "john.doe" }

201 - created. 409 - duplicate testId + targetBranch.

GET /filters?targetBranch=MASTER

Returns all filters for a branch. Optionally supports &reason=JIRA-1234 (regexp).

GET /filters/:id | DELETE /filters/:id

Get / delete a filter by ObjectId. DELETE returns 204.

PATCH /filters/:id - update reason / author

{ "reason": "Updated: fix deployed, still flaky", "author": "jane.doe" }

testId and targetBranch cannot be changed - that would be a different filter.

POST /resolve - for CI/CD

{ "targetBranch": "MASTER" }
{ "targetBranch": "MASTER", "disabledTests": ["TP000001", "TP000007"], "resolvedAt": "..." }

Branch logic: MASTER -> filters for MASTER + PROD. PROD -> only PROD.

On MongoDB error: 200, "disabledTests": [], "fallback": true (fail-open).

Error format

{ "error": "Filter not found" }
{ "error": "Validation failed", "details": [{ "field": "testId", "message": "Required" }] }

Architectural decisions

1. CI <-> service contract

/resolve returns an array of testId values. CI builds the runner CLI arguments on its own.

The alternative - returning a ready-made grep pattern - would couple the service to Mocha and break the contract if the runner changes.

2. Integration with Mocha Runner

CI converts the array into --grep "TP000001|TP000007" --invert. An empty disabledTests means all tests run without extra flags.

RESPONSE=$(curl -s -w "\n%{http_code}" -X POST http://test-filter-service/resolve \
  -H "Content-Type: application/json" -d "{\"targetBranch\": \"$TARGET_BRANCH\"}")
HTTP_STATUS=$(echo "$RESPONSE" | tail -n1)
BODY=$(echo "$RESPONSE" | head -n -1)

if [ "$HTTP_STATUS" != "200" ]; then
  echo "WARNING: Service unavailable. Running all tests (fail-open)."
  GREP_ARGS=""
else
  DISABLED=$(echo "$BODY" | jq -r '.disabledTests | join("|")')
  GREP_ARGS=$([ -n "$DISABLED" ] && echo "--grep \"$DISABLED\" --invert" || echo "")
fi

npx mocha 'tests/**/*.spec.js' $GREP_ARGS

Why --grep instead of .mocharc: filters are dynamic, so they cannot be fixed in config.

Limitations: the TP###### format is safe for regex; with more than 100 filters the grep string becomes long - this can be solved by using the Mocha programmatic API with a .mocharc.js file generated by CI from the service response.

3. Architectural assumptions

  • The service does not control the runner directly - it returns a list of testId values, and CI applies that list to Mocha or another runner.
  • PROD is a subset of MASTER - therefore MASTER uses filters for MASTER + PROD, while PROD uses only PROD.
  • Fail-open is preferred over fail-close - if MongoDB is unavailable, CI should continue and run all tests instead of stopping the pipeline.
  • testId is a stable test identifier - tests are assumed to have unique and persistent IDs in the TP###### format so they can be reliably disabled.
  • Filters must survive service restarts - therefore in-memory storage is not considered the main solution.
  • Graceful shutdown - when stopping, the service waits for current requests to finish and closes the MongoDB connection. Timeout: 10 seconds, then forced exit.
  • testId and targetBranch cannot be changed - PATCH only updates reason and author. To change the test or branch, delete the filter and create a new one.
  • /resolve returns a stable response - the disabledTests list is deduplicated and sorted so identical requests produce identical results.
  • Physical deletion - DELETE removes the document permanently, with no soft delete and no history.
  • MongoDB is the only dependency - no Redis, queues, or other services.
  • Indexes are created on startup - ensureIndexes() runs on every start; if indexes already exist, MongoDB skips creation.
  • Swagger is optional - if swagger-output.json is missing, /docs is simply disabled and the service continues to run.

4. Validation via Zod

All input data is described by Zod schemas in src/types.ts. A single schema produces TypeScript types, request validation, and Swagger examples. If a Swagger example does not match the schema, the build fails.

5. Limitations and risks

  • Single point of failure - if the service is unavailable, CI runs all tests (fail-open).
  • No cache - every /resolve hits MongoDB. If the database is unavailable, the service returns an empty list and fallback: true.
  • No filter expiration - a forgotten filter will disable a test indefinitely.
  • No audit log - when a filter is deleted, information about who created it, why, and when it was deleted is lost.
  • No rate limiting.

6. Data storage

MongoDB (required by the task). Data is stored in the mongo_data named volume. In-memory storage (Map) is not suitable because filters must survive restarts.

Two indexes:

{ testId: 1, targetBranch: 1 }  // unique - one test per branch
{ targetBranch: 1 }             // for /resolve: find({ targetBranch: { $in: [...] } })

7. Scaling

  • Caching /resolve - Redis, so MongoDB is not queried on every CI request. Refresh the cache every 30-60 seconds.
  • Multiple instances - the service does not keep state in memory, so multiple copies can run behind a load balancer.
  • Audit log - a separate filter_events collection with history: who created, updated, or deleted a filter.
  • Authorization - API keys for CI, JWT for filter management.
  • Notifications - Slack/webhook notifications when a filter is created or deleted.
  • Regexp-based filtering - a pattern field for disabling groups of tests by pattern.
  • Jira integration - a ticketId field and automatic filter removal when the ticket is closed.
  • API versioning - /v1/resolve, /v1/filters, so the contract can evolve without breaking old clients.

8. Production improvements

  1. Structured logging - add a request ID to link service logs with a specific CI run.
  2. Metrics - a /metrics endpoint (Prometheus): request counters, response time, number of active filters.
  3. CI pipeline - separate stages: lint -> tests -> build -> deploy. Right now everything runs only inside docker build.
  4. Alerting on fallback - when MongoDB is unavailable and /resolve returns an empty list, send a Slack alert so the team learns about the issue before flaky tests start breaking the pipeline.
  5. ReDoS protection - the reason parameter in GET /filters is passed to MongoDB as regex. A complex pattern can hang the query. Fix: limit pattern length, validate safety (safe-regex2), add a query timeout (maxTimeMS).
  6. Bulk operations - POST /filters/bulk and DELETE /filters/bulk to create/delete multiple filters in one request.
  7. Write locking - so /resolve does not read filters while they are being modified. This guarantees that all CI jobs receive either the old state or the new state in full, without mixing them.

About

Service for managing disabled E2E tests in a CI/CD pipeline. CI calls POST /resolve before running E2E tests and receives a list of tests to skip.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors