A thin web app that lets anyone try the three Prompt Engineering Toolkit evaluators without pasting API keys. Deployed on Vercel.
This is the companion app for Doberjohn/prompt-engineering-toolkit. The toolkit contains the framework, evaluators, calibration sets, and skills. This repo contains only the web app that showcases three of them.
| Path | Purpose |
|---|---|
index.html |
Landing page with three evaluator cards |
prompt-evaluator.html |
PPEP prompt evaluator page |
issue-evaluator.html |
GitHub implementation plan issue evaluator page |
uiux-url-evaluator.html |
UI/UX URL mode evaluator page |
shared/styles.css |
Dark theme shared across all pages |
shared/client.js |
Fetches /api/evaluate, renders markdown results |
api/evaluate.js |
Vercel serverless function. Proxies to Anthropic, rate-limits per IP |
scripts/sync-prompts.js |
Build-time script: fetches canonical prompts from the toolkit repo on GitHub |
system-prompts/ |
Populated by the sync script at build time. Gitignored |
The three HTML pages are static and ship with no secrets. When the user clicks Evaluate, the browser calls POST /api/evaluate on the same origin. The serverless function reads the appropriate system prompt from system-prompts/, checks the caller's IP against an Upstash Redis rate limit (10 evaluations per day per IP by default), forwards the request to Anthropic using a server-held API key, and returns the model's response as plain text. The user never sees or supplies a key.
The system prompts are fetched from the toolkit repo on GitHub at build time. The toolkit's markdown files remain the single source of truth.
All set in the Vercel project dashboard. None are exposed to the browser.
| Variable | Required | Default | Purpose |
|---|---|---|---|
ANTHROPIC_API_KEY |
yes | — | Anthropic API key. Keep secret. |
UPSTASH_REDIS_REST_URL |
yes | — | From Upstash dashboard. Without this, rate limiting is disabled |
UPSTASH_REDIS_REST_TOKEN |
yes | — | From Upstash dashboard |
MODEL |
no | claude-opus-4-7 |
Anthropic model ID. Change without redeploying code |
RATE_LIMIT_PER_DAY |
no | 10 |
Per-IP daily evaluation cap |
TOOLKIT_REPO |
no | Doberjohn/prompt-engineering-toolkit |
Source repo for the canonical prompts. Useful when testing against a fork |
TOOLKIT_REF |
no | main |
Branch or tag to fetch prompts from |
- In Vercel project dashboard, update
ANTHROPIC_API_KEYto the new value - Redeploy (or trigger a redeploy via "Redeploy latest")
- Revoke the old key in Anthropic console
Total time: under 60 seconds.
git clone https://github.com/<your-username>/prompt-engineering-toolkit-companion
cd prompt-engineering-toolkit-companion
npm install
# Populate system-prompts/ by fetching from GitHub
npm run sync-prompts
# Create .env.local with the required vars (see above)
# Then start the local Vercel dev server
npm run devThe app runs at http://localhost:3000 with hot reload for static files. Changes to api/evaluate.js require restarting the dev server.
Connect this repository to a Vercel project. In the project's settings:
- Build command:
npm run build(runssync-promptsautomatically) - Output directory: leave as default (Vercel auto-detects)
- Install command:
npm install
Set the environment variables listed above. Deploy.
The system-prompts/ folder is not committed. It is regenerated from the toolkit's GitHub raw URLs every time Vercel builds. The fetch targets the main branch of the toolkit repo by default (configurable via TOOLKIT_REPO and TOOLKIT_REF env vars).
This means: when you update prompts/issue-evaluator.md in the toolkit and push to main, the companion app will pick up the change on its next build — not automatically.
To propagate toolkit changes to the live app, you need to trigger a rebuild. Three options:
Option 1: Manual redeploy (simplest, no setup)
After pushing to the toolkit's main branch, open the Vercel dashboard for this app and click "Redeploy latest". Takes ~30 seconds.
Option 2: Vercel deploy hook (recommended)
In the Vercel project settings, create a deploy hook URL. Then add a GitHub Action to the toolkit repo (in .github/workflows/notify-companion.yml):
name: Trigger companion app rebuild
on:
push:
branches: [main]
paths:
- 'prompts/**'
jobs:
trigger:
runs-on: ubuntu-latest
steps:
- name: Ping Vercel deploy hook
run: curl -X POST ${{ secrets.COMPANION_DEPLOY_HOOK }}Store the deploy hook URL in the toolkit repo's GitHub Actions secrets as COMPANION_DEPLOY_HOOK. Now every push that touches prompts/** triggers a companion app rebuild automatically.
Option 3: Scheduled rebuild A daily cron job that rebuilds regardless. Wasteful but zero coupling between repos. Not recommended unless you want full independence between the repos.
The sync script also strips human-facing preambles from the UI/UX mode file (the instructional framing above the first --- divider), so those tokens aren't wasted on every API call.
Documented here so future maintainers (including future me) know what was deferred and why.
markedfrom CDN. If jsdelivr has an outage, markdown rendering breaks. Self-hostingmarked.esm.jswould fix this. Deferred; acceptable for a demo.- Cold-start latency. First request after ~10 minutes of inactivity has an extra 1-2 second penalty. Inherent to serverless. Fixing requires either Vercel Pro's keep-warm or migrating to an edge runtime, which loses Node
fsaccess (and therefore our system-prompt loading approach). Deferred. - No retry on upstream 429. If Anthropic rate-limits the proxy, the user gets a one-line error with no retry. Exponential backoff would be a small, valuable improvement.
- Rate limit is per-IP, not per-session. Users behind shared NAT share a quota. Acceptable for a demo but worth noting.
MAX_TOKENSis hardcoded to 8000. Sufficient for all three evaluators today. If the issue or UI/UX rubrics grow, this becomes a ceiling to revisit.- No request logging or analytics. Intentional (privacy), but makes debugging production issues harder. If abuse becomes a real concern, structured logs to Vercel's console (no user content, just outcomes) would help without compromising privacy.
- Build depends on GitHub raw availability. If
raw.githubusercontent.comis down at build time, deployment fails. Historically this is reliable (enterprise-grade uptime), but it is a new external dependency compared to the same-repo version. Mitigation: the build script aborts loudly rather than silently deploying stale content. - Manual redeploy coupling. Prompt changes in the toolkit don't automatically propagate without the deploy-hook setup. Acceptable tradeoff for the independence gain, but worth knowing.
With defaults (Opus 4.7, 10 evaluations per IP per day, ~$0.15-0.30 per evaluation), a single IP maxing out their daily quota costs $1.50-$3.00. A realistic busy day with 50 distinct visitors who each do 2-3 evaluations costs roughly $15-$45. Adjust MODEL and RATE_LIMIT_PER_DAY environment variables if that's outside your comfort zone.
MIT.