A web app for SEO agencies to diagnose traffic drops and spikes by connecting Google Search Console and GA4 data.
Built for The SEO Community.
- Sign in with Google to grant read-only access to your GSC and GA4 properties
- Pull data for any property — pages, queries, and query-page mappings, day-by-day
- Detect anomalies using statistical baselines (2 standard deviations from your site's own historical volatility)
- Tag branded queries automatically using domain name matching with fuzzy/Levenshtein distance
- Cluster queries semantically using e5-small embeddings + Mini-Batch K-Means
- Cluster pages by shared query overlap using Leiden community detection
- Visualize time series with Chart.js, including Google algorithm update date overlays
- Export results as a zip of CSV files (summary, pages, queries, query-page mappings)
Tokens are ephemeral — discarded after the data pull. No credentials are stored.
| Layer | Technology |
|---|---|
| Framework | Next.js 15 (App Router) |
| Charts | Chart.js (react-chartjs-2) + annotation + zoom plugins |
| Database | DuckDB (per-session file, via @duckdb/node-api) |
| Auth | next-auth (Google OAuth2) |
| Google APIs | googleapis (GSC, GA4) |
| Embeddings | e5-small (Python, sentence-transformers) |
| Query Clustering | Mini-Batch K-Means (scikit-learn) |
| Page Clustering | Leiden community detection (igraph + leidenalg) |
| Export | CSV zip (archiver) |
- Node.js 20+
- Python 3.12+ with uv
- A Google Cloud project with OAuth 2.0 credentials
git clone https://github.com/YOUR_USERNAME/traffic-diagnosis.git
cd traffic-diagnosis
npm install
cd python && uv sync && cd ..- Go to Google Cloud Console → Credentials
- Create an OAuth 2.0 Client ID (Web application)
- Add authorized redirect URI:
http://localhost:3000/api/auth/callback/google - Enable these APIs in your project:
- Search Console API
- Google Analytics Data API
- Google Analytics Admin API
cp .env.local.example .env.localEdit .env.local with your credentials:
GOOGLE_CLIENT_ID=your-client-id.apps.googleusercontent.com
GOOGLE_CLIENT_SECRET=your-client-secret
NEXTAUTH_SECRET=<run: openssl rand -base64 32>
NEXTAUTH_URL=http://localhost:3000
npm run devOpen http://localhost:3000, sign in with Google, select a property, and start a sync.
docker compose up --buildRequires .env.local in the project root.
- Auth — Google OAuth grants read-only access to GSC + GA4. Token stays in server memory, never persisted.
- Sync — Background worker pulls data day-by-day at <16 QPS. Three separate pulls (page-only, query-only, query+page) to maximize data completeness. GA4 session data pulled if a matching property is found. Progress shown via polling.
- Analysis — Anomaly detection runs in DuckDB SQL (weekly aggregation, WoW % change, stddev baseline). Branded queries tagged via n-gram extraction + Levenshtein distance from domain name.
- Clustering — Python sidecar embeds queries with e5-small, clusters with K-Means. Pages clustered by shared query overlap using cosine similarity + Leiden community detection.
- Dashboard — Chart.js time series with algorithm update annotations, KPI cards, anomaly tables, sortable data tables. Drill-down to page and query detail.
- Export — Download a zip of 4 CSVs: summary, pages, queries, query-page mappings.
- Cleanup — Session DuckDB files are deleted after 24 hours.
src/
├── app/
│ ├── page.tsx # Home — sign in + property selector
│ ├── property/[id]/
│ │ ├── page.tsx # Dashboard
│ │ ├── pages/page.tsx # Page drill-down
│ │ └── queries/page.tsx # Query drill-down
│ └── api/
│ ├── auth/[...nextauth]/ # OAuth
│ ├── properties/ # List GSC/GA4 properties
│ ├── sync/start|status/ # Background data sync
│ ├── analysis/ # Sitewide, pages, queries, anomalies, branded
│ ├── clusters/ # Cluster summaries
│ ├── export/ # CSV zip download
│ └── health/ # Health check
├── components/
│ ├── charts/ # Chart.js wrappers
│ └── dashboard/ # Metrics, tables, export button
├── lib/
│ ├── db.ts # DuckDB per-session manager
│ ├── gsc.ts # GSC API client
│ ├── ga4.ts # GA4 API client
│ ├── sync-worker.ts # Background sync orchestration
│ ├── anomaly.ts # Anomaly detection (DuckDB SQL)
│ ├── branded.ts # Branded query detection
│ ├── python.ts # Python sidecar caller
│ └── auth.ts # next-auth config
python/
├── embed.py # e5-small embeddings
├── cluster_queries.py # Mini-Batch K-Means
└── cluster_pages.py # Leiden community detection
Open source — license TBD.