Skip to content

RobinAllenson/traffic-diagnosis

Repository files navigation

Traffic Diagnosis

A web app for SEO agencies to diagnose traffic drops and spikes by connecting Google Search Console and GA4 data.

Built for The SEO Community.

What it does

  • Sign in with Google to grant read-only access to your GSC and GA4 properties
  • Pull data for any property — pages, queries, and query-page mappings, day-by-day
  • Detect anomalies using statistical baselines (2 standard deviations from your site's own historical volatility)
  • Tag branded queries automatically using domain name matching with fuzzy/Levenshtein distance
  • Cluster queries semantically using e5-small embeddings + Mini-Batch K-Means
  • Cluster pages by shared query overlap using Leiden community detection
  • Visualize time series with Chart.js, including Google algorithm update date overlays
  • Export results as a zip of CSV files (summary, pages, queries, query-page mappings)

Tokens are ephemeral — discarded after the data pull. No credentials are stored.

Tech Stack

Layer Technology
Framework Next.js 15 (App Router)
Charts Chart.js (react-chartjs-2) + annotation + zoom plugins
Database DuckDB (per-session file, via @duckdb/node-api)
Auth next-auth (Google OAuth2)
Google APIs googleapis (GSC, GA4)
Embeddings e5-small (Python, sentence-transformers)
Query Clustering Mini-Batch K-Means (scikit-learn)
Page Clustering Leiden community detection (igraph + leidenalg)
Export CSV zip (archiver)

Prerequisites

  • Node.js 20+
  • Python 3.12+ with uv
  • A Google Cloud project with OAuth 2.0 credentials

Setup

1. Clone and install

git clone https://github.com/YOUR_USERNAME/traffic-diagnosis.git
cd traffic-diagnosis
npm install
cd python && uv sync && cd ..

2. Create Google OAuth credentials

  1. Go to Google Cloud Console → Credentials
  2. Create an OAuth 2.0 Client ID (Web application)
  3. Add authorized redirect URI: http://localhost:3000/api/auth/callback/google
  4. Enable these APIs in your project:
    • Search Console API
    • Google Analytics Data API
    • Google Analytics Admin API

3. Configure environment

cp .env.local.example .env.local

Edit .env.local with your credentials:

GOOGLE_CLIENT_ID=your-client-id.apps.googleusercontent.com
GOOGLE_CLIENT_SECRET=your-client-secret
NEXTAUTH_SECRET=<run: openssl rand -base64 32>
NEXTAUTH_URL=http://localhost:3000

4. Run

npm run dev

Open http://localhost:3000, sign in with Google, select a property, and start a sync.

Docker

docker compose up --build

Requires .env.local in the project root.

How it works

  1. Auth — Google OAuth grants read-only access to GSC + GA4. Token stays in server memory, never persisted.
  2. Sync — Background worker pulls data day-by-day at <16 QPS. Three separate pulls (page-only, query-only, query+page) to maximize data completeness. GA4 session data pulled if a matching property is found. Progress shown via polling.
  3. Analysis — Anomaly detection runs in DuckDB SQL (weekly aggregation, WoW % change, stddev baseline). Branded queries tagged via n-gram extraction + Levenshtein distance from domain name.
  4. Clustering — Python sidecar embeds queries with e5-small, clusters with K-Means. Pages clustered by shared query overlap using cosine similarity + Leiden community detection.
  5. Dashboard — Chart.js time series with algorithm update annotations, KPI cards, anomaly tables, sortable data tables. Drill-down to page and query detail.
  6. Export — Download a zip of 4 CSVs: summary, pages, queries, query-page mappings.
  7. Cleanup — Session DuckDB files are deleted after 24 hours.

Project Structure

src/
├── app/
│   ├── page.tsx                    # Home — sign in + property selector
│   ├── property/[id]/
│   │   ├── page.tsx                # Dashboard
│   │   ├── pages/page.tsx          # Page drill-down
│   │   └── queries/page.tsx        # Query drill-down
│   └── api/
│       ├── auth/[...nextauth]/     # OAuth
│       ├── properties/             # List GSC/GA4 properties
│       ├── sync/start|status/      # Background data sync
│       ├── analysis/               # Sitewide, pages, queries, anomalies, branded
│       ├── clusters/               # Cluster summaries
│       ├── export/                 # CSV zip download
│       └── health/                 # Health check
├── components/
│   ├── charts/                     # Chart.js wrappers
│   └── dashboard/                  # Metrics, tables, export button
├── lib/
│   ├── db.ts                       # DuckDB per-session manager
│   ├── gsc.ts                      # GSC API client
│   ├── ga4.ts                      # GA4 API client
│   ├── sync-worker.ts              # Background sync orchestration
│   ├── anomaly.ts                  # Anomaly detection (DuckDB SQL)
│   ├── branded.ts                  # Branded query detection
│   ├── python.ts                   # Python sidecar caller
│   └── auth.ts                     # next-auth config
python/
├── embed.py                        # e5-small embeddings
├── cluster_queries.py              # Mini-Batch K-Means
└── cluster_pages.py                # Leiden community detection

License

Open source — license TBD.

About

Self-hosted traffic diagnosis tool for SEO agencies — connects GSC + GA4 to detect and investigate traffic anomalies

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors