Skip to content

synsejse/scrappey-resolverr-rs

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

40 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

scrappey-resolverr-rs πŸš€πŸ¦€

A high-performance, Rust-based, FlareSolverr-compatible API for bypassing anti-bot challenges (Cloudflare, DDoS-Guard, etc.) using a headful Chrome browser running inside a virtual display (via transparent and xvfb-run), with Scrappey fallback and built-in authenticated HTTP proxy bridging.


Overview πŸ“–

scrappey-resolverr-rs is a modern, Docker-ready replacement for FlareSolverr, written in Rust for speed and reliability. It exposes a FlareSolverr-compatible HTTP API, orchestrates a headful Chrome browser running inside a virtual display (using the transparent library and xvfb-run) to solve anti-bot challenges, and can fall back to the Scrappey API for advanced bypassing. It also includes a local HTTP-to-authenticated-HTTP proxy bridge, making it easy to use authenticated proxies with browser automation.


How it Works βš™οΈ

  1. API Requests: The server exposes endpoints compatible with FlareSolverr (/v1, /health, /).

  2. Challenge Handling:

    • Receives a request to fetch a URL.
    • Launches a headful Chrome session (not headless) via chromedriver, running inside a virtual display using the transparent library and xvfb-run.
    • Navigates to the target URL, handling cookies and user-agent spoofing.
    • Detects and solves anti-bot challenges (Cloudflare, DDoS-Guard) automatically.
    • If browser-based solving fails, falls back to the Scrappey API.
  3. Proxy Bridge:

    • Runs a local HTTP proxy on port 8080 that forwards requests to an upstream authenticated HTTP proxy (as configured in Docker).
    • Chrome is configured to use this bridge, enabling authenticated proxy support.
  4. Persistence:

    • Session data (cookies and user-agent) are persisted to disk in the data directory (default: /data/sessions/) for session continuity.
    • Failure screenshots are automatically captured when challenges fail (saved to /data/screenshots/).

Architecture πŸ—οΈ

  • Rust (tokio, axum): High-performance async server and proxy.
  • Headful Chrome (chromedriver + transparent + xvfb-run): Real browser automation for challenge solving, running in a virtual display (not headless).
  • Scrappey API: Fallback for advanced anti-bot bypass.
  • HTTP Proxy Bridge: Local proxy for authenticated upstream proxies.
  • Dockerized: All dependencies (Chrome, chromedriver, proxy) managed via Docker Compose.

Key Components:

  • src/main.rs β€” Entrypoint, server, and process management.
  • src/config.rs β€” Configuration management and environment variable loading.
  • src/flaresolverr.rs β€” FlareSolverr-compatible API handlers.
  • src/browser.rs β€” Browser automation and challenge logic.
  • src/challenge.rs β€” Challenge detection and solving.
  • src/fwd_proxy.rs β€” HTTP proxy bridge implementation.
  • src/scrappey.rs β€” Scrappey API client.

Installation 🐳

Prerequisites πŸ“¦

Quick Start 🚦

Option 1: Use Prebuilt Docker Image (Recommended)

You can use the prebuilt image from GitHub Container Registry without building locally:

  1. Configure required environment variables: Edit docker-compose.yml and set:

    • SCRAPPEY_API_KEY (get from Scrappey)
    • PROXY_HOST, PROXY_PORT (your HTTP proxy details)
    • PROXY_USERNAME, PROXY_PASSWORD (optional, for authenticated proxies)

    Note: Many environment variables have sensible defaults and are commented out in the docker-compose.yml file. Uncomment and modify them only if you need to change the defaults.

  2. Update your docker-compose.yml: In the scrappey-resolverr service section, set the image to:

    image: ghcr.io/bananikxenos/scrappey-resolverr-rs:release

    Remove or comment out any build: lines for this service.

  3. Start the services:

    docker-compose up -d

    This will:

    • Start an authenticated Squid proxy (proxy service)
    • Pull and run the prebuilt scrappey-resolverr-rs image (scrappey-resolverr service)
    • Launch Chrome and chromedriver inside the container
  4. API will be available at: http://localhost:8191 🎯


Option 2: Build Locally

  1. Clone the repository:

    git clone <this-repo-url>
    cd scrappey-resolverr-rs
  2. Configure required environment variables: Edit docker-compose.yml and set:

    • SCRAPPEY_API_KEY (get from Scrappey)
    • PROXY_HOST, PROXY_PORT (your HTTP proxy details)
    • PROXY_USERNAME, PROXY_PASSWORD (optional, for authenticated proxies)

    Note: Many environment variables have sensible defaults and are commented out in the docker-compose.yml file. Uncomment and modify them only if you need to change the defaults.

  3. Start the services:

    docker-compose up --build

    This will:

    • Start an authenticated Squid proxy (proxy service)
    • Build and run scrappey-resolverr-rs (scrappey-resolverr service)
    • Launch Chrome and chromedriver inside the container
  4. API will be available at: http://localhost:8191 🎯


Usage Examples πŸ§‘β€πŸ’»

Health Check ❀️

curl http://localhost:8191/health

Solve a Challenge (GET request) πŸ›‘οΈ

curl -X POST http://localhost:8191/v1 \
  -H 'Content-Type: application/json' \
  -d '{
    "cmd": "request.get",
    "url": "https://protected-site.com/",
    "maxTimeout": 60000
  }'

Example Response πŸ“¦

{
  "status": "ok",
  "message": "Challenge solved!",
  "solution": {
    "url": "https://protected-site.com/",
    "status": 200,
    "headers": {},
    "response": "<html>...</html>",
    "cookies": [
      {
        "name": "...",
        "value": "...",
        "domain": "...",
        "path": "/",
        "expires": 1712345678,
        "httpOnly": false,
        "secure": true,
        "sameSite": "Lax"
      }
    ],
    "userAgent": "Mozilla/5.0 ..."
  }
}

Configuration πŸ”§

Environment Variables

  • SCRAPPEY_API_KEY - Your Scrappey API key (required)
  • PROXY_HOST - HTTP proxy hostname (required)
  • PROXY_PORT - HTTP proxy port (required)
  • PROXY_USERNAME - HTTP proxy username (optional)
  • PROXY_PASSWORD - HTTP proxy password (optional)
  • DATA_PATH - Directory path for storing session data (default: /data). Sessions are stored in a sessions subdirectory.
  • CAPTURE_FAILURE_SCREENSHOTS - Enable/disable failure screenshots (default: true)
  • SCREENSHOT_DIR - Directory for failure screenshots (default: /data/screenshots)
  • MAX_FAILURE_SCREENSHOTS - Maximum number of failure screenshots to keep (default: 10)
  • HOST - Server bind address (default: 0.0.0.0)
  • PORT - Server port (default: 8191)

Failure Screenshots πŸ“Έ

When challenge resolution fails, the system automatically captures screenshots for debugging purposes. These are saved with timestamps and domain names:

  • Location: /data/screenshots/ (configurable via SCREENSHOT_DIR)
  • Format: failure_{domain}_{timestamp}.png or ddos_guard_failure_{domain}_{timestamp}.png
  • Control: Set CAPTURE_FAILURE_SCREENSHOTS=false to disable
  • Cleanup: Old screenshots are automatically cleaned up when the limit is exceeded (configurable via MAX_FAILURE_SCREENSHOTS)

Example screenshot filename: failure_example.com_20240315_143022.png

Notes

  • Persistence: Session data (cookies and user-agent) are saved in /data/sessions/ (mounted as a Docker volume). Each session has its own file.
  • Proxy: Chrome always connects to the local proxy bridge (127.0.0.1:8080), which forwards to your configured authenticated proxy.
  • Fallback: If browser-based solving fails, Scrappey API is used (requires a valid API key and balance).
  • Screenshots: Failure screenshots are automatically captured for debugging when challenges cannot be solved.
  • Sessions: Session management is not implemented (stateless per request).

Prowlarr Configuration πŸ¦πŸ”§

To use scrappey-resolverr-rs with Prowlarr:

1. Go to Settings β†’ Indexers βš™οΈ

2. Add Two Proxies 🧩

FlareSolverr Proxy: 🌩️

  • Host: Locally connectable IP of your scrappey-resolverr instance (e.g., the LAN IP or Docker network IP accessible from your Prowlarr host)
  • Port: 8191 (default FlareSolverr port)
  • Tags: a tag like scrappey

HTTP Proxy: 🌐

  • Host: Your publicly exposed proxy address (the proxy must be accessible from the public internet, as Scrappey will use it externally to act on your IP)
  • Port: Your PROXY_PORT
  • Username: Your PROXY_USERNAME
  • Password: Your PROXY_PASSWORD
  • Tags: a tag like proxy

3. For Each Indexer That Needs Cloudflare Bypass πŸ›‘οΈ

  • Edit the indexer settings
  • Add both tags you created to the "Tags" field

This ensures that requests for those indexers are routed through both the FlareSolverr-compatible API and your authenticated HTTP proxy.

Why is the HTTP proxy required? πŸ€”

The HTTP proxy is essential for maintaining IP persistence. This means that cookies and sessions remain valid across requests, as all browser and API traffic is routed through the same outgoing IP. πŸͺπŸ”’ As a result, cookies and user-agents do not need to be refreshed on every call, which dramatically reduces the number of Scrappey API calls required. This leads to more stable scraping sessions and significant savings on Scrappey usage. πŸ’Έβœ¨


Sources & Tools πŸ› οΈ

This project was built using and inspired by the following sources and tools:

Core Inspiration πŸ’‘

  • FlareSolverr - The original Python-based anti-bot challenge solver that this project aims to replace with a faster Rust implementation
  • scrappey_proxy by AnthonyRAFFY - Reference implementation and inspiration for Scrappey API integration

Development Tools πŸ”§

  • GitHub Copilot - AI-powered code completion and assistance
  • Zed Editor - Modern, high-performance code editor used for development

License πŸ“„

This project is licensed under the MIT License. See the LICENSE file for details.

About

An attempt to create a drop in flaresolverr replacement using scrappey.com API

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors