LLMCrawl JavaScript SDK

The official JavaScript SDK for LLMCrawl, providing a simple and powerful way to scrape websites, crawl multiple pages, and extract structured data using AI.

Installation

npm install @llmcrawl/llmcrawl-js

Note: Version 1.0.0 introduces breaking changes. If you're upgrading from 0.x, please use named imports: import { LLMCrawl } from '@llmcrawl/llmcrawl-js'

Quick Start

import { LLMCrawl } from "@llmcrawl/llmcrawl-js";

const client = new LLMCrawl({
  apiKey: "your-api-key-here",
});

// Scrape a single page
const result = await client.scrape("https://example.com");
console.log(result.data?.markdown);

Features

🌐 Single Page Scraping - Extract content from individual web pages
🕷️ Website Crawling - Crawl entire websites with customizable options
🗺️ Site Mapping - Get all URLs from a website
🤖 AI-Powered Extraction - Extract structured data using custom schemas
📷 Screenshot Capture - Take screenshots of web pages
⚙️ Flexible Configuration - Extensive customization options

API Reference

Constructor

import { LLMCrawl } from "@llmcrawl/llmcrawl-js";

const client = new LLMCrawl({
  apiKey: "your-api-key",
  baseUrl: "https://api.llmcrawl.dev", // Optional custom base URL
});

Scraping

`scrape(url, options?)`

Scrape a single webpage and extract its content.

const result = await client.scrape("https://example.com", {
  formats: ["markdown", "html", "links"],
  headers: {
    "User-Agent": "Mozilla/5.0...",
  },
  waitFor: 3000, // Wait 3 seconds for page to load
  extract: {
    schema: {
      type: "object",
      properties: {
        title: { type: "string" },
        price: { type: "number" },
        description: { type: "string" },
      },
      required: ["title", "price"],
    },
  },
});

if (result.success) {
  console.log("Markdown:", result.data.markdown);
  console.log("Extracted data:", result.data.extract);
  console.log("Links:", result.data.links);
}

Options:

formats: Array of formats to return ('markdown', 'html', 'rawHtml', 'links', 'screenshot', 'screenshot@fullPage')
headers: Custom HTTP headers
includeTags: HTML tags to include in output
excludeTags: HTML tags to exclude from output
timeout: Request timeout in milliseconds (1000-90000)
waitFor: Delay before capturing content (0-60000ms)
extract: AI extraction configuration
webhookUrls: URLs to send results to
metadata: Additional metadata to include

Crawling

`crawl(url, options?)`

Start a crawl job to scrape multiple pages from a website.

const crawlResult = await client.crawl("https://example.com", {
  limit: 100,
  maxDepth: 3,
  includePaths: ["/blog/*", "/docs/*"],
  excludePaths: ["/admin/*"],
  scrapeOptions: {
    formats: ["markdown"],
    extract: {
      schema: {
        type: "object",
        properties: {
          title: { type: "string" },
          content: { type: "string" },
        },
      },
    },
  },
});

if (crawlResult.success) {
  console.log("Crawl started with ID:", crawlResult.id);
}

Options:

limit: Maximum number of pages to crawl
maxDepth: Maximum crawl depth
includePaths: Array of path patterns to include
excludePaths: Array of path patterns to exclude
allowBackwardLinks: Allow crawling backward links
allowExternalLinks: Allow crawling external domains
ignoreSitemap: Ignore robots.txt and sitemap.xml
scrapeOptions: Scraping options for each page
webhookUrls: URLs for webhook notifications
webhookMetadata: Additional webhook metadata

`getCrawlStatus(jobId)`

Check the status of a crawl job.

const status = await client.getCrawlStatus("crawl-job-id");

if (status.success) {
  console.log("Status:", status.status);
  console.log("Progress:", `${status.completed}/${status.total}`);
  console.log("Data:", status.data); // Array of scraped pages
}

`cancelCrawl(jobId)`

Cancel a running crawl job.

const result = await client.cancelCrawl("crawl-job-id");

if (result.success) {
  console.log("Crawl cancelled:", result.message);
}

Site Mapping

`map(url, options?)`

Get all URLs from a website without scraping their content.

const mapResult = await client.map("https://example.com", {
  limit: 1000,
  includeSubdomains: true,
  search: "documentation",
});

if (mapResult.success) {
  console.log("Found URLs:", mapResult.links);
}

Options:

limit: Maximum number of links to return (1-5000)
includeSubdomains: Include subdomain URLs
search: Filter links by search query
ignoreSitemap: Ignore robots.txt and sitemap.xml
includePaths: Path patterns to include
excludePaths: Path patterns to exclude

AI-Powered Data Extraction

LLMCrawl supports advanced AI-powered data extraction using custom schemas:

const result = await client.scrape("https://example-store.com/product/123", {
  formats: ["markdown"],
  extract: {
    mode: "llm",
    schema: {
      type: "object",
      properties: {
        productName: { type: "string" },
        price: { type: "number" },
        description: { type: "string" },
        inStock: { type: "boolean" },
        reviews: {
          type: "array",
          items: {
            type: "object",
            properties: {
              rating: { type: "number" },
              comment: { type: "string" },
              author: { type: "string" },
            },
          },
        },
      },
      required: ["productName", "price"],
    },
    systemPrompt: "Extract product information from this e-commerce page.",
    prompt: "Focus on getting accurate pricing and availability information.",
  },
});

if (result.success && result.data.extract) {
  const productData = JSON.parse(result.data.extract);
  console.log("Product:", productData.productName);
  console.log("Price:", productData.price);
  console.log("In Stock:", productData.inStock);
}

Error Handling

All methods return a response object with a success field:

const result = await client.scrape("https://example.com");

if (result.success) {
  // Handle successful response
  console.log(result.data);
} else {
  // Handle error
  console.error("Error:", result.error);
  console.error("Details:", result.details);
}

Type Definitions

The SDK includes comprehensive TypeScript types:

import {
  ScrapeResponse,
  CrawlResponse,
  CrawlStatusResponse,
  Document,
  ExtractOptions,
  ScrapeOptions,
  CrawlerOptions,
} from "@llmcrawl/llmcrawl-js";

Examples

E-commerce Product Scraping

const product = await client.scrape("https://store.example.com/product/123", {
  formats: ["markdown"],
  extract: {
    schema: {
      type: "object",
      properties: {
        name: { type: "string" },
        price: { type: "number" },
        rating: { type: "number" },
        availability: { type: "string" },
      },
    },
  },
});

News Article Extraction

const article = await client.scrape("https://news.example.com/article/123", {
  formats: ["markdown", "html"],
  extract: {
    schema: {
      type: "object",
      properties: {
        headline: { type: "string" },
        author: { type: "string" },
        publishDate: { type: "string" },
        content: { type: "string" },
        tags: { type: "array", items: { type: "string" } },
      },
    },
  },
});

Documentation Crawling

// Start crawling documentation
const crawl = await client.crawl("https://docs.example.com", {
  limit: 500,
  includePaths: ["/docs/*", "/api/*"],
  excludePaths: ["/docs/internal/*"],
  scrapeOptions: {
    formats: ["markdown"],
    extract: {
      schema: {
        type: "object",
        properties: {
          title: { type: "string" },
          section: { type: "string" },
          content: { type: "string" },
        },
      },
    },
  },
});

// Poll for completion
if (crawl.success) {
  let status = await client.getCrawlStatus(crawl.id);
  while (status.success && status.status === "scraping") {
    await new Promise((resolve) => setTimeout(resolve, 5000));
    status = await client.getCrawlStatus(crawl.id);
  }

  if (status.success && status.status === "completed") {
    console.log("Crawl completed!");
    console.log("Pages scraped:", status.data.length);
  }
}

Support

📧 Email: contact@llmcrawl.dev
📚 Documentation: https://docs.llmcrawl.dev
🐛 Issues: https://github.com/LLMCrawl/llmcrawl-js/issues

License

MIT License - see LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
.github/workflows		.github/workflows
src		src
.gitignore		.gitignore
README.md		README.md
bun.lock		bun.lock
examples.ts		examples.ts
package.json		package.json
tsconfig.json		tsconfig.json
tsdown.config.ts		tsdown.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LLMCrawl JavaScript SDK

Installation

Quick Start

Features

API Reference

Constructor

Scraping

`scrape(url, options?)`

Crawling

`crawl(url, options?)`

`getCrawlStatus(jobId)`

`cancelCrawl(jobId)`

Site Mapping

`map(url, options?)`

AI-Powered Data Extraction

Error Handling

Type Definitions

Examples

E-commerce Product Scraping

News Article Extraction

Documentation Crawling

Support

License

About

Uh oh!

Releases 4

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

LLMCrawl JavaScript SDK

Installation

Quick Start

Features

API Reference

Constructor

Scraping

scrape(url, options?)

Crawling

crawl(url, options?)

getCrawlStatus(jobId)

cancelCrawl(jobId)

Site Mapping

map(url, options?)

AI-Powered Data Extraction

Error Handling

Type Definitions

Examples

E-commerce Product Scraping

News Article Extraction

Documentation Crawling

Support

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 4

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`scrape(url, options?)`

`crawl(url, options?)`

`getCrawlStatus(jobId)`

`cancelCrawl(jobId)`

`map(url, options?)`

Packages