The official JavaScript SDK for LLMCrawl, providing a simple and powerful way to scrape websites, crawl multiple pages, and extract structured data using AI.
npm install @llmcrawl/llmcrawl-jsNote: Version 1.0.0 introduces breaking changes. If you're upgrading from 0.x, please use named imports:
import { LLMCrawl } from '@llmcrawl/llmcrawl-js'
import { LLMCrawl } from "@llmcrawl/llmcrawl-js";
const client = new LLMCrawl({
apiKey: "your-api-key-here",
});
// Scrape a single page
const result = await client.scrape("https://example.com");
console.log(result.data?.markdown);- π Single Page Scraping - Extract content from individual web pages
- π·οΈ Website Crawling - Crawl entire websites with customizable options
- πΊοΈ Site Mapping - Get all URLs from a website
- π€ AI-Powered Extraction - Extract structured data using custom schemas
- π· Screenshot Capture - Take screenshots of web pages
- βοΈ Flexible Configuration - Extensive customization options
import { LLMCrawl } from "@llmcrawl/llmcrawl-js";
const client = new LLMCrawl({
apiKey: "your-api-key",
baseUrl: "https://api.llmcrawl.dev", // Optional custom base URL
});Scrape a single webpage and extract its content.
const result = await client.scrape("https://example.com", {
formats: ["markdown", "html", "links"],
headers: {
"User-Agent": "Mozilla/5.0...",
},
waitFor: 3000, // Wait 3 seconds for page to load
extract: {
schema: {
type: "object",
properties: {
title: { type: "string" },
price: { type: "number" },
description: { type: "string" },
},
required: ["title", "price"],
},
},
});
if (result.success) {
console.log("Markdown:", result.data.markdown);
console.log("Extracted data:", result.data.extract);
console.log("Links:", result.data.links);
}Options:
formats: Array of formats to return ('markdown','html','rawHtml','links','screenshot','screenshot@fullPage')headers: Custom HTTP headersincludeTags: HTML tags to include in outputexcludeTags: HTML tags to exclude from outputtimeout: Request timeout in milliseconds (1000-90000)waitFor: Delay before capturing content (0-60000ms)extract: AI extraction configurationwebhookUrls: URLs to send results tometadata: Additional metadata to include
Start a crawl job to scrape multiple pages from a website.
const crawlResult = await client.crawl("https://example.com", {
limit: 100,
maxDepth: 3,
includePaths: ["/blog/*", "/docs/*"],
excludePaths: ["/admin/*"],
scrapeOptions: {
formats: ["markdown"],
extract: {
schema: {
type: "object",
properties: {
title: { type: "string" },
content: { type: "string" },
},
},
},
},
});
if (crawlResult.success) {
console.log("Crawl started with ID:", crawlResult.id);
}Options:
limit: Maximum number of pages to crawlmaxDepth: Maximum crawl depthincludePaths: Array of path patterns to includeexcludePaths: Array of path patterns to excludeallowBackwardLinks: Allow crawling backward linksallowExternalLinks: Allow crawling external domainsignoreSitemap: Ignore robots.txt and sitemap.xmlscrapeOptions: Scraping options for each pagewebhookUrls: URLs for webhook notificationswebhookMetadata: Additional webhook metadata
Check the status of a crawl job.
const status = await client.getCrawlStatus("crawl-job-id");
if (status.success) {
console.log("Status:", status.status);
console.log("Progress:", `${status.completed}/${status.total}`);
console.log("Data:", status.data); // Array of scraped pages
}Cancel a running crawl job.
const result = await client.cancelCrawl("crawl-job-id");
if (result.success) {
console.log("Crawl cancelled:", result.message);
}Get all URLs from a website without scraping their content.
const mapResult = await client.map("https://example.com", {
limit: 1000,
includeSubdomains: true,
search: "documentation",
});
if (mapResult.success) {
console.log("Found URLs:", mapResult.links);
}Options:
limit: Maximum number of links to return (1-5000)includeSubdomains: Include subdomain URLssearch: Filter links by search queryignoreSitemap: Ignore robots.txt and sitemap.xmlincludePaths: Path patterns to includeexcludePaths: Path patterns to exclude
LLMCrawl supports advanced AI-powered data extraction using custom schemas:
const result = await client.scrape("https://example-store.com/product/123", {
formats: ["markdown"],
extract: {
mode: "llm",
schema: {
type: "object",
properties: {
productName: { type: "string" },
price: { type: "number" },
description: { type: "string" },
inStock: { type: "boolean" },
reviews: {
type: "array",
items: {
type: "object",
properties: {
rating: { type: "number" },
comment: { type: "string" },
author: { type: "string" },
},
},
},
},
required: ["productName", "price"],
},
systemPrompt: "Extract product information from this e-commerce page.",
prompt: "Focus on getting accurate pricing and availability information.",
},
});
if (result.success && result.data.extract) {
const productData = JSON.parse(result.data.extract);
console.log("Product:", productData.productName);
console.log("Price:", productData.price);
console.log("In Stock:", productData.inStock);
}All methods return a response object with a success field:
const result = await client.scrape("https://example.com");
if (result.success) {
// Handle successful response
console.log(result.data);
} else {
// Handle error
console.error("Error:", result.error);
console.error("Details:", result.details);
}The SDK includes comprehensive TypeScript types:
import {
ScrapeResponse,
CrawlResponse,
CrawlStatusResponse,
Document,
ExtractOptions,
ScrapeOptions,
CrawlerOptions,
} from "@llmcrawl/llmcrawl-js";const product = await client.scrape("https://store.example.com/product/123", {
formats: ["markdown"],
extract: {
schema: {
type: "object",
properties: {
name: { type: "string" },
price: { type: "number" },
rating: { type: "number" },
availability: { type: "string" },
},
},
},
});const article = await client.scrape("https://news.example.com/article/123", {
formats: ["markdown", "html"],
extract: {
schema: {
type: "object",
properties: {
headline: { type: "string" },
author: { type: "string" },
publishDate: { type: "string" },
content: { type: "string" },
tags: { type: "array", items: { type: "string" } },
},
},
},
});// Start crawling documentation
const crawl = await client.crawl("https://docs.example.com", {
limit: 500,
includePaths: ["/docs/*", "/api/*"],
excludePaths: ["/docs/internal/*"],
scrapeOptions: {
formats: ["markdown"],
extract: {
schema: {
type: "object",
properties: {
title: { type: "string" },
section: { type: "string" },
content: { type: "string" },
},
},
},
},
});
// Poll for completion
if (crawl.success) {
let status = await client.getCrawlStatus(crawl.id);
while (status.success && status.status === "scraping") {
await new Promise((resolve) => setTimeout(resolve, 5000));
status = await client.getCrawlStatus(crawl.id);
}
if (status.success && status.status === "completed") {
console.log("Crawl completed!");
console.log("Pages scraped:", status.data.length);
}
}- π§ Email: contact@llmcrawl.dev
- π Documentation: https://docs.llmcrawl.dev
- π Issues: https://github.com/LLMCrawl/llmcrawl-js/issues
MIT License - see LICENSE file for details.