🕷️ Web Crawler

A command-line web crawler built in Node.js that recursively traverses hyperlinks and collects data from web pages.

About

Built as a hands-on project for learning how HTTP and the web work under the hood. The crawler recursively follows links starting from a given URL, respects robots.txt rules, and handles concurrent page fetching using async patterns.

Features

🔗 Recursive link traversal with configurable depth limit
⚡ Concurrent page fetching via p-limit for controlled parallelism
🔁 Visited URL tracking to prevent duplicate requests and infinite loops
🤖 robots.txt compliance via robots-parser
🧪 Test suite powered by Jest

Project Structure

webcrawler/
├── src/           # Core crawler logic
├── tests/         # Jest test suites
├── main.js        # Entry point
├── package.json
└── .nvmrc         # Node version: 18.7.0

Getting Started

Prerequisites

Node.js 18.7.0 (use nvm for version management)

nvm use   # automatically picks up .nvmrc

Install

git clone https://github.com/Lazzar19/webcrawler.git
cd webcrawler
npm install

Run

node main.js <url>

Tests

npm test

Dependencies

Package	Purpose
`jsdom`	HTML parsing and link extraction
`p-limit`	Concurrency limiter for async requests
`robots-parser`	Parses and respects `robots.txt` rules
`jest`	Testing framework (dev dependency)

Author

Lazar Nikolic — GitHub · LinkedIn

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
backend		backend
src		src
tests		tests
.gitignore		.gitignore
.nvmrc		.nvmrc
README.md		README.md
package-lock.json		package-lock.json
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🕷️ Web Crawler

About

Features

Project Structure

Getting Started

Prerequisites

Install

Run

Tests

Dependencies

Author

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🕷️ Web Crawler

About

Features

Project Structure

Getting Started

Prerequisites

Install

Run

Tests

Dependencies

Author

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages