Advanced Web Search Engine

Overview

A sophisticated, multi-component search engine implementing modern information retrieval techniques, web crawling, and search algorithms. Built with scalability and performance in mind, featuring real-time indexing and intelligent ranking systems.

Searching.mp4

Architecture

The system consists of three main components:

Crawling API (.NET)
- Multi-threaded web crawler
- Content extraction
- Link discovery
- Real-time processing
See detailed architecture: Crawling Plan.png
See detailed description of crawler API: Crawling API readme
Indexing API (Node.js)
- Inverted indexing
- PageRank calculation
- TF-IDF scoring
- MongoDB integration
See detailed architecture: Indexing Plan.png
See detailed description of indexing API: Indexing API readme
Serving API (Python/Flask)
- Query processing
- Results serving
- Web interface
- Text analysis
See detailed description of serinvg API and front end: Serving readme

For a complete system overview, see: General Plan.png

Features

Web Crawler

Concurrent webpage processing
Intelligent URL management
Content extraction
Link graph building

Indexing System

Real-time document processing
Advanced ranking algorithms
Efficient data structures
Scalable storage solution

A video showcasing the database structure as well as the updating process within the indexing database happening in real time:

DatabaseAndRealtimeIndexingShowcase.mp4

Search Interface

Clean, responsive design
Real-time results
Advanced ranking display
Rich snippets

Technical Stack

Crawler: .NET 6.0, C#
Indexer: Node.js, MongoDB
Server: Python, Flask
Frontend: HTML5, CSS3, JavaScript

Name		Name	Last commit message	Last commit date
Latest commit History 55 Commits
CrawlerApi		CrawlerApi
IndexingAPI		IndexingAPI
Plans		Plans
ServingAPI and Frontend		ServingAPI and Frontend
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Advanced Web Search Engine

Overview

Architecture

Features

Web Crawler

Indexing System

Search Interface

Technical Stack

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Advanced Web Search Engine

Overview

Architecture

Features

Web Crawler

Indexing System

Search Interface

Technical Stack

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages