A sophisticated, multi-component search engine implementing modern information retrieval techniques, web crawling, and search algorithms. Built with scalability and performance in mind, featuring real-time indexing and intelligent ranking systems.
Searching.mp4
The system consists of three main components:
-
Crawling API (.NET)
- Multi-threaded web crawler
- Content extraction
- Link discovery
- Real-time processing
See detailed architecture: Crawling Plan.png
See detailed description of crawler API: Crawling API readme -
Indexing API (Node.js)
- Inverted indexing
- PageRank calculation
- TF-IDF scoring
- MongoDB integration
See detailed architecture: Indexing Plan.png
See detailed description of indexing API: Indexing API readme -
Serving API (Python/Flask)
- Query processing
- Results serving
- Web interface
- Text analysis
See detailed description of serinvg API and front end: Serving readme
For a complete system overview, see: General Plan.png
- Concurrent webpage processing
- Intelligent URL management
- Content extraction
- Link graph building
- Real-time document processing
- Advanced ranking algorithms
- Efficient data structures
- Scalable storage solution
A video showcasing the database structure as well as the updating process within the indexing database happening in real time:
DatabaseAndRealtimeIndexingShowcase.mp4
- Clean, responsive design
- Real-time results
- Advanced ranking display
- Rich snippets
- Crawler: .NET 6.0, C#
- Indexer: Node.js, MongoDB
- Server: Python, Flask
- Frontend: HTML5, CSS3, JavaScript
