-
Notifications
You must be signed in to change notification settings - Fork 4
Expand file tree
/
Copy pathREADME~
More file actions
executable file
·22 lines (12 loc) · 1.16 KB
/
README~
File metadata and controls
executable file
·22 lines (12 loc) · 1.16 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
Spiders and Scrapers in Prolog (8/16/2016)
This project evolved out of my attempts to automate the data collection for various projects
I am working on. After surveying numerous other languages, Prolog looked like an excellent choice for
writing a flexible codebase that could be reused for many different purposes. I did not realize how
simple various web programming projects become when you use the excellent tools provided by the SWI
prolog community. This tutorial will show you how to take advantage of them.
This practical prolog project will be broken up into 3 parts. Part 1 will guide you through the process
of writing a web scraper -- a tool used to extract data from an HTML document. Data extraction is a
key element in virtually any web programming project; Prolog makes this very easy by providing a database-like
interface to any page. Part 2 will put the webscraper build in part 1 to good use by building a site
crawler that verifies outbound links, and reports which ones are broken. Part 3 will build upon the crawler
in part 2 and incorporate advanced prolog features such as threads, session managment, and error handling.