Python web scraper for various Indonesia news site. Built with Python3 and bs4. Please use these tools wisely and for educational purposes only as I am not responsible for what you do.
Collect news title, URL, and text body. Optionally may also collect thumbnail, author(s), date of publication, categories, etc. Complete with guides and code documentation (as much as reasonably possible). The output should be in the form of pandas dataframe.
I plan to expand the code for more sites, listed in the planned sites section below. However, I do not plan to maintain sites that are considered 'done'. You are welcome to submit suggestions, requests, or fork this repo. The code should work under Python 3.10.
Each folder in 'Code/' is associated with one site, containing at least an IPYNB notebook. You may run this notebook in Colab or locally in your computer. All required libraries are listed under requirements.txt.
- kompas.com/
You may wish to contact me through mouse3mic3@gmail.com.