Skip to content

mouse3mic3/indonesia-news-scraping

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 

Repository files navigation

Indonesia News Site Scraper

Python web scraper for various Indonesia news site. Built with Python3 and bs4. Please use these tools wisely and for educational purposes only as I am not responsible for what you do.

Purposes

Collect news title, URL, and text body. Optionally may also collect thumbnail, author(s), date of publication, categories, etc. Complete with guides and code documentation (as much as reasonably possible). The output should be in the form of pandas dataframe.

Maintainability

I plan to expand the code for more sites, listed in the planned sites section below. However, I do not plan to maintain sites that are considered 'done'. You are welcome to submit suggestions, requests, or fork this repo. The code should work under Python 3.10.

General How-to

Each folder in 'Code/' is associated with one site, containing at least an IPYNB notebook. You may run this notebook in Colab or locally in your computer. All required libraries are listed under requirements.txt.

Done sites:

Planned sites

  • kompas.com/

Contact

You may wish to contact me through mouse3mic3@gmail.com.

About

Python web scraper for various Indonesia news site

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors