Full ETL pipeline to scrape, transform, and load country GDP data from Wikipedia.
- Python
- pandas, numpy
- BeautifulSoup (web scraping)
- SQLite
- Extract: Scrape nominal GDP table from Wikipedia archive.
- Transform: Clean GDP (millions to billions USD, round).
- Load: Output to
Countries_by_GDP.csvandWorld_Economies.db. - Logging: Timestamps in
etl_log.txt.
cd data-engineering-project/
python etl_code.pyCountries_by_GDP.csv: ~200 countries, GDP in billions USD.World_Economies.db: SQLite tableCountries_by_GDP.- Sample query: Top GDPs >1T (US, China, ... in etl_code.py).
| Country | GDP_USD_billions |
|---|---|
| United States | 26854.6 |
| China | 19373.59 |
| Japan | 4409.74 |
| Germany | 4308.85 |
| India | 3736.88 |
Last run: Check etl_log.txt.