This Jupyter Notebook contains a data analysis project focused on second-hand hatchback vehicles from the WeBuyCars car dealership in South Africa. The project utilizes web scraping techniques to gather vehicle data and then performs an initial assessment and analysis using Python libraries like pandas, seaborn, and matplotlib.
The primary objectives of this notebook are:
- Data Gathering: Scrape second-hand hatchback vehicle data from the WeBuyCars website.
- Data Assessment: Evaluate the gathered data for missing values and data types.
- Data Cleaning and Analysis: Clean the data and perform an analysis to provide insights into the vehicle market.
- Reporting: Generate a one-page PDF analysis report (this part is planned but not yet implemented in the provided code).
The data is scraped from the WeBuyCars website's API. The notebook specifically targets hatchback vehicles.
The following Python libraries are used in this project:
requestspandasseabornmatplotlib.pyplotosfpdf(commented out, but planned for the PDF report)
- The dataset contains 1920 entries and 92 columns.
- The data includes details such as
make,model,mileage,price,condition, and various other technical specifications. - There are a significant number of null values in columns related to auctions, as only a small subset of the vehicles were on auction at the time of data collection.
- Some columns that contain numerical data, such as
NoGears, are currently stored as strings.
- Ensure you have the required Python libraries installed.
- Run the notebook cells in sequence.
- The code will perform a POST request to the WeBuyCars API to gather data.
- The gathered data is stored in a pandas DataFrame called
car_trader. - You can uncomment the line
#car_trader.to_csv('weBuyCars.csv')to save the DataFrame to a CSV file.
Note: The notebook mentions a plan to "Assess and clean the data. Analyse the data and provide a 1 page PDF analysis report". This part of the project is a "To Do" and is not fully implemented in the provided code.
