Sports AI is a machine learning-based project designed to predict sports game outcomes using historical data, player statistics, and external factors such as odds and injuries. This project integrates web scraping, data processing, and deep learning models to provide insightful predictions for sports analytics and betting strategies.
- Predictive AI Model: Uses TensorFlow and PyTorch-based models to generate game outcome predictions.
- Automated Data Collection: Scrapes sports data from various sources using Selenium and
nba_api. - Data Processing: Cleans, structures, and prepares datasets for training models.
- Customizable Configurations: Uses an INI file for setup and modification.
- Detailed Export Format: Outputs individual data and predictions in Excel format.
Ensure you have the following installed:
- Python 3.9+
- Virtual environment setup (recommended)
- Firefox or Google Chrome (for Selenium web scraping)
-
Clone the Repository:
git clone https://github.com/xjasz/sports-ai.git cd sports-ai -
Create a Virtual Environment:
python -m venv venv source venv/bin/activate # On Windows use: venv\Scripts\activate
-
Install Dependencies:
pip install -r requirements.txt
-
Configure Settings:
- Unzip the
data/load/load_data.zipafter cloning project, it contains all games and player information up to late January 2025. - Edit the
prediction_seasoninglobals/run_settings.pyand set it to the current season currently its set to the2024season. When running the processes the prediction will automatically attempt to run predictions for the current day. - Edit the
use_databaseinglobals/run_settings.pyand set it toTrueif you want to generate the tables the runglobals/create_sql_tables.pyto generate the initial tables. - Edit the
current_season_onlyinglobals/run_settings.pyand set it toTrueafter the first run of everything. Its set asFalseto generate all the initial data the first time. - There is an automated process that uses Selenium and retreives extra information from fanduel. If you decide to use the database then either run
active/odds_service.pyseperatly as a service each day or run it once before the 3rd process to fetch the latest Over/Under odds for Points, Assists, and Rebounds from Fanduel. Its currently set to use Firefox but can easily be changed to use another Browser like Chrome if needed. - The ENVIRONEMNT VARIABLES in
globals/global_settings.pyneed to be changed to point to your database ifuse_databaseinglobals/run_settings.pyis set toTrue. - To speed up model training with an NVIDIA GPU you should install PyTorch from
https://pytorch.org/and select the correct CUDA version to use.
- Unzip the
-
Final Settings:
- The 1st process is
generate_data.py- this will generate every game up to the current day. Note: It will use latest_gamedate inglobals/config.iniand get any game from that date up to the current day, you don't need to editlatest_gamedateunless you want to fetch games from further back in time. - The 2nd process is
generate_ai.pythis will use all historical information from over 20 years of games and dynamically change hyper parameters, feature information, etc generating models. By default it is running the loop which will run 100 times with randomized settings. - The 3rd process is
generate_view.pythis will use the best models generated in the 2nd process and construct an excel file with the current prediction results for Points, Rebounds, and Assists. These results are located indata/ai/top/and are named (MERGE_PTS.xlsx,MERGE_AST.xlsx,MERGE_REB.xlsx)
- The 1st process is
To collect the latest sports data:
python generate_data.pyTo generate AI-based predictions:
python generate_ai.pyTo run multiple prediction models, merge results:
python generate_view.pyAfter running the AI model, you should receive output files such as:
MERGE_PTS.xlsx– Detailed logs of recent games and outcomes with the predictions.
Below is an Example of an Excel file generated by the model, showing predictions before a game:
The highlighted rows represent predictions (PRED) where the AI correctly identified whether a player would go over or under the given betting value (BET_VAL).
Columns include player stats, historical averages, betting odds, and additional contextual game data.
This provides you with some additional insights to help determine decisions.
- Fork the repository.
- Create a feature branch (
git checkout -b feature-name). - Commit changes (
git commit -m "Add feature"). - Push to your branch (
git push origin feature-name). - Submit a pull request.
Each new day will automatically remove previous day information. Store previous day information seperatly if you want to track performance over a long period of time.
Currently this works really well with the NBA and over time has predicted correctly over 50% of the time constantly over long periods of time. I've seen it up to 70% on some days when filtering out backups and players that are out.
I plan on modifying this in the future to work with MLB and NFL. Also I may modify this to use a read-only user to work with my database if its difficult to spin up your own database.
Let me know what you think and what improvements could be done. Any suggestions help, thanks again
This project is licensed under the MIT License.
For questions, reach out via GitHub Issues or email at codalata@gmail.com.
