This repo uses a config-first data analysis pipeline for fake data generation, Docker export, and picklist payload builds.
- Install dependencies:
git clone https://github.com/4201VitruvianBots/ScoutingApp2026.git
cd ScoutingApp2026
npm install
npm run build --workspace database- Create a Python environment for data-analysis:
cd data-analysis
python -m venv venv
.\venv\Scripts\Activate.ps1
pip install -r requirements.txt
cd ..- Start server:
npm run startServer URL: http://localhost:8080
Core pipeline scripts:
data-analysis/generate_fake_data.pydata-analysis/01_extract_source.pydata-analysis/02_clean_normalize.pydata-analysis/03_feature_engineering.pydata-analysis/04_team_aggregation.pydata-analysis/05_picklist_scores.pydata-analysis/06_export_app_payloads.pydata-analysis/run_analysis_full.py
Compatibility shims (kept for old commands):
data-analysis/run_fake_local_full.pydata-analysis/run_docker_full.py
Pipeline config lives in:
app_settings/settings.jsonapp_settings/match_schedule.jsonapp_settings/teams_list.txt
{
"paths": {
"raw_runs_root": "data-analysis/raw_runs",
"analysis_runs_root": "data-analysis/analysis_runs",
"raw_run_folder": "test_1",
"analysis_run_folder": "test_1",
"raw_run_base_name": "test_1",
"analysis_run_base_name": "test_1"
},
"fake_data": {
"destination": "local_csv",
"match_source_mode": "schedule",
"random_match_count": 72,
"include_pit": true,
"scouter_count": 12
},
"analysis": {
"raw_source_mode": "existing_raw",
"timeline_bin_sec": 1
}
}- Fake local CSV destination is controlled by:
fake_data.destination = "local_csv"paths.raw_runs_root(root output location)
- Fake Docker destination is controlled by:
fake_data.destination = "docker_db"mongo.*settings for connection/collections
- Match source:
fake_data.match_source_mode = "schedule"usesmatch_schedule.jsonfake_data.match_source_mode = "random_from_teams"usesteams_list.txt+random_match_count
- Other controls:
fake_data.include_pitfake_data.scouter_count
Selection precedence for all scripts:
- CLI override
- Settings-selected folder (
*_run_folder, then*_run_base_name)
Settings-selected keys:
- Raw run default:
paths.raw_run_folder - Analysis run default:
paths.analysis_run_folder
Raw and analysis run folders use configured names directly (no timestamp prefix).
Folder names come from:
- Raw:
paths.raw_run_base_name - Analysis:
paths.analysis_run_base_name
If paths.raw_run_folder / paths.analysis_run_folder are set, those names are used directly.
Set:
fake_data.destination = "local_csv"
Run:
python data-analysis/generate_fake_data.pyResult:
- New raw run folder under
paths.raw_runs_root - Writes
01_match_raw.csv,01_pit_raw.csv, source snapshots - Updates
raw_runs_root/latest_run.json
Set:
fake_data.destination = "docker_db"
Run:
python data-analysis/generate_fake_data.pyResult:
- Seeds configured Mongo collections
- Writes Docker seed report in
paths.raw_runs_root
Run:
python data-analysis/01_extract_source.pyOptional raw run base name override:
python data-analysis/01_extract_source.py --run-base-name comp2_rawSet paths.raw_run_folder to a raw run folder name (or abs path), then run stage 02:
python data-analysis/02_clean_normalize.pyOr override directly:
python data-analysis/02_clean_normalize.py --raw-run test_1Set paths.analysis_run_folder to the target analysis run, then run:
python data-analysis/03_feature_engineering.py
python data-analysis/04_team_aggregation.py
python data-analysis/05_picklist_scores.py
python data-analysis/06_export_app_payloads.pyOr pass --analysis-run per command.
python data-analysis/run_analysis_full.pyBehavior:
- Reads
analysis.raw_source_mode - If
docker_export, runs stage01first - Runs
02 - Pins
03-06to the same analysis run folder created by02
Overrides:
python data-analysis/run_analysis_full.py --raw-source-mode existing_raw --raw-run test_1
python data-analysis/run_analysis_full.py --raw-source-mode docker_export --raw-run-base-name event_a_raw- Input: Mongo collections
- Output raw folder:
01_match_raw.csv01_pit_raw.csv01_raw_snapshot.json
- Input: raw folder (
01_*) - Output analysis folder:
02_match_clean.csv02_pit_clean.csv02_validation_report.csv02_stage_summary.json
- Input:
02_* - Output:
03_match_features.csv03_timeseries_long.csv03_auto_path_points.csv
- Input:
03_match_features.csv,02_pit_clean.csv - Output:
04_team_aggregates.csv04_defense_events.csv
- Input:
04_team_aggregates.csv - Output:
05_picklist_scores.csv05_metric_contributions.csv
- Input: stage
03-05outputs - Output:
06_picklist_payload.json06_team_profiles.json
Server /data/retrieve/analyzed resolves the configured analysis run folder from app_settings/settings.json.
npm run analysis:generate-fake
npm run analysis:extract-source
npm run analysis:02
npm run analysis:03
npm run analysis:04
npm run analysis:05
npm run analysis:06
npm run analysis:full
npm run analysis:full:fake-local
npm run analysis:full:dockerrun_fake_local_full.pyandrun_docker_full.pyare retained as compatibility shims.- Canonical full-run entrypoint is
run_analysis_full.py.