Team 4201 Scouting System (2026)

This repo uses a config-first data analysis pipeline for fake data generation, Docker export, and picklist payload builds.

Quick Start

Install dependencies:

git clone https://github.com/4201VitruvianBots/ScoutingApp2026.git
cd ScoutingApp2026
npm install
npm run build --workspace database

Create a Python environment for data-analysis:

cd data-analysis
python -m venv venv
.\venv\Scripts\Activate.ps1
pip install -r requirements.txt
cd ..

Start server:

npm run start

Server URL: http://localhost:8080

Data Pipeline Overview

Core pipeline scripts:

data-analysis/generate_fake_data.py
data-analysis/01_extract_source.py
data-analysis/02_clean_normalize.py
data-analysis/03_feature_engineering.py
data-analysis/04_team_aggregation.py
data-analysis/05_picklist_scores.py
data-analysis/06_export_app_payloads.py
data-analysis/run_analysis_full.py

Compatibility shims (kept for old commands):

data-analysis/run_fake_local_full.py
data-analysis/run_docker_full.py

Config Files

Pipeline config lives in:

app_settings/settings.json
app_settings/match_schedule.json
app_settings/teams_list.txt

`settings.json` keys

{
  "paths": {
    "raw_runs_root": "data-analysis/raw_runs",
    "analysis_runs_root": "data-analysis/analysis_runs",
    "raw_run_folder": "test_1",
    "analysis_run_folder": "test_1",
    "raw_run_base_name": "test_1",
    "analysis_run_base_name": "test_1"
  },
  "fake_data": {
    "destination": "local_csv",
    "match_source_mode": "schedule",
    "random_match_count": 72,
    "include_pit": true,
    "scouter_count": 12
  },
  "analysis": {
    "raw_source_mode": "existing_raw",
    "timeline_bin_sec": 1
  }
}

How to choose fake data location

Fake local CSV destination is controlled by:
- fake_data.destination = "local_csv"
- paths.raw_runs_root (root output location)
Fake Docker destination is controlled by:
- fake_data.destination = "docker_db"
- mongo.* settings for connection/collections

How to customize fake generation

Match source:
- fake_data.match_source_mode = "schedule" uses match_schedule.json
- fake_data.match_source_mode = "random_from_teams" uses teams_list.txt + random_match_count
Other controls:
- fake_data.include_pit
- fake_data.scouter_count

How run selection works (config-first)

Selection precedence for all scripts:

CLI override
Settings-selected folder (*_run_folder, then *_run_base_name)

Settings-selected keys:

Raw run default: paths.raw_run_folder
Analysis run default: paths.analysis_run_folder

Run Folder Naming

Raw and analysis run folders use configured names directly (no timestamp prefix).

Folder names come from:

Raw: paths.raw_run_base_name
Analysis: paths.analysis_run_base_name

If paths.raw_run_folder / paths.analysis_run_folder are set, those names are used directly.

Typical Workflows

1) Generate fake raw data to local raw folder

Set:

fake_data.destination = "local_csv"

Run:

python data-analysis/generate_fake_data.py

Result:

New raw run folder under paths.raw_runs_root
Writes 01_match_raw.csv, 01_pit_raw.csv, source snapshots
Updates raw_runs_root/latest_run.json

2) Generate fake data into Docker DB

Set:

fake_data.destination = "docker_db"

Run:

python data-analysis/generate_fake_data.py

Result:

Seeds configured Mongo collections
Writes Docker seed report in paths.raw_runs_root

3) Export raw source from Docker into raw run folder

Run:

python data-analysis/01_extract_source.py

Optional raw run base name override:

python data-analysis/01_extract_source.py --run-base-name comp2_raw

4) Analyze an existing raw dataset

Set paths.raw_run_folder to a raw run folder name (or abs path), then run stage 02:

python data-analysis/02_clean_normalize.py

Or override directly:

python data-analysis/02_clean_normalize.py --raw-run test_1

5) Run analysis stages individually (`03`-`06`)

Set paths.analysis_run_folder to the target analysis run, then run:

python data-analysis/03_feature_engineering.py
python data-analysis/04_team_aggregation.py
python data-analysis/05_picklist_scores.py
python data-analysis/06_export_app_payloads.py

Or pass --analysis-run per command.

6) Run full pipeline with one command

python data-analysis/run_analysis_full.py

Behavior:

Reads analysis.raw_source_mode
If docker_export, runs stage 01 first
Runs 02
Pins 03-06 to the same analysis run folder created by 02

Overrides:

python data-analysis/run_analysis_full.py --raw-source-mode existing_raw --raw-run test_1
python data-analysis/run_analysis_full.py --raw-source-mode docker_export --raw-run-base-name event_a_raw

Stage Inputs and Outputs

Stage `01_extract_source.py`

Input: Mongo collections
Output raw folder:
- 01_match_raw.csv
- 01_pit_raw.csv
- 01_raw_snapshot.json

Stage `02_clean_normalize.py`

Input: raw folder (01_*)
Output analysis folder:
- 02_match_clean.csv
- 02_pit_clean.csv
- 02_validation_report.csv
- 02_stage_summary.json

Stage `03_feature_engineering.py`

Input: 02_*
Output:
- 03_match_features.csv
- 03_timeseries_long.csv
- 03_auto_path_points.csv

Stage `04_team_aggregation.py`

Input: 03_match_features.csv, 02_pit_clean.csv
Output:
- 04_team_aggregates.csv
- 04_defense_events.csv

Stage `05_picklist_scores.py`

Input: 04_team_aggregates.csv
Output:
- 05_picklist_scores.csv
- 05_metric_contributions.csv

Stage `06_export_app_payloads.py`

Input: stage 03-05 outputs
Output:
- 06_picklist_payload.json
- 06_team_profiles.json

Server /data/retrieve/analyzed resolves the configured analysis run folder from app_settings/settings.json.

NPM Shortcuts

npm run analysis:generate-fake
npm run analysis:extract-source
npm run analysis:02
npm run analysis:03
npm run analysis:04
npm run analysis:05
npm run analysis:06
npm run analysis:full
npm run analysis:full:fake-local
npm run analysis:full:docker

Notes on Deprecated Wrappers

run_fake_local_full.py and run_docker_full.py are retained as compatibility shims.
Canonical full-run entrypoint is run_analysis_full.py.

Name		Name	Last commit message	Last commit date
Latest commit History 76 Commits
.github/workflows		.github/workflows
.vscode		.vscode
app_settings		app_settings
client		client
data-analysis		data-analysis
database		database
images		images
requests		requests
server		server
.dockerignore		.dockerignore
.env		.env
.eslintrc.cjs		.eslintrc.cjs
.gitignore		.gitignore
.prettierignore		.prettierignore
.prettierrc		.prettierrc
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
dev_run_stderr.log		dev_run_stderr.log
dev_run_stdout.log		dev_run_stdout.log
import_report.json		import_report.json
package-lock.json		package-lock.json
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Team 4201 Scouting System (2026)

Quick Start

Data Pipeline Overview

Config Files

`settings.json` keys

How to choose fake data location

How to customize fake generation

How run selection works (config-first)

Run Folder Naming

Typical Workflows

1) Generate fake raw data to local raw folder

2) Generate fake data into Docker DB

3) Export raw source from Docker into raw run folder

4) Analyze an existing raw dataset

5) Run analysis stages individually (`03`-`06`)

6) Run full pipeline with one command

Stage Inputs and Outputs

Stage `01_extract_source.py`

Stage `02_clean_normalize.py`

Stage `03_feature_engineering.py`

Stage `04_team_aggregation.py`

Stage `05_picklist_scores.py`

Stage `06_export_app_payloads.py`

NPM Shortcuts

Notes on Deprecated Wrappers

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Team 4201 Scouting System (2026)

Quick Start

Data Pipeline Overview

Config Files

settings.json keys

How to choose fake data location

How to customize fake generation

How run selection works (config-first)

Run Folder Naming

Typical Workflows

1) Generate fake raw data to local raw folder

2) Generate fake data into Docker DB

3) Export raw source from Docker into raw run folder

4) Analyze an existing raw dataset

5) Run analysis stages individually (03-06)

6) Run full pipeline with one command

Stage Inputs and Outputs

Stage 01_extract_source.py

Stage 02_clean_normalize.py

Stage 03_feature_engineering.py

Stage 04_team_aggregation.py

Stage 05_picklist_scores.py

Stage 06_export_app_payloads.py

NPM Shortcuts

Notes on Deprecated Wrappers

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`settings.json` keys

5) Run analysis stages individually (`03`-`06`)

Stage `01_extract_source.py`

Stage `02_clean_normalize.py`

Stage `03_feature_engineering.py`

Stage `04_team_aggregation.py`

Stage `05_picklist_scores.py`

Stage `06_export_app_payloads.py`

Packages