Skip to content

henryzhongsc/general_ml_project_template

Repository files navigation

ML Project Template

This repo hosts a general-purpose ML project template designed for running experiments with different methods and datasets. The codebase revolves around three configs — pipeline_config, eval_config, and management_config — where the first defines a method, the second defines an evaluation scheme, and the third defines output folder structure settings.

This repo is designed with three key principles in mind:

  • Trackability and reproducibility: It is not uncommon to feel disconnected from the settings, code implementation, and results of an experiment run. This repo tries to provide meticulous logging, saves carbon copies of its input configs, and even copies the key code pieces directly into the output folder, alongside the results, so that you always know what's being run.
  • Just-right modularity: On components where self-containedness is often appreciated (e.g., methods), we offer slight code duplication for readability; yet, on components where strict fairness is needed (e.g., evaluations), we strive for high modularity.
  • Rigorous folder structures as guardrails: We define clear separation between methods, evaluations, etc. This is particularly helpful if you a) have multiple contributors and/or b) are a fan of AI-assisted coding, where the AI service is often intelligent enough to realize your intended goal but might not implement it the way you envisioned. This structure provides a quick sanity check on whether the AI is on track. Similarly, this README.md might also be worth adding to AI instruction docs like CLAUDE.md or be made into a Claude Skill.

Project Structure

.
├── configs/
│   ├── global_setting.py           # Global settings (SEED, timezone, etc.)
│   ├── eval_config/                # Evaluation configs
│   │   └── <dataset>/
│   │       └── <config>.json       # e.g., default.json
│   ├── pipeline_config/            # Pipeline configs (one per model)
│   │   └── <model>.json
│   └── management_config/          # Output folder structure settings
│       └── default.json
│
├── pipeline/                       # Method implementations
│   ├── pipeline_utils.py           # Arg parsing, config loading, shared utilities
│   └── <method>/                   # e.g., vanilla_huggingface
│       ├── main.py                 # Entry point
│       ├── inference.py            # Model loading, batch generation
│       └── <dataset>/              # Dataset-specific eval logic
│           └── eval.py
│
├── eval/                           # Evaluation utilities
│   ├── eval_utils.py               # Generic metrics (exact_match, etc.)
│   └── <dataset>/                  # Dataset-specific utils
│       ├── main.py                 # Prepare inputs
│       └── <dataset>_utils.py      # Format prompts, extract answers
│
├── data/                           # Raw datasets
│   └── <dataset>/
│       └── raw_data.json
│
├── scripts/                        # Bash scripts to run experiments
│   ├── <method>/                   # Full experiment scripts
│   │   └── <model>/
│   │       └── <dataset>.sh
│   └── quick_start/                # Quick verification scripts
│       └── <method>/
│           └── <model>/
│               └── <dataset>.sh
│
├── utils/                          # Shared utilities
│   ├── config_utils.py             # Result registration, output saving
│   ├── general_utils.py            # Seed locking, misc helpers
│   └── logger_utils.py             # Logger setup
│
├── requirements/                   # Python dependencies
│   └── basic_requirements.txt
│
├── example_output/                 # Example experiment outputs (for reference)
│   └── <dataset>/<method>/<model>/
│       ├── input_configs/          # Carbon copies of input configs
│       ├── raw_results/            # Fine-grain results
│       ├── backup/                 # Code snapshot at experiment time
│       ├── exp.log                 # Real-time log file
│       └── output.json             # Main results with processed_results
│
└── output/                         # Experiment outputs (generated, gitignored)
    └── <dataset>/<method>/<model>/
        └── ...                     # Same structure as example_output

Environment Setup

pip install -r requirements/basic_requirements.txt

Optionally, set up configs/global_setting.py with customized information like timezone, external storage dirs, access tokens, etc. You may consider running git update-index --skip-worktree configs/global_setting.py so that Git will no longer track this file, avoiding your locally stored tokens accidentally getting synced to upstream.


Experiment Reproduction

We supply all scripts in the scripts folder, with a folder structure that clearly indicates which script is for which experiment. E.g., should one want to run the dummy_pokemon_qa dataset (a dummy, 10-question dataset I made out of this Quiiiz post) vanilla_huggingface method using the Qwen3-0.6B model, one may do so via:

bash scripts/quick_start/vanilla_huggingface/Qwen3-0.6B/dummy_pokemon_qa.sh <output_folder_root_dir>

Scripts under scripts/quick_start are designed to conclude quickly. They serve as proxies to confirm the code and environment are (likely) running fine before committing to bigger runs. For full experiments, add scripts following a similar structure, e.g., scripts/<method>/<model>/<dataset>.sh.


CLI Arguments

The pipeline entry points (pipeline/<method>/main.py) accept the following arguments:

Argument Required Default Description
--exp_desc No experiment Experiment description for logging
--pipeline_config_dir Yes Path to pipeline config JSON
--eval_config_dir Yes Path to eval config JSON
--management_config_dir Yes Path to management config JSON
--output_folder_dir No default_output_dir from global_setting.py Path to output folder
--overwrite No allowed Overwrite behavior if output folder exists
--job_post_via No terminal Job submission method

--overwrite

Controls behavior when the output folder already exists with contents:

Value Behavior
allowed (default) Logs warning and proceeds, overwriting existing files
disabled Fails with FileExistsError if <output_folder_dir> already has files

If <output_folder_dir> exists but is empty, the pipeline proceeds regardless of this setting.

--job_post_via

Indicates how the job was submitted, affecting metadata logging:

Value Behavior
terminal (default) Standard terminal execution
slurm_sbatch Registers SLURM job info (job ID, node, etc.) in config

Result Digestion

Once an experiment is running — and suppose you are using the configs/management_config/default.json as the management_config — one may monitor real-time printouts in the terminal as well as the exp.log file under <output_folder_dir>. Once an experiment is concluded, the final results can be found in the same <output_folder_dir> folder.

Under <output_folder_dir>, one may expect the following components:

  • input_configs/ folder: Contains input_pipeline_config.json, input_eval_config.json, and input_management_config.json. These are carbon copies of the configs supplied to the script. Such configs are copied here for easy replication purposes as these configs basically define an experiment.
  • backup/ folder: Contains all files defined under the backup_scope key under the passed management_config. This directly connects the experiment with the code implementation it is running on. The backup logic: only files/directories listed in inclusion_list are backed up. When backing up a directory, its contents are filtered by exclusion_list — any path matching an exclusion pattern (as prefix like configs/ or path component like __pycache__/) is skipped. The one exception: a path listed exactly in inclusion_list is always included, even if it matches an exclusion pattern.
  • output.json: This file provides a fuse of the above input configs and some management information (e.g., start/end time of a job, arguments passed, etc.). Most importantly, it highlights the main reported metrics under the key processed_results. We purposely store the main results alongside the configs so that there is no mistake in connecting the results with its experiment setting.
  • raw_results/raw_results.json: This file registers the fine-grain results of the concluded experiment, including individual scoring of each output and all newly generated tokens upon each input for monitoring/debugging purposes. The raw_results/ folder is also the place to dump all additional materials for storage/inspection purposes, but not necessarily something you'd check for every experiment.
  • exp.log: This is a carbon copy of the real-time printouts to the terminal. This file is also updated in real-time during the experiment run.

Codebase Design and Contribution

Our codebase mainly revolves around three configs — pipeline_config, eval_config, and management_config — where the first defines a method, the second defines an evaluation scheme, and the third defines output folder structure settings. Their code implementations can be respectively found under the pipeline and eval folders. We keep the eval implementations "singleton" for each dataset to ensure they are fairly tested among different pipelines. Yet, for pipeline implementation, we intentionally repeat some code components as we want to keep each pipeline/method folder relatively self-contained for better readability; and to make everything loosely coupled to support intervention requests coming at different angles.

Generally speaking:

  • For modules that can be reused without hurting necessary self-containedness, they will be abstracted into a _utils.py file, like pipeline/pipeline_utils.py, eval/eval_utils.py, etc.
  • Every task should be executed via running the main.py file under a certain pipeline/<method>/ folder.

(We provide an exemplary experiment output under the example_output folder for your reference.)

Adding a new dataset for evaluation

  1. Add raw data to dataset_dir under configs/global_setting.py.
  2. Create a new configs/eval_config/<dataset>/default.json or something similarly reflective. Define moving factors (like prompt templates) under the prompts key rather than hardcoding them in code.
  3. Create eval/<dataset>/ with main.py (prepare inputs) and <dataset>_utils.py (format prompts using templates from config, extract answers, etc.).
  4. Create pipeline/<method>/<dataset>/eval.py for each method you'd care to evaluate this dataset against.
  5. Register the dataset in pipeline/<method>/main.py.

Adding a new method as part of a pipeline

  1. Create pipeline/<method>/ folder with main.py (entry point). Note this /<method>/ folder can be under different layers, e.g., we may have pipeline/quantization/kv_cache/kivi/. This would allow for different levels of _utils.py modularization.
  2. Optionally, create inference.py (model loading, generation) under pipeline/<method>/ if it is an inference-only task. If it is a training job, a train.py is recommended to host the main training loop, etc.
  3. Add <method>/<dataset>/eval.py files for each dataset you'd care to train/evaluate against.
  4. Create configs/pipeline_config/<method>/<model>/<dataset>.json for each task configuration.
  5. Add scripts to scripts/, typically following the <method>/<model>/<dataset>.sh structure.

About

A minimalistic coding template for reproducible, modular, and AI-pilled ML research project development.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors