Skip to content

Codebase review: prioritized recommendations across architecture, testing, CI, deps, CLI, docs & performance #2516

Description

@justinchuby

Microsoft Olive — Read-Only Codebase Review

Review date: 2026-06-12
Review scope: /home/justinchu/dev/Olive (olive-ai 0.11.0.dev0)
Review method: Eight-dimension parallel read-only static analysis (Architecture / Code Quality / Testing / Documentation / CLI / Dependencies / CI-CD / Performance). This report de-duplicates and synthesizes findings from all eight analyses into a unified priority ranking.
Hard constraint: Strictly read-only. No files in the repository were modified during this review.

This is a read-only static-analysis review. No files in the repository were modified during this review.


Executive Summary

Olive is a well-organized, clearly abstracted multi-backend model optimization toolkit. The core abstractions — Pass / Engine / System / Evaluator / Search — have well-defined inheritance contracts, a consistent Pydantic v2 configuration system, and a CLI entry point strongly synchronized with the code through argparse and auto-generated docs. However, the eight-dimension review uncovered several systemic issues concentrated in three areas: weak quality safeguards, layer-coupling leakage, and unreliable user-visible behavior.

Overall health: Above-average design paired with below-average quality guardrails. Without adding CI test gates, fixing the or True permanent skip, and tightening dependency upper bounds, a single upgrade of numpy / torch / transformers could cause production-grade regressions that go undetected.

Top 5 High-Priority Issues (ranked by risk):

  1. Test suite barely wired into CI: Of 184 test files, only 1 (test/cli/test_cli_test_model_smoke.py) runs in GitHub Actions. All core modules — passes / engine / search / evaluator / model / systems — have zero CI coverage. Any regression will only be discovered through user reports. (findings-tests item 1, findings-ci §2.1)
  2. Critical dependencies have no upper bounds + ruff target contradicts python_requires: requirements.txt lists numpy / torch / transformers / onnx / onnxscript without upper bounds, yet test files already document known breaking changes at onnx<1.21.0, torch<2.11.0, transformers<5.4.0, onnxscript<0.6.1; meanwhile pyproject.toml ruff target-version="py39" contradicts setup.py python_requires=">=3.10", silently invalidating 3.10+-specific lint rules. (findings-deps H1/H2/M1/M2, L3)
  3. Permanently skipped tests coexist with production code marked "incorrect": test_slicegpt.py:16 uses or True to force-skip and hide failures; graph_surgeries.py:170,184 RenameInputs / RenameOutputs carry a "This is incorrect" TODO yet remain registered in the surgeon table and are user-accessible; the auto-opt deprecation warning is never shown because the default --log_level=3 (ERROR) is higher than WARNING. All three share the same pattern: users and developers cannot see "known-bad" state. (findings-tests item 3, findings-quality item 8, findings-cli §10.1)
  4. Critical failures in olive/cache.py are silently swallowed: cache_model / load_model / cache_run and other critical paths use except Exception: logger.exception(...) and then continue executing rather than re-raising, creating a "cache miss" illusion and hard-to-diagnose downstream errors; cache.py also creates a reverse dependency by importing olive.model.ModelConfig and olive.passes.onnx.common.resave_model, breaking the layering. (findings-quality item 2, findings-arch item 2)
  5. systems → workflows and passes → evaluator reverse dependencies: DockerSystem directly imports olive.workflows.run.config.RunConfig; workflow_runner.py calls from olive.workflows import run; two ONNX passes (inc_quantization.py, session_params_tuning.py) construct OliveEvaluatorConfig directly inside _run_for_config to invoke the evaluator. Both reverse-dependency patterns prevent systems / passes from being tested or evolved independently. (findings-arch items 1 and 3)

All high / medium / low priority recommendations are listed below, grouped by dimension. Related issues (e.g., multiple missing auto-opt deprecation notices, conflicting onnxscript bounds, CLI snake/kebab mixing) are merged into single items retaining all file:line references.


High-Priority Recommendations

Inclusion criteria: correctness issues / security / user-blocking / data loss / CI gaps / severe developer-experience barriers.

Architecture

[Quick Fix] Disable or fix Surgeons marked "incorrect" in graph_surgeries.py

  • Location:
    • olive/passes/onnx/graph_surgeries.py:170-195 (RenameInputs / RenameOutputs class definitions, carrying # TODO(anyone): This is incorrect, remove or fix)
    • olive/passes/onnx/graph_surgeries.py:3273-3278 (registered in the surgeon table, user-accessible via configuration)
  • Current State: Both Surgeon subclasses have inline comments admitting they are "incorrect" yet remain exposed to users as valid surgeons.
  • Impact: If a user specifies surgeon: "RenameInputs", a known-broken code path executes without error, silently producing an incorrect ONNX model.
  • Recommended Action: Immediately raise NotImplementedError("RenameInputs is known to be incorrect; see TODO at line 170") inside _run_for_config, or remove the entries from the surgeon registry; restore once fixed.

[Refactor] Eliminate the reverse dependency from olive.systems to olive.workflows

  • Location: olive/systems/docker/docker_system.py:22 (from olive.workflows.run.config import RunConfig), olive/systems/docker/workflow_runner.py:8 (from olive.workflows import run as olive_run)
  • Current State: The systems layer directly imports the workflows layer, forming a systems ↔ workflows circular dependency.
  • Impact: systems cannot be tested or released independently of workflows; any workflows refactor breaks systems.
  • Recommended Action: Define a DockerRunSpec (containing only runtime-agnostic fields like host/target/accelerator/resources) inside systems/; let workflows convert RunConfig to DockerRunSpec before passing it in; move workflow_runner.py from systems/docker/ to workflows/ or cli/.

[Refactor] Eliminate the reverse dependency from olive/passes to olive/evaluator

  • Location:
    • olive/passes/onnx/inc_quantization.py:19-21
    • olive/passes/onnx/session_params_tuning.py:17-19
    • Both passes construct OliveEvaluatorConfig directly inside _run_for_config
  • Current State: The pass layer (which should only perform model transformations) directly invokes evaluation logic, violating the "passes are pure transformations" contract and is inconsistent with other pass interfaces.
  • Impact: Both passes cannot be reused or unit-tested without evaluator configuration; there is no unified mechanism for adding evaluation-feedback-driven passes.
  • Recommended Action: Add an evaluation_callback: Optional[Callable] parameter to the Pass base class, injected by the Engine at dispatch time; passes obtain evaluation results via the callback and are prohibited from directly import olive.evaluator.

[Refactor] Eliminate olive/cache.py reverse dependencies on domain layers

  • Location: olive/cache.py:22 (from olive.model.config.model_config import ModelConfig), olive/cache.py:483 (lazy import from olive.passes.onnx.common import resave_model), olive/model/handler/mixin/mlflow.py:10 (from olive.cache import OliveCache)
  • Current State: cache.py should be infrastructure, yet it depends on ModelConfig and a concrete pass; inversely, MLFlowTransformersMixin depends on OliveCache.from_cache_env() to read the cache directory from environment variables.
  • Impact: Cache cannot be tested in isolation; MLFlow handler has an implicit runtime environment variable dependency, producing obscure initialization failure messages.
  • Recommended Action: Replace the resave_model call with a protocol/hook injected by the caller (engine); have cache_model accept a serialized dict instead of the ModelConfig type; move MLFlow model-merge logic into Engine.initialize() or a pre-processing step in LocalSystem.run_pass, or switch to an explicit cache_dir: Path parameter.

Code Quality

[Quick Fix] Multiple critical failures silently swallowed in olive/cache.py

  • Location: olive/cache.py:188, 202, 229, 242, 266, 279, 679, 684, 701, 774
  • Current State: cache_model / load_model / cache_run / load_run_from_model_id and other critical paths use except Exception: logger.exception(...) and then continue executing without re-raising.
  • Impact: Cache write failures are invisible to users; downstream code continues running on the assumption the cache succeeded, producing hard-to-diagnose secondary errors.
  • Recommended Action: Change cache write operations to logger.warning and expose write status in return values / callers; true exceptions in read operations (not FileNotFoundError) must propagate upward; document in an EXCEPTIONS_TO_RAISE whitelist.

[Quick Fix] Bare exception catch in _run_search loses critical context

  • Location: olive/engine/engine.py:437-446
  • Current State: In search mode, any exception from _run_passes is caught by a global except Exception and only logged via logger.warning(...) without exc_info=True.
  • Impact: Bugs during search (memory errors, user script exceptions) are silenced; users only see "No output model produced" instead of the real error.
  • Recommended Action: Use logger.warning(..., exc_info=True); explicitly document the EXCEPTIONS_TO_RAISE list (ensure OlivePassError and similar always propagate through).

[Refactor] ~90% of the 4144-line hadamard_utils.py is hand-written matrix literals

  • Location: olive/passes/pytorch/hadamard_utils.py (4144 lines; get_had172() 1207 lines, get_had156() 939 lines, get_had140() 843 lines, get_had108() 435 lines — all +1, -1, ... literal data)
  • Current State: The largest file in the repository; its content is static data incorrectly embedded as code.
  • Impact: Slows IDE/lint time; misleads reviewers; makes matrix content impossible to diff; adds cold-start overhead.
  • Recommended Action: Serialize the matrices as .npy or .safetensors, ship them as package data, and load via np.load(importlib.resources.files(...)). Over 90% of the source file content can be removed.

Testing

[Quick Fix] or True in test_slicegpt.py permanently skips tests and hides real failures

  • Location: test/passes/pytorch/test_slicegpt.py:16-17
    @pytest.mark.skipif(
        (sys.version_info < (3, 10) and not torch.cuda.is_available()) or True,
        reason="requires python3.10 or higher",
    )
  • Current State: or True makes the condition always true; the comment # TODO(team): Failed in pipeline (linux gpu). Need to investigate. shows this is a workaround to hide failures.
  • Impact: Zero effective test coverage for the SliceGPT pass; the pattern may be copied into other files.
  • Recommended Action: Immediately investigate and fix the root cause; if GPU is required, change to @pytest.mark.skipif(not torch.cuda.is_available(), ...); if network is required, replace with a local small model; remove or True after fixing.

[Refactor] Add a test-unit.yml workflow covering core modules

  • Location: .github/workflows/test-model-fast.yml:39 only runs pytest test/cli/test_cli_test_model_smoke.py; the remaining 183 test files are not executed in any CI.
  • Current State: passes / engine / search / model / evaluator / systems / hardware — all core directories have zero CI protection.
  • Impact: Any regression can only be discovered through manual testing or production exposure.
  • Recommended Action: Add .github/workflows/test-unit.yml covering at least the hardware/network-dependency-free tests in test/engine/, test/search/, test/model/, test/evaluator/, test/hardware/, test/passes/onnx/; use workflow_dispatch or a separate matrix for GPU/QNN/OpenVINO suites.

[Quick Fix] Three locations perform direct HF downloads with no skip guard

  • Location:
    • test/passes/onnx/test_extract_adapters.py:37 (AutoModelForCausalLM.from_pretrained("hf-internal-testing/tiny-random-LlamaForCausalLM"), module scope)
    • test/passes/onnx/test_nvmo_graph_surgery.py:34 (AutoConfig.from_pretrained("Qwen/Qwen2.5-0.5B"), ~700 MB)
    • test/cli/test_cli.py:652 (HF download with no skip guard)
  • Current State: When the network is unavailable, tests emit a connection error rather than skipping; large model downloads significantly slow CI.
  • Recommended Action: Register a @pytest.mark.network marker (see CLI/CI items) and uniformly guard with skipif(not _network_available()); prefer local synthetic models (see test_cli_test_model_smoke.py::_save_local_tiny_llama).

Documentation

[Quick Fix] docs/source/conf.py version is inconsistent with the actual package version

  • Location: docs/source/conf.py:22 (version = "0.12.1") vs olive/version.py:1 (__version__ = "0.11.0.dev0")
  • Current State: Hard-coded version number is one minor version ahead of the actual package version.
  • Impact: The version switcher shows the wrong version; documentation does not match release artifacts; users and contributors reference incorrect APIs.
  • Recommended Action: Switch to dynamic reading: from olive.version import __version__ as version, and set release = version accordingly.

[Quick Fix] docs/architecture.md Pass base class API has changed but documentation has not been updated

  • Location: docs/architecture.md:47-48 (example uses @abstractmethod def default_config) vs olive/passes/olive_pass.py:318 (actual: _default_config, private method with different signature)
  • Impact: New contributors following the documentation to implement a custom Pass will get an "abstract method not implemented" error.
  • Recommended Action: Update the example to _default_config(cls, accelerator_spec); also update docs/architecture.md:79-87 — the outdated list of only 9 passes should be replaced with a link to reference/pass.rst; fix the Unicode em-dash at docs/architecture.md:363 (python –m olive.workflows.runolive run --config).

[Quick Fix] Mark auto-opt as deprecated at all entry points and fix the default log level

  • Location:
    • olive/cli/auto_opt.py:184 emits deprecation warning via logger.warning(...)
    • olive/cli/base.py:394 default --log_level=3 (ERROR) is above WARNING (2), so the warning is never shown by default
    • docs/source/how-to/cli/cli-auto-opt.md:1 does not note the deprecation
    • docs/source/reference/cli.rst:91-92 does not note the deprecation
    • notebooks/olive_quickstart.ipynb:42 still installs olive-ai[auto-opt]
  • Current State: The command is marked deprecated, but users see no indication of this at any of three entry points (CLI runtime, documentation, notebook).
  • Impact: Users continue learning and integrating a deprecated command; widespread breakage when it is eventually removed.
  • Recommended Action: ① Change logger.warning to logger.error or print print("DEPRECATED: use 'olive optimize'", file=sys.stderr) at subcommand registration; ② add > ⚠️ Deprecated: use \olive optimize`at the top ofcli-auto-opt.mdandcli.rst; ③ insert a deprecation notice at the top of --helpoutput; ④ update the quickstart notebook to useolive optimize`.

CLI / UX

[Quick Fix] --num-splits and other parameters globally mix kebab-case and snake_case

  • Location:
    • olive/cli/auto_opt.py:143-170 (kebab: --num-splits, --cost-model, etc.)
    • olive/cli/optimize.py:100-184 (snake: --num_split, --dim_param, etc.; synonym parameters named differently from auto_opt.py)
    • olive/cli/run.py:25-41 (mixed in same file: --run-config, --package-config kebab; --list_required_packages snake)
    • olive/cli/base.py:649-667 (add_shared_cache_options uses --account_name/--container_name) vs olive/cli/shared_cache.py:37-44 (--account/--container)
  • Current State: The same concept uses different separators across different commands and even within the same file; Azure credential parameter names differ between two implementations — an observable bug.
  • Impact: Users switching between commands must consult help every time; scripts are hard to reuse; Azure credential confusion is a real, observable bug.
  • Recommended Action: Standardize on snake_case (consistent with the majority of existing CLI); use argparse add_argument(..., dest=...) to retain kebab-case aliases for backward compatibility, printing a DeprecationWarning during the transition period. Standardize Azure credentials as --account_name / --container_name.

[Quick Fix] type=bool in capture-onnx-graph does not work as expected

  • Location: olive/cli/capture_onnx.py:152-163 (--exclude_embeds, --exclude_lm_head, --enable_cuda_graph, all type=bool, default=False)
  • Current State: argparse's type=bool evaluates any non-empty string as True (bool("false") == True), so --exclude_embeds false still yields True.
  • Impact: Users believe they are disabling a feature when in fact it is being enabled — silently incorrect behavior.
  • Recommended Action: Change to action="store_true", or define an explicit def str2bool(v): ... that accepts true/false/0/1.

[Quick Fix] CLI ValueError thrown directly causes users to see tracebacks

  • Location: olive/cli/base.py:217-218, 375, olive/cli/auto_opt.py:248-249, 300, olive/cli/optimize.py:260-290, olive/cli/capture_onnx.py:237-241, olive/cli/run_pass.py:134
  • Current State: Argument validation errors raise ValueError, triggering a full Python traceback and exit(1); this is mixed with argparse's parser.error() (exit 2).
  • Impact: Poor user experience; scripts cannot distinguish argument errors from runtime errors via exit code.
  • Recommended Action: Standardize argument validation to use parser.error("..."); retain ValueError only for genuine runtime errors.

[Quick Fix] Mixed print() and logger.* in CLI; multiple debug prints pollute stdout

  • Location: 40+ print calls throughout olive/cli/, most notably:
    • olive/cli/quantize.py:145 print(f"pass list: {pass_list}")
    • olive/cli/quantize.py:184 print(f"selected pass configs: {passes_dict}")
    • olive/cli/quantize.py:129-132 uses print for warnings
    • olive/cli/run_pass.py:198-200 uses print for errors then calls exit(0)
    • olive/cli/base.py:107-125 mixes print and logger in the same function
  • Current State: Debug-level output cannot be controlled via --log_level; error output goes to stdout instead of stderr; hard to parse in pipelines.
  • Recommended Action: Standardize: use print for user-facing success/progress/result paths; use logger.* (going to stderr) for diagnostics/warnings/errors; remove obvious debug prints; the error path in run_pass.py:198-200 should call sys.exit(1).

Dependencies / Packaging

[Quick Fix] Add upper bounds to critical dependencies in requirements.txt

  • Location: requirements.txt:1-14 and test/requirements-test.txt:18,24,40, test/requirements-test-gpu.txt:4, test/requirements-test-openvino.txt:6,7
  • Current State: Production requirements.txt lists numpy, torch, transformers, onnx, onnxscript without upper bounds; tests already document known breaking changes at onnx<1.21.0 (symlink issue), torch<2.11.0, transformers<5.4.0, onnxscript<0.6.1.
  • Impact: Any upstream major release may cause a fresh pip install olive-ai to fail immediately; poor reproducibility in user environments.
  • Recommended Action: Sync the test-documented upper bounds into production requirements.txt (e.g., numpy<2, onnx<1.21.0, torch<2.11.0, transformers<5.4.0, onnxscript>=0.5.3,<0.6.1); explicitly raise bounds via dependabot/manual PR in each subsequent release.

[Quick Fix] Fix the inconsistency between ruff target-version and python_requires

  • Location: pyproject.toml:84 (target-version = "py39") vs setup.py:77 (python_requires=">=3.10"); and comments at pyproject.toml:135, 176
  • Current State: ruff still checks against py39, missing lint rules specific to 3.10+ syntax (e.g., stricter B905 patterns, PEP 604 union types).
  • Impact: Lint false negatives; certain anti-patterns may slip through.
  • Recommended Action: Change to target-version = "py310"; remove legacy comments advising to "avoid" 3.10 constructs and re-audit the ignore list.

[Quick Fix] Remove or update the uninstallable tf extra (tensorflow==1.15.0)

  • Location: extra_dependencies.tf in olive/olive_config.json, olive/model/handler/tensorflow.py:31 (load_model directly raises NotImplementedError), olive/cli/configure_qualcomm_sdk.py:21, olive/platform_sdk/qualcomm/configure/configure.py:84
  • Current State: TF 1.15.0 does not support Python 3.10+; it cannot be installed under the current python_requires; TensorFlowModelHandler itself is not implemented.
  • Impact: pip install olive-ai[tf] will always fail, signaling to users that "this extra is dead."
  • Recommended Action: Remove the tf extra, or update it to modern TF (only if actually needed); synchronously delete TensorFlowModelHandler or mark it deprecated.

CI/CD

[Quick Fix] All GitHub Actions references use unpinned SHAs (supply-chain risk)

  • Location: 10+ mutable tags across 4 workflows: lint.yml:19, 21, 32, 73, codeql.yml:51, 66, 80, 85, docs.yml:30, test-model-fast.yml:21
  • Current State: All actions use mutable tags (@v3 / @v4 / @v5), including two third-party actions (reviewdog/action-misspell@v1, advanced-security/dismiss-alerts@v1).
  • Impact: If a tag is hijacked or an account is compromised, arbitrary code can be executed on the runner — a standard GitHub Actions supply-chain attack vector.
  • Recommended Action: Pin all actions to SHA (e.g., uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2); add .github/dependabot.yml to enable automatic updates for github-actions and pip.

[Refactor] Enable pip caching and dependency security scanning

  • Location: lint.yml:49-53, test-model-fast.yml:23-27, docs.yml:32-36 (all missing cache: pip); no pip-audit/safety in .github/workflows/
  • Current State: Every CI run re-downloads ML dependencies (5–15 minutes); no dependency vulnerability scanning; CodeQL only scans source code.
  • Recommended Action: Add cache: "pip" and cache-dependency-path: ... to all setup-python@v5 steps; add pip-audit -r requirements.txt to the scheduled trigger in codeql.yml.

Performance

[Refactor] Passes / search points are strictly serial — no multi-core utilization in search mode

  • Location: olive/engine/engine.py:620-631 (_run_passes), olive/engine/engine.py:409-451 (_run_search), olive/search/search_strategy.py:210-257 (synchronous generator)
  • Current State: All passes within a single workflow are executed serially; in search mode each search point is evaluated completely serially, waiting for record_feedback_signal before generating the next.
  • Impact: Workflows with 5+ passes × hundreds of search points have end-to-end time that is purely additive; multi-core CPUs sit idle.
  • Recommended Action: In search mode, introduce concurrent.futures.ProcessPoolExecutor for batch evaluation of independent search points at the same level; introduce an async feedback mechanism so the sampler can submit batches simultaneously.

[Refactor] PythonEnvironmentSystem spawns a new subprocess for every pass / evaluation

  • Location: olive/systems/python_environment/python_environment_system.py:61-93
  • Current State: Every run_pass / evaluate_model creates a temporary directory, writes JSON, calls subprocess.run to start a new Python interpreter, reads JSON, and deletes the directory.
  • Impact: In search mode with hundreds of iterations, each incurring 200ms–1s process startup overhead; cumulative JSON I/O is significant.
  • Recommended Action: Implement a persistent worker process (e.g., using multiprocessing.Process + Queue or ZeroMQ) that communicates via stdin/stdout/IPC and reuses the interpreter across iterations.

Medium-Priority Recommendations

Architecture / Code Quality

  • [Quick Fix] cache.py:216 pass_name: int type annotation is incorrect: The call site (engine.py:739) actually passes a str; change to pass_name: str.
  • [Quick Fix] Engine.__init__ negation-style parameter naming is inconsistent: engine.py:59,82 has no_artifacts: bool=False immediately assigned to internal skip_saving_artifacts; change to positive naming save_artifacts: bool=True or align the parameter name with the attribute name.
  • [Refactor] olive/evaluator/olive_evaluator.py is a 2059-line god module: Contains 10 classes (OliveEvaluator, per-backend evaluators, OliveEvaluatorConfig); split by backend into base.py, onnx.py, pytorch.py, openvino.py, qnn.py, lm.py, mteb.py, with __init__.py maintaining compatible exports.
  • [Refactor] olive/engine/engine.py is 844 lines with 4 TODOs: Mixes scheduling, caching, search expansion, evaluation, and packaging responsibilities; extract Scheduler / EvaluationRunner following the existing FootprintNode/WorkflowOutput approach.
  • [Refactor] olive/common/utils.py is a 721-line / 38-function grab-bag: Mixes hashing, subprocess, torch tensor, file I/O, and HF integration; split by responsibility into hashing.py/io_utils.py/torch_utils.py/dict_utils.py; move hf_repo_exists to olive/common/hf/.
  • [Refactor] olive/cli/base.py is 768 lines mixing ABC with utility functions: Extract add_input_model_options, add_hf_test_model_config, get_input_model_config, and similar into olive/cli/utils.py/olive/cli/model_options.py; keep only BaseOliveCLICommand in base.py.
  • [Quick Fix] olive/passes/onnx/common.py is depended on across backends: olive/passes/openvino/quantization.py:17 and olive/passes/quark_quantizer/quark_quantization.py:21 import model_proto_to_file / model_proto_to_olive_model / get_external_data_config; move backend-agnostic get_external_data_config to passes/common/; keep ONNX-specific utilities in passes/onnx/common.py.
  • [Quick Fix] PyTorch backend pass names lack a consistent prefix: olive/passes/pytorch/ contains Gptq, Rtn, LoRA, AutoAWQQuantizer, GptqQuantizer with inconsistent naming; uniformly add a Pytorch prefix (matching the ONNX backend convention) or add a lint check in Pass.__init_subclass__.
  • [Refactor] olive.search depends on MetricResult: olive/search/search_results.py:12, olive/search/search_strategy.py:23, olive/search/samplers/search_sampler.py:14 (though TYPE_CHECKING); define ScoreMap = dict[str, float] or a Protocol; have Engine convert MetricResult before passing it in.
  • [Quick Fix] olive.platform_sdk has a reverse dependency on olive.cli: olive/platform_sdk/qualcomm/configure/__main__.py:9 calls from olive.cli.launcher import legacy_call; inline the needed logic or define an independent run_configure() in platform_sdk.
  • [Quick Fix] Adapter handling code is duplicated in 3 places: olive/passes/pytorch/autoawq.py:131-138, autogptq.py:119-131, gptqmodel.py:119-128 are nearly identical; extract detach_adapter(model, backend_name) -> tuple[ModelHandler, Optional[str]] into olive/passes/pytorch/common.py.
  • [Refactor] HfModelHandler has a 5-level MRO plus dead composite_parent code: olive/model/handler/hf.py:29, olive/model/handler/pytorch.py:25 already carry # pylint: disable=too-many-ancestors; olive/model/handler/base.py:45 composite_parent has no consumers anywhere in the codebase. Change DummyInputsMixin and similar to composition (handler.dummy_input_provider: DummyInputProvider); delete composite_parent.

Testing

  • [Refactor] test_python_environment_system.py creates a real venv and runs pip install inside a fixture: test/systems/python_environment/test_python_environment_system.py:34-52; mock run_subprocess or move to test/systems/integration/ and add @pytest.mark.integration.
  • [Refactor] test_extract_adapters.py module-scope fixture runs an end-to-end pipeline: test/passes/onnx/test_extract_adapters.py:27-65 includes HF download + PEFT + ONNX export; replace with a local synthetic model, or mark as @pytest.mark.integration.
  • [Quick Fix] pyproject.toml registers only 1 marker and does not enable strict mode: Currently only openvino; add gpu, cuda, network, qnn, qairt, integration, and enable --strict-markers.
  • [Refactor] olive/telemetry/ and olive/platform_sdk/qualcomm/ have zero test coverage: Telemetry covers a user-privacy-critical path; add test/telemetry/test_telemetry.py with at minimum disable/enable toggle coverage and "no send when disabled"; remove the empty test/snpe/ shell or add a README, and add mock-SDK unit tests in test/passes/qairt/ or a new test/platform_sdk/.
  • [Refactor] test/conftest.py:15-30 autouse session fixture slows startup: Imports transformers and datasets, modifies transformers.utils.hub.TRANSFORMERS_CACHE globally; change to non-autouse explicit dependency, or configure via the TRANSFORMERS_CACHE environment variable in CI.
  • [Refactor] requirements-test.txt mixes in platform-specific heavy dependencies: cupy-cuda12x, nncf>=2.19.0, openvino>=2025.4.1, neural-compressor<2.4 in the base file cause install failures on CPU/AMD/Apple Silicon; split into a minimal base set plus requirements-test-{gpu,openvino,...}.txt.

Documentation

  • [Quick Fix] cli-auto-opt.md and reference/cli.rst:91-92 do not note the deprecation: (Already merged with the CLI high-priority item.)
  • [Quick Fix] how-to/index.md:8 mentions onnx-graph-capture but no corresponding how-to page exists: Add cli-capture-onnx-graph.md or remove the mention from the index.
  • [Quick Fix] NEWS.md was last updated in Feb 2025: README contains Aug/Oct 2025 entries; backfill v0.8.0, v0.9.0, v0.9.2 release entries for Mar–Oct 2025.
  • [Refactor] Improve docstring coverage across core modules: engine.py:49 (Engine.__init__), engine.py:241 (run_accelerator), pass_config.py:98 (BasePassConfig), onnx.py:45 (OnnxModelHandler.__init__) all lack docstrings; prioritize public abstract methods.
  • [Quick Fix] CONTRIBUTING.md / AGENTS.md lack docs build instructions: Add cd docs && pip install -r requirements.txt && make html.
  • [Quick Fix] docs.yml:41-42 inlines onnxruntime/torch without pinned versions: Move to docs/requirements.txt with pinned versions.

CLI / UX

  • [Quick Fix] --log_level accepts integer 0–4 with no choices=: olive/cli/base.py:390-398; change to string-based ["debug","info","warning","error","critical"] or at minimum add choices=range(5); also introduce -v/-q Boolean shortcut flags (consistent with git/pip/uv).
  • [Quick Fix] run command lacks --dry_run: olive/cli/run.py does not call add_save_config_file_options; inconsistent experience compared to optimize and other commands.
  • [Quick Fix] quantize --algorithm help text is semantically incorrect: olive/cli/quantize.py:65-66 says "List of", but the parameter accepts only a single value; change to "The quantization algorithm to use."
  • [Quick Fix] tune-session-params help references non-existent parameters: olive/cli/session_params_tuning.py:30-36 mentions --model / --hf_model_name, but the actual parameter is --model_name_or_path.
  • [Quick Fix] --disable_force_evaluate_other_eps is a double negative: olive/cli/session_params_tuning.py:78-84; change to --evaluate_all_eps (positive) with inverted default.
  • [Refactor] olive --help has no grouping: olive/cli/launcher.py:37-59 lists 19 subcommands flat, with auto-opt registered before optimize; group by "core workflows / model conversion / adapter management / utilities", with deprecated commands listed last.
  • [Quick Fix] @action decorator placement is inconsistent: olive/cli/run_pass.py:22-23 decorates the class, while other commands decorate the run() method (e.g., finetune.py:82); standardize to run().
  • [Quick Fix] diffusion_lora.py bypasses add_input_model_options: olive/cli/diffusion_lora.py:34-47 defines custom -m/--model_name_or_path, missing --trust_remote_code and --test; use the shared helper.
  • [Quick Fix] --memory help does not mention supported string formats: olive/cli/base.py:704-709 says "in bytes", but type=AcceleratorSpec.str_to_int_memory accepts "4GB"; add an example.

Dependencies / Packaging

  • [Quick Fix] setup.py:84 package_data references non-existent package olive.auto_optimizer: Dead entry; remove or confirm the package exists.
  • [Quick Fix] onnxscript upper and lower bounds conflict: Production requirements.txt:4 >=0.5.3 with no upper bound; tests test/requirements-test.txt:24 <0.6.1; unify by tightening.
  • [Quick Fix] openvino extras and test versions are misaligned: olive_config.json specifies openvino>=2025.4.1, nncf>=2.19.0; test/requirements-test-openvino.txt:6,7 specifies openvino>=2025.3.0, nncf>=2.18.0; align them.
  • [Quick Fix] Missing aggregate extras (all / dev): Makes one-command CI and developer installs inconvenient.
  • [Refactor] finetune/bnb/lora extras overlap: finetune ≈ bnb + lora + optimum is not expressed using olive-ai[bnb,lora] syntax; high maintenance cost to keep in sync.

CI/CD

  • [Quick Fix] docs.yml is missing a permissions: declaration: Add explicit permissions: contents: read; also add permissions: {} at the top level of each workflow as a default, elevating at the job level as needed.
  • [Quick Fix] optional-lint job has excess security-events: write: .github/workflows/lint.yml:16-17; that job does not upload SARIF; remove under the principle of least privilege.
  • [Quick Fix] lint.yml/test-model-fast.yml/docs.yml do not trigger on rel-*: Release candidate branches are not running tests or building docs; explicitly add rel-*.
  • [Refactor] Introduce a platform/Python version matrix: Currently all workflows run on ubuntu-latest + Python 3.12; add windows-latest and python-version: ["3.10","3.12"] (at minimum in lint and smoke tests).
  • [Refactor] Introduce pip-audit and mypy/pyright static type checking: Add pip-audit -r requirements.txt to the scheduled trigger in codeql.yml; incrementally enable mypy --ignore-missing-imports olive/ in lint.
  • [Quick Fix] pre-commit and CI lint check sets are inconsistent: PYLINT, TOML-SORT, EDITORCONFIG-CHECKER, NOQA only in CI; check-yaml, absolufy-imports only in pre-commit; unify by having pre-commit invoke lintrunner.

Performance

  • [Refactor] engine.py:306, 398 unnecessary model_dump() + validate_config() round-trip in search hot path: Use pass_config.model_copy() or pass by reference directly.
  • [Refactor] footprint.py:91-102, 230-254 every record() call triggers a full _resolve_metrics() — O(n²): Change to lazy evaluation (call only before create_pareto_frontier) or incremental computation.
  • [Refactor] footprint.py:270-303 Pareto frontier uses a double loop — O(n²): Noticeable at larger search spaces; replace with a divide-and-conquer Pareto algorithm or sorting-based approach.
  • [Refactor] DockerSystem.run_workflow destroys the container on every run: olive/systems/docker/docker_system.py:108-142; support container reuse.
  • [Quick Fix] local.py:46 ModelConfig.from_json(output_model.to_json()) is an unnecessary round-trip: Return output_model.to_model_config() directly.
  • [Refactor] onnx.load(self.model_path) loads large models fully into memory: olive/model/handler/onnx.py:110-111; in large LLM scenarios every pass does a full load; prefer load_external_data=False in lightweight inspection paths.
  • [Refactor] Multiple onnx.load calls on the same file inside quantization passes: olive/passes/onnx/quantization.py:265, 477, 510, 540; pass ModelProto within the pass instead of repeatedly re-reading from disk.
  • [Quick Fix] cache.py:185, 227, 263 JSON indent=4 increases I/O: Use indent=None in production mode.
  • [Quick Fix] cache.py:290 cache_id is only 8 hex characters: 32-bit space has collision risk in shared caches; extend to 16 characters or add a pass prefix.

Low-Priority Recommendations

Code Quality / Style

  • pyproject.toml globally disables all complexity rules in pylint: too-many-arguments / branches / locals / nested-blocks / return-statements / statements, broad-exception-caught, unused-argument are all globally disabled; exempt per module at most, not globally, to catch newly introduced violations. The same applies to ruff ignore: D100-D107 fully off, RUF013 off due to a "Python 3.10" comment (contradicting target=py39), PLW0603/B028 marked "temp" but now permanent.
  • olive/passes/onnx/conversion.py:41 file-level # pylint: disable=W0212: Scope is too broad; change to inline suppression.
  • olive/common/utils.py:76 internal variable md5_hash in hash_string but the algorithm is SHA-256: Misleading; rename to sha256_hash or hasher (same issue in hash_io_stream/hash_file/hash_dict).
  • 100+ untracked TODO/FIXME items: olive/engine/engine.py:560, olive/passes/pytorch/lora.py:360 (functional bug num_items_in_batch), olive/passes/onnx/graph_surgeries.py:170,184 ("incorrect"), three similar adapter issues in olive/passes/pytorch/*; enable ruff TD003 to require each TODO to link to an issue.
  • olive/engine/engine.py:462-485 _dump_run_history temporarily mutates global verbosity: Anti-pattern (thread-unsafe); control via module logger level or a parameter instead.
  • Return type annotation coverage across passes/ is ~33%: Annotate public and abstract methods with ->.
  • Multiple _run_for_config implementations exceed 200 lines: diffusers/lora.py:231 _train_sd (356), _train_flux (316), onnx/mnb_to_qdq.py:89 (271), openvino/optimum_intel.py:349 (246), onnx/split.py:42 (211), onnx/model_builder.py:210 (208), pytorch/selective_mixed_precision.py:394 compute_module_stats (207); extract _prepare_*/_post_process_* private methods.
  • olive/data/config.py:24, 44 DataConfig has 4 fixed component fields: Adding a new data component type requires changing the class; follow pass_config.py and change to components: dict[str, DataComponentConfig].

Testing

  • Engine / Workflow test names do not follow AGENTS.md convention: test/engine/test_engine.py:46,76,93,112, test/workflows/test_workflow_run.py use test_register, test_run, etc., lacking "expected behavior"; rename to test_<func>_<expected>[_when_<cond>].
  • Key subdirectories lack conftest files: test/engine/, test/search/, test/model/, test/systems/; convert factory functions in test/utils.py to conftest fixtures.
  • test_aimet_quantization.py uses skipif(CUDA_AVAILABLE) to exclude GPU environments: test/passes/onnx/test_aimet_quantization.py:144,197,223,269; use a TEST_DEVICE environment variable or a dedicated marker instead.
  • test/assets/ contains only user_script.py: No shared ONNX files or golden outputs; commit common dummy ONNX files as static assets to reduce runtime generation.
  • olive/exception.py / olive/logging.py have no tests: Add test/test_exception.py / test/test_logging.py.

Documentation

  • Minor README / getting-started issues: README.md:98 "the the"; README.md:71-79 instructions mention Llama-3.2-1B-Instruct but the actual command uses Qwen2.5-0.5B-Instruct; docs/source/getting-started/getting-started.md:55 "60secs" is a significant underestimate.
  • notebooks/README.md:3,6 typos: "ocmmand" → "command", "hot wo" → "how to".
  • docs/architecture.md is not in the Sphinx toctree: Users cannot find it from the documentation site; move into docs/source/ or add to the toctree.
  • docs/source/conf.py:65 suppress_warnings = ["myst.xref_missing"]: Suppresses missing internal cross-reference warnings, potentially masking broken links; at least keep a version with this enabled in CI.
  • docs/source/index.md:30 features/index links to .rst while others use .md: Style inconsistency.
  • No Python API example notebooks: All existing notebooks are CLI-based.

CLI / Exit Codes

  • Exit codes are not standardized: Argument errors use raise ValueError (exit 1) and parser.error() (exit 2) inconsistently; scripts cannot distinguish them.
  • run_pass.py:183-200 _list_passes has no --json format: Inconvenient for tool integration.
  • Internal single-letter pass keys leak into --save_config_file output: capture_onnx.py:340-347 ("c"/"m"/"b"/"f"), generate_adapter.py:89 ("e"), finetune.py:159 ("f"); change to readable key names.
  • --modality choices=["text"] has only one option: olive/cli/optimize.py:155-163; hide or mark as internal.
  • session_params_tuning.py:149 prompts to "set log_level to 1" but 1 is INFO; debugging requires DEBUG (0).

Dependencies / Packaging

  • pyproject.toml lacks [build-system] and [project]: Not PEP 517/518 compliant; inconvenient for dependabot/pip inspect.
  • triton is a hard dependency in bnb/finetune extras: Linux + CUDA only; will fail in non-GPU environments.
  • hf-xet is a core dependency: It is a performance acceleration package, not a functional dependency; consider moving to an optional extra.
  • MANIFEST.in does not explicitly include olive data files: sdist may miss data files (wheel is fine via package_data).
  • setup.py:47-48 is missing macOS classifier: Olive is pure Python and can run on macOS.
  • setup.py:40 "3 - Alpha": Consider advancing to Beta.
  • test/requirements-test.txt:14 neural-compressor<2.4 has no explanatory comment.

CI/CD / Performance / Other

  • .github/workflows/test-model-fast.yml does not upload failure artifacts: Logs are destroyed when the runner exits; add actions/upload-artifact@v4 with if: failure().
  • lint.yml:72 SARIF upload has continue-on-error: true: Silent failure may silently drop security scan results.
  • docs.yml:43 make linkcheck may block PRs due to unavailable external links: Consider running only on schedule trigger or adding --ignore patterns.
  • quantization.py:306-337 deepcopy on a module-level config dict: Minor overhead; can be changed to a shallow copy.
  • cache.py:795-806 evaluation cache is keyed only on model_id, without an evaluator configuration hash: If a user changes the evaluation dataset but the pass configuration is unchanged, the cache does not invalidate; introduce an evaluator hash.
  • search/search_results.py:36 uses None-padded sparse list: Change to dict[int, tuple].
  • optuna_sampler.py:73-78 while True loop skips duplicate points: May produce many wasted trials when the search space is nearly exhausted.
  • OnnxQuantization._initialize() depends on destructor to clean up temporary directory: quantization.py:289; exception paths may leak; change to a contextmanager or try/finally.
  • olive/passes/openvino/optimum_intel.py:275 silently except ImportError: pass: Users see an AttributeError instead of a "missing dependency" message.
  • telemetry_extensions.py:57, 126 @action uses inspect.stack(): Currently only decorates Engine.run() so risk is limited; future expansion will require session-level caching.

Appendix: Per-Dimension Investigation Summaries

Architecture (findings-arch.md)

Olive has clear inheritance contracts and a Pydantic v2 configuration foundation across the Pass / Engine / System / Evaluator / Search / Model layers — a reasonable overall structure for a mid-to-large project. The main architectural debt concentrates in reverse dependencies (systems → workflows, cache → model/passes, passes → evaluator, platform_sdk → cli, model/handler/mixin → cache) and god modules (olive_evaluator.py at 2059 lines, engine.py at 844 lines, cli/base.py at 768 lines, common/utils.py at 721 lines). The HfModelHandler 5-level MRO and dead composite_parent code need cleanup. Full 14 findings in findings-arch.md.

Code Quality (findings-quality.md)

Key metrics: largest single function is 1207 lines (get_had172), 68 bare except Exception catches, 100+ TODO/FIXME items, pylint globally disabling 37 complexity rules. The most critical issues are critical failures silently swallowed in cache.py, missing exc_info in _run_search exception handling, and the matrix data embedded as code in hadamard_utils.py. The CLI layer mixes print/logger, and the "incorrect" surgeons in graph_surgeries.py remain production-accessible. Full 17 findings in findings-quality.md.

Testing (findings-tests.md)

Of 184 test files, only 1 runs in CI; test_slicegpt.py has an or True permanent skip; test_python_environment_system.py and test_extract_adapters.py are end-to-end integration tests disguised as unit tests; 3 locations perform HF downloads with no skip guard; telemetry / platform_sdk have zero tests; pyproject.toml registers only the openvino marker. Engine/Workflow test method names do not follow AGENTS.md convention. Full 15 findings in findings-tests.md.

Documentation (findings-docs.md)

README + Sphinx main structure is good (CLI docs strongly synchronized with code via sphinxarg.ext), but there are 3 P0 issues: conf.py version 0.12.1 vs actual 0.11.0.dev0; architecture.md Pass base class API has changed (default_config_default_config) without being updated; auto-opt deprecated status is not communicated at any of three entry points (runtime warning filtered by default log_level, cli-auto-opt.md, reference/cli.rst). Docstring coverage is 22–47%. Full list in findings-docs.md.

CLI (findings-cli.md)

The biggest issue is inconsistent parameter naming style (kebab-case and snake_case mixed across different commands and even within the same file, with the Azure credential parameter names --account_name vs --account being a real observable bug); type=bool in capture-onnx-graph does not work as expected; multiple raise ValueError calls expose raw tracebacks; debug print calls in quantize.py pollute stdout; --log_level uses integers without a --verbose/--quiet shortcut; olive --help lists 19 subcommands flat with no grouping. Full 10 sections in findings-cli.md.

Dependencies / Packaging (findings-deps.md)

H1: ruff target-version="py39" contradicts python_requires=">=3.10"; H2: numpy/torch/transformers/onnx have no production upper bounds but tests document known breaking changes; H3: tf extra tensorflow==1.15.0 cannot be installed on Python 3.10+. Medium-priority issues: onnxscript bounds conflict, openvino extras/test version mismatch, dead olive.auto_optimizer entry in setup.py package_data, missing aggregate all extra. Full 11 sections in findings-deps.md.

CI/CD (findings-ci.md)

4 workflows (lint/codeql/docs/test-model-fast). Core issues: unit tests are almost entirely absent from CI (183/184 not running); all GitHub Actions references use unpinned mutable tags (supply-chain risk); no Dependabot; no platform/Python version matrix (all ubuntu+3.12); no pip caching; no static type checking or dependency security scanning; pre-commit and CI lint check sets are inconsistent. Full 12 sections in findings-ci.md.

Performance (findings-perf.md)

13 performance anti-patterns, with 3 high-severity: search/pass execution is fully serial with no parallel evaluation, PythonEnvironmentSystem spawns a new subprocess per iteration, ONNX models are loaded fully into memory with no streaming. Medium-severity issues include Pydantic hot-path round-trips, O(n²) Pareto frontier, and repeated onnx.load calls inside quantization passes. Adding new Passes / Systems / Samplers is low-cost; the extensibility design is sound. Telemetry concurrency design is correct. Full list in findings-perf.md.


Report generated: 2026-06-12 · Strictly read-only review. No modifications were made to the Olive repository. All artifacts are stored in the session files/ directory.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions