Microsoft Olive — Read-Only Codebase Review
Review date: 2026-06-12
Review scope: /home/justinchu/dev/Olive (olive-ai 0.11.0.dev0)
Review method: Eight-dimension parallel read-only static analysis (Architecture / Code Quality / Testing / Documentation / CLI / Dependencies / CI-CD / Performance). This report de-duplicates and synthesizes findings from all eight analyses into a unified priority ranking.
Hard constraint: Strictly read-only. No files in the repository were modified during this review.
This is a read-only static-analysis review. No files in the repository were modified during this review.
Executive Summary
Olive is a well-organized, clearly abstracted multi-backend model optimization toolkit. The core abstractions — Pass / Engine / System / Evaluator / Search — have well-defined inheritance contracts, a consistent Pydantic v2 configuration system, and a CLI entry point strongly synchronized with the code through argparse and auto-generated docs. However, the eight-dimension review uncovered several systemic issues concentrated in three areas: weak quality safeguards, layer-coupling leakage, and unreliable user-visible behavior.
Overall health: Above-average design paired with below-average quality guardrails. Without adding CI test gates, fixing the or True permanent skip, and tightening dependency upper bounds, a single upgrade of numpy / torch / transformers could cause production-grade regressions that go undetected.
Top 5 High-Priority Issues (ranked by risk):
- Test suite barely wired into CI: Of 184 test files, only 1 (
test/cli/test_cli_test_model_smoke.py) runs in GitHub Actions. All core modules — passes / engine / search / evaluator / model / systems — have zero CI coverage. Any regression will only be discovered through user reports. (findings-tests item 1, findings-ci §2.1)
- Critical dependencies have no upper bounds + ruff target contradicts python_requires:
requirements.txt lists numpy / torch / transformers / onnx / onnxscript without upper bounds, yet test files already document known breaking changes at onnx<1.21.0, torch<2.11.0, transformers<5.4.0, onnxscript<0.6.1; meanwhile pyproject.toml ruff target-version="py39" contradicts setup.py python_requires=">=3.10", silently invalidating 3.10+-specific lint rules. (findings-deps H1/H2/M1/M2, L3)
- Permanently skipped tests coexist with production code marked "incorrect":
test_slicegpt.py:16 uses or True to force-skip and hide failures; graph_surgeries.py:170,184 RenameInputs / RenameOutputs carry a "This is incorrect" TODO yet remain registered in the surgeon table and are user-accessible; the auto-opt deprecation warning is never shown because the default --log_level=3 (ERROR) is higher than WARNING. All three share the same pattern: users and developers cannot see "known-bad" state. (findings-tests item 3, findings-quality item 8, findings-cli §10.1)
- Critical failures in
olive/cache.py are silently swallowed: cache_model / load_model / cache_run and other critical paths use except Exception: logger.exception(...) and then continue executing rather than re-raising, creating a "cache miss" illusion and hard-to-diagnose downstream errors; cache.py also creates a reverse dependency by importing olive.model.ModelConfig and olive.passes.onnx.common.resave_model, breaking the layering. (findings-quality item 2, findings-arch item 2)
systems → workflows and passes → evaluator reverse dependencies: DockerSystem directly imports olive.workflows.run.config.RunConfig; workflow_runner.py calls from olive.workflows import run; two ONNX passes (inc_quantization.py, session_params_tuning.py) construct OliveEvaluatorConfig directly inside _run_for_config to invoke the evaluator. Both reverse-dependency patterns prevent systems / passes from being tested or evolved independently. (findings-arch items 1 and 3)
All high / medium / low priority recommendations are listed below, grouped by dimension. Related issues (e.g., multiple missing auto-opt deprecation notices, conflicting onnxscript bounds, CLI snake/kebab mixing) are merged into single items retaining all file:line references.
High-Priority Recommendations
Inclusion criteria: correctness issues / security / user-blocking / data loss / CI gaps / severe developer-experience barriers.
Architecture
[Quick Fix] Disable or fix Surgeons marked "incorrect" in graph_surgeries.py
- Location:
olive/passes/onnx/graph_surgeries.py:170-195 (RenameInputs / RenameOutputs class definitions, carrying # TODO(anyone): This is incorrect, remove or fix)
olive/passes/onnx/graph_surgeries.py:3273-3278 (registered in the surgeon table, user-accessible via configuration)
- Current State: Both Surgeon subclasses have inline comments admitting they are "incorrect" yet remain exposed to users as valid surgeons.
- Impact: If a user specifies
surgeon: "RenameInputs", a known-broken code path executes without error, silently producing an incorrect ONNX model.
- Recommended Action: Immediately raise
NotImplementedError("RenameInputs is known to be incorrect; see TODO at line 170") inside _run_for_config, or remove the entries from the surgeon registry; restore once fixed.
[Refactor] Eliminate the reverse dependency from olive.systems to olive.workflows
- Location:
olive/systems/docker/docker_system.py:22 (from olive.workflows.run.config import RunConfig), olive/systems/docker/workflow_runner.py:8 (from olive.workflows import run as olive_run)
- Current State: The systems layer directly imports the workflows layer, forming a systems ↔ workflows circular dependency.
- Impact:
systems cannot be tested or released independently of workflows; any workflows refactor breaks systems.
- Recommended Action: Define a
DockerRunSpec (containing only runtime-agnostic fields like host/target/accelerator/resources) inside systems/; let workflows convert RunConfig to DockerRunSpec before passing it in; move workflow_runner.py from systems/docker/ to workflows/ or cli/.
[Refactor] Eliminate the reverse dependency from olive/passes to olive/evaluator
- Location:
olive/passes/onnx/inc_quantization.py:19-21
olive/passes/onnx/session_params_tuning.py:17-19
- Both passes construct
OliveEvaluatorConfig directly inside _run_for_config
- Current State: The pass layer (which should only perform model transformations) directly invokes evaluation logic, violating the "passes are pure transformations" contract and is inconsistent with other pass interfaces.
- Impact: Both passes cannot be reused or unit-tested without evaluator configuration; there is no unified mechanism for adding evaluation-feedback-driven passes.
- Recommended Action: Add an
evaluation_callback: Optional[Callable] parameter to the Pass base class, injected by the Engine at dispatch time; passes obtain evaluation results via the callback and are prohibited from directly import olive.evaluator.
[Refactor] Eliminate olive/cache.py reverse dependencies on domain layers
- Location:
olive/cache.py:22 (from olive.model.config.model_config import ModelConfig), olive/cache.py:483 (lazy import from olive.passes.onnx.common import resave_model), olive/model/handler/mixin/mlflow.py:10 (from olive.cache import OliveCache)
- Current State:
cache.py should be infrastructure, yet it depends on ModelConfig and a concrete pass; inversely, MLFlowTransformersMixin depends on OliveCache.from_cache_env() to read the cache directory from environment variables.
- Impact: Cache cannot be tested in isolation; MLFlow handler has an implicit runtime environment variable dependency, producing obscure initialization failure messages.
- Recommended Action: Replace the
resave_model call with a protocol/hook injected by the caller (engine); have cache_model accept a serialized dict instead of the ModelConfig type; move MLFlow model-merge logic into Engine.initialize() or a pre-processing step in LocalSystem.run_pass, or switch to an explicit cache_dir: Path parameter.
Code Quality
[Quick Fix] Multiple critical failures silently swallowed in olive/cache.py
- Location:
olive/cache.py:188, 202, 229, 242, 266, 279, 679, 684, 701, 774
- Current State:
cache_model / load_model / cache_run / load_run_from_model_id and other critical paths use except Exception: logger.exception(...) and then continue executing without re-raising.
- Impact: Cache write failures are invisible to users; downstream code continues running on the assumption the cache succeeded, producing hard-to-diagnose secondary errors.
- Recommended Action: Change cache write operations to
logger.warning and expose write status in return values / callers; true exceptions in read operations (not FileNotFoundError) must propagate upward; document in an EXCEPTIONS_TO_RAISE whitelist.
[Quick Fix] Bare exception catch in _run_search loses critical context
- Location:
olive/engine/engine.py:437-446
- Current State: In search mode, any exception from
_run_passes is caught by a global except Exception and only logged via logger.warning(...) without exc_info=True.
- Impact: Bugs during search (memory errors, user script exceptions) are silenced; users only see "No output model produced" instead of the real error.
- Recommended Action: Use
logger.warning(..., exc_info=True); explicitly document the EXCEPTIONS_TO_RAISE list (ensure OlivePassError and similar always propagate through).
[Refactor] ~90% of the 4144-line hadamard_utils.py is hand-written matrix literals
- Location:
olive/passes/pytorch/hadamard_utils.py (4144 lines; get_had172() 1207 lines, get_had156() 939 lines, get_had140() 843 lines, get_had108() 435 lines — all +1, -1, ... literal data)
- Current State: The largest file in the repository; its content is static data incorrectly embedded as code.
- Impact: Slows IDE/lint time; misleads reviewers; makes matrix content impossible to diff; adds cold-start overhead.
- Recommended Action: Serialize the matrices as
.npy or .safetensors, ship them as package data, and load via np.load(importlib.resources.files(...)). Over 90% of the source file content can be removed.
Testing
[Quick Fix] or True in test_slicegpt.py permanently skips tests and hides real failures
- Location:
test/passes/pytorch/test_slicegpt.py:16-17
@pytest.mark.skipif(
(sys.version_info < (3, 10) and not torch.cuda.is_available()) or True,
reason="requires python3.10 or higher",
)
- Current State:
or True makes the condition always true; the comment # TODO(team): Failed in pipeline (linux gpu). Need to investigate. shows this is a workaround to hide failures.
- Impact: Zero effective test coverage for the
SliceGPT pass; the pattern may be copied into other files.
- Recommended Action: Immediately investigate and fix the root cause; if GPU is required, change to
@pytest.mark.skipif(not torch.cuda.is_available(), ...); if network is required, replace with a local small model; remove or True after fixing.
[Refactor] Add a test-unit.yml workflow covering core modules
- Location:
.github/workflows/test-model-fast.yml:39 only runs pytest test/cli/test_cli_test_model_smoke.py; the remaining 183 test files are not executed in any CI.
- Current State: passes / engine / search / model / evaluator / systems / hardware — all core directories have zero CI protection.
- Impact: Any regression can only be discovered through manual testing or production exposure.
- Recommended Action: Add
.github/workflows/test-unit.yml covering at least the hardware/network-dependency-free tests in test/engine/, test/search/, test/model/, test/evaluator/, test/hardware/, test/passes/onnx/; use workflow_dispatch or a separate matrix for GPU/QNN/OpenVINO suites.
[Quick Fix] Three locations perform direct HF downloads with no skip guard
- Location:
test/passes/onnx/test_extract_adapters.py:37 (AutoModelForCausalLM.from_pretrained("hf-internal-testing/tiny-random-LlamaForCausalLM"), module scope)
test/passes/onnx/test_nvmo_graph_surgery.py:34 (AutoConfig.from_pretrained("Qwen/Qwen2.5-0.5B"), ~700 MB)
test/cli/test_cli.py:652 (HF download with no skip guard)
- Current State: When the network is unavailable, tests emit a connection error rather than skipping; large model downloads significantly slow CI.
- Recommended Action: Register a
@pytest.mark.network marker (see CLI/CI items) and uniformly guard with skipif(not _network_available()); prefer local synthetic models (see test_cli_test_model_smoke.py::_save_local_tiny_llama).
Documentation
[Quick Fix] docs/source/conf.py version is inconsistent with the actual package version
- Location:
docs/source/conf.py:22 (version = "0.12.1") vs olive/version.py:1 (__version__ = "0.11.0.dev0")
- Current State: Hard-coded version number is one minor version ahead of the actual package version.
- Impact: The version switcher shows the wrong version; documentation does not match release artifacts; users and contributors reference incorrect APIs.
- Recommended Action: Switch to dynamic reading:
from olive.version import __version__ as version, and set release = version accordingly.
[Quick Fix] docs/architecture.md Pass base class API has changed but documentation has not been updated
- Location:
docs/architecture.md:47-48 (example uses @abstractmethod def default_config) vs olive/passes/olive_pass.py:318 (actual: _default_config, private method with different signature)
- Impact: New contributors following the documentation to implement a custom Pass will get an "abstract method not implemented" error.
- Recommended Action: Update the example to
_default_config(cls, accelerator_spec); also update docs/architecture.md:79-87 — the outdated list of only 9 passes should be replaced with a link to reference/pass.rst; fix the Unicode em-dash at docs/architecture.md:363 (python –m olive.workflows.run → olive run --config).
[Quick Fix] Mark auto-opt as deprecated at all entry points and fix the default log level
- Location:
olive/cli/auto_opt.py:184 emits deprecation warning via logger.warning(...)
olive/cli/base.py:394 default --log_level=3 (ERROR) is above WARNING (2), so the warning is never shown by default
docs/source/how-to/cli/cli-auto-opt.md:1 does not note the deprecation
docs/source/reference/cli.rst:91-92 does not note the deprecation
notebooks/olive_quickstart.ipynb:42 still installs olive-ai[auto-opt]
- Current State: The command is marked deprecated, but users see no indication of this at any of three entry points (CLI runtime, documentation, notebook).
- Impact: Users continue learning and integrating a deprecated command; widespread breakage when it is eventually removed.
- Recommended Action: ① Change
logger.warning to logger.error or print print("DEPRECATED: use 'olive optimize'", file=sys.stderr) at subcommand registration; ② add > ⚠️ Deprecated: use \olive optimize`at the top ofcli-auto-opt.mdandcli.rst; ③ insert a deprecation notice at the top of --helpoutput; ④ update the quickstart notebook to useolive optimize`.
CLI / UX
[Quick Fix] --num-splits and other parameters globally mix kebab-case and snake_case
- Location:
olive/cli/auto_opt.py:143-170 (kebab: --num-splits, --cost-model, etc.)
olive/cli/optimize.py:100-184 (snake: --num_split, --dim_param, etc.; synonym parameters named differently from auto_opt.py)
olive/cli/run.py:25-41 (mixed in same file: --run-config, --package-config kebab; --list_required_packages snake)
olive/cli/base.py:649-667 (add_shared_cache_options uses --account_name/--container_name) vs olive/cli/shared_cache.py:37-44 (--account/--container)
- Current State: The same concept uses different separators across different commands and even within the same file; Azure credential parameter names differ between two implementations — an observable bug.
- Impact: Users switching between commands must consult help every time; scripts are hard to reuse; Azure credential confusion is a real, observable bug.
- Recommended Action: Standardize on snake_case (consistent with the majority of existing CLI); use argparse
add_argument(..., dest=...) to retain kebab-case aliases for backward compatibility, printing a DeprecationWarning during the transition period. Standardize Azure credentials as --account_name / --container_name.
[Quick Fix] type=bool in capture-onnx-graph does not work as expected
- Location:
olive/cli/capture_onnx.py:152-163 (--exclude_embeds, --exclude_lm_head, --enable_cuda_graph, all type=bool, default=False)
- Current State: argparse's
type=bool evaluates any non-empty string as True (bool("false") == True), so --exclude_embeds false still yields True.
- Impact: Users believe they are disabling a feature when in fact it is being enabled — silently incorrect behavior.
- Recommended Action: Change to
action="store_true", or define an explicit def str2bool(v): ... that accepts true/false/0/1.
[Quick Fix] CLI ValueError thrown directly causes users to see tracebacks
- Location:
olive/cli/base.py:217-218, 375, olive/cli/auto_opt.py:248-249, 300, olive/cli/optimize.py:260-290, olive/cli/capture_onnx.py:237-241, olive/cli/run_pass.py:134
- Current State: Argument validation errors raise
ValueError, triggering a full Python traceback and exit(1); this is mixed with argparse's parser.error() (exit 2).
- Impact: Poor user experience; scripts cannot distinguish argument errors from runtime errors via exit code.
- Recommended Action: Standardize argument validation to use
parser.error("..."); retain ValueError only for genuine runtime errors.
[Quick Fix] Mixed print() and logger.* in CLI; multiple debug prints pollute stdout
- Location: 40+
print calls throughout olive/cli/, most notably:
olive/cli/quantize.py:145 print(f"pass list: {pass_list}")
olive/cli/quantize.py:184 print(f"selected pass configs: {passes_dict}")
olive/cli/quantize.py:129-132 uses print for warnings
olive/cli/run_pass.py:198-200 uses print for errors then calls exit(0)
olive/cli/base.py:107-125 mixes print and logger in the same function
- Current State: Debug-level output cannot be controlled via
--log_level; error output goes to stdout instead of stderr; hard to parse in pipelines.
- Recommended Action: Standardize: use
print for user-facing success/progress/result paths; use logger.* (going to stderr) for diagnostics/warnings/errors; remove obvious debug prints; the error path in run_pass.py:198-200 should call sys.exit(1).
Dependencies / Packaging
[Quick Fix] Add upper bounds to critical dependencies in requirements.txt
- Location:
requirements.txt:1-14 and test/requirements-test.txt:18,24,40, test/requirements-test-gpu.txt:4, test/requirements-test-openvino.txt:6,7
- Current State: Production
requirements.txt lists numpy, torch, transformers, onnx, onnxscript without upper bounds; tests already document known breaking changes at onnx<1.21.0 (symlink issue), torch<2.11.0, transformers<5.4.0, onnxscript<0.6.1.
- Impact: Any upstream major release may cause a fresh
pip install olive-ai to fail immediately; poor reproducibility in user environments.
- Recommended Action: Sync the test-documented upper bounds into production
requirements.txt (e.g., numpy<2, onnx<1.21.0, torch<2.11.0, transformers<5.4.0, onnxscript>=0.5.3,<0.6.1); explicitly raise bounds via dependabot/manual PR in each subsequent release.
[Quick Fix] Fix the inconsistency between ruff target-version and python_requires
- Location:
pyproject.toml:84 (target-version = "py39") vs setup.py:77 (python_requires=">=3.10"); and comments at pyproject.toml:135, 176
- Current State: ruff still checks against py39, missing lint rules specific to 3.10+ syntax (e.g., stricter
B905 patterns, PEP 604 union types).
- Impact: Lint false negatives; certain anti-patterns may slip through.
- Recommended Action: Change to
target-version = "py310"; remove legacy comments advising to "avoid" 3.10 constructs and re-audit the ignore list.
[Quick Fix] Remove or update the uninstallable tf extra (tensorflow==1.15.0)
- Location:
extra_dependencies.tf in olive/olive_config.json, olive/model/handler/tensorflow.py:31 (load_model directly raises NotImplementedError), olive/cli/configure_qualcomm_sdk.py:21, olive/platform_sdk/qualcomm/configure/configure.py:84
- Current State: TF 1.15.0 does not support Python 3.10+; it cannot be installed under the current
python_requires; TensorFlowModelHandler itself is not implemented.
- Impact:
pip install olive-ai[tf] will always fail, signaling to users that "this extra is dead."
- Recommended Action: Remove the
tf extra, or update it to modern TF (only if actually needed); synchronously delete TensorFlowModelHandler or mark it deprecated.
CI/CD
[Quick Fix] All GitHub Actions references use unpinned SHAs (supply-chain risk)
- Location: 10+ mutable tags across 4 workflows:
lint.yml:19, 21, 32, 73, codeql.yml:51, 66, 80, 85, docs.yml:30, test-model-fast.yml:21
- Current State: All actions use mutable tags (
@v3 / @v4 / @v5), including two third-party actions (reviewdog/action-misspell@v1, advanced-security/dismiss-alerts@v1).
- Impact: If a tag is hijacked or an account is compromised, arbitrary code can be executed on the runner — a standard GitHub Actions supply-chain attack vector.
- Recommended Action: Pin all actions to SHA (e.g.,
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2); add .github/dependabot.yml to enable automatic updates for github-actions and pip.
[Refactor] Enable pip caching and dependency security scanning
- Location:
lint.yml:49-53, test-model-fast.yml:23-27, docs.yml:32-36 (all missing cache: pip); no pip-audit/safety in .github/workflows/
- Current State: Every CI run re-downloads ML dependencies (5–15 minutes); no dependency vulnerability scanning; CodeQL only scans source code.
- Recommended Action: Add
cache: "pip" and cache-dependency-path: ... to all setup-python@v5 steps; add pip-audit -r requirements.txt to the scheduled trigger in codeql.yml.
Performance
[Refactor] Passes / search points are strictly serial — no multi-core utilization in search mode
- Location:
olive/engine/engine.py:620-631 (_run_passes), olive/engine/engine.py:409-451 (_run_search), olive/search/search_strategy.py:210-257 (synchronous generator)
- Current State: All passes within a single workflow are executed serially; in search mode each search point is evaluated completely serially, waiting for
record_feedback_signal before generating the next.
- Impact: Workflows with 5+ passes × hundreds of search points have end-to-end time that is purely additive; multi-core CPUs sit idle.
- Recommended Action: In search mode, introduce
concurrent.futures.ProcessPoolExecutor for batch evaluation of independent search points at the same level; introduce an async feedback mechanism so the sampler can submit batches simultaneously.
[Refactor] PythonEnvironmentSystem spawns a new subprocess for every pass / evaluation
- Location:
olive/systems/python_environment/python_environment_system.py:61-93
- Current State: Every
run_pass / evaluate_model creates a temporary directory, writes JSON, calls subprocess.run to start a new Python interpreter, reads JSON, and deletes the directory.
- Impact: In search mode with hundreds of iterations, each incurring 200ms–1s process startup overhead; cumulative JSON I/O is significant.
- Recommended Action: Implement a persistent worker process (e.g., using
multiprocessing.Process + Queue or ZeroMQ) that communicates via stdin/stdout/IPC and reuses the interpreter across iterations.
Medium-Priority Recommendations
Architecture / Code Quality
- [Quick Fix]
cache.py:216 pass_name: int type annotation is incorrect: The call site (engine.py:739) actually passes a str; change to pass_name: str.
- [Quick Fix]
Engine.__init__ negation-style parameter naming is inconsistent: engine.py:59,82 has no_artifacts: bool=False immediately assigned to internal skip_saving_artifacts; change to positive naming save_artifacts: bool=True or align the parameter name with the attribute name.
- [Refactor]
olive/evaluator/olive_evaluator.py is a 2059-line god module: Contains 10 classes (OliveEvaluator, per-backend evaluators, OliveEvaluatorConfig); split by backend into base.py, onnx.py, pytorch.py, openvino.py, qnn.py, lm.py, mteb.py, with __init__.py maintaining compatible exports.
- [Refactor]
olive/engine/engine.py is 844 lines with 4 TODOs: Mixes scheduling, caching, search expansion, evaluation, and packaging responsibilities; extract Scheduler / EvaluationRunner following the existing FootprintNode/WorkflowOutput approach.
- [Refactor]
olive/common/utils.py is a 721-line / 38-function grab-bag: Mixes hashing, subprocess, torch tensor, file I/O, and HF integration; split by responsibility into hashing.py/io_utils.py/torch_utils.py/dict_utils.py; move hf_repo_exists to olive/common/hf/.
- [Refactor]
olive/cli/base.py is 768 lines mixing ABC with utility functions: Extract add_input_model_options, add_hf_test_model_config, get_input_model_config, and similar into olive/cli/utils.py/olive/cli/model_options.py; keep only BaseOliveCLICommand in base.py.
- [Quick Fix]
olive/passes/onnx/common.py is depended on across backends: olive/passes/openvino/quantization.py:17 and olive/passes/quark_quantizer/quark_quantization.py:21 import model_proto_to_file / model_proto_to_olive_model / get_external_data_config; move backend-agnostic get_external_data_config to passes/common/; keep ONNX-specific utilities in passes/onnx/common.py.
- [Quick Fix] PyTorch backend pass names lack a consistent prefix:
olive/passes/pytorch/ contains Gptq, Rtn, LoRA, AutoAWQQuantizer, GptqQuantizer with inconsistent naming; uniformly add a Pytorch prefix (matching the ONNX backend convention) or add a lint check in Pass.__init_subclass__.
- [Refactor]
olive.search depends on MetricResult: olive/search/search_results.py:12, olive/search/search_strategy.py:23, olive/search/samplers/search_sampler.py:14 (though TYPE_CHECKING); define ScoreMap = dict[str, float] or a Protocol; have Engine convert MetricResult before passing it in.
- [Quick Fix]
olive.platform_sdk has a reverse dependency on olive.cli: olive/platform_sdk/qualcomm/configure/__main__.py:9 calls from olive.cli.launcher import legacy_call; inline the needed logic or define an independent run_configure() in platform_sdk.
- [Quick Fix] Adapter handling code is duplicated in 3 places:
olive/passes/pytorch/autoawq.py:131-138, autogptq.py:119-131, gptqmodel.py:119-128 are nearly identical; extract detach_adapter(model, backend_name) -> tuple[ModelHandler, Optional[str]] into olive/passes/pytorch/common.py.
- [Refactor]
HfModelHandler has a 5-level MRO plus dead composite_parent code: olive/model/handler/hf.py:29, olive/model/handler/pytorch.py:25 already carry # pylint: disable=too-many-ancestors; olive/model/handler/base.py:45 composite_parent has no consumers anywhere in the codebase. Change DummyInputsMixin and similar to composition (handler.dummy_input_provider: DummyInputProvider); delete composite_parent.
Testing
- [Refactor]
test_python_environment_system.py creates a real venv and runs pip install inside a fixture: test/systems/python_environment/test_python_environment_system.py:34-52; mock run_subprocess or move to test/systems/integration/ and add @pytest.mark.integration.
- [Refactor]
test_extract_adapters.py module-scope fixture runs an end-to-end pipeline: test/passes/onnx/test_extract_adapters.py:27-65 includes HF download + PEFT + ONNX export; replace with a local synthetic model, or mark as @pytest.mark.integration.
- [Quick Fix]
pyproject.toml registers only 1 marker and does not enable strict mode: Currently only openvino; add gpu, cuda, network, qnn, qairt, integration, and enable --strict-markers.
- [Refactor]
olive/telemetry/ and olive/platform_sdk/qualcomm/ have zero test coverage: Telemetry covers a user-privacy-critical path; add test/telemetry/test_telemetry.py with at minimum disable/enable toggle coverage and "no send when disabled"; remove the empty test/snpe/ shell or add a README, and add mock-SDK unit tests in test/passes/qairt/ or a new test/platform_sdk/.
- [Refactor]
test/conftest.py:15-30 autouse session fixture slows startup: Imports transformers and datasets, modifies transformers.utils.hub.TRANSFORMERS_CACHE globally; change to non-autouse explicit dependency, or configure via the TRANSFORMERS_CACHE environment variable in CI.
- [Refactor]
requirements-test.txt mixes in platform-specific heavy dependencies: cupy-cuda12x, nncf>=2.19.0, openvino>=2025.4.1, neural-compressor<2.4 in the base file cause install failures on CPU/AMD/Apple Silicon; split into a minimal base set plus requirements-test-{gpu,openvino,...}.txt.
Documentation
- [Quick Fix]
cli-auto-opt.md and reference/cli.rst:91-92 do not note the deprecation: (Already merged with the CLI high-priority item.)
- [Quick Fix]
how-to/index.md:8 mentions onnx-graph-capture but no corresponding how-to page exists: Add cli-capture-onnx-graph.md or remove the mention from the index.
- [Quick Fix]
NEWS.md was last updated in Feb 2025: README contains Aug/Oct 2025 entries; backfill v0.8.0, v0.9.0, v0.9.2 release entries for Mar–Oct 2025.
- [Refactor] Improve docstring coverage across core modules:
engine.py:49 (Engine.__init__), engine.py:241 (run_accelerator), pass_config.py:98 (BasePassConfig), onnx.py:45 (OnnxModelHandler.__init__) all lack docstrings; prioritize public abstract methods.
- [Quick Fix] CONTRIBUTING.md / AGENTS.md lack docs build instructions: Add
cd docs && pip install -r requirements.txt && make html.
- [Quick Fix]
docs.yml:41-42 inlines onnxruntime/torch without pinned versions: Move to docs/requirements.txt with pinned versions.
CLI / UX
- [Quick Fix]
--log_level accepts integer 0–4 with no choices=: olive/cli/base.py:390-398; change to string-based ["debug","info","warning","error","critical"] or at minimum add choices=range(5); also introduce -v/-q Boolean shortcut flags (consistent with git/pip/uv).
- [Quick Fix]
run command lacks --dry_run: olive/cli/run.py does not call add_save_config_file_options; inconsistent experience compared to optimize and other commands.
- [Quick Fix]
quantize --algorithm help text is semantically incorrect: olive/cli/quantize.py:65-66 says "List of", but the parameter accepts only a single value; change to "The quantization algorithm to use."
- [Quick Fix]
tune-session-params help references non-existent parameters: olive/cli/session_params_tuning.py:30-36 mentions --model / --hf_model_name, but the actual parameter is --model_name_or_path.
- [Quick Fix]
--disable_force_evaluate_other_eps is a double negative: olive/cli/session_params_tuning.py:78-84; change to --evaluate_all_eps (positive) with inverted default.
- [Refactor]
olive --help has no grouping: olive/cli/launcher.py:37-59 lists 19 subcommands flat, with auto-opt registered before optimize; group by "core workflows / model conversion / adapter management / utilities", with deprecated commands listed last.
- [Quick Fix]
@action decorator placement is inconsistent: olive/cli/run_pass.py:22-23 decorates the class, while other commands decorate the run() method (e.g., finetune.py:82); standardize to run().
- [Quick Fix]
diffusion_lora.py bypasses add_input_model_options: olive/cli/diffusion_lora.py:34-47 defines custom -m/--model_name_or_path, missing --trust_remote_code and --test; use the shared helper.
- [Quick Fix]
--memory help does not mention supported string formats: olive/cli/base.py:704-709 says "in bytes", but type=AcceleratorSpec.str_to_int_memory accepts "4GB"; add an example.
Dependencies / Packaging
- [Quick Fix]
setup.py:84 package_data references non-existent package olive.auto_optimizer: Dead entry; remove or confirm the package exists.
- [Quick Fix]
onnxscript upper and lower bounds conflict: Production requirements.txt:4 >=0.5.3 with no upper bound; tests test/requirements-test.txt:24 <0.6.1; unify by tightening.
- [Quick Fix]
openvino extras and test versions are misaligned: olive_config.json specifies openvino>=2025.4.1, nncf>=2.19.0; test/requirements-test-openvino.txt:6,7 specifies openvino>=2025.3.0, nncf>=2.18.0; align them.
- [Quick Fix] Missing aggregate extras (
all / dev): Makes one-command CI and developer installs inconvenient.
- [Refactor]
finetune/bnb/lora extras overlap: finetune ≈ bnb + lora + optimum is not expressed using olive-ai[bnb,lora] syntax; high maintenance cost to keep in sync.
CI/CD
- [Quick Fix]
docs.yml is missing a permissions: declaration: Add explicit permissions: contents: read; also add permissions: {} at the top level of each workflow as a default, elevating at the job level as needed.
- [Quick Fix]
optional-lint job has excess security-events: write: .github/workflows/lint.yml:16-17; that job does not upload SARIF; remove under the principle of least privilege.
- [Quick Fix]
lint.yml/test-model-fast.yml/docs.yml do not trigger on rel-*: Release candidate branches are not running tests or building docs; explicitly add rel-*.
- [Refactor] Introduce a platform/Python version matrix: Currently all workflows run on
ubuntu-latest + Python 3.12; add windows-latest and python-version: ["3.10","3.12"] (at minimum in lint and smoke tests).
- [Refactor] Introduce
pip-audit and mypy/pyright static type checking: Add pip-audit -r requirements.txt to the scheduled trigger in codeql.yml; incrementally enable mypy --ignore-missing-imports olive/ in lint.
- [Quick Fix] pre-commit and CI lint check sets are inconsistent:
PYLINT, TOML-SORT, EDITORCONFIG-CHECKER, NOQA only in CI; check-yaml, absolufy-imports only in pre-commit; unify by having pre-commit invoke lintrunner.
Performance
- [Refactor]
engine.py:306, 398 unnecessary model_dump() + validate_config() round-trip in search hot path: Use pass_config.model_copy() or pass by reference directly.
- [Refactor]
footprint.py:91-102, 230-254 every record() call triggers a full _resolve_metrics() — O(n²): Change to lazy evaluation (call only before create_pareto_frontier) or incremental computation.
- [Refactor]
footprint.py:270-303 Pareto frontier uses a double loop — O(n²): Noticeable at larger search spaces; replace with a divide-and-conquer Pareto algorithm or sorting-based approach.
- [Refactor]
DockerSystem.run_workflow destroys the container on every run: olive/systems/docker/docker_system.py:108-142; support container reuse.
- [Quick Fix]
local.py:46 ModelConfig.from_json(output_model.to_json()) is an unnecessary round-trip: Return output_model.to_model_config() directly.
- [Refactor]
onnx.load(self.model_path) loads large models fully into memory: olive/model/handler/onnx.py:110-111; in large LLM scenarios every pass does a full load; prefer load_external_data=False in lightweight inspection paths.
- [Refactor] Multiple
onnx.load calls on the same file inside quantization passes: olive/passes/onnx/quantization.py:265, 477, 510, 540; pass ModelProto within the pass instead of repeatedly re-reading from disk.
- [Quick Fix]
cache.py:185, 227, 263 JSON indent=4 increases I/O: Use indent=None in production mode.
- [Quick Fix]
cache.py:290 cache_id is only 8 hex characters: 32-bit space has collision risk in shared caches; extend to 16 characters or add a pass prefix.
Low-Priority Recommendations
Code Quality / Style
pyproject.toml globally disables all complexity rules in pylint: too-many-arguments / branches / locals / nested-blocks / return-statements / statements, broad-exception-caught, unused-argument are all globally disabled; exempt per module at most, not globally, to catch newly introduced violations. The same applies to ruff ignore: D100-D107 fully off, RUF013 off due to a "Python 3.10" comment (contradicting target=py39), PLW0603/B028 marked "temp" but now permanent.
olive/passes/onnx/conversion.py:41 file-level # pylint: disable=W0212: Scope is too broad; change to inline suppression.
olive/common/utils.py:76 internal variable md5_hash in hash_string but the algorithm is SHA-256: Misleading; rename to sha256_hash or hasher (same issue in hash_io_stream/hash_file/hash_dict).
- 100+ untracked TODO/FIXME items:
olive/engine/engine.py:560, olive/passes/pytorch/lora.py:360 (functional bug num_items_in_batch), olive/passes/onnx/graph_surgeries.py:170,184 ("incorrect"), three similar adapter issues in olive/passes/pytorch/*; enable ruff TD003 to require each TODO to link to an issue.
olive/engine/engine.py:462-485 _dump_run_history temporarily mutates global verbosity: Anti-pattern (thread-unsafe); control via module logger level or a parameter instead.
- Return type annotation coverage across passes/ is ~33%: Annotate public and abstract methods with
->.
- Multiple
_run_for_config implementations exceed 200 lines: diffusers/lora.py:231 _train_sd (356), _train_flux (316), onnx/mnb_to_qdq.py:89 (271), openvino/optimum_intel.py:349 (246), onnx/split.py:42 (211), onnx/model_builder.py:210 (208), pytorch/selective_mixed_precision.py:394 compute_module_stats (207); extract _prepare_*/_post_process_* private methods.
olive/data/config.py:24, 44 DataConfig has 4 fixed component fields: Adding a new data component type requires changing the class; follow pass_config.py and change to components: dict[str, DataComponentConfig].
Testing
- Engine / Workflow test names do not follow AGENTS.md convention:
test/engine/test_engine.py:46,76,93,112, test/workflows/test_workflow_run.py use test_register, test_run, etc., lacking "expected behavior"; rename to test_<func>_<expected>[_when_<cond>].
- Key subdirectories lack conftest files:
test/engine/, test/search/, test/model/, test/systems/; convert factory functions in test/utils.py to conftest fixtures.
test_aimet_quantization.py uses skipif(CUDA_AVAILABLE) to exclude GPU environments: test/passes/onnx/test_aimet_quantization.py:144,197,223,269; use a TEST_DEVICE environment variable or a dedicated marker instead.
test/assets/ contains only user_script.py: No shared ONNX files or golden outputs; commit common dummy ONNX files as static assets to reduce runtime generation.
olive/exception.py / olive/logging.py have no tests: Add test/test_exception.py / test/test_logging.py.
Documentation
- Minor README / getting-started issues:
README.md:98 "the the"; README.md:71-79 instructions mention Llama-3.2-1B-Instruct but the actual command uses Qwen2.5-0.5B-Instruct; docs/source/getting-started/getting-started.md:55 "60secs" is a significant underestimate.
notebooks/README.md:3,6 typos: "ocmmand" → "command", "hot wo" → "how to".
docs/architecture.md is not in the Sphinx toctree: Users cannot find it from the documentation site; move into docs/source/ or add to the toctree.
docs/source/conf.py:65 suppress_warnings = ["myst.xref_missing"]: Suppresses missing internal cross-reference warnings, potentially masking broken links; at least keep a version with this enabled in CI.
docs/source/index.md:30 features/index links to .rst while others use .md: Style inconsistency.
- No Python API example notebooks: All existing notebooks are CLI-based.
CLI / Exit Codes
- Exit codes are not standardized: Argument errors use
raise ValueError (exit 1) and parser.error() (exit 2) inconsistently; scripts cannot distinguish them.
run_pass.py:183-200 _list_passes has no --json format: Inconvenient for tool integration.
- Internal single-letter pass keys leak into
--save_config_file output: capture_onnx.py:340-347 ("c"/"m"/"b"/"f"), generate_adapter.py:89 ("e"), finetune.py:159 ("f"); change to readable key names.
--modality choices=["text"] has only one option: olive/cli/optimize.py:155-163; hide or mark as internal.
session_params_tuning.py:149 prompts to "set log_level to 1" but 1 is INFO; debugging requires DEBUG (0).
Dependencies / Packaging
pyproject.toml lacks [build-system] and [project]: Not PEP 517/518 compliant; inconvenient for dependabot/pip inspect.
triton is a hard dependency in bnb/finetune extras: Linux + CUDA only; will fail in non-GPU environments.
hf-xet is a core dependency: It is a performance acceleration package, not a functional dependency; consider moving to an optional extra.
MANIFEST.in does not explicitly include olive data files: sdist may miss data files (wheel is fine via package_data).
setup.py:47-48 is missing macOS classifier: Olive is pure Python and can run on macOS.
setup.py:40 "3 - Alpha": Consider advancing to Beta.
test/requirements-test.txt:14 neural-compressor<2.4 has no explanatory comment.
CI/CD / Performance / Other
.github/workflows/test-model-fast.yml does not upload failure artifacts: Logs are destroyed when the runner exits; add actions/upload-artifact@v4 with if: failure().
lint.yml:72 SARIF upload has continue-on-error: true: Silent failure may silently drop security scan results.
docs.yml:43 make linkcheck may block PRs due to unavailable external links: Consider running only on schedule trigger or adding --ignore patterns.
quantization.py:306-337 deepcopy on a module-level config dict: Minor overhead; can be changed to a shallow copy.
cache.py:795-806 evaluation cache is keyed only on model_id, without an evaluator configuration hash: If a user changes the evaluation dataset but the pass configuration is unchanged, the cache does not invalidate; introduce an evaluator hash.
search/search_results.py:36 uses None-padded sparse list: Change to dict[int, tuple].
optuna_sampler.py:73-78 while True loop skips duplicate points: May produce many wasted trials when the search space is nearly exhausted.
OnnxQuantization._initialize() depends on destructor to clean up temporary directory: quantization.py:289; exception paths may leak; change to a contextmanager or try/finally.
olive/passes/openvino/optimum_intel.py:275 silently except ImportError: pass: Users see an AttributeError instead of a "missing dependency" message.
telemetry_extensions.py:57, 126 @action uses inspect.stack(): Currently only decorates Engine.run() so risk is limited; future expansion will require session-level caching.
Appendix: Per-Dimension Investigation Summaries
Architecture (findings-arch.md)
Olive has clear inheritance contracts and a Pydantic v2 configuration foundation across the Pass / Engine / System / Evaluator / Search / Model layers — a reasonable overall structure for a mid-to-large project. The main architectural debt concentrates in reverse dependencies (systems → workflows, cache → model/passes, passes → evaluator, platform_sdk → cli, model/handler/mixin → cache) and god modules (olive_evaluator.py at 2059 lines, engine.py at 844 lines, cli/base.py at 768 lines, common/utils.py at 721 lines). The HfModelHandler 5-level MRO and dead composite_parent code need cleanup. Full 14 findings in findings-arch.md.
Code Quality (findings-quality.md)
Key metrics: largest single function is 1207 lines (get_had172), 68 bare except Exception catches, 100+ TODO/FIXME items, pylint globally disabling 37 complexity rules. The most critical issues are critical failures silently swallowed in cache.py, missing exc_info in _run_search exception handling, and the matrix data embedded as code in hadamard_utils.py. The CLI layer mixes print/logger, and the "incorrect" surgeons in graph_surgeries.py remain production-accessible. Full 17 findings in findings-quality.md.
Testing (findings-tests.md)
Of 184 test files, only 1 runs in CI; test_slicegpt.py has an or True permanent skip; test_python_environment_system.py and test_extract_adapters.py are end-to-end integration tests disguised as unit tests; 3 locations perform HF downloads with no skip guard; telemetry / platform_sdk have zero tests; pyproject.toml registers only the openvino marker. Engine/Workflow test method names do not follow AGENTS.md convention. Full 15 findings in findings-tests.md.
Documentation (findings-docs.md)
README + Sphinx main structure is good (CLI docs strongly synchronized with code via sphinxarg.ext), but there are 3 P0 issues: conf.py version 0.12.1 vs actual 0.11.0.dev0; architecture.md Pass base class API has changed (default_config → _default_config) without being updated; auto-opt deprecated status is not communicated at any of three entry points (runtime warning filtered by default log_level, cli-auto-opt.md, reference/cli.rst). Docstring coverage is 22–47%. Full list in findings-docs.md.
CLI (findings-cli.md)
The biggest issue is inconsistent parameter naming style (kebab-case and snake_case mixed across different commands and even within the same file, with the Azure credential parameter names --account_name vs --account being a real observable bug); type=bool in capture-onnx-graph does not work as expected; multiple raise ValueError calls expose raw tracebacks; debug print calls in quantize.py pollute stdout; --log_level uses integers without a --verbose/--quiet shortcut; olive --help lists 19 subcommands flat with no grouping. Full 10 sections in findings-cli.md.
Dependencies / Packaging (findings-deps.md)
H1: ruff target-version="py39" contradicts python_requires=">=3.10"; H2: numpy/torch/transformers/onnx have no production upper bounds but tests document known breaking changes; H3: tf extra tensorflow==1.15.0 cannot be installed on Python 3.10+. Medium-priority issues: onnxscript bounds conflict, openvino extras/test version mismatch, dead olive.auto_optimizer entry in setup.py package_data, missing aggregate all extra. Full 11 sections in findings-deps.md.
CI/CD (findings-ci.md)
4 workflows (lint/codeql/docs/test-model-fast). Core issues: unit tests are almost entirely absent from CI (183/184 not running); all GitHub Actions references use unpinned mutable tags (supply-chain risk); no Dependabot; no platform/Python version matrix (all ubuntu+3.12); no pip caching; no static type checking or dependency security scanning; pre-commit and CI lint check sets are inconsistent. Full 12 sections in findings-ci.md.
Performance (findings-perf.md)
13 performance anti-patterns, with 3 high-severity: search/pass execution is fully serial with no parallel evaluation, PythonEnvironmentSystem spawns a new subprocess per iteration, ONNX models are loaded fully into memory with no streaming. Medium-severity issues include Pydantic hot-path round-trips, O(n²) Pareto frontier, and repeated onnx.load calls inside quantization passes. Adding new Passes / Systems / Samplers is low-cost; the extensibility design is sound. Telemetry concurrency design is correct. Full list in findings-perf.md.
Report generated: 2026-06-12 · Strictly read-only review. No modifications were made to the Olive repository. All artifacts are stored in the session files/ directory.
Microsoft Olive — Read-Only Codebase Review
This is a read-only static-analysis review. No files in the repository were modified during this review.
Executive Summary
Olive is a well-organized, clearly abstracted multi-backend model optimization toolkit. The core abstractions — Pass / Engine / System / Evaluator / Search — have well-defined inheritance contracts, a consistent Pydantic v2 configuration system, and a CLI entry point strongly synchronized with the code through argparse and auto-generated docs. However, the eight-dimension review uncovered several systemic issues concentrated in three areas: weak quality safeguards, layer-coupling leakage, and unreliable user-visible behavior.
Overall health: Above-average design paired with below-average quality guardrails. Without adding CI test gates, fixing the
or Truepermanent skip, and tightening dependency upper bounds, a single upgrade of numpy / torch / transformers could cause production-grade regressions that go undetected.Top 5 High-Priority Issues (ranked by risk):
test/cli/test_cli_test_model_smoke.py) runs in GitHub Actions. All core modules — passes / engine / search / evaluator / model / systems — have zero CI coverage. Any regression will only be discovered through user reports. (findings-tests item 1, findings-ci §2.1)requirements.txtlistsnumpy / torch / transformers / onnx / onnxscriptwithout upper bounds, yet test files already document known breaking changes atonnx<1.21.0,torch<2.11.0,transformers<5.4.0,onnxscript<0.6.1; meanwhilepyproject.tomlrufftarget-version="py39"contradictssetup.pypython_requires=">=3.10", silently invalidating 3.10+-specific lint rules. (findings-deps H1/H2/M1/M2, L3)test_slicegpt.py:16usesor Trueto force-skip and hide failures;graph_surgeries.py:170,184RenameInputs/RenameOutputscarry a "This is incorrect" TODO yet remain registered in the surgeon table and are user-accessible; theauto-optdeprecation warning is never shown because the default--log_level=3(ERROR) is higher than WARNING. All three share the same pattern: users and developers cannot see "known-bad" state. (findings-tests item 3, findings-quality item 8, findings-cli §10.1)olive/cache.pyare silently swallowed:cache_model/load_model/cache_runand other critical paths useexcept Exception: logger.exception(...)and then continue executing rather than re-raising, creating a "cache miss" illusion and hard-to-diagnose downstream errors;cache.pyalso creates a reverse dependency by importingolive.model.ModelConfigandolive.passes.onnx.common.resave_model, breaking the layering. (findings-quality item 2, findings-arch item 2)systems → workflowsandpasses → evaluatorreverse dependencies:DockerSystemdirectly importsolive.workflows.run.config.RunConfig;workflow_runner.pycallsfrom olive.workflows import run; two ONNX passes (inc_quantization.py,session_params_tuning.py) constructOliveEvaluatorConfigdirectly inside_run_for_configto invoke the evaluator. Both reverse-dependency patterns prevent systems / passes from being tested or evolved independently. (findings-arch items 1 and 3)All high / medium / low priority recommendations are listed below, grouped by dimension. Related issues (e.g., multiple missing
auto-optdeprecation notices, conflictingonnxscriptbounds, CLI snake/kebab mixing) are merged into single items retaining allfile:linereferences.High-Priority Recommendations
Architecture
[Quick Fix] Disable or fix Surgeons marked "incorrect" in
graph_surgeries.pyolive/passes/onnx/graph_surgeries.py:170-195(RenameInputs/RenameOutputsclass definitions, carrying# TODO(anyone): This is incorrect, remove or fix)olive/passes/onnx/graph_surgeries.py:3273-3278(registered in the surgeon table, user-accessible via configuration)surgeon: "RenameInputs", a known-broken code path executes without error, silently producing an incorrect ONNX model.NotImplementedError("RenameInputs is known to be incorrect; see TODO at line 170")inside_run_for_config, or remove the entries from the surgeon registry; restore once fixed.[Refactor] Eliminate the reverse dependency from
olive.systemstoolive.workflowsolive/systems/docker/docker_system.py:22(from olive.workflows.run.config import RunConfig),olive/systems/docker/workflow_runner.py:8(from olive.workflows import run as olive_run)systemscannot be tested or released independently ofworkflows; any workflows refactor breaks systems.DockerRunSpec(containing only runtime-agnostic fields like host/target/accelerator/resources) insidesystems/; letworkflowsconvertRunConfigtoDockerRunSpecbefore passing it in; moveworkflow_runner.pyfromsystems/docker/toworkflows/orcli/.[Refactor] Eliminate the reverse dependency from
olive/passestoolive/evaluatorolive/passes/onnx/inc_quantization.py:19-21olive/passes/onnx/session_params_tuning.py:17-19OliveEvaluatorConfigdirectly inside_run_for_configevaluation_callback: Optional[Callable]parameter to thePassbase class, injected by theEngineat dispatch time; passes obtain evaluation results via the callback and are prohibited from directlyimport olive.evaluator.[Refactor] Eliminate
olive/cache.pyreverse dependencies on domain layersolive/cache.py:22(from olive.model.config.model_config import ModelConfig),olive/cache.py:483(lazy importfrom olive.passes.onnx.common import resave_model),olive/model/handler/mixin/mlflow.py:10(from olive.cache import OliveCache)cache.pyshould be infrastructure, yet it depends onModelConfigand a concrete pass; inversely,MLFlowTransformersMixindepends onOliveCache.from_cache_env()to read the cache directory from environment variables.resave_modelcall with a protocol/hook injected by the caller (engine); havecache_modelaccept a serialized dict instead of theModelConfigtype; move MLFlow model-merge logic intoEngine.initialize()or a pre-processing step inLocalSystem.run_pass, or switch to an explicitcache_dir: Pathparameter.Code Quality
[Quick Fix] Multiple critical failures silently swallowed in
olive/cache.pyolive/cache.py:188, 202, 229, 242, 266, 279, 679, 684, 701, 774cache_model/load_model/cache_run/load_run_from_model_idand other critical paths useexcept Exception: logger.exception(...)and then continue executing without re-raising.logger.warningand expose write status in return values / callers; true exceptions in read operations (notFileNotFoundError) must propagate upward; document in anEXCEPTIONS_TO_RAISEwhitelist.[Quick Fix] Bare exception catch in
_run_searchloses critical contextolive/engine/engine.py:437-446_run_passesis caught by a globalexcept Exceptionand only logged vialogger.warning(...)withoutexc_info=True.logger.warning(..., exc_info=True); explicitly document theEXCEPTIONS_TO_RAISElist (ensureOlivePassErrorand similar always propagate through).[Refactor] ~90% of the 4144-line
hadamard_utils.pyis hand-written matrix literalsolive/passes/pytorch/hadamard_utils.py(4144 lines;get_had172()1207 lines,get_had156()939 lines,get_had140()843 lines,get_had108()435 lines — all+1, -1, ...literal data).npyor.safetensors, ship them as package data, and load vianp.load(importlib.resources.files(...)). Over 90% of the source file content can be removed.Testing
[Quick Fix]
or Trueintest_slicegpt.pypermanently skips tests and hides real failurestest/passes/pytorch/test_slicegpt.py:16-17or Truemakes the condition always true; the comment# TODO(team): Failed in pipeline (linux gpu). Need to investigate.shows this is a workaround to hide failures.SliceGPTpass; the pattern may be copied into other files.@pytest.mark.skipif(not torch.cuda.is_available(), ...); if network is required, replace with a local small model; removeor Trueafter fixing.[Refactor] Add a
test-unit.ymlworkflow covering core modules.github/workflows/test-model-fast.yml:39only runspytest test/cli/test_cli_test_model_smoke.py; the remaining 183 test files are not executed in any CI..github/workflows/test-unit.ymlcovering at least the hardware/network-dependency-free tests intest/engine/,test/search/,test/model/,test/evaluator/,test/hardware/,test/passes/onnx/; useworkflow_dispatchor a separate matrix for GPU/QNN/OpenVINO suites.[Quick Fix] Three locations perform direct HF downloads with no skip guard
test/passes/onnx/test_extract_adapters.py:37(AutoModelForCausalLM.from_pretrained("hf-internal-testing/tiny-random-LlamaForCausalLM"), module scope)test/passes/onnx/test_nvmo_graph_surgery.py:34(AutoConfig.from_pretrained("Qwen/Qwen2.5-0.5B"), ~700 MB)test/cli/test_cli.py:652(HF download with no skip guard)@pytest.mark.networkmarker (see CLI/CI items) and uniformly guard withskipif(not _network_available()); prefer local synthetic models (seetest_cli_test_model_smoke.py::_save_local_tiny_llama).Documentation
[Quick Fix]
docs/source/conf.pyversion is inconsistent with the actual package versiondocs/source/conf.py:22(version = "0.12.1") vsolive/version.py:1(__version__ = "0.11.0.dev0")from olive.version import __version__ as version, and setrelease = versionaccordingly.[Quick Fix]
docs/architecture.mdPass base class API has changed but documentation has not been updateddocs/architecture.md:47-48(example uses@abstractmethod def default_config) vsolive/passes/olive_pass.py:318(actual:_default_config, private method with different signature)_default_config(cls, accelerator_spec); also updatedocs/architecture.md:79-87— the outdated list of only 9 passes should be replaced with a link toreference/pass.rst; fix the Unicode em-dash atdocs/architecture.md:363(python –m olive.workflows.run→olive run --config).[Quick Fix] Mark
auto-optas deprecated at all entry points and fix the default log levelolive/cli/auto_opt.py:184emits deprecation warning vialogger.warning(...)olive/cli/base.py:394default--log_level=3(ERROR) is above WARNING (2), so the warning is never shown by defaultdocs/source/how-to/cli/cli-auto-opt.md:1does not note the deprecationdocs/source/reference/cli.rst:91-92does not note the deprecationnotebooks/olive_quickstart.ipynb:42still installsolive-ai[auto-opt]logger.warningtologger.erroror printprint("DEPRECATED: use 'olive optimize'", file=sys.stderr)at subcommand registration; ② add> ⚠️ Deprecated: use \olive optimize`at the top ofcli-auto-opt.mdandcli.rst; ③ insert a deprecation notice at the top of--helpoutput; ④ update the quickstart notebook to useolive optimize`.CLI / UX
[Quick Fix]
--num-splitsand other parameters globally mix kebab-case and snake_caseolive/cli/auto_opt.py:143-170(kebab:--num-splits,--cost-model, etc.)olive/cli/optimize.py:100-184(snake:--num_split,--dim_param, etc.; synonym parameters named differently fromauto_opt.py)olive/cli/run.py:25-41(mixed in same file:--run-config,--package-configkebab;--list_required_packagessnake)olive/cli/base.py:649-667(add_shared_cache_optionsuses--account_name/--container_name) vsolive/cli/shared_cache.py:37-44(--account/--container)add_argument(..., dest=...)to retain kebab-case aliases for backward compatibility, printing aDeprecationWarningduring the transition period. Standardize Azure credentials as--account_name/--container_name.[Quick Fix]
type=boolincapture-onnx-graphdoes not work as expectedolive/cli/capture_onnx.py:152-163(--exclude_embeds,--exclude_lm_head,--enable_cuda_graph, alltype=bool, default=False)type=boolevaluates any non-empty string asTrue(bool("false") == True), so--exclude_embeds falsestill yieldsTrue.action="store_true", or define an explicitdef str2bool(v): ...that acceptstrue/false/0/1.[Quick Fix] CLI
ValueErrorthrown directly causes users to see tracebacksolive/cli/base.py:217-218, 375,olive/cli/auto_opt.py:248-249, 300,olive/cli/optimize.py:260-290,olive/cli/capture_onnx.py:237-241,olive/cli/run_pass.py:134ValueError, triggering a full Python traceback and exit(1); this is mixed with argparse'sparser.error()(exit 2).parser.error("..."); retainValueErroronly for genuine runtime errors.[Quick Fix] Mixed
print()andlogger.*in CLI; multiple debug prints pollute stdoutprintcalls throughoutolive/cli/, most notably:olive/cli/quantize.py:145print(f"pass list: {pass_list}")olive/cli/quantize.py:184print(f"selected pass configs: {passes_dict}")olive/cli/quantize.py:129-132usesprintfor warningsolive/cli/run_pass.py:198-200usesprintfor errors then calls exit(0)olive/cli/base.py:107-125mixesprintandloggerin the same function--log_level; error output goes to stdout instead of stderr; hard to parse in pipelines.printfor user-facing success/progress/result paths; uselogger.*(going to stderr) for diagnostics/warnings/errors; remove obvious debug prints; the error path inrun_pass.py:198-200should callsys.exit(1).Dependencies / Packaging
[Quick Fix] Add upper bounds to critical dependencies in
requirements.txtrequirements.txt:1-14andtest/requirements-test.txt:18,24,40,test/requirements-test-gpu.txt:4,test/requirements-test-openvino.txt:6,7requirements.txtlistsnumpy,torch,transformers,onnx,onnxscriptwithout upper bounds; tests already document known breaking changes atonnx<1.21.0(symlink issue),torch<2.11.0,transformers<5.4.0,onnxscript<0.6.1.pip install olive-aito fail immediately; poor reproducibility in user environments.requirements.txt(e.g.,numpy<2,onnx<1.21.0,torch<2.11.0,transformers<5.4.0,onnxscript>=0.5.3,<0.6.1); explicitly raise bounds via dependabot/manual PR in each subsequent release.[Quick Fix] Fix the inconsistency between
ruff target-versionandpython_requirespyproject.toml:84(target-version = "py39") vssetup.py:77(python_requires=">=3.10"); and comments atpyproject.toml:135, 176B905patterns, PEP 604 union types).target-version = "py310"; remove legacy comments advising to "avoid" 3.10 constructs and re-audit theignorelist.[Quick Fix] Remove or update the uninstallable
tfextra (tensorflow==1.15.0)extra_dependencies.tfinolive/olive_config.json,olive/model/handler/tensorflow.py:31(load_modeldirectly raisesNotImplementedError),olive/cli/configure_qualcomm_sdk.py:21,olive/platform_sdk/qualcomm/configure/configure.py:84python_requires;TensorFlowModelHandleritself is not implemented.pip install olive-ai[tf]will always fail, signaling to users that "this extra is dead."tfextra, or update it to modern TF (only if actually needed); synchronously deleteTensorFlowModelHandleror mark it deprecated.CI/CD
[Quick Fix] All GitHub Actions references use unpinned SHAs (supply-chain risk)
lint.yml:19, 21, 32, 73,codeql.yml:51, 66, 80, 85,docs.yml:30,test-model-fast.yml:21@v3/@v4/@v5), including two third-party actions (reviewdog/action-misspell@v1,advanced-security/dismiss-alerts@v1).uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2); add.github/dependabot.ymlto enable automatic updates for github-actions and pip.[Refactor] Enable pip caching and dependency security scanning
lint.yml:49-53,test-model-fast.yml:23-27,docs.yml:32-36(all missingcache: pip); nopip-audit/safetyin.github/workflows/cache: "pip"andcache-dependency-path: ...to allsetup-python@v5steps; addpip-audit -r requirements.txtto the scheduled trigger incodeql.yml.Performance
[Refactor] Passes / search points are strictly serial — no multi-core utilization in search mode
olive/engine/engine.py:620-631(_run_passes),olive/engine/engine.py:409-451(_run_search),olive/search/search_strategy.py:210-257(synchronous generator)record_feedback_signalbefore generating the next.concurrent.futures.ProcessPoolExecutorfor batch evaluation of independent search points at the same level; introduce an async feedback mechanism so the sampler can submit batches simultaneously.[Refactor]
PythonEnvironmentSystemspawns a new subprocess for every pass / evaluationolive/systems/python_environment/python_environment_system.py:61-93run_pass/evaluate_modelcreates a temporary directory, writes JSON, callssubprocess.runto start a new Python interpreter, reads JSON, and deletes the directory.multiprocessing.Process+ Queue or ZeroMQ) that communicates via stdin/stdout/IPC and reuses the interpreter across iterations.Medium-Priority Recommendations
Architecture / Code Quality
cache.py:216pass_name: inttype annotation is incorrect: The call site (engine.py:739) actually passes astr; change topass_name: str.Engine.__init__negation-style parameter naming is inconsistent:engine.py:59,82hasno_artifacts: bool=Falseimmediately assigned to internalskip_saving_artifacts; change to positive namingsave_artifacts: bool=Trueor align the parameter name with the attribute name.olive/evaluator/olive_evaluator.pyis a 2059-line god module: Contains 10 classes (OliveEvaluator, per-backend evaluators,OliveEvaluatorConfig); split by backend intobase.py,onnx.py,pytorch.py,openvino.py,qnn.py,lm.py,mteb.py, with__init__.pymaintaining compatible exports.olive/engine/engine.pyis 844 lines with 4 TODOs: Mixes scheduling, caching, search expansion, evaluation, and packaging responsibilities; extractScheduler/EvaluationRunnerfollowing the existingFootprintNode/WorkflowOutputapproach.olive/common/utils.pyis a 721-line / 38-function grab-bag: Mixes hashing, subprocess, torch tensor, file I/O, and HF integration; split by responsibility intohashing.py/io_utils.py/torch_utils.py/dict_utils.py; movehf_repo_existstoolive/common/hf/.olive/cli/base.pyis 768 lines mixing ABC with utility functions: Extractadd_input_model_options,add_hf_test_model_config,get_input_model_config, and similar intoolive/cli/utils.py/olive/cli/model_options.py; keep onlyBaseOliveCLICommandinbase.py.olive/passes/onnx/common.pyis depended on across backends:olive/passes/openvino/quantization.py:17andolive/passes/quark_quantizer/quark_quantization.py:21importmodel_proto_to_file/model_proto_to_olive_model/get_external_data_config; move backend-agnosticget_external_data_configtopasses/common/; keep ONNX-specific utilities inpasses/onnx/common.py.olive/passes/pytorch/containsGptq,Rtn,LoRA,AutoAWQQuantizer,GptqQuantizerwith inconsistent naming; uniformly add aPytorchprefix (matching the ONNX backend convention) or add a lint check inPass.__init_subclass__.olive.searchdepends onMetricResult:olive/search/search_results.py:12,olive/search/search_strategy.py:23,olive/search/samplers/search_sampler.py:14(thoughTYPE_CHECKING); defineScoreMap = dict[str, float]or a Protocol; haveEngineconvertMetricResultbefore passing it in.olive.platform_sdkhas a reverse dependency onolive.cli:olive/platform_sdk/qualcomm/configure/__main__.py:9callsfrom olive.cli.launcher import legacy_call; inline the needed logic or define an independentrun_configure()inplatform_sdk.olive/passes/pytorch/autoawq.py:131-138,autogptq.py:119-131,gptqmodel.py:119-128are nearly identical; extractdetach_adapter(model, backend_name) -> tuple[ModelHandler, Optional[str]]intoolive/passes/pytorch/common.py.HfModelHandlerhas a 5-level MRO plus deadcomposite_parentcode:olive/model/handler/hf.py:29,olive/model/handler/pytorch.py:25already carry# pylint: disable=too-many-ancestors;olive/model/handler/base.py:45composite_parenthas no consumers anywhere in the codebase. ChangeDummyInputsMixinand similar to composition (handler.dummy_input_provider: DummyInputProvider); deletecomposite_parent.Testing
test_python_environment_system.pycreates a real venv and runspip installinside a fixture:test/systems/python_environment/test_python_environment_system.py:34-52; mockrun_subprocessor move totest/systems/integration/and add@pytest.mark.integration.test_extract_adapters.pymodule-scope fixture runs an end-to-end pipeline:test/passes/onnx/test_extract_adapters.py:27-65includes HF download + PEFT + ONNX export; replace with a local synthetic model, or mark as@pytest.mark.integration.pyproject.tomlregisters only 1 marker and does not enable strict mode: Currently onlyopenvino; addgpu,cuda,network,qnn,qairt,integration, and enable--strict-markers.olive/telemetry/andolive/platform_sdk/qualcomm/have zero test coverage: Telemetry covers a user-privacy-critical path; addtest/telemetry/test_telemetry.pywith at minimum disable/enable toggle coverage and "no send when disabled"; remove the emptytest/snpe/shell or add a README, and add mock-SDK unit tests intest/passes/qairt/or a newtest/platform_sdk/.test/conftest.py:15-30autouse session fixture slows startup: Importstransformersanddatasets, modifiestransformers.utils.hub.TRANSFORMERS_CACHEglobally; change to non-autouse explicit dependency, or configure via theTRANSFORMERS_CACHEenvironment variable in CI.requirements-test.txtmixes in platform-specific heavy dependencies:cupy-cuda12x,nncf>=2.19.0,openvino>=2025.4.1,neural-compressor<2.4in the base file cause install failures on CPU/AMD/Apple Silicon; split into a minimal base set plusrequirements-test-{gpu,openvino,...}.txt.Documentation
cli-auto-opt.mdandreference/cli.rst:91-92do not note the deprecation: (Already merged with the CLI high-priority item.)how-to/index.md:8mentionsonnx-graph-capturebut no corresponding how-to page exists: Addcli-capture-onnx-graph.mdor remove the mention from the index.NEWS.mdwas last updated in Feb 2025: README contains Aug/Oct 2025 entries; backfill v0.8.0, v0.9.0, v0.9.2 release entries for Mar–Oct 2025.engine.py:49(Engine.__init__),engine.py:241(run_accelerator),pass_config.py:98(BasePassConfig),onnx.py:45(OnnxModelHandler.__init__) all lack docstrings; prioritize public abstract methods.cd docs && pip install -r requirements.txt && make html.docs.yml:41-42inlinesonnxruntime/torchwithout pinned versions: Move todocs/requirements.txtwith pinned versions.CLI / UX
--log_levelaccepts integer 0–4 with nochoices=:olive/cli/base.py:390-398; change to string-based["debug","info","warning","error","critical"]or at minimum addchoices=range(5); also introduce-v/-qBoolean shortcut flags (consistent with git/pip/uv).runcommand lacks--dry_run:olive/cli/run.pydoes not calladd_save_config_file_options; inconsistent experience compared tooptimizeand other commands.quantize --algorithmhelp text is semantically incorrect:olive/cli/quantize.py:65-66says "List of", but the parameter accepts only a single value; change to "The quantization algorithm to use."tune-session-paramshelp references non-existent parameters:olive/cli/session_params_tuning.py:30-36mentions--model/--hf_model_name, but the actual parameter is--model_name_or_path.--disable_force_evaluate_other_epsis a double negative:olive/cli/session_params_tuning.py:78-84; change to--evaluate_all_eps(positive) with inverted default.olive --helphas no grouping:olive/cli/launcher.py:37-59lists 19 subcommands flat, withauto-optregistered beforeoptimize; group by "core workflows / model conversion / adapter management / utilities", with deprecated commands listed last.@actiondecorator placement is inconsistent:olive/cli/run_pass.py:22-23decorates the class, while other commands decorate therun()method (e.g.,finetune.py:82); standardize torun().diffusion_lora.pybypassesadd_input_model_options:olive/cli/diffusion_lora.py:34-47defines custom-m/--model_name_or_path, missing--trust_remote_codeand--test; use the shared helper.--memoryhelp does not mention supported string formats:olive/cli/base.py:704-709says "in bytes", buttype=AcceleratorSpec.str_to_int_memoryaccepts"4GB"; add an example.Dependencies / Packaging
setup.py:84package_datareferences non-existent packageolive.auto_optimizer: Dead entry; remove or confirm the package exists.onnxscriptupper and lower bounds conflict: Productionrequirements.txt:4>=0.5.3with no upper bound; teststest/requirements-test.txt:24<0.6.1; unify by tightening.openvinoextras and test versions are misaligned:olive_config.jsonspecifiesopenvino>=2025.4.1,nncf>=2.19.0;test/requirements-test-openvino.txt:6,7specifiesopenvino>=2025.3.0,nncf>=2.18.0; align them.all/dev): Makes one-command CI and developer installs inconvenient.finetune/bnb/loraextras overlap:finetune ≈ bnb + lora + optimumis not expressed usingolive-ai[bnb,lora]syntax; high maintenance cost to keep in sync.CI/CD
docs.ymlis missing apermissions:declaration: Add explicitpermissions: contents: read; also addpermissions: {}at the top level of each workflow as a default, elevating at the job level as needed.optional-lintjob has excesssecurity-events: write:.github/workflows/lint.yml:16-17; that job does not upload SARIF; remove under the principle of least privilege.lint.yml/test-model-fast.yml/docs.ymldo not trigger onrel-*: Release candidate branches are not running tests or building docs; explicitly addrel-*.ubuntu-latest+ Python 3.12; addwindows-latestandpython-version: ["3.10","3.12"](at minimum in lint and smoke tests).pip-auditand mypy/pyright static type checking: Addpip-audit -r requirements.txtto the scheduled trigger incodeql.yml; incrementally enablemypy --ignore-missing-imports olive/in lint.PYLINT,TOML-SORT,EDITORCONFIG-CHECKER,NOQAonly in CI;check-yaml,absolufy-importsonly in pre-commit; unify by having pre-commit invokelintrunner.Performance
engine.py:306, 398unnecessarymodel_dump()+validate_config()round-trip in search hot path: Usepass_config.model_copy()or pass by reference directly.footprint.py:91-102, 230-254everyrecord()call triggers a full_resolve_metrics()— O(n²): Change to lazy evaluation (call only beforecreate_pareto_frontier) or incremental computation.footprint.py:270-303Pareto frontier uses a double loop — O(n²): Noticeable at larger search spaces; replace with a divide-and-conquer Pareto algorithm or sorting-based approach.DockerSystem.run_workflowdestroys the container on every run:olive/systems/docker/docker_system.py:108-142; support container reuse.local.py:46ModelConfig.from_json(output_model.to_json())is an unnecessary round-trip: Returnoutput_model.to_model_config()directly.onnx.load(self.model_path)loads large models fully into memory:olive/model/handler/onnx.py:110-111; in large LLM scenarios every pass does a full load; preferload_external_data=Falsein lightweight inspection paths.onnx.loadcalls on the same file inside quantization passes:olive/passes/onnx/quantization.py:265, 477, 510, 540; passModelProtowithin the pass instead of repeatedly re-reading from disk.cache.py:185, 227, 263JSONindent=4increases I/O: Useindent=Nonein production mode.cache.py:290cache_id is only 8 hex characters: 32-bit space has collision risk in shared caches; extend to 16 characters or add a pass prefix.Low-Priority Recommendations
Code Quality / Style
pyproject.tomlglobally disables all complexity rules in pylint:too-many-arguments / branches / locals / nested-blocks / return-statements / statements,broad-exception-caught,unused-argumentare all globally disabled; exempt per module at most, not globally, to catch newly introduced violations. The same applies to ruffignore:D100-D107fully off,RUF013off due to a "Python 3.10" comment (contradicting target=py39),PLW0603/B028marked "temp" but now permanent.olive/passes/onnx/conversion.py:41file-level# pylint: disable=W0212: Scope is too broad; change to inline suppression.olive/common/utils.py:76internal variablemd5_hashinhash_stringbut the algorithm is SHA-256: Misleading; rename tosha256_hashorhasher(same issue inhash_io_stream/hash_file/hash_dict).olive/engine/engine.py:560,olive/passes/pytorch/lora.py:360(functional bugnum_items_in_batch),olive/passes/onnx/graph_surgeries.py:170,184("incorrect"), three similar adapter issues inolive/passes/pytorch/*; enable ruffTD003to require each TODO to link to an issue.olive/engine/engine.py:462-485_dump_run_historytemporarily mutates global verbosity: Anti-pattern (thread-unsafe); control via module logger level or a parameter instead.->._run_for_configimplementations exceed 200 lines:diffusers/lora.py:231 _train_sd(356),_train_flux(316),onnx/mnb_to_qdq.py:89(271),openvino/optimum_intel.py:349(246),onnx/split.py:42(211),onnx/model_builder.py:210(208),pytorch/selective_mixed_precision.py:394 compute_module_stats(207); extract_prepare_*/_post_process_*private methods.olive/data/config.py:24, 44DataConfighas 4 fixed component fields: Adding a new data component type requires changing the class; followpass_config.pyand change tocomponents: dict[str, DataComponentConfig].Testing
test/engine/test_engine.py:46,76,93,112,test/workflows/test_workflow_run.pyusetest_register,test_run, etc., lacking "expected behavior"; rename totest_<func>_<expected>[_when_<cond>].test/engine/,test/search/,test/model/,test/systems/; convert factory functions intest/utils.pyto conftest fixtures.test_aimet_quantization.pyusesskipif(CUDA_AVAILABLE)to exclude GPU environments:test/passes/onnx/test_aimet_quantization.py:144,197,223,269; use aTEST_DEVICEenvironment variable or a dedicated marker instead.test/assets/contains onlyuser_script.py: No shared ONNX files or golden outputs; commit common dummy ONNX files as static assets to reduce runtime generation.olive/exception.py/olive/logging.pyhave no tests: Addtest/test_exception.py/test/test_logging.py.Documentation
README.md:98"the the";README.md:71-79instructions mention Llama-3.2-1B-Instruct but the actual command uses Qwen2.5-0.5B-Instruct;docs/source/getting-started/getting-started.md:55"60secs" is a significant underestimate.notebooks/README.md:3,6typos: "ocmmand" → "command", "hot wo" → "how to".docs/architecture.mdis not in the Sphinx toctree: Users cannot find it from the documentation site; move intodocs/source/or add to the toctree.docs/source/conf.py:65suppress_warnings = ["myst.xref_missing"]: Suppresses missing internal cross-reference warnings, potentially masking broken links; at least keep a version with this enabled in CI.docs/source/index.md:30features/indexlinks to.rstwhile others use.md: Style inconsistency.CLI / Exit Codes
raise ValueError(exit 1) andparser.error()(exit 2) inconsistently; scripts cannot distinguish them.run_pass.py:183-200_list_passeshas no--jsonformat: Inconvenient for tool integration.--save_config_fileoutput:capture_onnx.py:340-347("c"/"m"/"b"/"f"),generate_adapter.py:89("e"),finetune.py:159("f"); change to readable key names.--modality choices=["text"]has only one option:olive/cli/optimize.py:155-163; hide or mark as internal.session_params_tuning.py:149prompts to "set log_level to 1" but 1 is INFO; debugging requires DEBUG (0).Dependencies / Packaging
pyproject.tomllacks[build-system]and[project]: Not PEP 517/518 compliant; inconvenient for dependabot/pip inspect.tritonis a hard dependency inbnb/finetuneextras: Linux + CUDA only; will fail in non-GPU environments.hf-xetis a core dependency: It is a performance acceleration package, not a functional dependency; consider moving to an optional extra.MANIFEST.indoes not explicitly include olive data files: sdist may miss data files (wheel is fine viapackage_data).setup.py:47-48is missing macOS classifier: Olive is pure Python and can run on macOS.setup.py:40"3 - Alpha": Consider advancing to Beta.test/requirements-test.txt:14neural-compressor<2.4has no explanatory comment.CI/CD / Performance / Other
.github/workflows/test-model-fast.ymldoes not upload failure artifacts: Logs are destroyed when the runner exits; addactions/upload-artifact@v4withif: failure().lint.yml:72SARIF upload hascontinue-on-error: true: Silent failure may silently drop security scan results.docs.yml:43make linkcheckmay block PRs due to unavailable external links: Consider running only on schedule trigger or adding--ignorepatterns.quantization.py:306-337deepcopyon a module-level config dict: Minor overhead; can be changed to a shallow copy.cache.py:795-806evaluation cache is keyed only on model_id, without an evaluator configuration hash: If a user changes the evaluation dataset but the pass configuration is unchanged, the cache does not invalidate; introduce an evaluator hash.search/search_results.py:36usesNone-padded sparse list: Change todict[int, tuple].optuna_sampler.py:73-78while Trueloop skips duplicate points: May produce many wasted trials when the search space is nearly exhausted.OnnxQuantization._initialize()depends on destructor to clean up temporary directory:quantization.py:289; exception paths may leak; change to a contextmanager or try/finally.olive/passes/openvino/optimum_intel.py:275silentlyexcept ImportError: pass: Users see anAttributeErrorinstead of a "missing dependency" message.telemetry_extensions.py:57, 126@actionusesinspect.stack(): Currently only decoratesEngine.run()so risk is limited; future expansion will require session-level caching.Appendix: Per-Dimension Investigation Summaries
Architecture (findings-arch.md)
Olive has clear inheritance contracts and a Pydantic v2 configuration foundation across the Pass / Engine / System / Evaluator / Search / Model layers — a reasonable overall structure for a mid-to-large project. The main architectural debt concentrates in reverse dependencies (
systems → workflows,cache → model/passes,passes → evaluator,platform_sdk → cli,model/handler/mixin → cache) and god modules (olive_evaluator.pyat 2059 lines,engine.pyat 844 lines,cli/base.pyat 768 lines,common/utils.pyat 721 lines). TheHfModelHandler5-level MRO and deadcomposite_parentcode need cleanup. Full 14 findings infindings-arch.md.Code Quality (findings-quality.md)
Key metrics: largest single function is 1207 lines (
get_had172), 68 bareexcept Exceptioncatches, 100+ TODO/FIXME items, pylint globally disabling 37 complexity rules. The most critical issues are critical failures silently swallowed incache.py, missingexc_infoin_run_searchexception handling, and the matrix data embedded as code inhadamard_utils.py. The CLI layer mixesprint/logger, and the "incorrect" surgeons ingraph_surgeries.pyremain production-accessible. Full 17 findings infindings-quality.md.Testing (findings-tests.md)
Of 184 test files, only 1 runs in CI;
test_slicegpt.pyhas anor Truepermanent skip;test_python_environment_system.pyandtest_extract_adapters.pyare end-to-end integration tests disguised as unit tests; 3 locations perform HF downloads with no skip guard;telemetry/platform_sdkhave zero tests;pyproject.tomlregisters only theopenvinomarker. Engine/Workflow test method names do not follow AGENTS.md convention. Full 15 findings infindings-tests.md.Documentation (findings-docs.md)
README + Sphinx main structure is good (CLI docs strongly synchronized with code via
sphinxarg.ext), but there are 3 P0 issues:conf.pyversion 0.12.1 vs actual 0.11.0.dev0;architecture.mdPass base class API has changed (default_config→_default_config) without being updated;auto-optdeprecated status is not communicated at any of three entry points (runtime warning filtered by default log_level,cli-auto-opt.md,reference/cli.rst). Docstring coverage is 22–47%. Full list infindings-docs.md.CLI (findings-cli.md)
The biggest issue is inconsistent parameter naming style (kebab-case and snake_case mixed across different commands and even within the same file, with the Azure credential parameter names
--account_namevs--accountbeing a real observable bug);type=boolincapture-onnx-graphdoes not work as expected; multipleraise ValueErrorcalls expose raw tracebacks; debugprintcalls inquantize.pypollute stdout;--log_leveluses integers without a--verbose/--quietshortcut;olive --helplists 19 subcommands flat with no grouping. Full 10 sections infindings-cli.md.Dependencies / Packaging (findings-deps.md)
H1: ruff
target-version="py39"contradictspython_requires=">=3.10"; H2:numpy/torch/transformers/onnxhave no production upper bounds but tests document known breaking changes; H3:tfextratensorflow==1.15.0cannot be installed on Python 3.10+. Medium-priority issues:onnxscriptbounds conflict,openvinoextras/test version mismatch, deadolive.auto_optimizerentry insetup.pypackage_data, missing aggregateallextra. Full 11 sections infindings-deps.md.CI/CD (findings-ci.md)
4 workflows (lint/codeql/docs/test-model-fast). Core issues: unit tests are almost entirely absent from CI (183/184 not running); all GitHub Actions references use unpinned mutable tags (supply-chain risk); no Dependabot; no platform/Python version matrix (all ubuntu+3.12); no pip caching; no static type checking or dependency security scanning; pre-commit and CI lint check sets are inconsistent. Full 12 sections in
findings-ci.md.Performance (findings-perf.md)
13 performance anti-patterns, with 3 high-severity: search/pass execution is fully serial with no parallel evaluation,
PythonEnvironmentSystemspawns a new subprocess per iteration, ONNX models are loaded fully into memory with no streaming. Medium-severity issues include Pydantic hot-path round-trips, O(n²) Pareto frontier, and repeatedonnx.loadcalls inside quantization passes. Adding new Passes / Systems / Samplers is low-cost; the extensibility design is sound. Telemetry concurrency design is correct. Full list infindings-perf.md.Report generated: 2026-06-12 · Strictly read-only review. No modifications were made to the Olive repository. All artifacts are stored in the session files/ directory.