Skip to content

fix: avoid settings validation during ingestion service import#413

Open
DivyamTalwar wants to merge 1 commit into
morphik-org:mainfrom
DivyamTalwar:fix/ingestion-settings-import
Open

fix: avoid settings validation during ingestion service import#413
DivyamTalwar wants to merge 1 commit into
morphik-org:mainfrom
DivyamTalwar:fix/ingestion-settings-import

Conversation

@DivyamTalwar

Copy link
Copy Markdown

Summary

core.services.ingestion_service resolved project settings at module import time. Because the default local config uses the Postgres provider, importing IngestionService during pytest collection could require POSTGRES_URI before any ColPali rendering test ran.

This PR removes that import-time settings validation from the service module while preserving early settings validation for real service instances. IngestionService now stores a resolved settings reference during construction, app startup and ingestion workers pass their already-resolved settings object, and the module-level settings symbol remains available for lazy attribute access.

This is intentionally scoped to the ingestion service import path. Test collection still has an existing LiteLLM model-pricing HTTP fetch from other imports; this PR does not attempt to make the whole collection path offline.

Changes

  • Replace the eager module-level settings = get_settings() binding in core/services/ingestion_service.py with a lazy compatibility proxy.
  • Store a resolved settings reference on each IngestionService instance during construction.
  • Pass the existing resolved settings object from core/services_init.py and core/workers/ingestion_worker.py.
  • Read COLPALI_PDF_DPI once per PDF/office conversion path so a single document uses a stable DPI value.
  • Update the ColPali rendering unit tests to inject minimal test settings directly into IngestionService.
  • Add focused regression tests for import-time settings resolution, constructor settings injection, constructor fallback resolution, and the lazy module-level settings proxy.

Why this matters

Full pytest collection is a basic contributor workflow. These rendering unit tests should be collectible under the default local config without requiring POSTGRES_URI.

Constructor resolution keeps configuration validation before ingestion/update side effects in normal app and worker paths. The lazy module-level proxy keeps attribute access through the old ingestion_service.settings seam available without forcing configuration validation during import.

Production paths now rely on injected or resolved settings during IngestionService construction. The module-level proxy remains a compatibility and test override seam, not the preferred runtime configuration source.

Tests

Commands run:

  • uv run pre-commit run isort --files core/services/ingestion_service.py core/services_init.py core/workers/ingestion_worker.py core/tests/unit/test_ingestion_colpali_rendering.py core/tests/unit/test_ingestion_service_settings_import.py - passed
  • uv run pre-commit run black --files core/services/ingestion_service.py core/services_init.py core/workers/ingestion_worker.py core/tests/unit/test_ingestion_colpali_rendering.py core/tests/unit/test_ingestion_service_settings_import.py - passed
  • uv run pre-commit run ruff --files core/services/ingestion_service.py core/services_init.py core/workers/ingestion_worker.py core/tests/unit/test_ingestion_colpali_rendering.py core/tests/unit/test_ingestion_service_settings_import.py - passed
  • env -u POSTGRES_URI uv run pytest -q core/tests/unit/test_ingestion_service_settings_import.py - passed, 8 tests passed
  • env -u POSTGRES_URI uv run pytest --collect-only -q core/tests/unit/test_ingestion_service_settings_import.py core/tests/unit/test_ingestion_colpali_rendering.py - passed, 11 tests collected
  • env -u POSTGRES_URI uv run pytest -q core/tests/unit/test_ingestion_service_settings_import.py core/tests/unit/test_ingestion_colpali_rendering.py - passed, 11 tests passed
  • env -u POSTGRES_URI uv run pytest -q core/tests/unit/test_ingestion_service_settings_import.py core/tests/unit/test_ingestion_colpali_rendering.py core/tests/unit/test_ingestion_service_metadata_update.py - passed, 17 tests passed
  • env -u POSTGRES_URI uv run pytest --collect-only -q - passed, 253 tests collected

Not run:

  • Full test execution was not run because the full suite includes integration/live-service tests that require external services.

Risk

Risk level: medium

This changes where IngestionService stores settings: instance methods now read self.settings instead of a module-global settings object. Startup and worker construction pass the same resolved settings object they already use, and direct construction still resolves settings during __init__ unless tests explicitly inject a settings object. Reverting the service constructor change and the two construction call sites restores the previous behavior.

Issue

No existing issue. This was discovered during repository analysis.

Notes for maintainers

This PR intentionally addresses only the ingestion_service import-time settings dependency needed by the ColPali rendering tests. Other import-time settings patterns, such as core/services/document_service.py and route/module-level settings singletons, are left out of scope and are not tracked by this PR.

Collection still logs a LiteLLM model-pricing HTTP fetch from existing imports; this PR does not attempt to make the whole collection path offline.

Importing IngestionService during pytest collection previously resolved settings immediately and could require POSTGRES_URI before tests ran. Move settings validation to service construction, pass the already-resolved startup/worker settings into production instances, keep module-level settings attribute access lazy, preserve direct proxy overrides through a validated settings reconstruction, and add regression coverage for the import and constructor contracts.

Constraint: Default local config selects Postgres and validates POSTGRES_URI when get_settings() runs.
Rejected: Test-only os.environ defaulting | mutates process-wide configuration during import.
Rejected: Requiring test callers to replace the whole module settings object | breaks existing direct-attribute monkeypatch seams.
Rejected: Returning the mutable settings proxy from construction | can bypass full settings validation when partial overrides exist.
Rejected: model_copy(update=...) for override merges | Pydantic does not validate update values.
Confidence: high
Scope-risk: moderate
Directive: Keep broader import-time settings cleanup out of this PR unless covered by targeted tests.
Tested: uv run pre-commit run isort --files core/services/ingestion_service.py core/services_init.py core/workers/ingestion_worker.py core/tests/unit/test_ingestion_colpali_rendering.py core/tests/unit/test_ingestion_service_settings_import.py; uv run pre-commit run black --files core/services/ingestion_service.py core/services_init.py core/workers/ingestion_worker.py core/tests/unit/test_ingestion_colpali_rendering.py core/tests/unit/test_ingestion_service_settings_import.py; uv run pre-commit run ruff --files core/services/ingestion_service.py core/services_init.py core/workers/ingestion_worker.py core/tests/unit/test_ingestion_colpali_rendering.py core/tests/unit/test_ingestion_service_settings_import.py; env -u POSTGRES_URI uv run pytest -q core/tests/unit/test_ingestion_service_settings_import.py; env -u POSTGRES_URI uv run pytest --collect-only -q core/tests/unit/test_ingestion_service_settings_import.py core/tests/unit/test_ingestion_colpali_rendering.py; env -u POSTGRES_URI uv run pytest -q core/tests/unit/test_ingestion_service_settings_import.py core/tests/unit/test_ingestion_colpali_rendering.py; env -u POSTGRES_URI uv run pytest -q core/tests/unit/test_ingestion_service_settings_import.py core/tests/unit/test_ingestion_colpali_rendering.py core/tests/unit/test_ingestion_service_metadata_update.py; env -u POSTGRES_URI uv run pytest --collect-only -q
Not-tested: Full test execution; includes integration/live-service tests that require external services.
@CLAassistant

CLAassistant commented Jun 29, 2026

Copy link
Copy Markdown

CLA assistant check
All committers have signed the CLA.

@DivyamTalwar DivyamTalwar marked this pull request as ready for review June 29, 2026 20:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants