Skip to content

chore: persist Docker data and runtime state#12

Open
ouyangmland wants to merge 4 commits into
henrylin99:masterfrom
ouyangmland:chore/docker-persistent-data-config
Open

chore: persist Docker data and runtime state#12
ouyangmland wants to merge 4 commits into
henrylin99:masterfrom
ouyangmland:chore/docker-persistent-data-config

Conversation

@ouyangmland

@ouyangmland ouyangmland commented Jun 7, 2026

Copy link
Copy Markdown

Summary

This PR improves the default Docker deployment configuration so the project can be run directly with persistent application state, Parquet market data, Redis/Celery services, and minute-level data sync dependencies on a host machine or NAS/router environment.

Changes:

  • Mount host ./data to container /app/data for downloaded Parquet data packages.
  • Mount host ./instance to container /app/instance so the SQLite database survives container rebuilds/restarts.
  • Mount host ./logs to container /app/logs so logs are persisted outside the container.
  • Use the Compose service name redis for Redis/Celery settings instead of localhost, which does not point to the Redis container from inside web/worker containers.
  • Persist Redis data with a named volume.
  • Add restart: unless-stopped for the app services and Redis.
  • Update .env.example to match the Docker Compose paths and network settings.
  • Install requirements.txt in the Docker image instead of requirements_minimal.txt, because the Compose setup starts Redis/Celery-backed services and the minimal dependency set does not include required packages such as redis and celery.
  • Add pytdx to requirements.txt, because the Tongdaxin minute data sync imports pytdx.hq.TdxHq_API.

Why

The README tells users to download the Parquet data package and use the Docker deployment, but the previous Compose file did not mount the host data directory into the container. In practice, users who unzip the data package on the host still get startup warnings that required assets are missing because the container cannot see the files.

The previous .env.example also used localhost for Redis. In Docker Compose, localhost inside the web or worker container refers to that same container, not the Redis service. Using redis matches the Compose service name and allows the containers to connect reliably.

Persisting data, instance, logs, and Redis storage also prevents data loss when containers are rebuilt or upgraded.

Additionally, the minute-level Tongdaxin sync path currently imports pytdx, but pytdx was not listed in requirements.txt. Without it, syncing data such as 5min bars fails with No module named 'pytdx'.

Validation

Validated on an iStoreOS Docker host with a downloaded data.zip package:

  1. Extracted the archive so files such as stock_basic.parquet, stock_trade_calendar.parquet, stock_business.parquet, daily_history/daily, and daily_basic/daily exist under host ./data.
  2. Ran Docker Compose with the updated mounts and environment.
  3. Verified all containers started:
    • web
    • worker
    • redis
  4. Verified the app responded with HTTP/1.1 200 OK on port 5000.
  5. Verified startup logs changed from missing critical Parquet assets to:
    • 关键资产检查: 通过
    • ✅ 大宽表: 宽表已是最新
  6. Verified pytdx can be imported inside the Docker container.
  7. Verified syncing 5min data no longer fails with No module named 'pytdx'.

@henrylin99

Copy link
Copy Markdown
Owner

PR #12 Code Review: Docker 持久化配置

感谢提交这个 PR!Docker 持久化配置是实际部署中非常必要的改进。以下是 review 过程中发现的一些问题,按严重程度排列:


🔴 Bug:LLM_BASE_URL 被移除

文件: .env.example

移除了 LLM_BASE_URL=https://api.deepseek.com,但 LLM_PROVIDER=openaiLLM_MODEL=deepseek-v4-flash 仍然保留。config.py 中 openai provider 的 base_url 默认值是 https://api.openai.com/v1,所以用户复制 .env.example.env 后,LLM 请求会发往 OpenAI 端点但带着 DeepSeek 的模型名,导致 401 认证错误或模型未找到。

建议: 恢复 LLM_BASE_URL=https://api.deepseek.com,或者改为 LLM_PROVIDER=ollama 作为默认值。


🔴 Bug:environment: 静默覆盖 env_file: .env

文件: docker-compose.yml(web 和 worker 服务)

Docker Compose 中 environment: 优先级高于 env_file。当前两个服务同时设置了 env_file: .envenvironment: 块,其中 REDIS_HOSTCELERY_BROKER_URLDATA_DIRSQLITE_DATABASE_PATH 等键重叠。用户编辑 .env 修改这些值后会被 environment: 静默覆盖,没有错误提示。

建议: 二选一——要么只用 env_file: .env(推荐,单一来源),要么去掉 env_file 只用 environment:。如果选择两者并存,建议在文件中加注释说明覆盖关系。


🟠 Bug:.env.example 仅适用于 Docker 环境

文件: .env.example

修改后的 .env.example 包含容器专用路径:

  • SQLITE_DATABASE_PATH=/app/instance/stock_cursor.sqlite3
  • REDIS_HOST=redis
  • CELERY_BROKER_URL=redis://redis:6379/0

本地开发的用户复制 .env.example.env 后,SQLite 会因 /app/instance/ 不存在而报错,Redis 会因 DNS 解析失败而不可用。

建议: 考虑拆分为 .env.example(通用/本地)和 .env.docker.example(Docker 专用),或至少在文件顶部加醒目注释说明这是 Docker 配置。


🟠 问题:celery_app.py 不使用 CELERY_BROKER_URL

文件: app/celery_app.py 第 31 行

make_celery()REDIS_HOST/PORT/DB 重新构造 broker URL,完全忽略 CELERY_BROKER_URL 环境变量。但 docker-compose.ymlenvironment: 块中设置了 CELERY_BROKER_URL,给运维人员造成误导——以为修改此变量可以改变 Celery 连接目标。

建议: 在 compose 中去掉 CELERY_BROKER_URLCELERY_RESULT_BACKEND,只保留 REDIS_HOST/PORT/DB;或修改 celery_app.py 让它优先使用 CELERY_BROKER_URL


🟡 配置不一致:DATA_JOB_EXECUTION_MODE=inline 与 Celery worker 冲突

文件: .env.example

DATA_JOB_EXECUTION_MODE=inline 会让数据下载任务在 web 进程中同步执行,但 compose 文件同时启动了专用的 Celery worker 容器。Worker 容器会一直空闲。

建议:.env.example 中将默认值改为 DATA_JOB_EXECUTION_MODE=celery,或加注释说明 Docker 部署时应切换为 celery


🟡 清理:Dockerfile 中无用的 COPY

文件: Dockerfile 第 13 行

COPY requirements.txt requirements_minimal.txt ./ 仍然复制了 requirements_minimal.txt,但后续只安装 requirements.txt。这是遗留的死代码。

建议: 改为 COPY requirements.txt ./


🟡 清理:docker-compose.yml 中 web/worker 配置重复

文件: docker-compose.yml

web 和 worker 的 environment:(5 个变量)和 volumes:(3 个挂载)完全相同,可以用 YAML anchor(&app-common / *app-common)去重,减少维护时遗漏的风险。


🟡 可移植性:完整 requirements.txt 在 ARM 上可能构建失败

文件: Dockerfile 第 14 行

requirements_minimal.txt 切换到 requirements.txt 引入了 xgboostlightgbmcvxpyscipylxml 等需要编译的包。在 ARM 架构(如 ARM NAS)上可能因缺少 libxml2-dev 等系统依赖而构建失败。


🟡 可移植性:version 字段移除可能影响旧版 docker-compose

文件: docker-compose.yml

移除了 version: "3.9"。Compose V2 可忽略此字段,但 PR 目标环境 iStoreOS 可能使用 docker-compose V1(≤1.29),缺少 version 字段可能导致解析失败。


总结

前两个问题(LLM_BASE_URL 移除和 environment 覆盖 env_file)建议在合并前修复。其余为改进建议,可后续迭代处理。

感谢你提交这个 PR,Docker 持久化配置对实际部署体验的提升非常大! 🙏

@ouyangmland ouyangmland closed this Jun 7, 2026
@ouyangmland ouyangmland reopened this Jun 7, 2026
@ouyangmland ouyangmland force-pushed the chore/docker-persistent-data-config branch from 2091422 to 4430c30 Compare June 7, 2026 22:31
@ouyangmland

Copy link
Copy Markdown
Author

已继续根据 review 反馈修正并推送:

  • Dockerfile 现在按建议清理了无用 COPY:
    -COPY requirements.txt requirements_minimal.txt ./
    -RUN pip install --no-cache-dir -r requirements_minimal.txt
    +COPY requirements.txt ./
    +RUN pip install --no-cache-dir -r requirements.txt
  • .env.example 保持本地/通用默认配置,并保留 LLM_BASE_URL=https://api.deepseek.com
  • 新增 .env.docker.example,用于 Docker 部署时复制为 .env,包含 /app 容器路径、redis 服务名、Celery broker/backend、DATA_DIR=/app/dataDATA_JOB_EXECUTION_MODE=celery
  • docker-compose.yml 去掉 environment: 覆盖,只保留 env_file: .env 作为单一配置来源,并用 YAML anchor 去重 web/worker 共享配置,同时恢复 version: "3.9"
  • app/celery_app.py 已改为使用 CELERY_BROKER_URL / CELERY_RESULT_BACKEND,避免 env 配置项无效。
  • requirements.txt 保留 pytdx>=1.72,修复通达信分钟数据同步缺依赖问题。

我已在 iStoreOS Docker 环境中应用并验证:docker compose config 通过,web/worker/redis 正常运行,页面返回 HTTP/1.1 200 OK,Celery broker/backend 为 redis://redis:6379/0,启动日志显示 关键资产检查: 通过数据任务模式: celery,分钟数据同步不再报 No module named 'pytdx'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants