Skip to content

Environment variable setup issue ("Environment variable 'USER' not found") in Docker #742

@insop

Description

@insop

🐛 Describe the bug

When running the following command in a Docker environment, we encounter an OmegaConf interpolation error because the USER environment variable is not set:

Command

uv run python -m apps.grpo.main --config apps/grpo/qwen3_1_7b.yaml

Error

  ...
  File "/workspace/torchforge/.venv/lib/python3.12/site-packages/omegaconf/omegaconf.py", line 458, in resolver_wrapper
    ret = resolver(*args, **kwargs)
  File "/workspace/torchforge/.venv/lib/python3.12/site-packages/omegaconf/resolvers/oc/__init__.py", line 38, in env
    raise KeyError(f"Environment variable '{key}' not found")
  omegaconf.errors.InterpolationResolutionError: KeyError raised while resolving interpolation: "Environment variable 'USER' not found"
  full_key: metric_logging.wandb.group
  object_type: dict

Proposed fix
Using a default value for USER prevents failures in environments where it’s unset (e.g., many containers). This change worked for me, and I can open a PR to apply it to other config files if desired.

diff --git a/apps/grpo/qwen3_1_7b.yaml b/apps/grpo/qwen3_1_7b.yaml
index f7b06eb8..10e66148 100644
--- a/apps/grpo/qwen3_1_7b.yaml
+++ b/apps/grpo/qwen3_1_7b.yaml
@@ -18,7 +18,7 @@ rollout_threads: 1   # Recommended to set equal to generator.num_replicas
 metric_logging:
   wandb:
     project: grpo-training
-    group: grpo_exp_${oc.env:USER}
+    group: grpo_exp_${oc.env:USER,default_user}
     logging_mode: global_reduce # global_reduce, per_rank_reduce, per_rank_no_reduce
   console:
     logging_mode: global_reduce

Environment

git log -1
commit cd9e295c49b2a1a6e07eea2d77fa295613729638 (HEAD -> main, origin/main, origin/HEAD)
Author: Jiyue Wang <JenniferWang@users.noreply.github.com>
Date:   Wed Jan 28 16:40:10 2026 -0500

    [vllm] Upgrade vllm version to v0.13.0 (#737)

# Check core components
python -c "import torch, forge, monarch, vllm; print('All imports successful')"

# Check specific versions
python -c "
import torch
import forge
import vllm

print(f'PyTorch: {torch.__version__}')
print(f'TorchForge: {forge.__version__}')
print(f'vLLM: {vllm.__version__}')
print(f'CUDA: {torch.version.cuda}')
"
All imports successful
PyTorch: 2.9.0+cu128
TorchForge: 
vLLM: 0.13.0
CUDA: 12.8

Versions

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions