Skip to content

image build error #863

Description

@aaadipop

Hello,

I want to make an experiment by running a benchmark through different environments and I ended up to mlcommons training repo
I navigate to the reccomendation_v2 benchmark, i clone the repo, run docker build and I got this error: can't find Rust compiler
Then I navigate to the small llm pretraining, clone the repo, build the docker.h200 image, source config_H200_1x8x1_8b.sh && python pretrain_llama31.py and I got

Traceback (most recent call last):
  File "/workspace/code/pretrain_llama31.py", line 23, in <module>
    from nemo.collections import llm
  File "/workspace/NeMo/nemo/collections/llm/__init__.py", line 20, in <module>
    from nemo.collections.llm import peft
  File "/workspace/NeMo/nemo/collections/llm/peft/__init__.py", line 15, in <module>
    from nemo.collections.llm.peft.api import gpt_lora, merge_lora
  File "/workspace/NeMo/nemo/collections/llm/peft/api.py", line 20, in <module>
    from megatron.core import dist_checkpointing
  File "/workspace/Megatron-LM/megatron/core/__init__.py", line 5, in <module>
    from megatron.core.distributed import DistributedDataParallel
  File "/workspace/Megatron-LM/megatron/core/distributed/__init__.py", line 7, in <module>
    from .finalize_model_grads import finalize_model_grads
  File "/workspace/Megatron-LM/megatron/core/distributed/finalize_model_grads.py", line 16, in <module>
    from ..transformer.moe.moe_utils import get_updated_expert_bias
  File "/workspace/Megatron-LM/megatron/core/transformer/moe/moe_utils.py", line 12, in <module>
    from megatron.core.extensions.transformer_engine import (
  File "/workspace/Megatron-LM/megatron/core/extensions/transformer_engine.py", line 95, in <module>
    class TELinear(te.pytorch.Linear):
                   ^^^^^^^^^^
AttributeError: module 'transformer_engine' has no attribute 'pytorch'

I have tried to reproduce those environments locally or in apptainer containers but without success
*there are also other errors for apptainer or by venvs but i list the simplest docker path

Thanks,

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions