Skip to content

How do we pass a model as an input for benchmarking? #38

@sunpterodactyl

Description

@sunpterodactyl

How do we pass a model into benchmarking?

Hi, I'm currently trying to use this model for a local benchmarking analysis, but I can't seem to initialise the model o. I tried patching together pieces of code, namely using

o = Inference("llama3-1B-chat", device="cuda:0", logging=True) #llama3-1B/3B/8B-chat, gpt-oss-20B, qwen3-next-80B
o.ini_model(models_dir="./models/", force_download=False)

But when I try to load my model I get an error. I'm not sure why I get this error so I was wondering if there is a way to just run the model on a server or without giving it a text input?

File "/home/sunpter/oLLM/ollm_env/lib/python3.10/site-packages/transformers/modeling_utils.py", line 4971, in from_pretrained
    model = cls(config, *model_args, **model_kwargs)
  File "/home/sunpter/oLLM/ollm/src/ollm/llama.py", line 201, in __init__
    super().__init__(config)
  File "/home/sunpter/oLLM/ollm_env/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 420, in __init__
    self.model = LlamaModel(config)
  File "/home/sunpter/oLLM/ollm/src/ollm/llama.py", line 110, in __init__
    self.layers[-1]._unload_layer_weights()
  File "/home/sunpter/oLLM/ollm/src/ollm/voxtral.py", line 32, in _unload_layer_weights
    for attr_path in loader.manifest[base]:
AttributeError: 'NoneType' object has no attribute 'manifest'

Here is my full code for reference (if needed)

from optimum_benchmark import Benchmark, BenchmarkConfig, TorchrunConfig, InferenceConfig, PyTorchConfig
from optimum_benchmark.logging_utils import setup_logging
from ollm import Inference, AutoInference
from ollm.gds_loader import GDSWeights, MoEWeightsLoader
from ollm.utils import Stats
from ollm.kvcache import KVCache
from ollm import gpt_oss, gds_loader, llama, qwen3_next, gemma3, voxtral
import torch, os, time, requests



setup_logging(level="INFO")

if __name__ == "__main__":

    o = Inference("llama3-1B-chat", device="cuda:0", logging=True) #llama3-1B/3B/8B-chat, gpt-oss-20B, qwen3-next-80B
    o.ini_model(models_dir="./models/", force_download=False)
    
    print(f"Launching the config")
    launcher_config = TorchrunConfig(nproc_per_node=2)
    print(f"Initialising Torch")
    scenario_config = InferenceConfig(latency=True, memory=True)
    backend_config = PyTorchConfig(model=o, device="cuda", device_ids="0", no_weights=True)
    print(f"Backend Config Successful")
    benchmark_config = BenchmarkConfig(
        name="ollm_llama",
        scenario=scenario_config,
        launcher=launcher_config,
        backend=backend_config,
    )
    print(f"Benchmark Config Successful")
    print(f"Generating Benchmark Report...")
    benchmark_report = Benchmark.launch(benchmark_config)
    # push artifacts to the hub
    benchmark_config.save_json("ollm_llama/benchmark_config.json")
    benchmark_report.save_json("ollm_llama/benchmark_report.json")
    print(f"Merging Benchmark Reports...")
    # merge them into a single artifact and push to the hub
    benchmark = Benchmark(config=benchmark_config, report=benchmark_report)
    benchmark.save_json("ollm_llama/benchmark.json")`

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions