How do we pass a model into benchmarking?
Hi, I'm currently trying to use this model for a local benchmarking analysis, but I can't seem to initialise the model o. I tried patching together pieces of code, namely using
o = Inference("llama3-1B-chat", device="cuda:0", logging=True) #llama3-1B/3B/8B-chat, gpt-oss-20B, qwen3-next-80B
o.ini_model(models_dir="./models/", force_download=False)
But when I try to load my model I get an error. I'm not sure why I get this error so I was wondering if there is a way to just run the model on a server or without giving it a text input?
File "/home/sunpter/oLLM/ollm_env/lib/python3.10/site-packages/transformers/modeling_utils.py", line 4971, in from_pretrained
model = cls(config, *model_args, **model_kwargs)
File "/home/sunpter/oLLM/ollm/src/ollm/llama.py", line 201, in __init__
super().__init__(config)
File "/home/sunpter/oLLM/ollm_env/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 420, in __init__
self.model = LlamaModel(config)
File "/home/sunpter/oLLM/ollm/src/ollm/llama.py", line 110, in __init__
self.layers[-1]._unload_layer_weights()
File "/home/sunpter/oLLM/ollm/src/ollm/voxtral.py", line 32, in _unload_layer_weights
for attr_path in loader.manifest[base]:
AttributeError: 'NoneType' object has no attribute 'manifest'
Here is my full code for reference (if needed)
from optimum_benchmark import Benchmark, BenchmarkConfig, TorchrunConfig, InferenceConfig, PyTorchConfig
from optimum_benchmark.logging_utils import setup_logging
from ollm import Inference, AutoInference
from ollm.gds_loader import GDSWeights, MoEWeightsLoader
from ollm.utils import Stats
from ollm.kvcache import KVCache
from ollm import gpt_oss, gds_loader, llama, qwen3_next, gemma3, voxtral
import torch, os, time, requests
setup_logging(level="INFO")
if __name__ == "__main__":
o = Inference("llama3-1B-chat", device="cuda:0", logging=True) #llama3-1B/3B/8B-chat, gpt-oss-20B, qwen3-next-80B
o.ini_model(models_dir="./models/", force_download=False)
print(f"Launching the config")
launcher_config = TorchrunConfig(nproc_per_node=2)
print(f"Initialising Torch")
scenario_config = InferenceConfig(latency=True, memory=True)
backend_config = PyTorchConfig(model=o, device="cuda", device_ids="0", no_weights=True)
print(f"Backend Config Successful")
benchmark_config = BenchmarkConfig(
name="ollm_llama",
scenario=scenario_config,
launcher=launcher_config,
backend=backend_config,
)
print(f"Benchmark Config Successful")
print(f"Generating Benchmark Report...")
benchmark_report = Benchmark.launch(benchmark_config)
# push artifacts to the hub
benchmark_config.save_json("ollm_llama/benchmark_config.json")
benchmark_report.save_json("ollm_llama/benchmark_report.json")
print(f"Merging Benchmark Reports...")
# merge them into a single artifact and push to the hub
benchmark = Benchmark(config=benchmark_config, report=benchmark_report)
benchmark.save_json("ollm_llama/benchmark.json")`
How do we pass a model into benchmarking?
Hi, I'm currently trying to use this model for a local benchmarking analysis, but I can't seem to initialise the model o. I tried patching together pieces of code, namely using
But when I try to load my model I get an error. I'm not sure why I get this error so I was wondering if there is a way to just run the model on a server or without giving it a text input?
Here is my full code for reference (if needed)