I got the following error when i run vllm_variable_size or naive_hf_variable_size.
1536 4 1000
~/AIML/llm-continuous-batching-benchmarks-winston ~/AIML/llm-continuous-batching-benchmarks-winston/benchmark_configs
Traceback (most recent call last):
File "/home/ubuntu/AIML/llm-continuous-batching-benchmarks-winston/./benchmark_throughput.py", line 597, in
main()
File "/home/ubuntu/AIML/llm-continuous-batching-benchmarks-winston/./benchmark_throughput.py", line 539, in main
prompts, prompt_lens = gen_random_prompts_return_lens(
File "/home/ubuntu/AIML/llm-continuous-batching-benchmarks-winston/./benchmark_throughput.py", line 479, in gen_random_prompts_return_lens
assert len(
AssertionError: Expected prompt to contain exactly 512 tokens, got len(encoded)=350
my env:
Machine: g5.4xlarge
Model: meta-llama/Llama-2-7b-chat-hf
any reason for this error?
I got the following error when i run vllm_variable_size or naive_hf_variable_size.
1536 4 1000
~/AIML/llm-continuous-batching-benchmarks-winston ~/AIML/llm-continuous-batching-benchmarks-winston/benchmark_configs
Traceback (most recent call last):
File "/home/ubuntu/AIML/llm-continuous-batching-benchmarks-winston/./benchmark_throughput.py", line 597, in
main()
File "/home/ubuntu/AIML/llm-continuous-batching-benchmarks-winston/./benchmark_throughput.py", line 539, in main
prompts, prompt_lens = gen_random_prompts_return_lens(
File "/home/ubuntu/AIML/llm-continuous-batching-benchmarks-winston/./benchmark_throughput.py", line 479, in gen_random_prompts_return_lens
assert len(
AssertionError: Expected prompt to contain exactly 512 tokens, got len(encoded)=350
my env:
Machine: g5.4xlarge
Model: meta-llama/Llama-2-7b-chat-hf
any reason for this error?