run on g5.4xlarge

I got the following error when i run vllm_variable_size or naive_hf_variable_size.

1536 4 1000
~/AIML/llm-continuous-batching-benchmarks-winston ~/AIML/llm-continuous-batching-benchmarks-winston/benchmark_configs
Traceback (most recent call last):
  File "/home/ubuntu/AIML/llm-continuous-batching-benchmarks-winston/./benchmark_throughput.py", line 597, in <module>
    main()
  File "/home/ubuntu/AIML/llm-continuous-batching-benchmarks-winston/./benchmark_throughput.py", line 539, in main
    prompts, prompt_lens = gen_random_prompts_return_lens(
  File "/home/ubuntu/AIML/llm-continuous-batching-benchmarks-winston/./benchmark_throughput.py", line 479, in gen_random_prompts_return_lens
    assert len(
AssertionError: Expected prompt to contain exactly 512 tokens, got len(encoded)=350


my env:
Machine: g5.4xlarge
Model: meta-llama/Llama-2-7b-chat-hf

any reason for this error?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

run on g5.4xlarge #5

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

run on g5.4xlarge #5

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions