Skip to content

run on g5.4xlarge #5

@colorzhang

Description

@colorzhang

I got the following error when i run vllm_variable_size or naive_hf_variable_size.

1536 4 1000
~/AIML/llm-continuous-batching-benchmarks-winston ~/AIML/llm-continuous-batching-benchmarks-winston/benchmark_configs
Traceback (most recent call last):
File "/home/ubuntu/AIML/llm-continuous-batching-benchmarks-winston/./benchmark_throughput.py", line 597, in
main()
File "/home/ubuntu/AIML/llm-continuous-batching-benchmarks-winston/./benchmark_throughput.py", line 539, in main
prompts, prompt_lens = gen_random_prompts_return_lens(
File "/home/ubuntu/AIML/llm-continuous-batching-benchmarks-winston/./benchmark_throughput.py", line 479, in gen_random_prompts_return_lens
assert len(
AssertionError: Expected prompt to contain exactly 512 tokens, got len(encoded)=350

my env:
Machine: g5.4xlarge
Model: meta-llama/Llama-2-7b-chat-hf

any reason for this error?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions