Quantized much slower than llama.cpp with same model and settings...

quantized compiled using --> cargo build --example quantized -r --features metal 

Unsure of...  how many layers accelerated / how many threads used / clearly different sample stages

..yet I presume the speed should be on par...   ?


CANDLE
./quantized --model mistral-7b-instruct-v0.2.Q4_K_S.gguf --which 7b-mistral-instruct-v0.2 --prompt "Blueberries cost more than strawberries. Blueberries cost less than raspberries. Raspberries cost more than strawberries and blueberries. If the first two statements are true, the third statement is?" --sample-len 2048 --temperature 0.1 --seed 1337 --top-p 0.950 --repeat-penalty 1.100 --repeat-last-n 64

--> 31.83 t/s
-------------

avx: false, neon: true, simd128: false, f16c: false / temp: 0.10 repeat-penalty: 1.10 repeat-last-n: 64 / loaded 291 tensors (4.14GB) in 0.09s

-------------

LLAMA.CPP
./main -p "Blueberries cost more than strawberries. Blueberries cost less than raspberries. Raspberries cost more than strawberries and blueberries. If the first two statements are true, the third statement is?" -m mistral-7b-instruct-v0.2.Q4_K_S.gguf  -n 128 -ngl 33 --threads 8 --seed 1337

--> 51.30 t/s
-------------

sampling: 
	repeat_last_n = 64, repeat_penalty = 1.100, frequency_penalty = 0.000, presence_penalty = 0.000
	top_k = 40, tfs_z = 1.000, top_p = 0.950, min_p = 0.050, typical_p = 1.000, temp = 0.800
	mirostat = 0, mirostat_lr = 0.100, mirostat_ent = 5.000

sampling order: 
CFG -> Penalties -> top_k -> tfs_z -> typical_p -> top_p -> min_p -> temperature 
generate: n_ctx = 512, n_batch = 512, n_predict = 128, n_keep = 1





Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Quantized much slower than llama.cpp with same model and settings... #1939

--> 31.83 t/s

--> 51.30 t/s

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Quantized much slower than llama.cpp with same model and settings... #1939

Description

--> 31.83 t/s

--> 51.30 t/s

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions