quantized compiled using --> cargo build --example quantized -r --features metal
Unsure of... how many layers accelerated / how many threads used / clearly different sample stages
..yet I presume the speed should be on par... ?
CANDLE
./quantized --model mistral-7b-instruct-v0.2.Q4_K_S.gguf --which 7b-mistral-instruct-v0.2 --prompt "Blueberries cost more than strawberries. Blueberries cost less than raspberries. Raspberries cost more than strawberries and blueberries. If the first two statements are true, the third statement is?" --sample-len 2048 --temperature 0.1 --seed 1337 --top-p 0.950 --repeat-penalty 1.100 --repeat-last-n 64
--> 31.83 t/s
avx: false, neon: true, simd128: false, f16c: false / temp: 0.10 repeat-penalty: 1.10 repeat-last-n: 64 / loaded 291 tensors (4.14GB) in 0.09s
LLAMA.CPP
./main -p "Blueberries cost more than strawberries. Blueberries cost less than raspberries. Raspberries cost more than strawberries and blueberries. If the first two statements are true, the third statement is?" -m mistral-7b-instruct-v0.2.Q4_K_S.gguf -n 128 -ngl 33 --threads 8 --seed 1337
--> 51.30 t/s
sampling:
repeat_last_n = 64, repeat_penalty = 1.100, frequency_penalty = 0.000, presence_penalty = 0.000
top_k = 40, tfs_z = 1.000, top_p = 0.950, min_p = 0.050, typical_p = 1.000, temp = 0.800
mirostat = 0, mirostat_lr = 0.100, mirostat_ent = 5.000
sampling order:
CFG -> Penalties -> top_k -> tfs_z -> typical_p -> top_p -> min_p -> temperature
generate: n_ctx = 512, n_batch = 512, n_predict = 128, n_keep = 1
quantized compiled using --> cargo build --example quantized -r --features metal
Unsure of... how many layers accelerated / how many threads used / clearly different sample stages
..yet I presume the speed should be on par... ?
CANDLE
./quantized --model mistral-7b-instruct-v0.2.Q4_K_S.gguf --which 7b-mistral-instruct-v0.2 --prompt "Blueberries cost more than strawberries. Blueberries cost less than raspberries. Raspberries cost more than strawberries and blueberries. If the first two statements are true, the third statement is?" --sample-len 2048 --temperature 0.1 --seed 1337 --top-p 0.950 --repeat-penalty 1.100 --repeat-last-n 64
--> 31.83 t/s
avx: false, neon: true, simd128: false, f16c: false / temp: 0.10 repeat-penalty: 1.10 repeat-last-n: 64 / loaded 291 tensors (4.14GB) in 0.09s
LLAMA.CPP
./main -p "Blueberries cost more than strawberries. Blueberries cost less than raspberries. Raspberries cost more than strawberries and blueberries. If the first two statements are true, the third statement is?" -m mistral-7b-instruct-v0.2.Q4_K_S.gguf -n 128 -ngl 33 --threads 8 --seed 1337
--> 51.30 t/s
sampling:
repeat_last_n = 64, repeat_penalty = 1.100, frequency_penalty = 0.000, presence_penalty = 0.000
top_k = 40, tfs_z = 1.000, top_p = 0.950, min_p = 0.050, typical_p = 1.000, temp = 0.800
mirostat = 0, mirostat_lr = 0.100, mirostat_ent = 5.000
sampling order:
CFG -> Penalties -> top_k -> tfs_z -> typical_p -> top_p -> min_p -> temperature
generate: n_ctx = 512, n_batch = 512, n_predict = 128, n_keep = 1