Name	Name	Last commit message	Last commit date
parent directory ..
README.md	README.md
minicpm-o4_5_llamacpp.md	minicpm-o4_5_llamacpp.md
minicpm-o4_5_llamacpp_zh.md	minicpm-o4_5_llamacpp_zh.md
minicpm-v4_5_llamacpp.md	minicpm-v4_5_llamacpp.md
minicpm-v4_5_llamacpp_zh.md	minicpm-v4_5_llamacpp_zh.md
minicpm-v4_6_llamacpp.md	minicpm-v4_6_llamacpp.md
minicpm-v4_6_llamacpp_zh.md	minicpm-v4_6_llamacpp_zh.md
minicpm-v4_llamacpp.md	minicpm-v4_llamacpp.md
minicpm-v4_llamacpp_zh.md	minicpm-v4_llamacpp_zh.md
minicpm4_1_llamacpp.md	minicpm4_1_llamacpp.md
minicpm4_1_llamacpp_zh.md	minicpm4_1_llamacpp_zh.md
minicpm4_llamacpp.md	minicpm4_llamacpp.md
minicpm4_llamacpp_zh.md	minicpm4_llamacpp_zh.md
minicpm5_llamacpp.md	minicpm5_llamacpp.md
minicpm5_llamacpp_zh.md	minicpm5_llamacpp_zh.md

Name

Last commit message

Last commit date

minicpm-o4_5_llamacpp.md

minicpm-o4_5_llamacpp_zh.md

minicpm-v4_5_llamacpp.md

minicpm-v4_5_llamacpp_zh.md

minicpm-v4_6_llamacpp.md

minicpm-v4_6_llamacpp_zh.md

minicpm-v4_llamacpp.md

minicpm-v4_llamacpp_zh.md

minicpm4_1_llamacpp.md

minicpm4_1_llamacpp_zh.md

minicpm4_llamacpp.md

minicpm4_llamacpp_zh.md

minicpm5_llamacpp.md

minicpm5_llamacpp_zh.md

llama.cpp

llama.cpp as a C++ library

Before starting, let's first discuss what is llama.cpp and what you should expect, and why we say "use" llama.cpp, with "use" in quotes. llama.cpp is essentially a different ecosystem with a different design philosophy that targets light-weight footprint, minimal external dependency, multi-platform, and extensive, flexible hardware support:

Plain C/C++ implementation without external dependencies
Support a wide variety of hardware:
- AVX, AVX2 and AVX512 support for x86_64 CPU
- Apple Silicon via Metal and Accelerate (CPU and GPU)
- NVIDIA GPU (via CUDA), AMD GPU (via hipBLAS), Intel GPU (via SYCL), Ascend NPU (via CANN), and Moore Threads GPU (via MUSA)
- Vulkan backend for GPU
Various quantization schemes for faster inference and reduced memory footprint
CPU+GPU hybrid inference to partially accelerate models larger than the total VRAM capacity

It's like the Python frameworks torch+transformers or torch+vllm but in C++. However, this difference is crucial:

Python is an interpreted language: The code you write is executed line-by-line on-the-fly by an interpreter. You can run the example code snippet or script with an interpreter or a natively interactive interpreter shell. In addition, Python is learner friendly, and even if you don't know much before, you can tweak the source code here and there.
C++ is a compiled language: The source code you write needs to be compiled beforehand, and it is translated to machine code and an executable program by a compiler. The overhead from the language side is minimal. You do have source code for example programs showcasing how to use the library. But it is not very easy to modify the source code if you are not verse in C++ or C.

To use llama.cpp means that you use the llama.cpp library in your own program, like writing the source code of Ollama, GPT4ALL, llamafile etc. But that's not what this guide is intended or could do. Instead, here we introduce how to use the llama-cli example program, in the hope that you know that llama.cpp does support MiniCPM-V and how the ecosystem of llama.cpp generally works.

In this guide, we will show how to "use" llama.cpp to run models on your local machine, in particular, the llama-cli and the llama-server example program, which comes with the library.

The main steps

Get the programs
Get the MiniCPM-V models in GGUF¹ format
Run the program with the model

Versioned Deployment Guides

Version	English	中文
MiniCPM-V 4.6 (latest)	Guide	指南
MiniCPM-V 4.5	Guide	指南
MiniCPM-V 4.0	Guide	指南
MiniCPM-o 4.5	Guide	指南

GGUF (GPT-Generated Unified Format) is a file format designed for efficiently storing and loading language models for inference. ↩

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

README.md

llama.cpp

llama.cpp as a C++ library

The main steps

Versioned Deployment Guides

Uh oh!

FilesExpand file tree

llama.cpp

Directory actions

More options

Directory actions

More options

Latest commit

History

llama.cpp

Folders and files

parent directory

README.md

llama.cpp

llama.cpp as a C++ library

The main steps

Versioned Deployment Guides

Footnotes