Skip to content
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 3 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ This component is part of the [bpftime](https://github.com/eunomia-bpf/bpftime)
- Compiles eBPF ELF files into AOTed native code ELF object files, which can be linked like C-compiled objects or loaded into llvmbpf.
- Loads and executes AOT-compiled ELF object files within the eBPF runtime.
- Supports eBPF helpers and maps lddw functions.
- Supports PTX generation for CUDA and run eBPF program on GPU.
- Supports PTX generation and run eBPF program on GPU.

This library is optimized for performance, flexibility, and minimal dependencies. It does not include maps implement, helpers, verifiers, or loaders for eBPF applications, making it suitable as a lightweight, high-performance library.

Expand All @@ -26,6 +26,7 @@ For a comprehensive userspace eBPF runtime that includes support for maps, helpe
- [load eBPF bytecode from ELF file](#load-ebpf-bytecode-from-elf-file)
- [Maps and data relocation support](#maps-and-data-relocation-support)
- [Build into standalone binary for deployment](#build-into-standalone-binary-for-deployment)
- [PTX generation for CUDA on GPU](#ptx-generation-for-cuda-on-gpu)
- [optimizaion](#optimizaion)
- [inline the maps and helper function](#inline-the-maps-and-helper-function)
- [Use original LLVM IR from C code](#use-original-llvm-ir-from-c-code)
Expand Down Expand Up @@ -380,7 +381,7 @@ clang -g main.c xdp-counter.ll -o standalone

And you can run the `standalone` eBPF program directly.

## PTX generation for CUDA
## PTX generation for CUDA on GPU

llvmbpf can generate PTX code for CUDA and run eBPF program on GPU. See [example/ptx](example/ptx) for an example.

Expand Down
21 changes: 21 additions & 0 deletions example/ptx/CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
add_executable(ptx_test ptx_test.cpp)

option(LLVMBPF_CUDA_PATH "Path where CUDA installs")
message(STATUS "LLVMBPF_CUDA_PATH=${LLVMBPF_CUDA_PATH}")
if(NOT LLVMBPF_CUDA_PATH AND NOT ${LLVMBPF_CUDA_PATH})
message(FATAL_ERROR "Please set LLVMBPF_CUDA_PATH if you want to build ptx_test, otherwise set LLVMBPF_ENABLE_PTX to NO")
endif()
set(LLVM_LINK_COMPONENTS
${LLVM_TARGETS_TO_BUILD}
)
target_link_directories(ptx_test PRIVATE ${LLVMBPF_CUDA_PATH}/lib64 ${LLVMBPF_CUDA_PATH}/lib64/stubs)
target_link_libraries(ptx_test
PRIVATE
${LLVM_LIBS}
llvmbpf_vm
cuda
libnvptxcompiler_static.a
spdlog::spdlog
)
add_dependencies(ptx_test llvmbpf_vm spdlog::spdlog)
target_include_directories(ptx_test PRIVATE ${LLVM_INCLUDE_DIRS} include src ${LLVMBPF_CUDA_PATH}/include ${SPDLOG_INCLUDES})
280 changes: 280 additions & 0 deletions example/ptx/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,280 @@
# ptx program

This is a simple demo program which shows how to call llvmbpf to generate PTX, and uses CUDA driver API to execute the compiled PTX.

Set `LLVMBPF_CUDA_PATH` to the directory where CUDA installs, for example, `/usr/local/cuda-12.6`

This example demonstrates how to use llvmbpf to generate NVIDIA PTX code from eBPF programs and execute it on CUDA-capable GPUs. It showcases the complete workflow:

1. Initializing LLVM components
2. Defining an eBPF program
3. Compiling the eBPF program to PTX using llvmbpf
4. Compiling the PTX to CUDA binary using NVIDIA's PTX compiler
5. Loading and executing the binary using CUDA Driver API
6. Handling host-device communication for helper functions

## Program Overview

### eBPF to PTX Compilation Flow

The program defines a simple eBPF program that interacts with a BPF map, converts it to PTX, and then runs it on the GPU. This demonstrates how BPF programs can be executed on NVIDIA GPUs with proper support for helper functions.

### Host-Device Communication

The program implements a communication mechanism between the host (CPU) and device (GPU) to handle BPF helper functions. When the BPF program calls a helper function like `map_lookup_elem`, the GPU code signals the host, which processes the request and returns the result.

### Execution Model

1. The eBPF program is compiled to PTX
2. The PTX is wrapped with trampoline code to handle helper function calls
3. NVIDIA's PTX compiler converts the PTX to a CUDA binary
4. The binary is loaded and executed on the GPU
5. A host thread handles helper function requests from the GPU

## Key Components

1. **eBPF Program**: Defined as an array of `ebpf_inst` structures
2. **LLVM JIT Context**: Used to compile eBPF to PTX
3. **PTX Compiler Interface**: Uses NVIDIA's PTX compiler to generate executable code
4. **Shared Memory Structure**: Enables communication between host and device
5. **Helper Function Handlers**: Process requests from the GPU on the host

## Usage

Build and run the example:

```sh
# set the CUDA path, for example, /usr/local/cuda-12.6
cmake -B build -DCMAKE_BUILD_TYPE=Release -DLLVMBPF_ENABLE_PTX=1 -DLLVMBPF_CUDA_PATH=/usr/local/cuda-12.6
cmake --build build --target all -j
```

Run the PTX example:

```sh
build/example/ptx/ptx_test
```

## Code Explanation

Here's a detailed explanation of how the `ptx_test.cpp` code works:

### 1. Initialization and Includes

The code starts by including necessary headers for LLVM, CUDA, and eBPF functionality. The missing header files that need to be fixed:

```cpp:llvmbpf/example/ptx/ptx_test.cpp
// LLVM headers are included for JIT compilation
#include <llvm/MC/TargetRegistry.h>
#include <llvm/Support/Error.h>

// Project-specific headers need to be properly included
#include <llvm_jit_context.hpp> // This needs to be fixed
#include <ebpf_inst.h> // This needs to be fixed

// CUDA headers that need to be correctly included
#include <cuda.h> // CUDA Driver API
#include <nvPTXCompiler.h> // NVIDIA PTX Compiler API
#include <cuda_runtime.h> // CUDA Runtime API
```

### 2. eBPF Program Definition

The code defines a test eBPF program as an array of `ebpf_inst` structures. This program:
1. Takes a memory pointer in `r1`
2. Saves it to `r6`
3. Prepares arguments for a map lookup
4. Calls the `map_lookup_elem` helper function
5. Stores the result to the provided memory
6. Returns 0

```cpp
static const struct ebpf_inst test_prog[] = {
// r6 = r1 (save input pointer)
{ EBPF_OP_MOV64_REG, 6, 1, 0, 0 },

// Prepare key for map lookup (4 integers: 111,0,0,0)
{ EBPF_OP_MOV64_IMM, 1, 0, 0, 111 },
{ EBPF_OP_STXW, 10, 1, -16, 0 },
// ... more instructions to prepare arguments ...

// Call map_lookup_elem (helper function 1)
{ EBPF_OP_CALL, 0, 0, 0, 1 },

// Store result to output memory
{ EBPF_OP_LDXW, 1, 0, 0, 0 },
{ EBPF_OP_STXW, 6, 1, 0, 0 },

// Return 0
{ EBPF_OP_MOV64_IMM, 0, 0, 0, 0 },
{ EBPF_OP_EXIT, 0, 0, 0, 0 }
};
```

### 3. PTX Compilation Function

The `compile()` function uses NVIDIA's PTX compiler API to convert PTX code to CUDA binary:

```cpp
static std::vector<char> compile(const std::string &ptx)
{
nvPTXCompilerHandle compiler = NULL;
// Initialize PTX compiler
NVPTXCOMPILER_SAFE_CALL(nvPTXCompilerCreate(&compiler,
(size_t)ptx.size(),
ptx.c_str()));

// Set compilation options (e.g., target GPU architecture)
const char *compile_options[] = { "--gpu-name=sm_60", "--verbose" };

// Compile PTX to binary
NVPTXCOMPILER_SAFE_CALL(nvPTXCompilerCompile(compiler, 2, compile_options));

// Retrieve compiled binary
size_t elfSize;
NVPTXCOMPILER_SAFE_CALL(
nvPTXCompilerGetCompiledProgramSize(compiler, &elfSize));
std::vector<char> elf_binary(elfSize, 0);
NVPTXCOMPILER_SAFE_CALL(nvPTXCompilerGetCompiledProgram(
compiler, (void *)elf_binary.data()));

// Clean up
NVPTXCOMPILER_SAFE_CALL(nvPTXCompilerDestroy(&compiler));
return elf_binary;
}
```

### 4. Host-Device Communication

The code implements a communication mechanism between host and device using a shared memory structure:

```cpp
struct CommSharedMem {
int flag1; // Device -> Host signal
int flag2; // Host -> Device signal
int occupy_flag; // Lock mechanism
int request_id; // Identifies the request type
long map_id; // Map identifier
HelperCallRequest req; // Request data
HelperCallResponse resp; // Response data
uint64_t time_sum[8]; // Timing information
};
```

### 5. CUDA Kernel Execution

The `elfLoadAndKernelLaunch()` function:
1. Initializes CUDA
2. Loads the compiled binary
3. Sets up shared memory for host-device communication
4. Launches the kernel
5. Starts a host thread to handle helper function calls
6. Waits for kernel completion

```cpp
static int elfLoadAndKernelLaunch(void *elf, size_t elfSize)
{
// Initialize CUDA
CUDA_SAFE_CALL(cuInit(0));
CUDA_SAFE_CALL(cuDeviceGet(&cuDevice, 0));
CUDA_SAFE_CALL(cuCtxCreate(&context, 0, cuDevice));

// Load compiled binary
CUDA_SAFE_CALL(cuModuleLoadDataEx(&module, elf, 0, 0, 0));

// Set up communication channel
auto comm = std::make_unique<CommSharedMem>();
memset(comm.get(), 0, sizeof(CommSharedMem));

// Register memory with CUDA for fast access
CUDA_SAFE_CALL(cuMemHostRegister(comm.get(),
sizeof(CommSharedMem),
CU_MEMHOSTREGISTER_DEVICEMAP));

// Set up map info and shared memory for the kernel
// ...

// Get kernel function and launch it
CUDA_SAFE_CALL(cuModuleGetFunction(&kernel, module, "bpf_main"));
CUDA_SAFE_CALL(cuLaunchKernel(kernel, 1, 1, 1, // grid dim
1, 1, 1, // block dim
0, nullptr, // shared mem and stream
args, 0)); // arguments

// Start host thread to handle helper calls
std::thread hostThread([&]() {
while (!should_exit.load()) {
if (comm->flag1 == 1) {
// Process helper function request
// ...
comm->flag2 = 1; // Signal completion
}
std::this_thread::sleep_for(std::chrono::milliseconds(10));
}
});

// Wait for kernel completion
CUDA_SAFE_CALL(cuCtxSynchronize());
hostThread.join();

// Clean up
// ...
return 0;
}
```

### 6. Main Function

The `main()` function ties everything together:
1. Sets up signal handler
2. Initializes LLVM components
3. Creates a llvmbpf VM
4. Registers helper functions
5. Loads the eBPF program
6. Generates PTX code
7. Compiles PTX to CUDA binary
8. Executes the binary on the GPU

```cpp
int main()
{
signal(SIGINT, signal_handler);

// Initialize LLVM components
llvm::InitializeAllTargetInfos();
llvm::InitializeAllTargets();
llvm::InitializeAllTargetMCs();
llvm::InitializeAllAsmPrinters();
llvm::InitializeAllAsmParsers();

// Verify NVPTX target is available
// ...

// Set up llvmbpf VM
llvmbpf_vm vm;
vm.register_external_function(1, "map_lookup", (void *)test_func);
vm.register_external_function(2, "map_update", (void *)test_func);
vm.register_external_function(3, "map_delete", (void *)test_func);

// Load eBPF program
vm.load_code((void *)test_prog, sizeof(test_prog));

// Generate PTX
llvm_bpf_jit_context ctx(vm);
auto result = *ctx.generate_ptx(false, "bpf_main", "sm_60");

// Wrap PTX with trampoline code for helper functions
result = wrap_ptx_with_trampoline(patch_helper_names_and_header(
patch_main_from_func_to_entry(result)));

// Compile PTX to CUDA binary
auto bin = compile(result);

// Execute on GPU
elfLoadAndKernelLaunch(bin.data(), bin.size());

return 0;
}
```

This code demonstrates how to run eBPF programs on NVIDIA GPUs by compiling them to PTX, with support for calling back to the host for helper functions.
Loading
Loading