Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 5 additions & 1 deletion infra/experimental/agent-skills/copy_to_global.sh
Original file line number Diff line number Diff line change
Expand Up @@ -46,9 +46,13 @@ fi
# Copy each skill from the local "skills" directory to the global skills directory
# Make sure we work from this scripts base folder and copy each of the skills
# fuzzing-memory-unsafe-expert
# fuzzing-go-expert
# fuzzing-rust-expert
# fuzzing-jvm-expert
# fuzzing-python-expert
# oss-fuzz-engineer
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
for skill in "fuzzing-memory-unsafe-expert" "oss-fuzz-engineer"; do
for skill in "fuzzing-memory-unsafe-expert" "fuzzing-go-expert" "fuzzing-rust-expert" "fuzzing-jvm-expert" "fuzzing-python-expert" "oss-fuzz-engineer"; do
abs_skill="$SCRIPT_DIR/$skill"
# Copy over the skill and replace any existing skill with the same name in the global skills directory
if [ -d "$GLOBAL_SKILLS_DIR/$skill" ]; then
Expand Down
140 changes: 140 additions & 0 deletions infra/experimental/agent-skills/fuzzing-go-expert/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,140 @@
---
name: fuzzing-go-expert
description:
Use this skill to fuzz open source Go software projects.
---

# Fuzzing Go expert

This skill provides the agent with the knowledge and tools to write, build, and
validate fuzz targets for Go projects integrated into OSS-Fuzz. Go fuzzing uses
the native Go fuzzing framework introduced in Go 1.18, which OSS-Fuzz drives
via libFuzzer under the hood using `compile_native_go_fuzzer`.

## Fundamental Concepts

### OSS-Fuzz base image

Go projects must use the Go base builder image:

```dockerfile
FROM gcr.io/oss-fuzz-base/base-builder-go
```

Set `language: go` in `project.yaml`.

### Harness structure

Go fuzz targets are standard Go test functions with the prefix `Fuzz`, placed
in `_test.go` files (or plain `.go` files that import the testing package):

```go
package mypkg

import (
"testing"
_ "github.com/AdamKorcz/go-118-fuzz-build/testing" // required for OSS-Fuzz native fuzzing
)

func FuzzMyTarget(f *testing.F) {
// Seed corpus: add representative valid inputs so the fuzzer starts
// from a meaningful state rather than empty bytes.
f.Add([]byte("example input"))
f.Add([]byte("another seed"))

f.Fuzz(func(t *testing.T, data []byte) {
// Call into the target. Ignore expected errors; let unexpected
// panics surface as findings.
_, _ = ParseSomething(data)
})
}
```

The inner `f.Fuzz` callback signature can use typed parameters instead of
`[]byte` when the target expects structured input:

```go
f.Fuzz(func(t *testing.T, s string, n int, b bool) {
_ = ProcessRecord(s, n, b)
})
```

### Building in OSS-Fuzz

Use the `compile_native_go_fuzzer` helper in `build.sh`. It takes the package
import path, the function name, and the output binary name:

```bash
# build.sh
cp $SRC/fuzz_test.go ./ # copy harness into the module if needed
printf "package mypkg\nimport _ \"github.com/AdamKorcz/go-118-fuzz-build/testing\"\n" \
> register.go # required registration shim
go mod tidy
compile_native_go_fuzzer github.com/owner/repo/pkg FuzzMyTarget fuzz_my_target
```

For projects with multiple packages or multiple fuzz targets repeat the call:

```bash
compile_native_go_fuzzer github.com/owner/repo/pkg1 FuzzFoo fuzz_foo
compile_native_go_fuzzer github.com/owner/repo/pkg2 FuzzBar fuzz_bar
```

### Seed corpus and dictionaries

- Seed corpus entries go in `$OUT/<fuzzer_name>_seed_corpus/` as individual
files, or as a zip at `$OUT/<fuzzer_name>_seed_corpus.zip`.
- Dictionaries go in `$OUT/<fuzzer_name>.dict` as plaintext token files.
- Alternatively, add seeds directly via `f.Add(...)` in the harness — these
are compiled in and used as the initial corpus.

## Characteristics of good Go fuzzing harnesses

1. **Targets attack surface**: focus on parsers, decoders, protocol handlers,
serialisation/deserialisation, and any API that accepts untrusted bytes or
strings.
2. **Handles expected errors gracefully**: wrap calls in error checks and ignore
expected error returns. Only genuine panics and unexpected behaviour are
findings.
3. **Uses typed fuzz parameters** when the target is not purely byte-oriented —
Go's fuzzer can mutate `string`, `int`, `bool`, `float64`, etc. directly.
4. **Avoids non-determinism**: do not use random sources, time, goroutines, or
global state that persists between calls.
5. **Keeps the callback fast**: expensive setup (e.g. parsing config, opening
files) belongs outside `f.Fuzz(...)`, not inside the inner function.
6. **Provides meaningful seeds**: `f.Add(...)` entries should be valid
representative inputs so coverage grows from the start.
7. **Does not get stuck**: avoid code paths that busy-loop or block on I/O
inside the fuzz function.
8. **Includes the registration shim**: the `import _
"github.com/AdamKorcz/go-118-fuzz-build/testing"` blank import is required
for OSS-Fuzz to hook into native Go fuzzing — never omit it.

## What Go fuzzing finds

Go is memory-safe, so the focus shifts from memory-corruption bugs to:

- **Panics**: index out of range, nil pointer dereference, type assertion
failures, stack overflows — any unrecovered `panic` is a crash.
- **Logic bugs**: incorrect parsing, silent data corruption, wrong output for
valid input.
- **Infinite loops / hangs**: code that never returns on certain inputs
(detected by OSS-Fuzz's timeout).
- **Incorrect error handling**: code that should return an error but panics
instead, or vice versa.

## Operational guidelines

- Always validate with:
```
python3 infra/helper.py build_fuzzers <project>
python3 infra/helper.py check_build <project>
python3 infra/helper.py run_fuzzer <project> <fuzzer_name> -- -max_total_time=30
```
- An instant crash almost always means the harness itself is wrong (e.g.
missing error handling, bad seed, wrong package path).
- Run `go vet ./...` and `go build ./...` inside the module before wrapping in
an OSS-Fuzz build to catch compile errors early.
- When iterating locally clone the upstream repo and switch the Dockerfile from
`RUN git clone` to `COPY` to avoid network round-trips.
- Document why each entry point was chosen and what class of bugs it may find.
199 changes: 199 additions & 0 deletions infra/experimental/agent-skills/fuzzing-jvm-expert/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,199 @@
---
name: fuzzing-jvm-expert
description:
Use this skill to fuzz open source JVM projects (Java, Kotlin, Scala, etc.)
using Jazzer.
---

# Fuzzing JVM expert

This skill provides the agent with the knowledge and tools to write, build, and
validate fuzz targets for JVM-based projects (Java, Kotlin, Scala, Groovy)
integrated into OSS-Fuzz. JVM fuzzing uses
[Jazzer](https://github.com/CodeIntelligenceTesting/jazzer), which wraps
libFuzzer and instruments JVM bytecode for coverage guidance.

## Fundamental Concepts

### OSS-Fuzz base image

JVM projects must use the JVM base builder image:

```dockerfile
FROM gcr.io/oss-fuzz-base/base-builder-jvm
```

Set `language: jvm` in `project.yaml`.

### Harness structure — raw bytes

The simplest Jazzer harness receives raw bytes via `fuzzerTestOneInput`:

```java
import com.code_intelligence.jazzer.api.FuzzedDataProvider;

public class MyTargetFuzzer {
public static void fuzzerTestOneInput(byte[] data) {
try {
MyLibrary.parse(data);
} catch (ExpectedExceptionType e) {
// Ignore expected exceptions; they are not bugs.
}
}
}
```

### Harness structure — typed input via FuzzedDataProvider

`FuzzedDataProvider` splits the raw byte stream into typed values, which is
essential for targets that require structured input:

```java
import com.code_intelligence.jazzer.api.FuzzedDataProvider;

public class MyTargetFuzzer {
public static void fuzzerTestOneInput(FuzzedDataProvider data) {
String header = data.consumeString(64);
int version = data.consumeInt();
byte[] payload = data.consumeRemainingAsBytes();

try {
MyLibrary.process(header, version, payload);
} catch (IllegalArgumentException | IOException e) {
// Expected — not a finding.
}
}
}
```

Useful `FuzzedDataProvider` methods:

| Method | Description |
|---|---|
| `consumeBytes(n)` | `byte[]` of length n |
| `consumeRemainingAsBytes()` | all remaining bytes |
| `consumeString(maxLen)` | arbitrary String |
| `consumeAsciiString(maxLen)` | ASCII-only String |
| `consumeInt()` / `consumeInt(min, max)` | int |
| `consumeLong()` | long |
| `consumeBoolean()` | boolean |
| `consumeDouble()` | double |
| `pickValue(collection)` | random element |

### One-time setup with `fuzzerInitialize`

If initialisation is expensive (loading config, creating DB connections, etc.),
put it in an optional static method that Jazzer calls once before fuzzing:

```java
public class MyTargetFuzzer {
private static MyClient client;

public static void fuzzerInitialize() {
client = new MyClient(/* static config */);
}

public static void fuzzerTestOneInput(FuzzedDataProvider data) {
client.process(data.consumeRemainingAsBytes());
}
}
```

### Building in OSS-Fuzz

The `build.sh` pattern for Maven projects:

```bash
# Build the project JARs.
$MVN package -DskipTests -Dmaven.javadoc.skip=true

# Collect JARs needed at runtime.
ALL_JARS="mylib-1.0.jar"
BUILD_CLASSPATH=$(echo $ALL_JARS | xargs printf -- "$OUT/%s:"):$JAZZER_API_PATH
RUNTIME_CLASSPATH=$(echo $ALL_JARS | xargs printf -- "\$this_dir/%s:"):\$this_dir

for fuzzer in $(find $SRC -maxdepth 1 -name '*Fuzzer.java'); do
fuzzer_basename=$(basename -s .java "$fuzzer")
javac -cp $BUILD_CLASSPATH "$fuzzer"
cp $SRC/$fuzzer_basename.class $OUT/

# Wrapper script that launches jazzer_driver with the right arguments.
echo "#!/bin/bash
this_dir=\$(dirname \"\$0\")
if [[ \"\$@\" =~ (^| )-runs=[0-9]+($| ) ]]; then
mem_settings='-Xmx1900m:-Xss900k'
else
mem_settings='-Xmx2048m:-Xss1024k'
fi
LD_LIBRARY_PATH=\"$JVM_LD_LIBRARY_PATH\":\$this_dir \\
\$this_dir/jazzer_driver --agent_path=\$this_dir/jazzer_agent_deploy.jar \\
--instrumentation_includes=com.example.** \\
--cp=$RUNTIME_CLASSPATH \\
--target_class=$fuzzer_basename \\
--jvm_args=\"\$mem_settings\" \\
\$@" > $OUT/$fuzzer_basename
chmod u+x $OUT/$fuzzer_basename
done
```

For Gradle projects replace `$MVN package` with the appropriate Gradle command
and adjust JAR paths accordingly.

### Seed corpus and dictionaries

- Zip seed files to `$OUT/<fuzzer_name>_seed_corpus.zip`.
- Place dictionaries at `$OUT/<fuzzer_name>.dict`.

## Characteristics of good JVM fuzzing harnesses

1. **Targets attack surface**: parsers, deserializers (JSON, XML, Protobuf,
custom binary formats), network protocol handlers, template engines, and
any API that accepts untrusted bytes or strings.
2. **Catches expected exceptions**: wrap calls in `try/catch` for all
documented exception types. Only unexpected exceptions and crashes are
findings.
3. **Uses `FuzzedDataProvider`** for structured input rather than passing raw
bytes to methods that expect well-formed data.
4. **Initialises heavy state in `fuzzerInitialize`**: client connections,
parsers with complex configuration, and loaded schemas should be set up once.
5. **Avoids non-determinism**: no `Math.random()`, no `System.currentTimeMillis()`
in the fuzzing path, no thread spawning.
6. **Sets `--instrumentation_includes`** to the package prefix of the target
library in the wrapper script — without this Jazzer cannot guide fuzzing.
7. **Configures JVM memory appropriately**: use the `mem_settings` pattern
shown above to avoid OOM kills during runs vs. crash reproduction.
8. **Avoids false positives**: `OutOfMemoryError`, `StackOverflowError`, and
`NullPointerException` on invalid input are usually expected — decide which
are genuine bugs for this project.

## What JVM fuzzing finds

- **Unexpected exceptions**: `NullPointerException`, `ArrayIndexOutOfBoundsException`,
`ClassCastException`, `NumberFormatException` on paths that should not throw.
- **Assertion errors and contract violations**: internal consistency checks
that fail on adversarial input.
- **Hang / infinite loops**: detected by OSS-Fuzz's timeout.
- **Security bugs**: deserialization gadgets, path traversal via crafted
filenames, SSRF via crafted URLs — depends on the library.
- **Logic bugs**: incorrect output for valid-ish input.

Jazzer can also detect:
- SQL injection (via JDBC hooks)
- Path traversal (via file API hooks)
- Command injection (via `Runtime.exec` hooks)

## Operational guidelines

- Always validate with:
```
python3 infra/helper.py build_fuzzers <project>
python3 infra/helper.py check_build <project>
python3 infra/helper.py run_fuzzer <project> <fuzzer_name> -- -max_total_time=30
```
- An instant crash usually means a missing JAR on the classpath or an
uncaught expected exception — check `check_build` output carefully.
- Build the project outside the fuzzing harness first (`mvn package` or
`gradle build`) to ensure the project itself compiles cleanly.
- When iterating locally clone the upstream repo and switch the Dockerfile from
`RUN git clone` to `COPY` to avoid network round-trips.
- Document why each entry point was chosen and what class of bugs it may find.
Loading
Loading