PHP bindings for Vectorscan, a high-performance multiple regex matching library (portable fork of Intel Hyperscan).
- PHP 8.0+
- Vectorscan (or Hyperscan) library and headers installed
- On Debian/Ubuntu:
apt install libvectorscan-devorapt install libhyperscan-dev - On Fedora:
dnf install vectorscan-develordnf install hyperscan-devel - Or build from source: https://github.com/VectorCamp/vectorscan
- On Debian/Ubuntu:
phpize
./configure --with-vectorscan
make
make installThen add to your php.ini:
extension=vectorscanuse Vectorscan\Database;
// Compile patterns
$db = Database::compile(['foo.*bar', 'baz\d+']);
// Scan data
$matches = $db->scan("hello fooXbar world baz42");
// $matches = [
// ['id' => 0, 'from' => 0, 'to' => 14],
// ['id' => 1, 'from' => 0, 'to' => 25],
// ]$db = Database::compile(['error', 'warning', 'critical']);
$matches = $db->scan($logData, function(array $match): bool {
echo "Pattern {$match['id']} matched at offset {$match['to']}\n";
// Return false to stop scanning early
return true;
});use Vectorscan\Database;
use Vectorscan\Stream;
// Compile in streaming mode
$db = Database::compile(
['password\s*=\s*\S+'],
[VECTORSCAN_FLAG_CASELESS | VECTORSCAN_FLAG_DOTALL],
[1],
VECTORSCAN_MODE_STREAM
);
// Open a stream
$stream = Stream::open($db, function(array $match) {
echo "Found sensitive data at offset {$match['to']}\n";
return true;
});
// Feed data in chunks
$stream->scan("first chunk of data pass");
$stream->scan("word = secret123 more data");
$stream->close();$db = Database::compile($patterns);
// Save to cache
$serialized = $db->serialize();
file_put_contents('/tmp/patterns.db', $serialized);
// Load from cache (much faster than recompiling)
$cached = file_get_contents('/tmp/patterns.db');
$db = Database::unserialize($cached);// One-shot scan (compiles + scans in one call)
$matches = vectorscan_scan(['foo', 'bar'], "foobar");
// Library info
echo vectorscan_version(); // e.g. "5.4.11"
var_dump(vectorscan_valid_platform()); // boolVECTORSCAN_MODE_BLOCK— Block mode (default), scan a single bufferVECTORSCAN_MODE_STREAM— Streaming mode, scan data across multiple chunksVECTORSCAN_MODE_VECTORED— Vectored mode, scan multiple non-contiguous blocks
VECTORSCAN_FLAG_CASELESS— Case-insensitive matchingVECTORSCAN_FLAG_DOTALL—.matches any character including newlineVECTORSCAN_FLAG_MULTILINE—^and$match at line boundariesVECTORSCAN_FLAG_SINGLEMATCH— Report only one match per patternVECTORSCAN_FLAG_ALLOWEMPTY— Allow patterns that can match empty stringsVECTORSCAN_FLAG_UTF8— Enable UTF-8 modeVECTORSCAN_FLAG_UCP— Enable Unicode character propertiesVECTORSCAN_FLAG_SOM_LEFTMOST— Report start of match (leftmost)
| Method | Description |
|---|---|
static compile(array $patterns, ?array $flags, ?array $ids, int $mode): Database |
Compile patterns |
scan(string $data, ?callable $callback): array |
Scan data, return match details |
scanBool(string $data): bool |
Fast boolean scan (no match details) |
serialize(): string |
Serialize database |
static unserialize(string $data): Database |
Deserialize database |
size(): int |
Get database size in bytes |
info(): string |
Get database info string |
| Method | Description |
|---|---|
__construct(Database $database) |
Allocate scratch space |
size(): int |
Get scratch size in bytes |
| Method | Description |
|---|---|
static open(Database $database, callable $callback): Stream |
Open a stream |
scan(string $data): void |
Feed data to stream |
close(): void |
Close stream (flushes remaining matches) |
reset(): void |
Reset stream to initial state |
Thrown on compilation errors, scan failures, or invalid operations.
| Function | Description |
|---|---|
vectorscan_version(): string |
Get library version |
vectorscan_valid_platform(): bool |
Check platform support |
vectorscan_scan(array $patterns, string $data, ?array $flags, ?callable $callback): array |
One-shot compile + scan |
PHP License 3.01