Skip to content

catchem88/php-vectorscan

Repository files navigation

php-vectorscan

PHP bindings for Vectorscan, a high-performance multiple regex matching library (portable fork of Intel Hyperscan).

Requirements

  • PHP 8.0+
  • Vectorscan (or Hyperscan) library and headers installed

Installation From source

phpize
./configure --with-vectorscan
make
make install

Then add to your php.ini:

extension=vectorscan

Quick Start

Block Mode (single buffer scan)

use Vectorscan\Database;

// Compile patterns
$db = Database::compile(['foo.*bar', 'baz\d+']);

// Scan data
$matches = $db->scan("hello fooXbar world baz42");
// $matches = [
//   ['id' => 0, 'from' => 0, 'to' => 14],
//   ['id' => 1, 'from' => 0, 'to' => 25],
// ]

With Callback

$db = Database::compile(['error', 'warning', 'critical']);

$matches = $db->scan($logData, function(array $match): bool {
    echo "Pattern {$match['id']} matched at offset {$match['to']}\n";
    // Return false to stop scanning early
    return true;
});

Streaming Mode

use Vectorscan\Database;
use Vectorscan\Stream;

// Compile in streaming mode
$db = Database::compile(
    ['password\s*=\s*\S+'],
    [VECTORSCAN_FLAG_CASELESS | VECTORSCAN_FLAG_DOTALL],
    [1],
    VECTORSCAN_MODE_STREAM
);

// Open a stream
$stream = Stream::open($db, function(array $match) {
    echo "Found sensitive data at offset {$match['to']}\n";
    return true;
});

// Feed data in chunks
$stream->scan("first chunk of data pass");
$stream->scan("word = secret123 more data");
$stream->close();

Database Serialization

$db = Database::compile($patterns);

// Save to cache
$serialized = $db->serialize();
file_put_contents('/tmp/patterns.db', $serialized);

// Load from cache (much faster than recompiling)
$cached = file_get_contents('/tmp/patterns.db');
$db = Database::unserialize($cached);

Procedural API

// One-shot scan (compiles + scans in one call)
$matches = vectorscan_scan(['foo', 'bar'], "foobar");

// Library info
echo vectorscan_version(); // e.g. "5.4.11"
var_dump(vectorscan_valid_platform()); // bool

API Reference

Constants

Mode Flags

  • VECTORSCAN_MODE_BLOCK — Block mode (default), scan a single buffer
  • VECTORSCAN_MODE_STREAM — Streaming mode, scan data across multiple chunks
  • VECTORSCAN_MODE_VECTORED — Vectored mode, scan multiple non-contiguous blocks

Pattern Flags

  • VECTORSCAN_FLAG_CASELESS — Case-insensitive matching
  • VECTORSCAN_FLAG_DOTALL. matches any character including newline
  • VECTORSCAN_FLAG_MULTILINE^ and $ match at line boundaries
  • VECTORSCAN_FLAG_SINGLEMATCH — Report only one match per pattern
  • VECTORSCAN_FLAG_ALLOWEMPTY — Allow patterns that can match empty strings
  • VECTORSCAN_FLAG_UTF8 — Enable UTF-8 mode
  • VECTORSCAN_FLAG_UCP — Enable Unicode character properties
  • VECTORSCAN_FLAG_SOM_LEFTMOST — Report start of match (leftmost)

Classes

Vectorscan\Database

Method Description
static compile(array $patterns, ?array $flags, ?array $ids, int $mode): Database Compile patterns
scan(string $data, ?callable $callback): array Scan data, return match details
scanBool(string $data): bool Fast boolean scan (no match details)
serialize(): string Serialize database
static unserialize(string $data): Database Deserialize database
size(): int Get database size in bytes
info(): string Get database info string

Vectorscan\Scratch

Method Description
__construct(Database $database) Allocate scratch space
size(): int Get scratch size in bytes

Vectorscan\Stream

Method Description
static open(Database $database, callable $callback): Stream Open a stream
scan(string $data): void Feed data to stream
close(): void Close stream (flushes remaining matches)
reset(): void Reset stream to initial state

Vectorscan\Exception

Thrown on compilation errors, scan failures, or invalid operations.

Functions

Function Description
vectorscan_version(): string Get library version
vectorscan_valid_platform(): bool Check platform support
vectorscan_scan(array $patterns, string $data, ?array $flags, ?callable $callback): array One-shot compile + scan

License

PHP License 3.01

About

PHP bindings for Vectorscan, a high-performance multiple regex matching library.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors