Releases · wevote-project/crystal-text-splitter

20 Jan 03:56

antarr

v0.2.1

3a476e5

v0.2.1 - Performance Optimizations Latest

Latest

Performance Improvements

Three major optimizations for production RAG systems:

1. Overlap Calculation (97-99% memory reduction)

Backward char scanning instead of allocating full word arrays
Constant memory usage regardless of document size
O(limit) vs O(n) space complexity

2. String Allocation (31% memory reduction, 1.2x speedup)

Eliminated unnecessary intermediate variables in hot loops
Direct String::Builder append in character mode
Cleaner, more maintainable code

3. True Lazy Iterator (4-5x faster for early termination)

Fixed design flaw where iterator loaded all chunks upfront
State machine approach (no Fibers!)
65-67% memory reduction vs eager evaluation
Zero overhead for full iteration

Real-World Impact

Processing 100K word document:

First chunk: 7.52ms → 1.78ms (4.2x faster)
Memory: 5,197 MB → 1,781 MB (65% reduction)

Processing 1,000 docs (first 5 chunks each):

Time: 3.6s → 1.0s (3.6x faster)
Memory: 65% less

Benchmarks Included

Comprehensive benchmarks added in benchmarks/ directory with detailed analysis.

Breaking Changes

None - All optimizations are backward compatible!

Assets 2

24 Nov 06:38

antarr

v0.2.0

3190f2a

v0.2.0 - Iterator API & Performance

🚀 New Features

Iterator API

Added memory-efficient iterator API for processing large documents:

# Block syntax - most efficient (no array allocation)
splitter.each_chunk(text) { |chunk| process(chunk) }

# Iterator with lazy evaluation
splitter.each_chunk(text).first(10).each { |chunk| ... }

# Traditional array (backward compatible)
chunks = splitter.split_text(text)

⚡ Performance Improvements

25% faster: Process 1MB in 6.79ms (was 9ms)
57% less memory: Only 17.9MB/op (was 41.9MB/op)
Uses String::Builder for efficient string construction
Optimized with Crystal's native split method

Benchmark Results (1MB text)

Throughput: 147 ops/sec
Latency: 6.79ms
Memory: 17.9MB per operation
Chunks: 1,249

📚 Documentation

Added iterator API examples and usage patterns
Updated README with performance benchmarks
Added examples/iterator_usage.cr with practical examples
Updated feature comparison table

🔄 Backward Compatibility

All existing code continues to work. The split_text() method returns an Array as before.

💡 Benefits

Memory efficiency: Process huge documents without loading all chunks into memory
Streaming capable: Lazy evaluation with iterator chains
Production ready: All 25 tests passing
Type safe: Full Crystal type safety

Full Changelog: v0.1.1...v0.2.0

Assets 2

24 Nov 04:02

antarr

v0.1.1

b948922

v0.1.1: CI Fixes

Crystal Text Splitter v0.1.1

Bug fix release that ensures all CI checks pass.

What's Fixed

✅ CI Build Job Removed: Removed unnecessary build job that was failing (libraries don't need executable targets)
✅ Code Formatting: Ran crystal tool format on all files to pass formatting checks
✅ Ameba Linting: Removed leftover scaffold file that contained TODO comments
✅ All CI Checks Pass: Ubuntu/macOS with latest/nightly Crystal all passing

Installation

dependencies:
  text-splitter:
    github: wevote-project/crystal-text-splitter
    version: ~> 0.1.1

Upgrading from v0.1.0

No breaking changes - drop-in replacement. Simply update your shard.yml to point to ~> 0.1.1.

Full Changelog: v0.1.0...v0.1.1

Assets 2

24 Nov 03:33

antarr

v0.1.0

640f1ee

v0.1.0: Initial Release

Crystal Text Splitter v0.1.0

Intelligent text chunking for RAG (Retrieval-Augmented Generation) and LLM applications.

Features

Two splitting modes: Character-based and word-based chunking
Configurable overlap: Preserve context between chunks
Sentence boundary respect: Keeps sentences intact when possible
Production-tested: Extracted from the bills-rag-system project
Comprehensive test coverage: 22 test cases covering all functionality

Installation

Add this to your application's shard.yml:

dependencies:
  text-splitter:
    github: wevote-project/crystal-text-splitter
    version: ~> 0.1.0

Quick Start

require "text-splitter"

# Character-based splitting
splitter = Text::Splitter.new(
  chunk_size: 1000,
  chunk_overlap: 200
)

chunks = splitter.split_text("Your long document...")

# Word-based splitting
word_splitter = Text::Splitter.new(
  chunk_size: 280,
  chunk_overlap: 50,
  mode: Text::Splitter::ChunkMode::Words
)

What's New

Initial release with character and word splitting modes
Full test coverage
Complete documentation with examples
CI/CD pipeline for Ubuntu and macOS

Perfect for building semantic search, RAG pipelines, and LLM applications in Crystal!

Assets 2

Releases: wevote-project/crystal-text-splitter

v0.2.1 - Performance Optimizations

Performance Improvements

1. Overlap Calculation (97-99% memory reduction)

2. String Allocation (31% memory reduction, 1.2x speedup)

3. True Lazy Iterator (4-5x faster for early termination)

Real-World Impact

Benchmarks Included

Breaking Changes

Uh oh!

v0.2.0 - Iterator API & Performance

🚀 New Features

Iterator API

⚡ Performance Improvements

Benchmark Results (1MB text)

📚 Documentation

🔄 Backward Compatibility

💡 Benefits

Uh oh!

v0.1.1: CI Fixes

Crystal Text Splitter v0.1.1

What's Fixed

Installation

Upgrading from v0.1.0

Uh oh!

v0.1.0: Initial Release

Crystal Text Splitter v0.1.0

Features

Installation

Quick Start

What's New

Uh oh!