Skip to content

Feature: Dalli v4.2.0 - Performance & Observability#1061

Merged
petergoldstein merged 14 commits into
mainfrom
feature/v4.2.0-performance
Jan 26, 2026
Merged

Feature: Dalli v4.2.0 - Performance & Observability#1061
petergoldstein merged 14 commits into
mainfrom
feature/v4.2.0-performance

Conversation

@petergoldstein

@petergoldstein petergoldstein commented Jan 25, 2026

Copy link
Copy Markdown
Owner

Summary

This PR implements the v4.2.0 roadmap features for Dalli, focusing on improving performance and adding tracing.

v4.2.0 Features (Performance & Observability)

  • Buffered I/O improvements - Set socket.sync = false with explicit flush, reducing syscalls for pipelined operations
  • OpenTelemetry tracing - Automatic distributed tracing when OTel SDK is present, with zero overhead when disabled
  • get_multi optimizations - Use Set instead of Array for O(1) lookups, use select! instead of delete_if

Files Changed

  • New files: instrumentation.rb, pipelined_setter.rb, pipelined_deleter.rb, protocol_deprecations.rb, 5_0_ROADMAP.md
  • Modified: client.rb, connection_manager.rb, meta.rb, binary.rb, pipelined_getter.rb
  • Documentation: README.md, CHANGELOG.md
  • Tests: New integration tests and instrumentation unit tests

Test plan

  • All 631 tests pass
  • Rubocop passes with no offenses
  • Tested against memcached with meta protocol
  • Tested deprecation warnings for binary protocol
  • OpenTelemetry instrumentation tested with mock tracer

🤖 Generated with Claude Code

@danmayer danmayer left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great to see this all getting pulled in, we had meant to help do this but got pulled into some other things, when all of this is in we can likely go back to using the dalli mainline at 4.2.0

Comment thread .rubocop.yml

Metrics/MethodLength:
Max: 15
Max: 20

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice, yeah some of these restrictions where getting annoying

Comment thread lib/dalli/client.rb
#
# Example:
# client.set_multi({ 'key1' => 'value1', 'key2' => 'value2' }, 300)
def set_multi(hash, ttl = nil, req_options = nil)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all these multi commands have been significantly faster in production

# that will be removed in Dalli 5.0.
##
module ProtocolDeprecations
BINARY_PROTOCOL_DEPRECATION_MESSAGE = <<~MSG.chomp

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice

petergoldstein and others added 9 commits January 25, 2026 22:11
Buffered I/O:
- Set socket.sync = false for buffered writes
- Add explicit flush before reading responses
- Reduces syscalls for pipelined operations

OpenTelemetry Tracing:
- Add Dalli::Instrumentation module for distributed tracing
- Auto-detects OpenTelemetry SDK presence
- Zero overhead when OTel not loaded
- Traces perform, get_multi, set_multi, delete_multi,
  get_with_metadata, and fetch_with_lock operations

get_multi Optimizations:
- Use Set instead of Array for deleted server tracking
- Use select! instead of delete_if for connected check

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add CHANGELOG entry for v4.2.0 with buffered I/O, OpenTelemetry,
  and get_multi optimization features
- Add OpenTelemetry tracing to README features list

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add error recording: exceptions are now recorded on spans with error status
- Add comprehensive documentation to README explaining OTel integration
- Add tests with mock OpenTelemetry to verify span creation and error recording
- Update CHANGELOG to document error recording feature

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
OpenTelemetry improvements:
- Add server.address attribute to single-key operation spans
- Add db.memcached.key_count to multi-key operation spans
- Add db.memcached.hit_count and miss_count to get_multi spans
- Add trace_with_result method for post-execution span updates
- Expand README documentation with full attribute list

Raw mode optimization:
- Skip bitflags (f flag) in meta protocol get requests when in raw mode
- Saves 2 bytes per request and skips unnecessary parsing
- Applies to client-level raw mode and per-request raw option

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Tests added:
- skip_flags option tests in RequestFormatter (raw mode optimization)
- Thundering herd flag tests (N, R for vivify/recache)
- Metadata flag tests (h, l, u for hit status/last access/skip LRU)
- Stale flag test for meta_delete (I flag)
- raw_mode? method tests in Protocol::Base

Roadmap updates:
- Mark Phase 2 (v4.2.0) as complete
- Document what was implemented vs planned
- Add "Future Performance Work" section for Shopify PR #45 optimizations
  that were not included (allocation reduction, single-server fast path)
- Update Shopify PR status table with current Dalli status

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Document the specific span attributes added by different operation types:
- Single-key operations: db.operation, server.address
- Multi-key operations: db.memcached.key_count, hit_count, miss_count
- Bulk write operations: db.operation, db.memcached.key_count

Add detailed @param, @yield, @return, and @example annotations to all
public methods. Document error handling behavior and zero-overhead
guarantee when OpenTelemetry is not loaded.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add comprehensive research notes for each GitHub issue:

- #1034: Likely fixed by PR #1025's IO#timeout= usage on Ruby 3.0+.
  Document fallback path fix from @lnussbaum if needed.

- #776/#941: Document root cause (send/receive buffer deadlock) and
  proposed interleaved read/write solution from @petergoldstein.

- #1022: Document the raw+nil behavior matrix and competing proposals.
  Blocked on Rails team input about MemCacheStore expectations.

- #1019: Easy fix - add configurable separator with character validation.

- #805: Likely resolved by OTel support + Rails 8.0 error handling.

- #1039: Deprioritize due to insufficient info and no other reports.

Reprioritize issues based on impact and actionability.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
#1034 (timeval packing):
- Clarify that IO#timeout= requires Ruby 3.2+, not 3.0
- Ruby 3.1 users on non-x86_64 architectures are affected
- Add concrete implementation using @lnussbaum's detection approach
- Document options: fix fallback (v4.3), bump to Ruby 3.2 (v5.0)

#776/#941 (get_multi deadlock):
- Add detailed implementation plan leveraging v4.2.0 buffered I/O
- Design interleaved read/write for large key batches (>10k keys)
- Include code examples for PipelinedGetter modifications
- Define testing strategy for the fix

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
TruffleRuby (#988):
- Strategy: non-blocking CI jobs with continue-on-error
- Document incompatibilities and report to TruffleRuby team
- @nirvdrum/@eregon as points of contact

Benchmarks:
- Document that benchmarks already run on every push/PR
- Note potential improvements: enable set_multi test, store artifacts,
  regression detection, multi-Ruby benchmarks
- Mark as low priority since current setup is functional

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@petergoldstein petergoldstein force-pushed the feature/v4.2.0-performance branch from dc1662c to b6532d0 Compare January 26, 2026 03:11
petergoldstein and others added 5 commits January 25, 2026 23:19
Replace the manual `readfull` loop with Ruby's built-in `IO#read` method.
Modern Ruby (3.1+) handles non-blocking reads internally with proper
timeout support via `IO#timeout=`, making the manual loop unnecessary.

This provides ~7% performance improvement on read operations based on
benchmarks from PR #1026.

The `readfull` method is preserved in socket.rb for backwards compatibility
but is no longer used internally.

Based on work by @grcooper in PR #1026.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add deprecation warning to readfull method, which is no longer used
internally after switching to IO#read. The method will be removed
in Dalli 5.0.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The IO#read method doesn't handle timeouts properly on:
- Ruby 3.1 (lacks IO#timeout= which was added in Ruby 3.2)
- JRuby (different IO implementation)

The original readfull method uses read_nonblock with manual timeout
handling via wait_readable, which works across all supported Ruby
versions and implementations.

This reverts commits 98b0309 and f5165ef.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Document the IO#read performance optimization for v5.0, explaining why
it was deferred:
- IO#timeout= required for proper timeout handling (Ruby 3.2+)
- Ruby 3.1 hangs without IO#timeout= support
- JRuby has different IO implementation, needs readfull fallback

v5.0's minimum Ruby 3.2 requirement enables this optimization.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@petergoldstein

Copy link
Copy Markdown
Owner Author

I'm planning to cut this gem version tonight. I tried the readfull->read change, but rolled it back because of Ruby 3.1 incompatibility. Deferring to 5.0. Continuing to update the roadmap doc based on open issue research.

@petergoldstein petergoldstein merged commit 62f694f into main Jan 26, 2026
49 of 50 checks passed
@petergoldstein petergoldstein deleted the feature/v4.2.0-performance branch February 24, 2026 13:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants