Skip to content

Speed up Reader.parse and Writer.generate_lines#6

Draft
lautis wants to merge 2 commits into
masterfrom
optimise
Draft

Speed up Reader.parse and Writer.generate_lines#6
lautis wants to merge 2 commits into
masterfrom
optimise

Conversation

@lautis

@lautis lautis commented Apr 26, 2026

Copy link
Copy Markdown
Owner

Goal: cut allocations and copies in the hot paths so the Rust extension
runs as fast as possible without resorting to unsafe shortcuts that
break GC safety.

Writer changes

  • Take RArray/RString directly instead of having magnus deserialize
    rows into Vec<Vec> on every call (eliminates ~10K transient
    String + 1K Vec allocations for a 1000-row generate_lines).
  • Use csv::Writer::write_byte_record so each row hits csv crate's
    bulk-copy fast path.
  • For generate_lines, write straight into a pre-sized Ruby string via
    rb_str_cat (RStringWriter) — saves the final Vec -> RString copy.
  • For generate_line, keep the Vec + enc_str_new path so MRI can use
    an embedded RString for the typically-small output.

Reader changes

  • parse_csv accepts RString and reads bytes via as_slice — no
    String::from_utf8 round-trip on the input.
  • Drive csv-core directly so we bypass csv crate's internal BufReader
    copy of the in-memory input.
  • Preserve the input string's encoding on every produced field instead
    of always returning BINARY, so non-ASCII inputs round-trip correctly.

Build profile

  • Add a release profile with fat LTO and codegen-units = 1.

Benchmark harness

  • Use a more representative dataset (mix of plain hex, quoted/escaped
    fields, and empty values) and add coverage for generate_line and the
    streaming Reader.each path so the new code paths are measured.

Results on Ruby 4.0.3 + YJIT, 1000-row dataset:
generate_lines 1.36k -> 6.64k i/s (~4.9x)
generate_line ? -> 1.63M i/s
parse 1.80k -> 2.10k i/s (~1.17x; mostly Ruby alloc cost)

lautis added 2 commits April 26, 2026 21:46
osv gem does not build cleanly, so it should be skipped
Goal: cut allocations and copies in the hot paths so the Rust extension
runs as fast as possible without resorting to unsafe shortcuts that
break GC safety.

Writer changes
- Take RArray/RString directly instead of having magnus deserialize
  rows into Vec<Vec<String>> on every call (eliminates ~10K transient
  String + 1K Vec allocations for a 1000-row generate_lines).
- Use csv::Writer::write_byte_record so each row hits csv crate's
  bulk-copy fast path.
- For generate_lines, write straight into a pre-sized Ruby string via
  rb_str_cat (RStringWriter) — saves the final Vec<u8> -> RString copy.
- For generate_line, keep the Vec<u8> + enc_str_new path so MRI can use
  an embedded RString for the typically-small output.

Reader changes
- parse_csv accepts RString and reads bytes via as_slice — no
  String::from_utf8 round-trip on the input.
- Drive csv-core directly so we bypass csv crate's internal BufReader
  copy of the in-memory input.
- Preserve the input string's encoding on every produced field instead
  of always returning BINARY, so non-ASCII inputs round-trip correctly.

Build profile
- Add a release profile with fat LTO and codegen-units = 1.

Benchmark harness
- Use a more representative dataset (mix of plain hex, quoted/escaped
  fields, and empty values) and add coverage for generate_line and the
  streaming Reader.each path so the new code paths are measured.

Results on Ruby 4.0.3 + YJIT, 1000-row dataset:
  generate_lines  1.36k -> 6.64k i/s  (~4.9x)
  generate_line   ?     -> 1.63M i/s
  parse           1.80k -> 2.10k i/s  (~1.17x; mostly Ruby alloc cost)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant