Skip to content

Add each_line enumerator to IO class#10

Open
ianfixes wants to merge 3 commits into
lautis:masterfrom
ianfixes:2020-01-25_io_each_line
Open

Add each_line enumerator to IO class#10
ianfixes wants to merge 3 commits into
lautis:masterfrom
ianfixes:2020-01-25_io_each_line

Conversation

@ianfixes

@ianfixes ianfixes commented Jan 25, 2020

Copy link
Copy Markdown

This enables the following:

require 'piperator'
require 'pathname'

# Open a set of files as if they were one big file, like the "cat" command does
# works anywhere you'd use File.open("one.txt").each_line, only 1 line in memory at a time
Piperator.infinite_io(Pathname.glob("*.txt").map(&:each_line).reduce(:+)) do |io| 
  io.each_line { |line| puts line }
end

# Ruby 2.3 version
Piperator.infinite_io(Pathname.glob("*.txt").map(&:each_line).lazy.flat_map(&:lazy)) do |io| 
  io.each_line { |line| puts line }
end

@ianfixes ianfixes force-pushed the 2020-01-25_io_each_line branch from fbe0b4e to 59eb99b Compare January 25, 2020 05:40
Comment thread lib/piperator/io.rb Outdated
@lautis

lautis commented Jan 29, 2020

Copy link
Copy Markdown
Owner

In the Ruby IO object, each_line takes the line separator and limit as arguments. Would that be possible to include here?

https://ruby-doc.org/core-2.5.1/IO.html#method-i-each_line

@ianfixes

Copy link
Copy Markdown
Author

I ran into trouble with this, and the only way around it seems to be to use the .eager method that was added in Ruby 2.7 to turn a lazy enumerator into a "regular" one.

My goal here is to be able to take a lazy enumerator (like you'd use for an infinite sequence) and use it to create an infinitely long IO stream. In other words, to drive a stream of bytes from a generator function.

The problem I'm running into here is that either I implement it as a lazy enumerator (which causes each_line to return no data), or as a "regular" enumerator (which causes an out of memory error -- it tries to read the entire infinite stream before continuing).

But yes, if I can find away around that, I'd add those other methods.

Do you have any ideas for how to accomplish this? The last thing I tried was to create a child class of StringIO and try to write data to it from my Enumerator::next every time its buffer became empty. It didn't seem to work.

@ianfixes

Copy link
Copy Markdown
Author

I found a solution to this while developing an unrelated project: use IO.pipe to handle all the buffering. That solves both the lazy/eager enumerator problem and the buffering. Contributing it here in case it is of interest.

@ianfixes ianfixes force-pushed the 2020-01-25_io_each_line branch 3 times, most recently from a51c4ae to 1be2c16 Compare June 29, 2021 17:19
@ianfixes ianfixes force-pushed the 2020-01-25_io_each_line branch from 1be2c16 to 466417e Compare June 29, 2021 17:20
@ianfixes

Copy link
Copy Markdown
Author

@lautis

In the Ruby IO object, each_line takes the line separator and limit as arguments. Would that be possible to include here?

I worked around your suggestion by yielding a literal core IO object from the function.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants