Conversation
pdvrieze
left a comment
There was a problem hiding this comment.
As more formats support kotlinx-io they all need some sort of text or binary input/output abstraction. It makes sense that this is more or less "internal", but such abstractions are not format specific at all. As such it makes more sense to put them into the core module (with a serialization internal api flag).
At some level it might even make sense to have these interfaces as public, but that may be better for a different discussion.
| @CborFriendModuleApi | ||
| public interface Input { | ||
| public val availableBytes: Int | ||
| /** Returns a -1 if no bytes are available. Otherwise returns a value between 0 and 255 (inclusive). */ | ||
| public fun read(): Int | ||
| public fun read(b: ByteArray, offset: Int, length: Int): Int | ||
| public fun skip(length: Int) | ||
| } | ||
|
|
||
| @CborFriendModuleApi | ||
| public interface Output { | ||
| public fun write(buffer: ByteArray, offset: Int = 0, count: Int = buffer.size) | ||
| public fun write(byteValue: Byte) | ||
| } |
There was a problem hiding this comment.
These types appear to be format independent and relevant to all binary formats. It would be worthwhile to consider whether these interfaces should be put in a common context.
There was a problem hiding this comment.
Should I handle that as part of this PR? My original intent with this API was to be as non-invasive as possible. If so, I'm happy to make the change but I'm not sure where it should live. formats/io?
There was a problem hiding this comment.
I would suggest leaving these base Input/Output interface in the core module, and change the parameter of BinaryFormat and StringFormat to use them.
There was a problem hiding this comment.
I think this is a separate discussion. We probably we want string and binary outputs in core, so we can then plug in whatever we want. Providing shims for ByteArrayOutput/input and Sink/Source (also for string-based IO) can then also move to core and core-io.
Just my 2 cents
| public val availableBytes: Int | ||
| /** Returns a -1 if no bytes are available. Otherwise returns a value between 0 and 255 (inclusive). */ |
There was a problem hiding this comment.
It is not clear what this means. What is the meaning of availableBytes==0 and how do you deal with (pending) network reads where the bytes are not available yet (and stream size is not known).
There was a problem hiding this comment.
Sorry, I didn't do the best job on the docs for Input/Output, I'll go back and update them 😅.
This one comment was me attempting to describe the pre-existing behavior of read in ByteArrayInput, since it took me a minute to figure it out. I had originally thought that this should return a Byte (the 0-255 value returned on the happy path), but there's a failure path if there's no more data available where it returns -1 (which forces the Int typing).
Re. network streams, I don't think this is even handled by kotlinx-io yet? It seems like the APIs presented are all synchronous (until Kotlin/kotlinx-io#163 has an implementation, at least).
There was a problem hiding this comment.
In most cases (including Java IO streams) returning 0 means that nothing was returned, but another call may return something again (although most API's that support blocking will block instead of returning 0). A negative (or -1) return value tends to mean end of file/stream.
|
RE: Input/Output general interfaces When kotlinx-io is 1.0, that will no longer be needed. Its |
sandwwraith
left a comment
There was a problem hiding this comment.
In short, the current Input/ByteArrayInput API does not fit the task because it was created to mimic the Java stream API, and kotlinx-io has different ideas in mind. Try to form the API based on what the actual CborDecoder needs.
Also, it may interfere with other work in CBOR (#3036), so maybe it's worth coordinating work in advance, or waiting for the PR instead. cc @JesusMcCloud @fzhinkin
|
|
||
| internal class IoStreamInput(private val source: Source): Input { | ||
| override val availableBytes: Int | ||
| get() = source.peek().readByteArray().size |
There was a problem hiding this comment.
Using peek and reading whole input to the memory is a huge potential performance problem. exhausted() should be used instead. Yes, it returns Boolean, so calls of availableBytes should also be adapted.
| get() = source.peek().readByteArray().size | ||
|
|
||
| override fun read(): Int = | ||
| try { |
There was a problem hiding this comment.
Using try-catch on the hot path is also a performance problem. read() seems to be called only from methods reading byte/int/long/etc. These methods should be adapted and use corresponding Sink methods instead (readByte()/readInt()/etc). It will also (probably) automatically give us IOException without the need to use availableBytes — see above.
| } | ||
|
|
||
| override fun read(b: ByteArray, offset: Int, length: Int): Int = | ||
| source.readAtMostTo(b, startIndex = offset, endIndex = offset + length) |
There was a problem hiding this comment.
Semantic is different: readAtMostTo can read fewer bytes than length in order not to block, while Input.read expects all the bytes. In this case, a reading loop should be implemented. @fzhinkin I do not remember why we don't have readExactTo in Source?
We have Source.readTo(sink: ByteArray, startIndex: Int = 0, endIndex: Int = sink.size). Use it instead. Return value of read(b, offset, len) is not needed anyway.
| import kotlinx.serialization.cbor.* | ||
| import kotlin.test.* | ||
|
|
||
| class IoTests { |
There was a problem hiding this comment.
Preferably, the whole test suite should be adapted to work with IO streams; see how it is done in JsonTestBase with JsonTestingMode.
There was a problem hiding this comment.
Also, a pair of basic benchmarks (see CborBaseline.kt) would be nice.
|
|
||
|
|
||
| //value classes are only inlined on the JVM, so we use a typealias and extensions instead | ||
| private typealias Stack = MutableList<CborWriter.Data> |
There was a problem hiding this comment.
This change is not related and I prefer to leave the typealias for easier reading.
| @CborFriendModuleApi | ||
| public interface Input { | ||
| public val availableBytes: Int | ||
| /** Returns a -1 if no bytes are available. Otherwise returns a value between 0 and 255 (inclusive). */ | ||
| public fun read(): Int | ||
| public fun read(b: ByteArray, offset: Int, length: Int): Int | ||
| public fun skip(length: Int) | ||
| } | ||
|
|
||
| @CborFriendModuleApi | ||
| public interface Output { | ||
| public fun write(buffer: ByteArray, offset: Int = 0, count: Int = buffer.size) | ||
| public fun write(byteValue: Byte) | ||
| } |
There was a problem hiding this comment.
I think this is a separate discussion. We probably we want string and binary outputs in core, so we can then plug in whatever we want. Providing shims for ByteArrayOutput/input and Sink/Source (also for string-based IO) can then also move to core and core-io.
Just my 2 cents
| structureStack.peek().bytes | ||
| private val structureStack = mutableListOf<Data>() | ||
| override fun getDestination(): Output = | ||
| structureStack.lastOrNull()?.bytes ?: output |
There was a problem hiding this comment.
This will interfere with #3036, but it's minimal. I'd of course still prefer my humongous PR to be finalized at at some point without adding even more layers to what is already challenging to review ;-)
Looking to follow in the footsteps of #2707 with CBOR support for kotlinx-io.
(Sorry for the noise re #3148, accidentally pointed it at the wrong branch and couldn't find the UI to change it.)