Skip to content

Add CBOR kotlinx-io support#3149

Open
cedrickcooke wants to merge 1 commit intoKotlin:devfrom
cedrickcooke:cbor-kotlinx-io
Open

Add CBOR kotlinx-io support#3149
cedrickcooke wants to merge 1 commit intoKotlin:devfrom
cedrickcooke:cbor-kotlinx-io

Conversation

@cedrickcooke
Copy link
Copy Markdown

Looking to follow in the footsteps of #2707 with CBOR support for kotlinx-io.

(Sorry for the noise re #3148, accidentally pointed it at the wrong branch and couldn't find the UI to change it.)

Copy link
Copy Markdown
Contributor

@pdvrieze pdvrieze left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As more formats support kotlinx-io they all need some sort of text or binary input/output abstraction. It makes sense that this is more or less "internal", but such abstractions are not format specific at all. As such it makes more sense to put them into the core module (with a serialization internal api flag).

At some level it might even make sense to have these interfaces as public, but that may be better for a different discussion.

Comment on lines +7 to +20
@CborFriendModuleApi
public interface Input {
public val availableBytes: Int
/** Returns a -1 if no bytes are available. Otherwise returns a value between 0 and 255 (inclusive). */
public fun read(): Int
public fun read(b: ByteArray, offset: Int, length: Int): Int
public fun skip(length: Int)
}

@CborFriendModuleApi
public interface Output {
public fun write(buffer: ByteArray, offset: Int = 0, count: Int = buffer.size)
public fun write(byteValue: Byte)
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These types appear to be format independent and relevant to all binary formats. It would be worthwhile to consider whether these interfaces should be put in a common context.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should I handle that as part of this PR? My original intent with this API was to be as non-invasive as possible. If so, I'm happy to make the change but I'm not sure where it should live. formats/io?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest leaving these base Input/Output interface in the core module, and change the parameter of BinaryFormat and StringFormat to use them.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a separate discussion. We probably we want string and binary outputs in core, so we can then plug in whatever we want. Providing shims for ByteArrayOutput/input and Sink/Source (also for string-based IO) can then also move to core and core-io.

Just my 2 cents

Comment on lines +9 to +10
public val availableBytes: Int
/** Returns a -1 if no bytes are available. Otherwise returns a value between 0 and 255 (inclusive). */
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not clear what this means. What is the meaning of availableBytes==0 and how do you deal with (pending) network reads where the bytes are not available yet (and stream size is not known).

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I didn't do the best job on the docs for Input/Output, I'll go back and update them 😅.

This one comment was me attempting to describe the pre-existing behavior of read in ByteArrayInput, since it took me a minute to figure it out. I had originally thought that this should return a Byte (the 0-255 value returned on the happy path), but there's a failure path if there's no more data available where it returns -1 (which forces the Int typing).

Re. network streams, I don't think this is even handled by kotlinx-io yet? It seems like the APIs presented are all synchronous (until Kotlin/kotlinx-io#163 has an implementation, at least).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In most cases (including Java IO streams) returning 0 means that nothing was returned, but another call may return something again (although most API's that support blocking will block instead of returning 0). A negative (or -1) return value tends to mean end of file/stream.

@sandwwraith
Copy link
Copy Markdown
Member

RE: Input/Output general interfaces

When kotlinx-io is 1.0, that will no longer be needed. Its Sink and Source can be directly added to a BinaryFormat or BinaryIoFormat. Implementing encoding to/from ByteArray should be trivial via sinks, so it would be enough for the implementation to write/read Sink/Source.

Copy link
Copy Markdown
Member

@sandwwraith sandwwraith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In short, the current Input/ByteArrayInput API does not fit the task because it was created to mimic the Java stream API, and kotlinx-io has different ideas in mind. Try to form the API based on what the actual CborDecoder needs.

Also, it may interfere with other work in CBOR (#3036), so maybe it's worth coordinating work in advance, or waiting for the PR instead. cc @JesusMcCloud @fzhinkin


internal class IoStreamInput(private val source: Source): Input {
override val availableBytes: Int
get() = source.peek().readByteArray().size
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using peek and reading whole input to the memory is a huge potential performance problem. exhausted() should be used instead. Yes, it returns Boolean, so calls of availableBytes should also be adapted.

get() = source.peek().readByteArray().size

override fun read(): Int =
try {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using try-catch on the hot path is also a performance problem. read() seems to be called only from methods reading byte/int/long/etc. These methods should be adapted and use corresponding Sink methods instead (readByte()/readInt()/etc). It will also (probably) automatically give us IOException without the need to use availableBytes — see above.

}

override fun read(b: ByteArray, offset: Int, length: Int): Int =
source.readAtMostTo(b, startIndex = offset, endIndex = offset + length)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Semantic is different: readAtMostTo can read fewer bytes than length in order not to block, while Input.read expects all the bytes. In this case, a reading loop should be implemented. @fzhinkin I do not remember why we don't have readExactTo in Source?

We have Source.readTo(sink: ByteArray, startIndex: Int = 0, endIndex: Int = sink.size). Use it instead. Return value of read(b, offset, len) is not needed anyway.

import kotlinx.serialization.cbor.*
import kotlin.test.*

class IoTests {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Preferably, the whole test suite should be adapted to work with IO streams; see how it is done in JsonTestBase with JsonTestingMode.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, a pair of basic benchmarks (see CborBaseline.kt) would be nice.



//value classes are only inlined on the JVM, so we use a typealias and extensions instead
private typealias Stack = MutableList<CborWriter.Data>
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is not related and I prefer to leave the typealias for easier reading.

Comment on lines +7 to +20
@CborFriendModuleApi
public interface Input {
public val availableBytes: Int
/** Returns a -1 if no bytes are available. Otherwise returns a value between 0 and 255 (inclusive). */
public fun read(): Int
public fun read(b: ByteArray, offset: Int, length: Int): Int
public fun skip(length: Int)
}

@CborFriendModuleApi
public interface Output {
public fun write(buffer: ByteArray, offset: Int = 0, count: Int = buffer.size)
public fun write(byteValue: Byte)
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a separate discussion. We probably we want string and binary outputs in core, so we can then plug in whatever we want. Providing shims for ByteArrayOutput/input and Sink/Source (also for string-based IO) can then also move to core and core-io.

Just my 2 cents

structureStack.peek().bytes
private val structureStack = mutableListOf<Data>()
override fun getDestination(): Output =
structureStack.lastOrNull()?.bytes ?: output
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will interfere with #3036, but it's minimal. I'd of course still prefer my humongous PR to be finalized at at some point without adding even more layers to what is already challenging to review ;-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants