Skip to content

messages delivered after WriteMessages returns a context timeout error #1427

Description

@ionutboangiu

I'm calling WriteMessages with a context timeout. When the broker is unreachable the context expires and I get the error back, which is fine. The problem is those messages still get delivered later when the broker comes back. Every single one of them.

My service has a generic export interface with multiple backends (HTTP, AMQP, Kafka, SQL, etc). They all work the same way: export returns an error, save to disk, retry later. This works for every other backend because an error means the message wasn't delivered. With Writer, an error doesn't mean that, so I end up with duplicates for everything during an outage.

I saw the comment on WriteMessages about messages possibly still being written after the method returns, and the TODO in writeBatch about cancelling batches when nobody's waiting for them anymore. So it seems like this is known but there's no way around it from the caller side.

There's also a memory issue. Since Kafka is just one of several optional export paths, I don't want a broker outage to OOM the whole process. But the internal queue seems to keep growing while the broker is unreachable, so under enough load it eventually does.

The only thing I tried was calling Close() to start over with a new Writer, but from what I can tell it flushes pending messages rather than discarding them. It blocks while the broker is down and delivers everything once it comes back.

Is there a recommended way to deal with this? Would a Close that discards pending batches rather than flushing them make sense?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions