simplify batching on flush to cortex sink#1022
simplify batching on flush to cortex sink#1022mcowgill-stripe wants to merge 7 commits intostripe:masterfrom
Conversation
|
|
| } | ||
|
|
||
| doIfNotDone := func(fn func() error) error { | ||
| batching: |
There was a problem hiding this comment.
I'm sure we can avoid using a label. Should a method be introduced?
There was a problem hiding this comment.
Or a done bool for the for loop
| end := i + batchSize | ||
| if end > len(metrics) { | ||
| end = len(metrics) | ||
| } |
There was a problem hiding this comment.
| end := i + batchSize | |
| if end > len(metrics) { | |
| end = len(metrics) | |
| } | |
| end := math.Min(i + batchSize, len(metrics)) |
| droppedMetrics += len(metrics[i:]) | ||
| break batching |
There was a problem hiding this comment.
Previous behavior had this drop observable with s.logger.Error(err)
| batch = []samplers.InterMetric{} | ||
| err := s.writeMetrics(ctx, batch) | ||
| if err != nil { | ||
| allErrs = multierror.Append(allErrs, err) |
There was a problem hiding this comment.
If I'm reading this right...
Previously if we had a single failure, we stopped processing (it returned from the method). Now it will continue through the batch?
This sounds like a fix, but might cause unintended build up of failures on remote failures.
I think, if this was intended, a test should be added for it which failed before and passes now. That'll help show @arnavdugar-stripe the functional change as well for validation.
|
|
||
| "github.com/golang/protobuf/proto" | ||
| "github.com/golang/snappy" | ||
| "github.com/hashicorp/go-multierror" |
There was a problem hiding this comment.
No concern from me, but if anyone has concern about increasing the dependency surface area alternatives are listed here: https://stackoverflow.com/questions/33470649/combine-multiple-error-strings
| if end > len(metrics) { | ||
| end = len(metrics) | ||
| } | ||
| batch := metrics[i:end] |
There was a problem hiding this comment.
Previously, this made a new array&slice for batch.
Now it uses a slice from metrics.
I had to read https://go.dev/blog/slices-intro, but it looks like
- this will be more performant
- It's possible that changes (including append()) inside of anything that
batchis passed to may write into the arraymetrics
I think this fixes it (alternatively we can read the code for where its passed, but then we have to hope it never gets changed)?
| batch := metrics[i:end] | |
| batch := metrics[i:end:end-i] |
| select { | ||
| case <-ctx.Done(): | ||
| return errors.New("context finished before completing metrics flush") | ||
| droppedMetrics += len(metrics[i:]) |
There was a problem hiding this comment.
probably not important, but since we are looking at performance improvements anyway...
I think metrics[i:] allocates a single pointer. We could avoid the allocation by...
| droppedMetrics += len(metrics[i:]) | |
| droppedMetrics += len(metrics)-i |
| batchSize = len(metrics) | ||
| } | ||
|
|
||
| doIfNotDone := func(fn func() error) error { |
There was a problem hiding this comment.
👏 hooray for not needing a closure now!
Summary
This change uses a simple gofunc + channel to implement the batching logic. The break function requires a label, I did avoid using
goto.Motivation
Simplifying the code to enable the use of channel with select without a function closure.
Test plan
Updated existing tests, there is no functional change. However, the test does require one update due to the async nature changing subtly.
Rollout/monitoring/revert plan
This change can be reverted and should not change the behavior after deploy.