Packets dropped with multithreaded TrackEvent, no buffer overwrite

I'm using perfetto for the first time so apologies if I'm just misunderstanding intended usage, but I've been digging into things for two days and can't understand why my traces are getting corrupted. I'm using the C++ SDK and doing nothing other than the default `TRACE_EVENT[_BEGIN/END]` datasource instrumentation. So far, I haven't seen a problem running single threaded, but when I run tasks in parallel on a large enough workload, I almost always end up with dropped packets which render the result useless.

Here is the relevant data, so far as I understand, from my investigation:

1. I see various mixtures of the following data loss warnings: `traced_buf_trace_writer_packet_loss`, `traced_buf_sequence_packet_loss` and `traced_buf_incremental_sequences_dropped`. Maybe just one, maybe two or all three. Yet whenever there are any such loss events, the timeline shows up malformed, with some events being marked as incomplete.
2. Sometimes the timeline length seems as it should be, with the incomplete slices just extending to the end and being far longer than they should be. Other times the incomplete slices just cut short and the trace timeline is much shorter than it should be.
3. Often I see `traced_flushes_succeeded` being one less than `traced_flushes_requested`, but this appears to have no relation to whether or not the results are malformed.
4. When it is malformed, `traced_buf_bytes_read` is slightly smaller than `traced_buf_bytes_written`.
5. To give an idea of my data volume, the traces I've been doing have been around 2-3 minutes, generating trace files of around half a gigabyte.
6. Slice nesting depth reaches around 100.
7. I ran `traceconv text` as suggested in another issue, and found some packets like the following:
```
packet {
  previous_packet_dropped: true
  timestamp: 85501
  sequence_flags: 2
  track_event {
    type: TYPE_SLICE_END
  }
  trusted_uid: 0
  trusted_packet_sequence_id: 3
  trusted_pid: 0
}
```

To give a bit more info on my setup. I trigger a single tracing session, which
- calls `perfetto::Tracing::NewTrace(perfetto::kInProcessBackend)`
- calls `Setup()` (passing a file) and then `StartBlocking`
- the main thread then spawns N workers (`std::jthread`), all of which along with the main thread run the same code which can emit events
- the main thread will join with all the worker threads, which are then destroyed
- only then will the main thread call `FlushBlocking` (unsure if needed) before `StopBlocking` and then closing the file

Results are unpredictable, but so far I've seen packet loss for N >= 1, but not for N == 0.

I've experimented with various config values based on suggestions in the docs and elsewhere:
- I've bumped up the buffer size (`add_buffers()->set_size_kb()`) to 1GB, which from the stats is more than is being written throughout the whole trace, so there is no overflow happening there.
- Tried increasing `shmem_size_hint_kb` and `shmem_page_size_hint_kb` to various degrees.
- Experimented with `file_write_period` and `flush_period`.
- Also tried to use `BufferExhaustedPolicy::kStall` as noted somewhere.

Am I right in thinking that under these conditions, there's really no reason I should be seeing packet loss like this? Any suggestions as to what the cause could be, or ways to debug, would be much appreciated.

Stats from one malformed trace:
```
clock_sync_cache_miss | 77 | info (analysis)
guess_trace_type_duration_ns | 110163100 | info (analysis)
packet_skipped_seq_needs_incremental_state_invalid | 9715362 | info (analysis)
parse_trace_duration_ns | 7912450300 | info (analysis)
traced_buf_buffer_size[0] | 1073741824 | info (trace)
traced_buf_bytes_read[0] | 363528192 | info (trace)
traced_buf_bytes_written[0] | 363560960 | info (trace)
traced_buf_chunks_read[0] | 11094 | info (trace)
traced_buf_chunks_rewritten[0] | 35 | info (trace)
traced_buf_chunks_written[0] | 11095 | info (trace)
traced_buf_patches_succeeded[0] | 354 | info (trace)
traced_buf_readaheads_failed[0] | 25 | info (trace)
traced_buf_readaheads_succeeded[0] | 234 | info (trace)
traced_buf_sequence_packet_loss[0] | 14 | data_loss (analysis)
traced_chunks_discarded | 19 | info (trace)
traced_data_sources_registered | 1 | info (trace)
traced_data_sources_seen | 20 | info (trace)
traced_flushes_requested | 15 | info (trace)
traced_flushes_succeeded | 15 | info (trace)
traced_producers_connected | 1 | info (trace)
traced_producers_seen | 1 | info (trace)
traced_total_buffers | 1 | info (trace)
traced_tracing_sessions | 1 | info (trace)
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Packets dropped with multithreaded TrackEvent, no buffer overwrite #5142

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Packets dropped with multithreaded TrackEvent, no buffer overwrite #5142

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions