Skip to content

Packets dropped with multithreaded TrackEvent, no buffer overwrite #5142

@kamrann

Description

@kamrann

I'm using perfetto for the first time so apologies if I'm just misunderstanding intended usage, but I've been digging into things for two days and can't understand why my traces are getting corrupted. I'm using the C++ SDK and doing nothing other than the default TRACE_EVENT[_BEGIN/END] datasource instrumentation. So far, I haven't seen a problem running single threaded, but when I run tasks in parallel on a large enough workload, I almost always end up with dropped packets which render the result useless.

Here is the relevant data, so far as I understand, from my investigation:

  1. I see various mixtures of the following data loss warnings: traced_buf_trace_writer_packet_loss, traced_buf_sequence_packet_loss and traced_buf_incremental_sequences_dropped. Maybe just one, maybe two or all three. Yet whenever there are any such loss events, the timeline shows up malformed, with some events being marked as incomplete.
  2. Sometimes the timeline length seems as it should be, with the incomplete slices just extending to the end and being far longer than they should be. Other times the incomplete slices just cut short and the trace timeline is much shorter than it should be.
  3. Often I see traced_flushes_succeeded being one less than traced_flushes_requested, but this appears to have no relation to whether or not the results are malformed.
  4. When it is malformed, traced_buf_bytes_read is slightly smaller than traced_buf_bytes_written.
  5. To give an idea of my data volume, the traces I've been doing have been around 2-3 minutes, generating trace files of around half a gigabyte.
  6. Slice nesting depth reaches around 100.
  7. I ran traceconv text as suggested in another issue, and found some packets like the following:
packet {
  previous_packet_dropped: true
  timestamp: 85501
  sequence_flags: 2
  track_event {
    type: TYPE_SLICE_END
  }
  trusted_uid: 0
  trusted_packet_sequence_id: 3
  trusted_pid: 0
}

To give a bit more info on my setup. I trigger a single tracing session, which

  • calls perfetto::Tracing::NewTrace(perfetto::kInProcessBackend)
  • calls Setup() (passing a file) and then StartBlocking
  • the main thread then spawns N workers (std::jthread), all of which along with the main thread run the same code which can emit events
  • the main thread will join with all the worker threads, which are then destroyed
  • only then will the main thread call FlushBlocking (unsure if needed) before StopBlocking and then closing the file

Results are unpredictable, but so far I've seen packet loss for N >= 1, but not for N == 0.

I've experimented with various config values based on suggestions in the docs and elsewhere:

  • I've bumped up the buffer size (add_buffers()->set_size_kb()) to 1GB, which from the stats is more than is being written throughout the whole trace, so there is no overflow happening there.
  • Tried increasing shmem_size_hint_kb and shmem_page_size_hint_kb to various degrees.
  • Experimented with file_write_period and flush_period.
  • Also tried to use BufferExhaustedPolicy::kStall as noted somewhere.

Am I right in thinking that under these conditions, there's really no reason I should be seeing packet loss like this? Any suggestions as to what the cause could be, or ways to debug, would be much appreciated.

Stats from one malformed trace:

clock_sync_cache_miss | 77 | info (analysis)
guess_trace_type_duration_ns | 110163100 | info (analysis)
packet_skipped_seq_needs_incremental_state_invalid | 9715362 | info (analysis)
parse_trace_duration_ns | 7912450300 | info (analysis)
traced_buf_buffer_size[0] | 1073741824 | info (trace)
traced_buf_bytes_read[0] | 363528192 | info (trace)
traced_buf_bytes_written[0] | 363560960 | info (trace)
traced_buf_chunks_read[0] | 11094 | info (trace)
traced_buf_chunks_rewritten[0] | 35 | info (trace)
traced_buf_chunks_written[0] | 11095 | info (trace)
traced_buf_patches_succeeded[0] | 354 | info (trace)
traced_buf_readaheads_failed[0] | 25 | info (trace)
traced_buf_readaheads_succeeded[0] | 234 | info (trace)
traced_buf_sequence_packet_loss[0] | 14 | data_loss (analysis)
traced_chunks_discarded | 19 | info (trace)
traced_data_sources_registered | 1 | info (trace)
traced_data_sources_seen | 20 | info (trace)
traced_flushes_requested | 15 | info (trace)
traced_flushes_succeeded | 15 | info (trace)
traced_producers_connected | 1 | info (trace)
traced_producers_seen | 1 | info (trace)
traced_total_buffers | 1 | info (trace)
traced_tracing_sessions | 1 | info (trace)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions