Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
38 changes: 35 additions & 3 deletions docs/data-sources/gpu.md
Original file line number Diff line number Diff line change
Expand Up @@ -176,9 +176,41 @@ naturally. See
for details on both modes.

Counter names and IDs are advertised by the GPU producer via `GpuCounterSpec` in
the data source descriptor. Counters are organized into groups (SYSTEM,
VERTICES, FRAGMENTS, PRIMITIVES, MEMORY, COMPUTE, RAY_TRACING) and include
measurement units and descriptions.
the data source descriptor, which includes measurement units and descriptions.

### Counter groups

Counter groups are used by the Perfetto UI to organize counter tracks into
groups. Counters can be assigned to built-in groups (SYSTEM, VERTICES,
FRAGMENTS, PRIMITIVES, MEMORY, COMPUTE, RAY_TRACING) via
`GpuCounterSpec.groups`. Producers can also define custom counter groups
using the `GpuCounterGroupSpec` message in `GpuCounterDescriptor`:

```
message GpuCounterGroupSpec {
optional uint32 group_id = 1;
optional string name = 2;
optional string description = 3;
repeated uint32 counter_ids = 4;
}
```

Custom groups can also be used to provide display names and descriptions for
the fixed `GpuCounterGroup` enum values (SYSTEM, VERTICES, etc.). To do this,
set `group_id` to the enum value and provide a `name` and/or `description`.

A counter's group membership is the union of groups assigned via
`GpuCounterSpec.groups` (the fixed enum) and `GpuCounterGroupSpec.counter_ids`
(custom groups).

For example, with custom groups "Compute Core" and "L2 Cache":

```
GPU > Counters > Compute Core > Counter A
GPU > Counters > Compute Core > Counter B
GPU > Counters > L2 Cache > Counter C
```


### Multi-GPU

Expand Down
18 changes: 18 additions & 0 deletions protos/perfetto/common/gpu_counter_descriptor.proto
Original file line number Diff line number Diff line change
Expand Up @@ -70,6 +70,24 @@ message GpuCounterDescriptor {
}
repeated GpuCounterBlock blocks = 2;

// Allow producer to define custom counter groups. Unlike the fixed
// GpuCounterGroup enum (which provides broad categories), these groups
// let the producer define hardware-specific groupings that the UI uses
// to organize counter tracks. Can also be used to provide display names
// and descriptions for the fixed GpuCounterGroup enum values by setting
// group_id to the enum value.
message GpuCounterGroupSpec {
// required. Unique ID for this group within the descriptor.
optional uint32 group_id = 1;
// optional. Display name for the group.
optional string name = 2;
// optional. Description of the group.
optional string description = 3;
// Counters that belong directly to this group (by counter_id).
repeated uint32 counter_ids = 4;
}
repeated GpuCounterGroupSpec counter_groups = 6;

// optional. Minimum sampling period supported by the producer in
// nanoseconds.
optional uint64 min_sampling_period_ns = 3;
Expand Down
18 changes: 18 additions & 0 deletions protos/perfetto/config/perfetto_config.proto
Original file line number Diff line number Diff line change
Expand Up @@ -83,6 +83,24 @@ message GpuCounterDescriptor {
}
repeated GpuCounterBlock blocks = 2;

// Allow producer to define custom counter groups. Unlike the fixed
// GpuCounterGroup enum (which provides broad categories), these groups
// let the producer define hardware-specific groupings that the UI uses
// to organize counter tracks. Can also be used to provide display names
// and descriptions for the fixed GpuCounterGroup enum values by setting
// group_id to the enum value.
message GpuCounterGroupSpec {
// required. Unique ID for this group within the descriptor.
optional uint32 group_id = 1;
// optional. Display name for the group.
optional string name = 2;
// optional. Description of the group.
optional string description = 3;
// Counters that belong directly to this group (by counter_id).
repeated uint32 counter_ids = 4;
}
repeated GpuCounterGroupSpec counter_groups = 6;

// optional. Minimum sampling period supported by the producer in
// nanoseconds.
optional uint64 min_sampling_period_ns = 3;
Expand Down
18 changes: 18 additions & 0 deletions protos/perfetto/trace/perfetto_trace.proto
Original file line number Diff line number Diff line change
Expand Up @@ -83,6 +83,24 @@ message GpuCounterDescriptor {
}
repeated GpuCounterBlock blocks = 2;

// Allow producer to define custom counter groups. Unlike the fixed
// GpuCounterGroup enum (which provides broad categories), these groups
// let the producer define hardware-specific groupings that the UI uses
// to organize counter tracks. Can also be used to provide display names
// and descriptions for the fixed GpuCounterGroup enum values by setting
// group_id to the enum value.
message GpuCounterGroupSpec {
// required. Unique ID for this group within the descriptor.
optional uint32 group_id = 1;
// optional. Display name for the group.
optional string name = 2;
// optional. Description of the group.
optional string description = 3;
// Counters that belong directly to this group (by counter_id).
repeated uint32 counter_ids = 4;
}
repeated GpuCounterGroupSpec counter_groups = 6;

// optional. Minimum sampling period supported by the producer in
// nanoseconds.
optional uint64 min_sampling_period_ns = 3;
Expand Down
85 changes: 82 additions & 3 deletions src/trace_processor/importers/proto/gpu_event_parser.cc
Original file line number Diff line number Diff line change
Expand Up @@ -305,14 +305,40 @@ TrackId GpuEventParser::InternGpuCounterTrack(
tracks::DynamicUnit(unit_id));
}

GpuEventParser::GroupMetadataMap GpuEventParser::BuildGroupMetadata(
const GpuCounterDescriptor::Decoder& desc) {
GroupMetadataMap metadata;
for (auto group_it = desc.counter_groups(); group_it; ++group_it) {
GpuCounterDescriptor::GpuCounterGroupSpec::Decoder group(*group_it);
if (!group.has_group_id()) {
continue;
}
auto group_id = static_cast<int32_t>(group.group_id());
auto name_id = group.has_name()
? context_->storage->InternString(group.name())
: kNullStringId;
auto desc_id = group.has_description()
? context_->storage->InternString(group.description())
: kNullStringId;
metadata.Insert(group_id, GroupMetadata{name_id, desc_id});
}
return metadata;
}

void GpuEventParser::InsertCounterGroups(
TrackId track_id,
const GpuCounterDescriptor::GpuCounterSpec::Decoder& spec) {
const GpuCounterDescriptor::GpuCounterSpec::Decoder& spec,
const GroupMetadataMap& group_metadata) {
if (spec.has_groups()) {
for (auto group = spec.groups(); group; ++group) {
tables::GpuCounterGroupTable::Row row;
row.group_id = *group;
row.track_id = track_id;
auto* meta = group_metadata.Find(*group);
if (meta) {
row.name = meta->name;
row.description = meta->description;
}
context_->storage->mutable_gpu_counter_group_table()->Insert(row);
}
} else {
Expand All @@ -323,6 +349,38 @@ void GpuEventParser::InsertCounterGroups(
}
}

void GpuEventParser::InsertCustomCounterGroups(
const GpuCounterDescriptor::Decoder& desc,
const base::FlatHashMap<uint32_t, TrackId>& counter_id_to_track) {
for (auto group_it = desc.counter_groups(); group_it; ++group_it) {
GpuCounterDescriptor::GpuCounterGroupSpec::Decoder group(*group_it);
if (!group.has_group_id()) {
continue;
}
auto group_id = static_cast<int32_t>(group.group_id());
auto name_id = group.has_name()
? context_->storage->InternString(group.name())
: kNullStringId;
auto desc_id = group.has_description()
? context_->storage->InternString(group.description())
: kNullStringId;

for (auto cid_it = group.counter_ids(); cid_it; ++cid_it) {
uint32_t counter_id = *cid_it;
auto* track_id_ptr = counter_id_to_track.Find(counter_id);
if (!track_id_ptr) {
continue;
}
tables::GpuCounterGroupTable::Row row;
row.group_id = group_id;
row.track_id = *track_id_ptr;
row.name = name_id;
row.description = desc_id;
context_->storage->mutable_gpu_counter_group_table()->Insert(row);
}
}
}

void GpuEventParser::PushGpuCounterValue(
int64_t ts,
double value,
Expand Down Expand Up @@ -359,6 +417,7 @@ void GpuEventParser::ParseGpuCounterEvent(

GpuCounterDescriptor::Decoder desc(interned->counter_descriptor());
auto gpu_id = interned->gpu_id();
auto group_metadata = BuildGroupMetadata(desc);

for (auto it = event.counters(); it; ++it) {
GpuCounterEvent::GpuCounter::Decoder counter(*it);
Expand All @@ -385,7 +444,7 @@ void GpuEventParser::ParseGpuCounterEvent(
auto [last_it, inserted] =
gpu_counter_last_id_.Insert(track_id, std::nullopt);
if (inserted) {
InsertCounterGroups(track_id, spec);
InsertCounterGroups(track_id, spec, group_metadata);
}

double counter_val = counter.has_int_value()
Expand All @@ -399,12 +458,30 @@ void GpuEventParser::ParseGpuCounterEvent(
context_->storage->IncrementStats(stats::gpu_counters_invalid_spec);
}
}

// Insert custom counter groups once per interned descriptor.
auto iid = event.counter_descriptor_iid();
if (!gpu_custom_groups_inserted_.Find(iid)) {
gpu_custom_groups_inserted_.Insert(iid, true);
base::FlatHashMap<uint32_t, TrackId> counter_id_to_track;
for (auto spec_it = desc.specs(); spec_it; ++spec_it) {
GpuCounterDescriptor::GpuCounterSpec::Decoder spec(*spec_it);
if (!spec.has_counter_id() || !spec.has_name()) {
continue;
}
auto track_id = InternGpuCounterTrack(gpu_id, spec);
counter_id_to_track.Insert(spec.counter_id(), track_id);
}
InsertCustomCounterGroups(desc, counter_id_to_track);
}
return;
}

// Legacy inline counter_descriptor path.
if (event.has_counter_descriptor()) {
GpuCounterDescriptor::Decoder descriptor(event.counter_descriptor());
auto group_metadata = BuildGroupMetadata(descriptor);
base::FlatHashMap<uint32_t, TrackId> counter_id_to_track;
for (auto it = descriptor.specs(); it; ++it) {
GpuCounterDescriptor::GpuCounterSpec::Decoder spec(*it);
if (!spec.has_counter_id()) {
Expand Down Expand Up @@ -434,9 +511,11 @@ void GpuEventParser::ParseGpuCounterEvent(

auto gpu_id = event.gpu_id();
auto track_id = InternGpuCounterTrack(gpu_id, spec);
InsertCounterGroups(track_id, spec);
InsertCounterGroups(track_id, spec, group_metadata);
gpu_counter_state_.Insert(counter_id, GpuCounterState{track_id, {}});
counter_id_to_track.Insert(counter_id, track_id);
}
InsertCustomCounterGroups(descriptor, counter_id_to_track);
}

for (auto it = event.counters(); it; ++it) {
Expand Down
18 changes: 16 additions & 2 deletions src/trace_processor/importers/proto/gpu_event_parser.h
Original file line number Diff line number Diff line change
Expand Up @@ -103,10 +103,20 @@ class GpuEventParser {
int32_t gpu_id,
const protos::pbzero::GpuCounterDescriptor::GpuCounterSpec::Decoder&
spec);
struct GroupMetadata {
StringId name;
StringId description;
};
using GroupMetadataMap = base::FlatHashMap<int32_t, GroupMetadata>;
GroupMetadataMap BuildGroupMetadata(
const protos::pbzero::GpuCounterDescriptor::Decoder& desc);
void InsertCounterGroups(
TrackId track_id,
const protos::pbzero::GpuCounterDescriptor::GpuCounterSpec::Decoder&
spec);
const protos::pbzero::GpuCounterDescriptor::GpuCounterSpec::Decoder& spec,
const GroupMetadataMap& group_metadata);
void InsertCustomCounterGroups(
const protos::pbzero::GpuCounterDescriptor::Decoder& desc,
const base::FlatHashMap<uint32_t, TrackId>& counter_id_to_track);
void PushGpuCounterValue(int64_t ts,
double value,
TrackId track_id,
Expand Down Expand Up @@ -144,6 +154,10 @@ class GpuEventParser {
base::FlatHashMap<TrackId, std::optional<tables::CounterTable::Id>>
gpu_counter_last_id_;

// Tracks which interned counter descriptors have had their custom groups
// inserted, to avoid duplicates. Key: counter_descriptor_iid.
base::FlatHashMap<uint64_t, bool> gpu_custom_groups_inserted_;

// For GpuRenderStageEvent
struct HwQueueInfo {
StringId name;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -629,7 +629,11 @@ CREATE PERFETTO VIEW gpu_counter_group (
-- Group id.
group_id LONG,
-- Track id.
track_id JOINID(track.id)
track_id JOINID(track.id),
-- Group name. NULL for legacy enum-based groups.
name STRING,
-- Group description. NULL for legacy enum-based groups.
description STRING
) AS
SELECT
*
Expand Down
15 changes: 12 additions & 3 deletions src/trace_processor/tables/profiler_tables.py
Original file line number Diff line number Diff line change
Expand Up @@ -1229,13 +1229,22 @@
columns=[
C('group_id', CppInt32()),
C('track_id', CppTableId(TRACK_TABLE)),
C('name', CppOptional(CppString())),
C('description', CppOptional(CppString())),
],
tabledoc=TableDoc(
doc='''''',
doc='''Maps GPU counter tracks to groups.''',
group='Misc',
columns={
'group_id': '''''',
'track_id': ''''''
'group_id':
'''Group identifier (enum value for legacy groups, custom
ID for producer-defined groups).''',
'track_id':
'''Track table reference for the counter.''',
'name':
'''Group name. NULL for legacy enum-based groups.''',
'description':
'''Group description. NULL for legacy enum-based groups.''',
}))

# TODO(lalitm): delete this once we have proper tree functions.
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
packet {
trusted_packet_sequence_id: 1
timestamp: 0
gpu_counter_event {
counter_descriptor {
specs {
counter_id: 1
name: "Counter A"
description: "First counter"
groups: COMPUTE
}
specs {
counter_id: 2
name: "Counter B"
description: "Second counter"
groups: COMPUTE
}
specs {
counter_id: 3
name: "Counter C"
description: "Third counter"
groups: MEMORY
}
specs {
counter_id: 4
name: "Counter D"
description: "Fourth counter"
}
counter_groups {
group_id: 5
name: "Memory"
description: "Memory counters"
counter_ids: 4
}
counter_groups {
group_id: 6
name: "Compute Core"
description: "Compute core counters"
}
counter_groups {
group_id: 100
name: "L2 Cache"
description: "L2 cache counters"
counter_ids: 3
}
}
counters {
counter_id: 1
int_value: 100
}
counters {
counter_id: 2
int_value: 200
}
counters {
counter_id: 3
int_value: 300
}
counters {
counter_id: 4
int_value: 400
}
}
}
Loading
Loading