Adds GraphQL Semantic Convention from Otel Workgroup#3515
Adds GraphQL Semantic Convention from Otel Workgroup#3515PascalSenn wants to merge 2 commits intoopen-telemetry:mainfrom
Conversation
|
This PR contains changes to area(s) that do not have an active SIG/project and will be auto-closed:
Such changes may be rejected or put on hold until a new SIG/project is established. Please refer to the Semantic Convention Areas |
|
This PR contains changes to area(s) that do not have an active SIG/project and will be auto-closed:
Such changes may be rejected or put on hold until a new SIG/project is established. Please refer to the Semantic Convention Areas |
|
@trask Can you let me know the process to get this reviewed? |
|
This PR contains changes to area(s) that do not have an active SIG/project and will be auto-closed:
Such changes may be rejected or put on hold until a new SIG/project is established. Please refer to the Semantic Convention Areas |
|
re-opening for discussion! @PascalSenn also check out open-telemetry/opentelemetry-specification#4906 there has been quite a bit of progress since we discussed last year about the option of hosting semantic conventions outside of this repository. I think the work is pretty bleeding edge still but may be ready for external usage soon. In the meantime, let's review and discuss the GraphQL semantic conventions here. |
docs/graphql/graphql-metrics.md
Outdated
| - [Metric: `graphql.server.request.parse.duration`](#metric-graphqlserverrequestparseduration) | ||
| - [Metric: `graphql.server.request.validate.duration`](#metric-graphqlserverrequestvalidateduration) | ||
| - [Metric: `graphql.server.request.execute.duration`](#metric-graphqlserverrequestexecuteduration) | ||
| - [Metric: `graphql.server.request.plan.duration`](#metric-graphqlserverrequestplanduration) |
There was a problem hiding this comment.
Has it been explored to consolidate these into a single graphql.server.processing.duration with the graphql.processing.type attribute to distinguish?
|
|
||
| - [GraphQL server](#graphql-server) | ||
| - [Metric: `graphql.server.request.duration`](#metric-graphqlserverrequestduration) | ||
| - [Metric: `graphql.server.active_requests`](#metric-graphqlserveractive_requests) |
There was a problem hiding this comment.
What about graphql.server.request.active
There was a problem hiding this comment.
@thompson-tomo I modeled this to mirror Metric: http.server.active_requests`. Is this the wrong convention?
There was a problem hiding this comment.
That metric has not been stabilised and at which point it is highly likening that it would be renamed.
the reason to change is to ensure that the namespace is identifying what it is being described and renaming it would result in it sitting alongside graphql.server.request.duration. Alternatively you could make this metric graphql.server.request.status with an attribute to indicate the active
| examples: ["Person", "Query", "Mutation"] | ||
| note: > | ||
| This is the GraphQL type name that contains the field definition. | ||
| - id: graphql.selection.field.coordinate |
There was a problem hiding this comment.
| - id: graphql.selection.field.coordinate | |
| - id: graphql.field.coordinate |
| note: > | ||
| This is always the actual field name as defined in the schema, not an | ||
| alias. | ||
| - id: graphql.selection.field.parent_type |
There was a problem hiding this comment.
| - id: graphql.selection.field.parent_type | |
| - id: graphql.field.parent_type |
| The path represents the location of the field being resolved within | ||
| the result structure. Therefore, if a field is aliased, the path will | ||
| use the alias name instead of the actual field name. | ||
| - id: graphql.selection.field.name |
There was a problem hiding this comment.
| - id: graphql.selection.field.name | |
| - id: graphql.field.name |
| - id: graphql.error.path | ||
| type: string | ||
| stability: development | ||
| brief: > | ||
| The path of the response field which experienced the error. | ||
| examples: ["user.friends[0].name", "findBookById"] | ||
| note: > | ||
| If an error can be associated to a particular field in the GraphQL | ||
| result, it must contain an entry with the key path that details the | ||
| path of the response field which experienced the error. | ||
|
|
||
| This allows clients to identify whether a null result is intentional | ||
| or caused by a runtime error. | ||
|
|
||
| The path starts from the root of the response. Field names are | ||
| separated by dots and list indices are represented using bracket | ||
| notation. If the error happens in an aliased field, the path should | ||
| use the aliased name, since it represents a path in the response, not | ||
| in the request. |
There was a problem hiding this comment.
Could we just use graphql.field.path?
There was a problem hiding this comment.
Correct and ties into the reusage I mentioned elsewhere
| be used as a metric > dimension. It is intended for span events and | ||
| log records only. | ||
|
|
||
| - id: graphql.error.locations |
There was a problem hiding this comment.
| - id: graphql.error.locations | |
| - id: graphql.document.locations |
| error extension code is a recommended way to categorize errors for | ||
| easier filtering and monitoring. | ||
|
|
||
| - id: graphql.error.schema_coordinate |
There was a problem hiding this comment.
| - id: graphql.error.schema_coordinate | |
| - id: graphql.field.schema_coordinate |
There was a problem hiding this comment.
hmm yes - we could generalize this.
i guess then it would be defined in the wrong group right?
| - id: registry.graphql.source | ||
| type: attribute_group | ||
| stability: development | ||
| display_name: GraphQL Source Attributes | ||
| brief: > | ||
| This document defines attributes for GraphQL source systems in distributed | ||
| GraphQL architectures. | ||
| attributes: | ||
| - id: graphql.source.name | ||
| brief: "The name of the source system." | ||
| type: string | ||
| stability: development | ||
| examples: ["accounts", "products", "reviews"] | ||
| note: > | ||
| The human-readable name of the downstream source that a distributed | ||
| GraphQL gateway dispatches to. For example, this could be a subgraph | ||
| name in a federated system or a stitched schema name. | ||
| - id: graphql.source.operation.name | ||
| brief: "The name of the GraphQL operation to be executed on the source." | ||
| type: string | ||
| stability: development | ||
| examples: ["GetUser", "FetchProducts", "ResolveReviews"] | ||
| note: > | ||
| The operation name of the query or mutation that the gateway sends to | ||
| the source system. | ||
| - id: graphql.source.operation.type | ||
| brief: "The type of GraphQL operation to be executed on the source." | ||
| stability: development | ||
| type: | ||
| members: | ||
| - id: query | ||
| value: "query" | ||
| brief: "GraphQL query operation" | ||
| stability: development | ||
| - id: mutation | ||
| value: "mutation" | ||
| brief: "GraphQL mutation operation" | ||
| stability: development | ||
| - id: subscription | ||
| value: "subscription" | ||
| brief: "GraphQL subscription operation" | ||
| stability: development | ||
| - id: _OTHER | ||
| value: "_OTHER" | ||
| brief: | ||
| "A fallback for operation types not covered by specific values | ||
| in this enum." | ||
| stability: development | ||
| examples: ["query", "mutation", "subscription"] | ||
| note: > | ||
| The type of operation that the gateway sends to the source system. | ||
| This enum matches `graphql.operation.type` for consistency. | ||
| - id: graphql.source.operation.hash | ||
| brief: "A hash of the GraphQL operation to be executed on the source." | ||
| type: string | ||
| stability: development | ||
| examples: ["sha256:abc123", "md5:def456"] | ||
| note: > | ||
| A hash of the operation document that the gateway sends to the source | ||
| system. Useful for identifying operations without transmitting the | ||
| full document. | ||
|
|
||
| The hash algorithm used SHOULD be specified as part of the value | ||
| (e.g., "sha256:..."), consistent with `graphql.document.hash`. |
There was a problem hiding this comment.
I got confused by the usage of the source sub-namespace. Perhaps we could use graphql.subgraph.* but
There was a problem hiding this comment.
I understand. so the reason we chose source was because of https://github.com/graphql/composite-schemas-spec . In this specification a Subgraph is a Source Schema.
There was a problem hiding this comment.
I understand Howabout graphql.source_schema.* to avoid confusion with the source namespace?
model/graphql/registry.yaml
Outdated
| - id: graphql.source.operation.type | ||
| brief: "The type of GraphQL operation to be executed on the source." | ||
| stability: development | ||
| type: | ||
| members: | ||
| - id: query | ||
| value: "query" | ||
| brief: "GraphQL query operation" | ||
| stability: development | ||
| - id: mutation | ||
| value: "mutation" | ||
| brief: "GraphQL mutation operation" | ||
| stability: development | ||
| - id: subscription | ||
| value: "subscription" | ||
| brief: "GraphQL subscription operation" | ||
| stability: development | ||
| - id: _OTHER | ||
| value: "_OTHER" | ||
| brief: | ||
| "A fallback for operation types not covered by specific values | ||
| in this enum." | ||
| stability: development | ||
| examples: ["query", "mutation", "subscription"] | ||
| note: > | ||
| The type of operation that the gateway sends to the source system. | ||
| This enum matches `graphql.operation.type` for consistency. |
There was a problem hiding this comment.
Is there any scenario where this would differ to graphql.operation.type if not, I would leave it out.
| ### Metric: `graphql.server.response.error_count` | ||
|
|
||
| <!-- semconv metric.graphql.server.response.error_count --> | ||
| <!-- NOTE: THIS TEXT IS AUTOGENERATED. DO NOT EDIT BY HAND. --> | ||
| <!-- see templates/registry/markdown/snippet.md.j2 --> | ||
| <!-- prettier-ignore-start --> | ||
|
|
||
| | Name | Instrument Type | Unit (UCUM) | Description | Stability | Entity Associations | | ||
| | -------- | --------------- | ----------- | -------------- | --------- | ------ | | ||
| | `graphql.server.response.error_count` | Histogram | `{error}` | Number of errors in a GraphQL response. [1] |  | | | ||
|
|
||
| **[1]:** This metric records the number of errors included in the GraphQL | ||
| response `errors` array. A value of 0 indicates a successful | ||
| response with no errors. | ||
|
|
||
| This is a histogram (not a counter) because it records the error | ||
| count per response, enabling analysis of error distribution across | ||
| requests. | ||
|
|
||
| Histogram bucket boundaries for error counts. | ||
|
|
||
| **Attributes:** | ||
|
|
||
| | Key | Stability | [Requirement Level](https://opentelemetry.io/docs/specs/semconv/general/attribute-requirement-level/) | Value Type | Description | Example Values | | ||
| | --- | --- | --- | --- | --- | --- | | ||
| | [`graphql.operation.type`](/docs/registry/attributes/graphql.md) |  | `Conditionally Required` If available. | string | The type of the operation being executed. | `query`; `mutation`; `subscription` | | ||
| | [`graphql.operation.name`](/docs/registry/attributes/graphql.md) |  | `Opt-In` | string | The name of the operation being executed. [1] | `FindBookById`; `GetUserProfile` | | ||
|
|
||
| **[1] `graphql.operation.name`:** This represents the operation name as specified in the GraphQL operation document. When the operation name is not provided, this attribute SHOULD be omitted. | ||
|
|
||
| --- | ||
|
|
||
| `graphql.operation.type` has the following list of well-known values. If one of them applies, then the respective value MUST be used; otherwise, a custom value MAY be used. | ||
|
|
||
| | Value | Description | Stability | | ||
| | --- | --- | --- | | ||
| | `_OTHER` | A fallback for operation types not covered by specific values in this enum. |  | | ||
| | `mutation` | GraphQL mutation operation |  | | ||
| | `query` | GraphQL query operation |  | | ||
| | `subscription` | GraphQL subscription operation |  | | ||
|
|
||
| <!-- prettier-ignore-end --> | ||
| <!-- END AUTOGENERATED TEXT --> | ||
| <!-- endsemconv --> |
There was a problem hiding this comment.
Would this make more sense as a client metric ie graphql.client.request.errors given it is describing the number of errors returned to the client in the response.
For server side errors, if a response has multiple errors would it be 1 span per error in which case we could use error.type on the span
Important
This is a working draft and is NOT intended to be merged yet. Final approval is still pending from both the OpenTelemetry Semantic Conventions maintainers and the GraphQL OpenTelemetry Working Group. This PR has been opened early to get feedback from the OTel side and be the base for discussion while the specification is being finalized.
Changes
This PR adds comprehensive GraphQL semantic conventions, developed by the GraphQL OpenTelemetry Working Group, to the OpenTelemetry Semantic Conventions repository.
Origin
This specification was developed in the graphql/graphql-wg otel-wg subcommittee over multiple working group sessions. Key contributors and participants include members from the GraphQL and OpenTelemetry communities working together to define instrumentation standards for GraphQL.
The specification intends to supersede the existing minimal GraphQL semantic conventions (previously only a single server span with 3 attributes) with a comprehensive convention covering the full GraphQL request lifecycle.
graphql.operation.nameis opt-in on metrics;graphql.document.hashandgraphql.document.idare preferred identifiersgraphql.error.*andexception.*attributesmetrics, distinct from request-level tracking
People involved
This work was developed by the GraphQL OpenTelemetry Working Group, a subcommittee of the GraphQL Working Group. Participants include representatives from across the GraphQL ecosystem working on instrumentation standards.
Important
The workgroup meets every third week of the month on thursday Checkout the calendar
In case you want to join add yourself to the agenda over at GraphQL OpenTelemetry Working Group
Merge requirement checklist
[chore]