Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
138 changes: 138 additions & 0 deletions docs/jvm-metrics-guide.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,138 @@
## 7. JVM Runtime Metrics

The OpenTelemetry Java Agent automatically collects JVM runtime metrics via JMX. These metrics are enabled by default and provide insight into JVM health — memory, threads, CPU, GC, and class loading.

### Stable Metrics (Enabled by Default)

These metrics follow the [OpenTelemetry JVM semantic conventions](https://github.com/open-telemetry/semantic-conventions/blob/main/docs/runtime/jvm-metrics.md) and are always collected:

| OTel Metric Name | Prometheus Name | Description |
|---|---|---|
| `jvm.memory.used` | `jvm_memory_used_bytes` | Measure of memory used |
| `jvm.memory.committed` | `jvm_memory_committed_bytes` | Measure of memory committed |
| `jvm.memory.limit` | `jvm_memory_limit_bytes` | Measure of max memory available |
| `jvm.memory.used_after_last_gc` | `jvm_memory_used_after_last_gc_bytes` | Memory used after last GC |
| `jvm.thread.count` | `jvm_thread_count` | Number of executing platform threads |
| `jvm.class.loaded` | `jvm_class_loaded_total` | Total number of classes loaded |
| `jvm.class.unloaded` | `jvm_class_unloaded_total` | Total number of classes unloaded |
| `jvm.class.count` | `jvm_class_count` | Current number of loaded classes |
| `jvm.cpu.time` | `jvm_cpu_time_seconds_total` | CPU time used by the JVM process |
| `jvm.cpu.count` | `jvm_cpu_count` | Number of available processors |
| `jvm.gc.duration` | `jvm_gc_duration_seconds` | Duration of GC pauses |

> **Note:** The Prometheus exporter automatically converts OTel metric names from dots (`.`) to underscores (`_`) and appends unit suffixes like `_bytes`, `_total`, `_seconds`.

### Experimental Metrics

Additional JVM metrics are available but must be explicitly enabled. These are considered experimental and may change in future releases.

#### How to Enable

```bash
# Command line
java -javaagent:opentelemetry-javaagent.jar \
-Dotel.instrumentation.runtime-telemetry.emit-experimental-telemetry=true \
-jar my-application.jar

# Environment variable
export OTEL_INSTRUMENTATION_RUNTIME_TELEMETRY_EMIT_EXPERIMENTAL_TELEMETRY=true
```

#### Experimental Metrics List

| OTel Metric Name | Prometheus Name | Description |
|---|---|---|
| `jvm.memory.init` | `jvm_memory_init_bytes` | Initial memory pool size |
| `jvm.buffer.memory.used` | `jvm_buffer_memory_used_bytes` | Memory used by buffers |
| `jvm.buffer.memory.limit` | `jvm_buffer_memory_limit_bytes` | Total memory capacity of buffers |
| `jvm.buffer.count` | `jvm_buffer_count` | Number of buffers in the pool |
| `jvm.system.cpu.load_1m` | `jvm_system_cpu_load_1m` | System CPU load average (1 min) |
| `jvm.system.cpu.utilization` | `jvm_system_cpu_utilization_ratio` | System CPU utilization |
| `jvm.file_descriptor.count` | `jvm_file_descriptor_count` | Number of open file descriptors |
| `jvm.thread.deadlock.count` | `jvm_thread_deadlock_count` | Threads in deadlock (monitors + ownable synchronizers) |
| `jvm.thread.monitor_deadlock.count` | `jvm_thread_monitor_deadlock_count` | Threads in deadlock (monitors only) |

### Deadlock Detection Metrics

The two deadlock metrics (`jvm.thread.deadlock.count` and `jvm.thread.monitor_deadlock.count`) use `ThreadMXBean.findDeadlockedThreads()` and `ThreadMXBean.findMonitorDeadlockedThreads()` — JMX **operations** that cannot be expressed via standard JMX YAML rules (which only support reading MBean attributes).

| Metric | What It Detects |
|---|---|
| `jvm.thread.deadlock.count` | Deadlocks involving **both** `synchronized` blocks and `java.util.concurrent` locks (e.g., `ReentrantLock`) |
| `jvm.thread.monitor_deadlock.count` | Deadlocks involving **only** `synchronized` blocks (object monitor locks) |

These are equivalent to Prometheus client_java's `jvm_threads_deadlocked` and `jvm_threads_deadlocked_monitor` gauges.

#### Verifying Deadlock Metrics

If you have the Prometheus exporter enabled, you can verify the metrics:

```bash
# Check for deadlock metrics
curl -s http://localhost:9464/metrics | grep jvm_thread_deadlock

# Expected output (0 means no deadlocks — which is healthy):
# HELP jvm_thread_deadlock_count Number of platform threads that are in deadlock...
# TYPE jvm_thread_deadlock_count gauge
# jvm_thread_deadlock_count{...} 0.0
# HELP jvm_thread_monitor_deadlock_count Number of platform threads that are in deadlock...
# TYPE jvm_thread_monitor_deadlock_count gauge
# jvm_thread_monitor_deadlock_count{...} 0.0
```

#### Alerting on Deadlocks

These metrics are particularly useful for alerting. A value greater than 0 indicates a deadlock in the JVM:

```yaml
# Example Prometheus alert rule
groups:
- name: jvm_deadlock_alerts
rules:
- alert: JvmThreadDeadlock
expr: jvm_thread_deadlock_count > 0
for: 1m
labels:
severity: critical
annotations:
summary: "JVM thread deadlock detected on {{ $labels.instance }}"
description: "{{ $value }} threads are in deadlock. Immediate investigation required."
```

### Complete Example with All JVM Metrics Enabled

```bash
java -javaagent:/path/to/opentelemetry-javaagent.jar \
-Dotel.service.name=my-java-service \
-Dotel.exporter.otlp.endpoint=http://localhost:4317 \
-Dotel.exporter.otlp.protocol=grpc \
-Dotel.metrics.exporter=otlp \
-Dotel.logs.exporter=otlp \
-Dotel.traces.exporter=otlp \
-Dotel.instrumentation.runtime-telemetry.emit-experimental-telemetry=true \
-jar my-application.jar
```

For JBoss/WildFly, add to `standalone.conf`:

```bash
# =========================================
# OpenTelemetry Java Agent Configuration
# with experimental JVM metrics enabled
# =========================================

OTEL_AGENT_PATH="/opt/jboss/opentelemetry-javaagent.jar"

export OTEL_SERVICE_NAME="jboss-application"
export OTEL_EXPORTER_OTLP_ENDPOINT="http://localhost:4317"
export OTEL_EXPORTER_OTLP_PROTOCOL="grpc"
export OTEL_METRICS_EXPORTER="otlp"
export OTEL_LOGS_EXPORTER="otlp"
export OTEL_TRACES_EXPORTER="otlp"

# Enable experimental JVM metrics (buffer pools, CPU utilization,
# file descriptors, deadlock detection)
export OTEL_INSTRUMENTATION_RUNTIME_TELEMETRY_EMIT_EXPERIMENTAL_TELEMETRY=true

JAVA_OPTS="$JAVA_OPTS -javaagent:$OTEL_AGENT_PATH"
```
4 changes: 4 additions & 0 deletions instrumentation/jmx-metrics/library/jvm.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,9 @@ Those metrics are defined in the [JVM runtime metrics semantic conventions](http
| [jvm.memory.init](https://opentelemetry.io/docs/specs/semconv/runtime/jvm-metrics/#metric-jvmmemoryinit) | experimental | UpDownCounter | By | jvm.memory.pool.name, jvm.memory.type | Initial memory requested |
| [jvm.memory.used_after_last_gc](https://opentelemetry.io/docs/specs/semconv/runtime/jvm-metrics/#metric-jvmmemoryused_after_last_gc) | stable | UpDownCounter | By | jvm.memory.pool.name, jvm.memory.type | Memory used after latest GC |
| [jvm.thread.count](https://opentelemetry.io/docs/specs/semconv/runtime/jvm-metrics/#metric-jvmthreadcount) | stable | UpDownCounter | {thread} | [^1] | Threads count |
| jvm.thread.daemon.count | experimental | UpDownCounter | {thread} | | Daemon threads count |
| jvm.thread.peak.count | experimental | UpDownCounter | {thread} | | Peak thread count since JVM start |
| jvm.thread.started.count | experimental | Counter | {thread} | | Total threads started since JVM start |
| [jvm.class.loaded](https://opentelemetry.io/docs/specs/semconv/runtime/jvm-metrics/#metric-jvmclassloaded) | stable | Counter | {class} | | Classes loaded since JVM start |
| [jvm.class.unloaded](https://opentelemetry.io/docs/specs/semconv/runtime/jvm-metrics/#metric-jvmclassunloaded) | stable | Counter | {class} | | Classes unloaded since JVM start |
| [jvm.class.count](https://opentelemetry.io/docs/specs/semconv/runtime/jvm-metrics/#metric-jvmclasscount) | stable | UpDownCounter | {class} | | Classes currently loaded count |
Expand All @@ -32,3 +35,4 @@ Using the [runtime-telemetry](../../runtime-telemetry) modules with instrumentat
[^1]: `jvm.thread.daemon` and `jvm.thread.state` attributes are not supported.

- [jvm.gc.duration](https://opentelemetry.io/docs/specs/semconv/runtime/jvm-metrics/#metric-jvmgcduration) metric is not supported as it is only exposed through JMX notifications which are not supported with YAML.
- Thread deadlock detection metrics (`jvm_threads_deadlocked`, `jvm_threads_deadlocked_monitor`) are not supported as they require invoking JMX operations rather than reading attributes.
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,24 @@ rules:
type: updowncounter
unit: "{thread}"
desc: Number of executing platform threads.
# jvm.thread.daemon.count (experimental) - matches JMX Exporter's jvm_threads_daemon
DaemonThreadCount:
metric: daemon.count
type: updowncounter
unit: "{thread}"
desc: Number of daemon threads currently running.
Comment on lines +46 to +50
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is already covered with jvm.thread.count metric with the [jvm.thread.daemon](https://opentelemetry.io/docs/specs/semconv/registry/attributes/jvm/) boolean attribute.

Unfortulately, the daemon status can't be accessed through the JMX interface, however it is provided when the metrics are captured with runtime-telemetry module.

# jvm.thread.peak.count (experimental) - matches JMX Exporter's jvm_threads_peak
PeakThreadCount:
metric: peak.count
type: updowncounter
unit: "{thread}"
desc: Peak thread count since JVM start or last peak reset.
# jvm.thread.started.count (experimental) - matches JMX Exporter's jvm_threads_started_total
TotalStartedThreadCount:
metric: started.count
type: counter
unit: "{thread}"
desc: Total number of threads started since JVM start.

- bean: java.lang:type=ClassLoading
prefix: jvm.class.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -113,6 +113,30 @@ void testJvmMetrics(String image) {
.hasUnit("{thread}")
.isUpDownCounter()
.hasDataPointsWithoutAttributes())
.add(
"jvm.thread.daemon.count",
metric ->
metric
.hasDescription("Number of daemon threads currently running.")
.hasUnit("{thread}")
.isUpDownCounter()
.hasDataPointsWithoutAttributes())
.add(
"jvm.thread.peak.count",
metric ->
metric
.hasDescription("Peak thread count since JVM start or last peak reset.")
.hasUnit("{thread}")
.isUpDownCounter()
.hasDataPointsWithoutAttributes())
.add(
"jvm.thread.started.count",
metric ->
metric
.hasDescription("Total number of threads started since JVM start.")
.hasUnit("{thread}")
.isCounter()
.hasDataPointsWithoutAttributes())
.add(
"jvm.class.loaded",
metric ->
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
/*
* Copyright The OpenTelemetry Authors
* SPDX-License-Identifier: Apache-2.0
*/

package io.opentelemetry.instrumentation.runtimemetrics.java8.internal;

import io.opentelemetry.api.OpenTelemetry;
import io.opentelemetry.api.metrics.Meter;
import java.lang.management.ManagementFactory;
import java.lang.management.ThreadMXBean;
import java.util.ArrayList;
import java.util.List;

/**
* Registers measurements that generate experimental metrics about JVM threads.
*
* <p>This class is internal and is hence not for public use. Its APIs are unstable and can change
* at any time.
*
* @deprecated Use {@link io.opentelemetry.instrumentation.runtimemetrics.java8.RuntimeMetrics}
* instead, and configure metric views to select specific metrics.
*/
@Deprecated
public final class ExperimentalThreads {

/** Register observers for java runtime experimental thread metrics. */
public static List<AutoCloseable> registerObservers(OpenTelemetry openTelemetry) {
return registerObservers(openTelemetry, ManagementFactory.getThreadMXBean());
}

// Visible for testing
static List<AutoCloseable> registerObservers(
OpenTelemetry openTelemetry, ThreadMXBean threadBean) {
Meter meter = JmxRuntimeMetricsUtil.getMeter(openTelemetry);
List<AutoCloseable> observables = new ArrayList<>();

observables.add(
meter
.upDownCounterBuilder("jvm.thread.deadlock.count")
.setDescription(
"Number of platform threads that are in deadlock waiting to acquire object monitors or ownable synchronizers.")
.setUnit("{thread}")
.buildWithCallback(
measurement ->
measurement.record(
nullSafeArrayLength(threadBean.findDeadlockedThreads()))));

observables.add(
meter
.upDownCounterBuilder("jvm.thread.monitor_deadlock.count")
.setDescription(
"Number of platform threads that are in deadlock waiting to acquire object monitors.")
.setUnit("{thread}")
.buildWithCallback(
measurement ->
measurement.record(
nullSafeArrayLength(threadBean.findMonitorDeadlockedThreads()))));

return observables;
}

private static long nullSafeArrayLength(long[] array) {
return array == null ? 0 : array.length;
}

private ExperimentalThreads() {}
}
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@ public static List<AutoCloseable> buildObservables(
observables.addAll(ExperimentalCpu.registerObservers(openTelemetry));
observables.addAll(ExperimentalMemoryPools.registerObservers(openTelemetry));
observables.addAll(ExperimentalFileDescriptor.registerObservers(openTelemetry));
observables.addAll(ExperimentalThreads.registerObservers(openTelemetry));
}
return observables;
}
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,99 @@
/*
* Copyright The OpenTelemetry Authors
* SPDX-License-Identifier: Apache-2.0
*/

package io.opentelemetry.instrumentation.runtimemetrics.java8.internal;

import static io.opentelemetry.instrumentation.runtimemetrics.java8.ScopeUtil.EXPECTED_SCOPE;
import static io.opentelemetry.sdk.testing.assertj.OpenTelemetryAssertions.assertThat;
import static org.mockito.Mockito.when;

import io.opentelemetry.instrumentation.testing.junit.InstrumentationExtension;
import io.opentelemetry.instrumentation.testing.junit.LibraryInstrumentationExtension;
import java.lang.management.ThreadMXBean;
import org.junit.jupiter.api.Test;
import org.junit.jupiter.api.extension.ExtendWith;
import org.junit.jupiter.api.extension.RegisterExtension;
import org.mockito.Mock;
import org.mockito.junit.jupiter.MockitoExtension;

@SuppressWarnings("deprecation") // until ExperimentalThreads is renamed
@ExtendWith(MockitoExtension.class)
class ExperimentalThreadsTest {

@RegisterExtension
static final InstrumentationExtension testing = LibraryInstrumentationExtension.create();

@Mock private ThreadMXBean threadBean;

@Test
void registerObservers_DeadlockedThreads() {
when(threadBean.findDeadlockedThreads()).thenReturn(new long[] {1, 2, 3});
when(threadBean.findMonitorDeadlockedThreads()).thenReturn(new long[] {4, 5});

ExperimentalThreads.registerObservers(testing.getOpenTelemetry(), threadBean);

testing.waitAndAssertMetrics(
"io.opentelemetry.runtime-telemetry-java8",
"jvm.thread.deadlock.count",
metrics ->
metrics.anySatisfy(
metricData ->
assertThat(metricData)
.hasInstrumentationScope(EXPECTED_SCOPE)
.hasDescription(
"Number of platform threads that are in deadlock waiting to acquire object monitors or ownable synchronizers.")
.hasUnit("{thread}")
.hasLongSumSatisfying(
sum ->
sum.isNotMonotonic()
.hasPointsSatisfying(point -> point.hasValue(3)))));

testing.waitAndAssertMetrics(
"io.opentelemetry.runtime-telemetry-java8",
"jvm.thread.monitor_deadlock.count",
metrics ->
metrics.anySatisfy(
metricData ->
assertThat(metricData)
.hasInstrumentationScope(EXPECTED_SCOPE)
.hasDescription(
"Number of platform threads that are in deadlock waiting to acquire object monitors.")
.hasUnit("{thread}")
.hasLongSumSatisfying(
sum ->
sum.isNotMonotonic()
.hasPointsSatisfying(point -> point.hasValue(2)))));
}

@Test
void registerObservers_NoDeadlockedThreads() {
when(threadBean.findDeadlockedThreads()).thenReturn(null);
when(threadBean.findMonitorDeadlockedThreads()).thenReturn(null);

ExperimentalThreads.registerObservers(testing.getOpenTelemetry(), threadBean);

testing.waitAndAssertMetrics(
"io.opentelemetry.runtime-telemetry-java8",
"jvm.thread.deadlock.count",
metrics ->
metrics.anySatisfy(
metricData ->
assertThat(metricData)
.hasLongSumSatisfying(
sum ->
sum.hasPointsSatisfying(point -> point.hasValue(0)))));

testing.waitAndAssertMetrics(
"io.opentelemetry.runtime-telemetry-java8",
"jvm.thread.monitor_deadlock.count",
metrics ->
metrics.anySatisfy(
metricData ->
assertThat(metricData)
.hasLongSumSatisfying(
sum ->
sum.hasPointsSatisfying(point -> point.hasValue(0)))));
}
}