Show child process cpu usage in dtop by aclauer · Pull Request #1880 · dimensionalOS/dimos

aclauer · 2026-04-18T02:22:13Z

Problem

dtop only shows cpu usage for Python workers spawned by DimOS. Any native modules spawned by that worker don't show up in the cpu statistics. Also add --log flag to log dtop statistics and dtop-plot to generate plots of cpu usage.

Closes DIM-XXX

Solution

Read the pids of any processes spawned and include their cpu usage in a drop down of the main worker.

Breaking Changes

None

How to Test

dimos --dtop --replay --replay-db=go2_bigoffice run unitree-go2

and

dtop

When dimos spawns the viewer, it will show up as a subprocess of the rerun bridge worker.

Contributor License Agreement

I have read and approved the CLA.

paul-nechifor · 2026-04-18T23:37:11Z

+    try:
+        proc = _get_process(pid)
+        for child in proc.children(recursive=False):
+            child_proc = _get_process(child.pid)
+            try:
+                name = child_proc.name()
+                cpu = child_proc.cpu_percent(interval=None)
+                result.append(ChildProcessStats(pid=child.pid, name=name, cpu_percent=cpu))
+            except (psutil.NoSuchProcess, psutil.AccessDenied):
+                pass
+    except (psutil.NoSuchProcess, psutil.AccessDenied):
+        pass
+    return result


Presumably _get_process raises (psutil.NoSuchProcess, psutil.AccessDenied). Then surround just that function call with try-except. Nesting the try-excepts is confusing.

paul-nechifor · 2026-04-18T23:40:16Z

+    parser.add_argument(
+        "--log",
+        nargs="?",
+        const=f"dtop_{time.strftime('%Y%m%d_%H%M%S')}.jsonl",


So it's automatically git-ignored.

Suggested change

const=f"dtop_{time.strftime('%Y%m%d_%H%M%S')}.jsonl",

const=f"dtop_{time.strftime('%Y%m%d_%H%M%S')}.ignore.jsonl",

greptile-apps · 2026-04-24T21:16:24Z

Greptile Summary

This PR extends dtop to aggregate CPU usage from direct child processes of each DimOS worker, surfacing them as collapsible rows in the TUI. It also adds a --log flag to write JSONL stats and a new dtop-plot CLI tool to visualise the logged data.

Confidence Score: 5/5

Safe to merge; all findings are P2 quality-of-life issues with no correctness impact on the happy path.

No P0 or P1 issues found. The stale _child_cpu_history entry and first-sample cpu_percent=0.0 are cosmetic glitches; the double proc.children() syscall is a minor efficiency concern. The more serious dtop_plot.py edge cases (KeyError on coordinator, NaN workers iteration) were flagged in a previous review cycle.

dimos/utils/cli/dtop_plot.py — KeyError and NaN-iteration risks from previous review are still open.

Important Files Changed

Filename	Overview
dimos/core/resource_monitor/stats.py	Adds `ChildProcessStats` dataclass and `collect_children_stats`; extends `WorkerStats` with `children` field. First-call cpu_percent=0.0 for new children and double proc.children() syscall are minor efficiency issues.
dimos/core/resource_monitor/monitor.py	Wires `collect_children_stats` into the per-worker stats loop and aggregates child CPU into parent total before constructing `WorkerStats`. Logic is straightforward and correct.
dimos/utils/cli/dtop.py	Adds `--log` flag, JSONL logging, child-process rows in TUI, `_child_cpu_history`, and refactors CPU metric rendering. `_child_cpu_history` entries for dead PIDs are never pruned, risking stale sparkline data on PID reuse.
dimos/utils/cli/dtop_plot.py	New `dtop-plot` CLI tool for plotting JSONL log files. Has `KeyError` and NaN-iteration risks on the workers column (flagged in previous threads); otherwise straightforward pandas + matplotlib usage.
pyproject.toml	Registers `dtop-plot` as a new console script entry point.

Sequence Diagram

sequenceDiagram
    participant M as StatsMonitor
    participant S as stats.py
    participant TUI as ResourceSpyApp
    participant Log as JSONL log

    loop every poll interval
        M->>S: collect_process_stats(worker_pid)
        S-->>M: ProcessStats
        M->>S: collect_children_stats(worker_pid)
        S-->>M: list[ChildProcessStats]
        M->>M: aggregate child cpu_percent into parent total
        M->>M: build WorkerStats(children=[...])
        M->>TUI: LCM message {coordinator, workers}
        TUI->>Log: json.dumps(msg) if --log
        TUI->>TUI: _render_panels() ↳ _make_lines() for worker ↳ _make_child_line() per child
    end

_{Reviews (4): Last reviewed commit: "Fix mypy" | Re-trigger Greptile}

greptile-apps · 2026-04-24T21:16:27Z

            self._latest = msg
            self._last_msg_time = time.monotonic()
+        if self._log_file:
+            self._log_file.write(json.dumps({"ts": time.time(), **msg}) + "\n")


Log file not flushed between writes

Each _on_msg call writes a line to _log_file but never calls flush(). Because Python's file I/O is buffered by default, lines written near a crash or SIGKILL will silently stay in the OS/Python buffer and never reach disk. Adding a flush() after the write ensures each message is durable.

Suggested change

self._log_file.write(json.dumps({"ts": time.time(), **msg}) + "\n")

self._log_file.write(json.dumps({"ts": time.time(), **msg}) + "\n")

self._log_file.flush()

greptile-apps · 2026-04-24T21:16:28Z

+    ) -> None:
        super().__init__()
        self._topic_name = topic_name
+        self._log_file = open(log_path, "a") if log_path else None


File handle leak if __init__ raises after open()

_log_file is opened before autoconf, PickleLCM(), and subscribe(). If any of those subsequent calls throw, on_unmount is never called and the file handle is leaked. A try/except (or opening the file later, e.g. in on_mount) would prevent this.

greptile-apps · 2026-04-24T21:16:29Z

                parts.append(Rule(title=title, style=border_style))
            parts.extend(self._make_lines(d, stale, ranges, self._cpu_history[role]))
+            for child in d.get("children", []):
+                pid = child.get("pid", 0)


pid variable shadowed by inner loop

The outer for tuple unpacks pid as a string (worker pid for display), but this inner assignment overwrites it with an integer child pid. The outer pid is rebound at the start of each outer iteration so there's no runtime bug, but the shadowing is confusing and could easily introduce a bug if code is added between the inner loop and the next outer iteration.

Suggested change

pid = child.get("pid", 0)

child_pid = child.get("pid", 0)

if child_pid not in self._child_cpu_history:

self._child_cpu_history[child_pid] = deque(maxlen=_SPARK_WIDTH * 2)

if not stale:

self._child_cpu_history[child_pid].append(child.get("cpu_percent", 0.0))

parts.append(self._make_child_line(child, stale, self._child_cpu_history[child_pid]))

greptile-apps · 2026-04-24T21:16:30Z

+    rows = []
+    for _, msg in raw.iterrows():
+        ts = msg["ts"]
+        rows.append({"ts": ts, "role": _COORDINATOR, **msg[_COORDINATOR]})


KeyError on malformed log lines

msg[_COORDINATOR] raises KeyError if any line in the JSONL file is missing the "coordinator" key (e.g., a truncated line written during an unclean shutdown). Wrapping the row processing in a try/except KeyError and skipping bad rows would make the tool more robust.

Suggested change

rows.append({"ts": ts, "role": _COORDINATOR, **msg[_COORDINATOR]})

try:

ts = msg["ts"]

rows.append({"ts": ts, "role": _COORDINATOR, **msg[_COORDINATOR]})

except (KeyError, TypeError):

continue

greptile-apps · 2026-04-25T22:02:35Z

+    for _, msg in raw.iterrows():
+        ts = msg["ts"]
+        rows.append({"ts": ts, "role": _COORDINATOR, **msg[_COORDINATOR]})
+        for w in msg.get("workers", []):


msg.get("workers", []) returns NaN, not [], on pandas null rows

pd.read_json(path, lines=True) creates a "workers" column for the whole DataFrame. If any log line is missing the "workers" key (e.g. a coordinator-only message from an older build, or a partially-written line), pandas fills that row with NaN. A pandas Series get(key, default) only falls back to default when the key is absent from the index — not when the value is NaN. So msg.get("workers", []) returns NaN for those rows, and for w in NaN raises TypeError: 'float' object is not iterable.

Use msg.get("workers") or [] to handle the NaN case:

for w in msg.get("workers") or []:

Initial subprocess display

c82ff71

aclauer changed the title ~~Initial subprocess display~~ Show child process cpu usage in dtop Apr 18, 2026

aclauer and others added 3 commits April 18, 2026 12:09

Dtop logging and plotting

e340bdf

Move legend off of plot

68beb7e

Merge branch 'dev' into andrew/feat/dtop-subprocess-cpu-usage

e57ae8e

paul-nechifor reviewed Apr 18, 2026

View reviewed changes

aclauer and others added 2 commits April 24, 2026 13:13

Merge branch 'dev' into andrew/feat/dtop-subprocess-cpu-usage

7e5751e

Fixes

28beba3

aclauer marked this pull request as ready for review April 24, 2026 21:13

aclauer requested a review from paul-nechifor April 24, 2026 21:13

greptile-apps Bot reviewed Apr 24, 2026

View reviewed changes

Merge branch 'dev' into andrew/feat/dtop-subprocess-cpu-usage

d5d90fd

greptile-apps Bot reviewed Apr 25, 2026

View reviewed changes

aclauer and others added 2 commits April 27, 2026 14:02

Merge branch 'dev' into andrew/feat/dtop-subprocess-cpu-usage

b96fbde

Fix mypy

d030c50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Show child process cpu usage in dtop#1880

Show child process cpu usage in dtop#1880
aclauer wants to merge 9 commits intodevfrom
andrew/feat/dtop-subprocess-cpu-usage

aclauer commented Apr 18, 2026 •

edited

Loading

Uh oh!

paul-nechifor Apr 18, 2026

Uh oh!

paul-nechifor Apr 18, 2026

Uh oh!

greptile-apps Bot commented Apr 24, 2026 •

edited

Loading

Uh oh!

greptile-apps Bot Apr 24, 2026

Uh oh!

greptile-apps Bot Apr 24, 2026

Uh oh!

greptile-apps Bot Apr 24, 2026

Uh oh!

greptile-apps Bot Apr 24, 2026

Uh oh!

greptile-apps Bot Apr 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	const=f"dtop_{time.strftime('%Y%m%d_%H%M%S')}.jsonl",
	const=f"dtop_{time.strftime('%Y%m%d_%H%M%S')}.ignore.jsonl",

	self._log_file.write(json.dumps({"ts": time.time(), **msg}) + "\n")
	self._log_file.write(json.dumps({"ts": time.time(), **msg}) + "\n")
	self._log_file.flush()

-                pid = child.get("pid", 0)
+                child_pid = child.get("pid", 0)
+                if child_pid not in self._child_cpu_history:
+                    self._child_cpu_history[child_pid] = deque(maxlen=_SPARK_WIDTH * 2)
+                if not stale:
+                    self._child_cpu_history[child_pid].append(child.get("cpu_percent", 0.0))
+                parts.append(self._make_child_line(child, stale, self._child_cpu_history[child_pid]))

Conversation

aclauer commented Apr 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Solution

Breaking Changes

How to Test

Contributor License Agreement

Uh oh!

paul-nechifor Apr 18, 2026

Choose a reason for hiding this comment

Uh oh!

paul-nechifor Apr 18, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps Bot Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Apr 25, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

aclauer commented Apr 18, 2026 •

edited

Loading

greptile-apps Bot commented Apr 24, 2026 •

edited

Loading