Conversation
| try: | ||
| proc = _get_process(pid) | ||
| for child in proc.children(recursive=False): | ||
| child_proc = _get_process(child.pid) | ||
| try: | ||
| name = child_proc.name() | ||
| cpu = child_proc.cpu_percent(interval=None) | ||
| result.append(ChildProcessStats(pid=child.pid, name=name, cpu_percent=cpu)) | ||
| except (psutil.NoSuchProcess, psutil.AccessDenied): | ||
| pass | ||
| except (psutil.NoSuchProcess, psutil.AccessDenied): | ||
| pass | ||
| return result |
There was a problem hiding this comment.
Presumably _get_process raises (psutil.NoSuchProcess, psutil.AccessDenied). Then surround just that function call with try-except. Nesting the try-excepts is confusing.
| parser.add_argument( | ||
| "--log", | ||
| nargs="?", | ||
| const=f"dtop_{time.strftime('%Y%m%d_%H%M%S')}.jsonl", |
There was a problem hiding this comment.
So it's automatically git-ignored.
| const=f"dtop_{time.strftime('%Y%m%d_%H%M%S')}.jsonl", | |
| const=f"dtop_{time.strftime('%Y%m%d_%H%M%S')}.ignore.jsonl", |
Greptile SummaryThis PR extends dtop to show CPU usage of child processes spawned by each worker (e.g. native modules), adds a
Confidence Score: 4/5Safe to merge with the NaN guard in dtop_plot._load fixed; the TUI changes are solid. One P1 finding (TypeError crash in dtop-plot on edge-case log files) caps the score at 4. The core TUI and monitoring changes are well-structured with no critical bugs. dimos/utils/cli/dtop_plot.py — Important Files Changed
Sequence DiagramsequenceDiagram
participant SM as StatsMonitor
participant S as stats.py
participant LCM as LCM Bus
participant DT as dtop (ResourceSpyApp)
participant DP as dtop-plot
SM->>S: collect_process_stats(worker_pid)
S-->>SM: ProcessStats (worker cpu, mem…)
SM->>S: collect_children_stats(worker_pid)
S-->>SM: list[ChildProcessStats]
SM->>SM: aggregate child cpu_percent into WorkerStats
SM->>LCM: publish WorkerStats (incl. children[])
LCM-->>DT: _on_msg(msg)
DT->>DT: update _child_cpu_history[pid]
DT->>DT: _make_child_line() per child
DT->>DT: write JSONL line to log file (if --log)
DP->>DP: pd.read_json(log)
DP->>DP: _load() → DataFrame + labels
DP->>DP: _plot() → save PNG
Reviews (2): Last reviewed commit: "Merge branch 'dev' into andrew/feat/dtop..." | Re-trigger Greptile |
| self._latest = msg | ||
| self._last_msg_time = time.monotonic() | ||
| if self._log_file: | ||
| self._log_file.write(json.dumps({"ts": time.time(), **msg}) + "\n") |
There was a problem hiding this comment.
Log file not flushed between writes
Each _on_msg call writes a line to _log_file but never calls flush(). Because Python's file I/O is buffered by default, lines written near a crash or SIGKILL will silently stay in the OS/Python buffer and never reach disk. Adding a flush() after the write ensures each message is durable.
| self._log_file.write(json.dumps({"ts": time.time(), **msg}) + "\n") | |
| self._log_file.write(json.dumps({"ts": time.time(), **msg}) + "\n") | |
| self._log_file.flush() |
| ) -> None: | ||
| super().__init__() | ||
| self._topic_name = topic_name | ||
| self._log_file = open(log_path, "a") if log_path else None |
There was a problem hiding this comment.
| parts.append(Rule(title=title, style=border_style)) | ||
| parts.extend(self._make_lines(d, stale, ranges, self._cpu_history[role])) | ||
| for child in d.get("children", []): | ||
| pid = child.get("pid", 0) |
There was a problem hiding this comment.
pid variable shadowed by inner loop
The outer for tuple unpacks pid as a string (worker pid for display), but this inner assignment overwrites it with an integer child pid. The outer pid is rebound at the start of each outer iteration so there's no runtime bug, but the shadowing is confusing and could easily introduce a bug if code is added between the inner loop and the next outer iteration.
| pid = child.get("pid", 0) | |
| child_pid = child.get("pid", 0) | |
| if child_pid not in self._child_cpu_history: | |
| self._child_cpu_history[child_pid] = deque(maxlen=_SPARK_WIDTH * 2) | |
| if not stale: | |
| self._child_cpu_history[child_pid].append(child.get("cpu_percent", 0.0)) | |
| parts.append(self._make_child_line(child, stale, self._child_cpu_history[child_pid])) |
| rows = [] | ||
| for _, msg in raw.iterrows(): | ||
| ts = msg["ts"] | ||
| rows.append({"ts": ts, "role": _COORDINATOR, **msg[_COORDINATOR]}) |
There was a problem hiding this comment.
KeyError on malformed log lines
msg[_COORDINATOR] raises KeyError if any line in the JSONL file is missing the "coordinator" key (e.g., a truncated line written during an unclean shutdown). Wrapping the row processing in a try/except KeyError and skipping bad rows would make the tool more robust.
| rows.append({"ts": ts, "role": _COORDINATOR, **msg[_COORDINATOR]}) | |
| try: | |
| ts = msg["ts"] | |
| rows.append({"ts": ts, "role": _COORDINATOR, **msg[_COORDINATOR]}) | |
| except (KeyError, TypeError): | |
| continue |
| for _, msg in raw.iterrows(): | ||
| ts = msg["ts"] | ||
| rows.append({"ts": ts, "role": _COORDINATOR, **msg[_COORDINATOR]}) | ||
| for w in msg.get("workers", []): |
There was a problem hiding this comment.
msg.get("workers", []) returns NaN, not [], on pandas null rows
pd.read_json(path, lines=True) creates a "workers" column for the whole DataFrame. If any log line is missing the "workers" key (e.g. a coordinator-only message from an older build, or a partially-written line), pandas fills that row with NaN. A pandas Series get(key, default) only falls back to default when the key is absent from the index — not when the value is NaN. So msg.get("workers", []) returns NaN for those rows, and for w in NaN raises TypeError: 'float' object is not iterable.
Use msg.get("workers") or [] to handle the NaN case:
for w in msg.get("workers") or []:
Problem
dtop only shows cpu usage for Python workers spawned by DimOS. Any native modules spawned by that worker don't show up in the cpu statistics. Also add --log flag to log dtop statistics and dtop-plot to generate plots of cpu usage.
Closes DIM-XXX
Solution
Read the pids of any processes spawned and include their cpu usage in a drop down of the main worker.
Breaking Changes
None
How to Test
dimos --dtop --replay --replay-db=go2_bigoffice run unitree-go2
and
dtop
When dimos spawns the viewer, it will show up as a subprocess of the rerun bridge worker.
Contributor License Agreement