[cueweb/docs] Host and Allocation Management: Hosts monitor page - Initial version#2391
Conversation
📝 WalkthroughWalkthroughAdds a Hosts monitor page: Host type extensions, KB-format/idling helpers with tests, hostColumns and SimpleDataTable hosts mode, a HostsPage client that polls getHosts() every 30s with error handling, and user-guide documentation. ChangesHost Monitoring Page Feature
sequenceDiagram
participant User
participant HostsPage as HostsPage Component
participant API as getHosts()
participant State as hosts/error State
participant Table as SimpleDataTable with hostColumns
User->>HostsPage: Visit /hosts (mount)
HostsPage->>HostsPage: Show skeleton (initial)
HostsPage->>API: Initial getHosts()
API-->>State: Return hosts or error
State-->>Table: Populate rows (if hosts)
Table-->>User: Display table
Note over HostsPage: Every 30s
HostsPage->>API: Poll getHosts()
alt Poll succeeds
API-->>State: Update hosts
State-->>Table: Refresh table
else Poll fails & no cached hosts
State-->>User: Show inline error + Retry button
User->>HostsPage: Click Retry
HostsPage->>API: Re-fetch
else Poll fails & cached hosts exist
Table-->>User: Keep previous rows
end
🎯 3 (Moderate) | ⏱️ ~20 minutes
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (6)
cueweb/app/hosts/page.tsx (1)
32-68: ⚡ Quick win
loadandpollduplicate the same fetch/error logic.Both functions call
getHosts(), then mirror the same success (setHosts/setError(null)) and failure (setError/setHosts(prev => prev ?? [])) handling. The only difference is thecancelledguard. Consider unifying into the singleloadcallback and reusing it inside the effect to keep the two paths from drifting.♻️ Proposed consolidation
- const load = React.useCallback(async () => { - try { - const data = await getHosts(); - setHosts(data); - setError(null); - } catch (err) { - // getHosts already toasts via handleError; keep any previously loaded - // rows on a failed poll and surface an inline message only when we have - // nothing to show. - setError(err instanceof Error ? err.message : String(err)); - setHosts((prev) => prev ?? []); - } - }, []); - - React.useEffect(() => { - let cancelled = false; - const poll = async () => { - try { - const data = await getHosts(); - if (!cancelled) { - setHosts(data); - setError(null); - } - } catch (err) { - if (!cancelled) { - setError(err instanceof Error ? err.message : String(err)); - setHosts((prev) => prev ?? []); - } - } - }; - poll(); - const interval = setInterval(poll, REFRESH_MS); - return () => { - cancelled = true; - clearInterval(interval); - }; - }, []); + const load = React.useCallback(async (isCancelled?: () => boolean) => { + try { + const data = await getHosts(); + if (isCancelled?.()) return; + setHosts(data); + setError(null); + } catch (err) { + if (isCancelled?.()) return; + // getHosts already toasts via handleError; keep any previously loaded + // rows on a failed poll and surface an inline message only when we have + // nothing to show. + setError(err instanceof Error ? err.message : String(err)); + setHosts((prev) => prev ?? []); + } + }, []); + + React.useEffect(() => { + let cancelled = false; + const isCancelled = () => cancelled; + load(isCancelled); + const interval = setInterval(() => load(isCancelled), REFRESH_MS); + return () => { + cancelled = true; + clearInterval(interval); + }; + }, [load]);🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@cueweb/app/hosts/page.tsx` around lines 32 - 68, Unify duplicated fetch/error logic by making the existing load callback the single source of truth for fetching hosts (it already calls getHosts and handles setHosts/setError), then reuse load inside the React.useEffect instead of the separate poll function; update load to accept or check a mounted/cancelled guard (e.g., a local let cancelled or a ref like isMountedRef) so effect can call load safely and prevent state updates after unmount, remove the poll function and its duplicated try/catch, and keep the same failure behavior (setError(err message) and setHosts(prev => prev ?? [])) while preserving the interval using REFRESH_MS and clearing it on cleanup.cueweb/app/hosts/host_format_utils.ts (2)
39-42: 💤 Low valueConsider documenting the idle > total case.
When
idle > total(invalid data), the function returns a ratio greater than 1.0, which would sort at the top. This behavior is reasonable for surfacing data quality issues, but a brief comment documenting the choice could help future maintainers.📝 Optional documentation
// idle / total guarded against divide-by-zero; used as a numeric sort key. +// Note: Does not clamp idle > total (invalid data); ratios > 1.0 will sort at the top. export function idleRatio(idle: number, total: number): number { if (!total) return 0; return idle / total; }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@cueweb/app/hosts/host_format_utils.ts` around lines 39 - 42, The idleRatio function can return values >1 when idle > total, which is intentional to surface data-quality issues; add a concise comment above the idleRatio(idle: number, total: number): number function that states this behavior explicitly (e.g., that a ratio >1 indicates invalid input and is surfaced intentionally rather than being clamped), so future maintainers understand the choice and sorting implications.
28-36: 💤 Low valueConsider guarding against negative values.
Currently, negative KB strings (e.g.,
"-1000") would pass through toconvertMemoryToStringand display as"-1000K". While negative memory indicates a data quality issue from the backend, you might prefer to return"-"to signal invalid data more clearly.🛡️ Optional guard for negative values
export function kbStringToHuman(kb: string): string { const n = Number(kb); // Number("") === 0 (finite), so guard the empty string explicitly to show // "-" (no data) rather than "0K". - if (kb === "" || !Number.isFinite(n)) { + if (kb === "" || !Number.isFinite(n) || n < 0) { return "-"; } return convertMemoryToString(n, "host"); }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@cueweb/app/hosts/host_format_utils.ts` around lines 28 - 36, The function kbStringToHuman currently converts negative KB strings (e.g., "-1000") into a human string via convertMemoryToString; add a guard in kbStringToHuman to treat negative numeric values as invalid and return "-" instead of passing them to convertMemoryToString. Specifically, after parsing kb to Number (n) in kbStringToHuman, check if n is negative (n < 0) and return "-" when so; otherwise continue to call convertMemoryToString as before.cueweb/app/__tests__/hosts/host_format_utils.test.ts (2)
15-24: 💤 Low valueConsider adding a test for negative values.
The current tests cover numeric, empty, and non-numeric inputs, but not negative values. Adding a test would document the expected behavior (currently displays as negative KB, e.g.,
"-1000K").🧪 Optional test for negative values
describe("kbStringToHuman", () => { it("formats KB strings to a human unit", () => { expect(kbStringToHuman("6815744")).toBe("6.5G"); expect(kbStringToHuman("512")).toBe("512K"); }); it("renders a dash for non-numeric input", () => { expect(kbStringToHuman("abc")).toBe("-"); expect(kbStringToHuman("")).toBe("-"); }); + it("displays negative values as negative KB", () => { + expect(kbStringToHuman("-1000")).toBe("-1000K"); + }); });🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@cueweb/app/__tests__/hosts/host_format_utils.test.ts` around lines 15 - 24, Add a test case to cover negative numeric input for kbStringToHuman: in the "kbStringToHuman" describe block add an it() that calls kbStringToHuman with a negative value string (e.g., "-1024" or "-1000000") and asserts the expected formatted output (e.g., "-1K" or "-1000K" depending on current behavior) so the test suite documents and prevents regressions for negative values.
26-33: 💤 Low valueConsider adding a test for idle > total.
Adding a test for the case where
idle > total(invalid data) would document the expected behavior (currently returns a ratio > 1.0).🧪 Optional test for idle > total
describe("idleRatio", () => { it("returns idle / total as a 0..1 ratio", () => { expect(idleRatio(4, 8)).toBe(0.5); }); it("returns 0 when total is 0 to avoid divide-by-zero", () => { expect(idleRatio(0, 0)).toBe(0); }); + it("returns ratio > 1.0 when idle > total (invalid data)", () => { + expect(idleRatio(10, 8)).toBe(1.25); + }); });🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@cueweb/app/__tests__/hosts/host_format_utils.test.ts` around lines 26 - 33, Add a unit test to document the behavior when idle > total by adding an "idle > total" case in the describe("idleRatio") group: call idleRatio with idle greater than total (e.g., idleRatio(8, 4)) and assert the returned value equals idle/total (2.0) so the test documents the current behavior of the idleRatio function rather than silently allowing an unexpected >1 result.cueweb/app/hosts/columns.tsx (1)
65-86: ⚡ Quick winConsider sorting consistency between cores and memory columns.
The cores column sorts by idle ratio (
idleRatio(idleCores, cores)) so hosts with proportionally similar free capacity sort together regardless of size (as explained by the comment on line 68). However, the memory column sorts by absolute idle bytes (kbStringToNumber(idleMemory)), meaning a host with 100 GB idle and 10% free would sort higher than a host with 50 GB idle and 50% free.This creates inconsistent sorting behavior: cores favor proportional availability while memory favors absolute availability.
🔄 Option 1: Make memory sort by ratio for consistency
{ id: "memory", header: sortableHeader("Memory (Idle/Total)"), - // Sort by idle bytes (numeric), not the formatted string. - accessorFn: (h) => kbStringToNumber(h.idleMemory), + // Sort by idle ratio so "most free" sorts together regardless of host size. + accessorFn: (h) => idleRatio(kbStringToNumber(h.idleMemory), kbStringToNumber(h.totalMemory)), cell: ({ row }) => (Option 2: If absolute memory sorting is intentional (e.g., for job placement where absolute capacity matters more than proportional), add a comment explaining the rationale similar to the cores column.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@cueweb/app/hosts/columns.tsx` around lines 65 - 86, The memory column currently sorts by absolute idle bytes (accessorFn using kbStringToNumber(h.idleMemory)) which is inconsistent with the cores column that sorts by proportion (idleRatio). Change the memory column's accessorFn to use the same ratio approach: call idleRatio(kbStringToNumber(h.idleMemory), kbStringToNumber(h.totalMemory)) and update the comment to explain "Sort by idle ratio so hosts with similar proportional free memory sort together"; keep the cell rendering using kbStringToHuman(row.original.idleMemory) / kbStringToHuman(row.original.totalMemory).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@cueweb/app/__tests__/hosts/host_format_utils.test.ts`:
- Around line 15-24: Add a test case to cover negative numeric input for
kbStringToHuman: in the "kbStringToHuman" describe block add an it() that calls
kbStringToHuman with a negative value string (e.g., "-1024" or "-1000000") and
asserts the expected formatted output (e.g., "-1K" or "-1000K" depending on
current behavior) so the test suite documents and prevents regressions for
negative values.
- Around line 26-33: Add a unit test to document the behavior when idle > total
by adding an "idle > total" case in the describe("idleRatio") group: call
idleRatio with idle greater than total (e.g., idleRatio(8, 4)) and assert the
returned value equals idle/total (2.0) so the test documents the current
behavior of the idleRatio function rather than silently allowing an unexpected
>1 result.
In `@cueweb/app/hosts/columns.tsx`:
- Around line 65-86: The memory column currently sorts by absolute idle bytes
(accessorFn using kbStringToNumber(h.idleMemory)) which is inconsistent with the
cores column that sorts by proportion (idleRatio). Change the memory column's
accessorFn to use the same ratio approach: call
idleRatio(kbStringToNumber(h.idleMemory), kbStringToNumber(h.totalMemory)) and
update the comment to explain "Sort by idle ratio so hosts with similar
proportional free memory sort together"; keep the cell rendering using
kbStringToHuman(row.original.idleMemory) /
kbStringToHuman(row.original.totalMemory).
In `@cueweb/app/hosts/host_format_utils.ts`:
- Around line 39-42: The idleRatio function can return values >1 when idle >
total, which is intentional to surface data-quality issues; add a concise
comment above the idleRatio(idle: number, total: number): number function that
states this behavior explicitly (e.g., that a ratio >1 indicates invalid input
and is surfaced intentionally rather than being clamped), so future maintainers
understand the choice and sorting implications.
- Around line 28-36: The function kbStringToHuman currently converts negative KB
strings (e.g., "-1000") into a human string via convertMemoryToString; add a
guard in kbStringToHuman to treat negative numeric values as invalid and return
"-" instead of passing them to convertMemoryToString. Specifically, after
parsing kb to Number (n) in kbStringToHuman, check if n is negative (n < 0) and
return "-" when so; otherwise continue to call convertMemoryToString as before.
In `@cueweb/app/hosts/page.tsx`:
- Around line 32-68: Unify duplicated fetch/error logic by making the existing
load callback the single source of truth for fetching hosts (it already calls
getHosts and handles setHosts/setError), then reuse load inside the
React.useEffect instead of the separate poll function; update load to accept or
check a mounted/cancelled guard (e.g., a local let cancelled or a ref like
isMountedRef) so effect can call load safely and prevent state updates after
unmount, remove the poll function and its duplicated try/catch, and keep the
same failure behavior (setError(err message) and setHosts(prev => prev ?? []))
while preserving the interval using REFRESH_MS and clearing it on cleanup.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 10cfef15-6bce-471c-a7b5-e6dab9d80d0c
📒 Files selected for processing (7)
cueweb/app/__tests__/hosts/host_format_utils.test.tscueweb/app/hosts/columns.tsxcueweb/app/hosts/host_format_utils.tscueweb/app/hosts/page.tsxcueweb/app/utils/get_utils.tscueweb/components/ui/simple-data-table.tsxdocs/_docs/user-guides/cueweb-user-guide.md
…ademySoftwareFoundation#2292) Add an isHostsTable flag (mirroring isFramesTable / isFramesLogTable) so the read-only hosts page can reuse SimpleDataTable: host-specific filter placeholder and empty-state copy, and no row context menu. Defaults to false, so existing frames/layers callers are unchanged.
548a7f9 to
4dd3540
Compare
|
@coderabbitai review |
✅ Action performedReview finished.
|
|
@ttpss930141011 Great job! I am reviewing your code. Soon we will merge it on the master or request more changes (if necessary). |
- the getHosts collapsed failures into [], so it never rejected - The /hosts inline error + Retry banner was dead; outages showed "No hosts registered" - Throw when accessGetApi returns non-array to distinguish failure from empty registry - Also fixes the same latent dead error-handling in the dashboard host widgets
- Add Monitor Hosts screenshots (CueCommander menu + page, light/dark) to the user guide - Expand the user-guide Monitor Hosts section: Columns menu, host filter; fix Memory sort note (idle ratio, not idle bytes) - Mark Monitor Hosts (initial version) as implemented in the CueCommander menu/route lists and cross-link the section - Add Monitor Hosts feature entry to the CueWeb overview (other-guides) - Add a Monitor Hosts reference subsection plus the GetHosts endpoint and /api/host/gethosts proxy route
|
Thanks for your contributions to the OpenCue project, @ttpss930141011 Excellent job! PR merged to the master 🎉 |
94bc190
into
AcademySoftwareFoundation:master




Related Issues
This is the foundation for the other Host & Allocation Management issues that need a host listing surface to build on: #2293 (lock/unlock), #2294 (reboot), #2295 (host detail page), #2296 (host tag editor).
Summarize your change.
Adds a
/hostsMonitor Hosts page to CueWeb: the web equivalent of CueGUI's CueCommander Monitor Hosts plugin (cuegui/cuegui/plugins/MonitorHostsPlugin.py,cuegui/cuegui/HostMonitorTree.py). CueWeb previously had no hosts page; the sidebarCueCommander > Monitor Hostslink 404'd. This provides the basic sortable, auto-refreshing table the issue asks for.app/hosts/page.tsx: loads hosts via the existinggetHosts()proxy (/api/host/gethosts) and auto-refreshes every 30s, with loading (skeletons), empty, and error (inline message + Retry) states. Read-only.app/hosts/columns.tsx) with columns: Name, State, Locked, NIMBY, Cores (Idle/Total), Memory (Idle/Total), Free /mcp. State/Locked reuse the existingStatusbadge. Numeric columns sort by their underlying value, not the formatted string.Hosttype widened (app/utils/get_utils.ts) to include the fields the table needs. Memory/mcpvalues arrive from the gateway as KB-in-string, so there are small parse/format helpers (app/hosts/host_format_utils.ts) with Jest unit tests.getHosts()(app/utils/get_utils.ts) now throws on a failed request instead of collapsing failures into[], so the page can tell a real Cuebot/gateway outage from an empty registry and actually render the inline error + Retry (the dashboard host widgets get the same fix for free).SimpleDataTable(shared by jobs/layers/frames) gains anisHostsTableflag, mirroring the existingisFramesTable/isFramesLogTableflags: host-specific filter placeholder and empty-state copy, and no row context menu. It defaults tofalse, so existing callers are unchanged, verified by the existing test suite still passing.other-guides/cueweb.md), and a reference subsection plus theGetHostsendpoint //api/host/gethostsproxy-route entries (reference/cueweb.md).Idle/Total vs used/total: the issue text says "used/total", but the backend and CueGUI's Monitor Hosts both expose idle vs total, so the columns show and are labeled "(Idle/Total)" to stay faithful to the data source.
Scope: read-only by design. Host actions (lock/unlock, tag edit, reboot, NIMBY toggle) and server-side filtering belong to the sibling issues (#2293–#2296) and are intentionally out of scope here.
Testing: added Jest unit tests for the formatting/sort helpers; full CueWeb suite passes (48/48). Manually verified against the local sandbox, the page lists the rqd hosts, sorting and the substring filter work, the table refreshes every 30s, and the existing jobs/layers/frames tables are unaffected by the
SimpleDataTablechange.LLM usage disclosure
Hai Shun: @ttpss930141011
Co-authored-by: Hai Shun o927416847@gmail.com
Co-authored-by: Ramon Figueiredo rfigueiredo@imageworks.com