Add new CommDiagnostics and GpuDiagnostics managers#28576
Conversation
Signed-off-by: Jade Abraham <jade.abraham@hpe.com>
Signed-off-by: Jade Abraham <jade.abraham@hpe.com>
Signed-off-by: Jade Abraham <jade.abraham@hpe.com>
|
In general, I am in favor of this. But when we talked about similar concepts in the past, one worry was that comm diagnostics are mainly tracked by the runtime globally. That makes nested comm diagnostics very difficult to maintain, whereas this context manager based approach makes it look like I can nest two comm-diagnostic regions. On a quick look, this PR doesn't seem to do anything to protect against that. I don't think it is a showstopper, but it is a worry nonetheless. |
|
Using a single, global context manager object that you explicitly enter ( |
This PR doesn't do anything to protect that, but there are already protections in place for nested comm diagnostics, see
I find this attractive and may implement that |
Adds a new verbose communication manager to CommDiagnostics and two new gpu communication managers to GpuDiagnostics. Also, adjusts
printCommDiagnosticsTableto work with a non-global objectThe motivation for this PR was a recent demo I gave where I found myself writing code like this
This code was repeated over and over in my demo. What I really wanted was something like this
This PR makes the above mostly possible.
The one piece that is missing is
commDiagnosticsManager, which currently can't be implemented in the CommDiagnostics module due to internal module issues. Since CommDiagnostics is used in internal modules (and especially during module initialization), this causes issues with interfaces (which attempt to aggressively resolve code before its ready). I have implemented this manager in a test as future work.This PR is somewhat related to several other issues about the design of CommDiagnostics (and by extension, GpuDiagnostics). See #16958, #16955, and #16956