Skip to content

Add new CommDiagnostics and GpuDiagnostics managers#28576

Open
jabraham17 wants to merge 5 commits into
chapel-lang:mainfrom
jabraham17:add-comm-manager
Open

Add new CommDiagnostics and GpuDiagnostics managers#28576
jabraham17 wants to merge 5 commits into
chapel-lang:mainfrom
jabraham17:add-comm-manager

Conversation

@jabraham17

Copy link
Copy Markdown
Member

Adds a new verbose communication manager to CommDiagnostics and two new gpu communication managers to GpuDiagnostics. Also, adjusts printCommDiagnosticsTable to work with a non-global object

The motivation for this PR was a recent demo I gave where I found myself writing code like this

resetGpuDiagnostics();
startGpuDiagnostics();
resetCommDiagnostics();
startCommDiagnostics();
var nextArr = myStencil(myArr, dataDom);
stopCommDiagnostics();
stopGpuDiagnostics();
printCommDiagnosticsTable();
writeln(getGpuDiagnostics());

This code was repeated over and over in my demo. What I really wanted was something like this

var nextArr: myArr.type;
manage new gpuDiagnosticsManager(autoPrint=true) do
  manage new commDiagnosticsManager(autoPrint=true) do
    nextArr = myStencil(myArr, dataDom);

This PR makes the above mostly possible.

The one piece that is missing is commDiagnosticsManager, which currently can't be implemented in the CommDiagnostics module due to internal module issues. Since CommDiagnostics is used in internal modules (and especially during module initialization), this causes issues with interfaces (which attempt to aggressively resolve code before its ready). I have implemented this manager in a test as future work.

This PR is somewhat related to several other issues about the design of CommDiagnostics (and by extension, GpuDiagnostics). See #16958, #16955, and #16956

  • test the new GPU tests with CHPL_GPU=cpu
  • paratest
  • paratest with gasnet
  • check docs build

Signed-off-by: Jade Abraham <jade.abraham@hpe.com>
Signed-off-by: Jade Abraham <jade.abraham@hpe.com>
Signed-off-by: Jade Abraham <jade.abraham@hpe.com>
Signed-off-by: Jade Abraham <jade.abraham@hpe.com>
Signed-off-by: Jade Abraham <jade.abraham@hpe.com>
@e-kayrakli

Copy link
Copy Markdown
Contributor

In general, I am in favor of this. But when we talked about similar concepts in the past, one worry was that comm diagnostics are mainly tracked by the runtime globally. That makes nested comm diagnostics very difficult to maintain, whereas this context manager based approach makes it look like I can nest two comm-diagnostic regions.

On a quick look, this PR doesn't seem to do anything to protect against that. I don't think it is a showstopper, but it is a worry nonetheless.

@DanilaFe

Copy link
Copy Markdown
Contributor

Using a single, global context manager object that you explicitly enter (manage globalCommDiagnostics) can make it more explicitly that you're using a global functionality (as opposed to creating a new instance of the manager every time, which suggests independence).

@jabraham17

Copy link
Copy Markdown
Member Author

On a quick look, this PR doesn't seem to do anything to protect against that. I don't think it is a showstopper, but it is a worry nonetheless.

This PR doesn't do anything to protect that, but there are already protections in place for nested comm diagnostics, see test/library/standard/CommDiagnostics/commDiagnosticsWarnings.chpl.

Using a single, global context manager object that you explicitly enter (manage globalCommDiagnostics) can make it more explicitly that you're using a global functionality (as opposed to creating a new instance of the manager every time, which suggests independence).

I find this attractive and may implement that

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants