Skip to content

Intel18 + openmp + MOM6 global_ALE_z crashes on theta , lscsky50, theia #1

@nikizadehgfdl

Description

@nikizadehgfdl

Intel18 + openmp executable (single thread test) causes crash or just hang for MOM6 test cases on all three machines theta (KNL) , lscsky50 (skylake) and theia.

Here's the crash output for global_ALE_z test case on theta and lscsky50:

 EKEmin=  1.000000000000000E+016 ResMin=   236869.453598697
 src=   1332071.81173317      ldamping=  8.991153093102879E-082
 gamma-b=  0.832273068599009      gamma-t=  0.901219478800562
 drag_visc=  2.083867476924661E-004 Ubg2=  0.000000000000000E+000
Something has gone very wrong
[NID 02598] 2018-04-20 14:33:31 Apid 4349292: initiated application termination

or for another test (benchmark):

_pmiu_daemon(SIGCHLD): [NID 00471] [c2-0c1s5n3] [Fri Apr 20 16:31:03 2018] PE RANK 19 exit signal Bus error
[NID 00471] 2018-04-20 16:31:03 Apid 4349434: initiated application termination
[NID 00471] 2018-04-20 16:31:04 Apid 4349434: Error detected during page fault processing.  Process terminated via bus error.

on KNL box:

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   PID 182461 RUNNING AT lscsky50-d
=   EXIT CODE: 9
=   CLEANING UP REMAINING PROCESSES
=   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================
   Intel(R) MPI Library troubleshooting guide:
      https://software.intel.com/node/561764
===================================================================================

No such issues for Intel17.

No such issue for non-openmp exec with Intel18.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions