From: Pang, Yui Tik (andrewpang_at_gatech.edu)
Date: Sat Oct 19 2019 - 11:10:07 CDT
Dear all,
I get an error from charmrun from the precompiled NAMD2.13 ibverbs and verbs version . The error persist even for self-compiled version of charm-6.8.2/verbs-linux-x86_64-ifort-iccstatic. The error is pasted as follows:
[0] wc[0] status 9 wc[i].opcode 0
mlx5: login-hive1.pace.gatech.edu: got completion with error:
00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000
00000001 00000000 00000000 00000000
00000000 00008a12 0a001e80 0036b1d2
------------- Processor 0 Exiting: Called CmiAbort ------------
Reason: Work completion error in sendCq
[0] Stack Traceback:
[0:0] [0x6176e3]
[0:1] [0x617736]
[0:2] [0x613a78]
[0:3] [0x61383e]
[0:4] [0x61c881]
[0:5] [0x61ead9]
[0:6] [0x61315f]
[0:7] [0x617857]
[0:8] [0x625f28]
[0:9] [0x626d93]
[0:10] [0x621671]
[0:11] [0x621ac9]
[0:12] [0x6219a0]
[0:13] [0x6174b6]
[0:14] [0x617337]
[0:15] [0x4e2a6b]
[0:16] __libc_start_main+0xf5 [0x7ffff6d753d5]
[0:17] [0x408ba9]
Our cluster uses MLX Infiniband and REHL 7 if the information helps. Any help will be appreciated!
Thank you!
Best,
Andrew Pang
This archive was generated by hypermail 2.1.6 : Tue Dec 31 2019 - 23:20:59 CST