Multicore-CUDA NAMD output hangs

From: Mitchell Gleed (aliigleed16_at_gmail.com)
Date: Wed Feb 18 2015 - 15:12:21 CST

Dear NAMD users,

I've been working with the supercomputing facility at my university to get
NAMD 2.10 compiled for use on new GPU nodes. We have successfully compiled
MPI and multicore versions but have had issues with the multicore builds.
I've found similar threads in the mailing list, but the most similar cases
have no responses and have been on the list a long time.

When running NAMD with the multicore builds, the jobs frequently hang
indefinitely with no output or warning message. Hanging occurs anywhere
after the first and last step of the apoa1 benchmark, and occurs on
different GPU nodes (same hardware). Running "top" on the jobs in the "hang
state" shows the CPUs are being utilized. We've found the probability of
hanging increases with the number of CPU's requested, but still occurs with
a low CPU:GPU ratio.

Initial benchmarking of the multicore CUDA build show that using all 24 CPU
cores with all 4 GPU's results in the fastest timing for the apoa1
benchmark (<0.08 days/ns) and performance decreases when fewer cores or
fewer GPU's are assigned, but these whole-node benchmark jobs hang more
often than the rest.

The MPI CUDA builds follow a similar trend, according to our initial
benchmarks, but none of the jobs have had any hanging issues. The MPI
build, although reasonably fast, is significantly slower than the multicore
builds (compare ~0.14 days/ns on apoa1 with 24 cores & 4 gpus).

We have found the multicore issue persists despite the use of intel
compilers, gnu compilers, or even the precompiled binary on the NAMD
website. We would love to get a working multicore build due to the
performance increase over the MPI build on a single node.

We typically use the following namd command-line arguments, but have tried
+isomalloc_sync and +ignoresharing as well: namd2 +p$cpus +idlepoll
+devices $CUDA_VISIBLE_DEVICES apoa1.namd

Here are the specs of the GPU nodes:
12-core Intel Haswell (2.3 GHz) (two sockets, 24 cores per node)
64 GB 2133 MHz DDR4
2 x Nvidia K80 (4 GPU's)

Let me know if you have any ideas about how to resolve the issue. Thank
you!

Mitch Gleed

This archive was generated by hypermail 2.1.6 : Thu Dec 31 2015 - 23:21:40 CST