NAMD error LInux 3.11.10-21-desktop kernel AMD opternon 6272

From: Thomas C. Bishop (bishop_at_latech.edu)
Date: Wed Nov 05 2014 - 09:48:15 CST

The following  may be related to the recent colvars post but not very likely.
Has anyone seen similar problems or willing to run a test on similar hardware/kernel configuration?

I recently demonstrated that my Supermicro (H8DG6 motherboard) with AMD Opteron(TM) Processor 6272
and the linux 3.11.10-21 x86_64 kernel (opensuse 13.1) has a memory problem that crashes a shared memory run w/  NAMD2.9/2.10

Using the same kernel/OS/simulation/namd versions but on intel based machines works fine.
Using the same simulation/OS/namd versions but with  desktop-3.11.6-4.1.x86_64 kernel  works fine on the supermicro /AMD machine

Seems something has gone wrong between desktop-3.11.6-4.1.x86_64  and 3.11.10-21 x86_64 that may be
AMD Opteron 6272 or Supermicro H8DG6 specific to my shared memory namd runs. Charmrun works in all cases.

Thanks
TOm






On 11/04/2014 10:44 PM, Leili Zhang wrote:
Dear all:

I recently compiled NAMD-2.10b1 for Linux-x86_64-MPI. I ran normal MD simulations perfectly fine with 16-128 cores of CPU. However when I tried to start metadynamics simulations, I got the following error messages:

...
colvars: Collective variables biases initialized, 1 in total.
colvars: ----------------------------------------------------------------------
colvars: Collective variables module initialized.
colvars: ----------------------------------------------------------------------
Info: Startup phase 10 took 0.015816 s, 381.293 MB of memory in use
Info: Startup phase 11 took 0.000250816 s, 381.293 MB of memory in use
Info: useSync: 1 useProxySync: 0
Info: Startup phase 12 took 0.000249147 s, 381.293 MB of memory in use
Info: Finished startup at 2.20578 s, 381.293 MB of memory in use

TCL: Running for 10000000 steps
colvars:   Error: NAMD does not have yet a way to communicate atom velocities to the colvars.
colvars:   If this error message is unclear, try recompiling with -DCOLVARS_DEBUG.
FATAL ERROR: Error in the collective variables module: exiting.
: Success
[0] Stack Traceback:
  [0:0] _Z8NAMD_errPKc+0xde  [0x61345e]
  [0:1] _ZN16colvarproxy_namd11fatal_errorERKSs+0x52  [0xa67232]
  [0:2] _ZN12colvarmodule4atom13read_velocityEv+0x2c  [0xa631ac]
  [0:3] _ZN12colvarmodule10atom_group15read_velocitiesEv+0x1fc  [0xa2d04c]
  [0:4] _ZN6colvar4calcEv+0x11a  [0x9ee14a]
  [0:5] _ZN12colvarmodule4calcEv+0x55  [0x9bff85]
  [0:6] _ZN16colvarproxy_namd9calculateEv+0x5e2  [0xa64be2]
  [0:7] _ZN12GlobalMaster11processDataEPiS0_P6VectorS2_S2_PdS3_S0_S0_S2_S0_S0_S2_+0x6e  [0x9979be]
  [0:8] _ZN18GlobalMasterServer11callClientsEv+0xcfc  [0x99a48c]
  [0:9] _ZN18GlobalMasterServer8recvDataEP20ComputeGlobalDataMsg+0x67c  [0x998eac]
  [0:10] _Z15_processHandlerPvP11CkCoreState+0x705  [0xcb5da5]
  [0:11] CsdScheduler+0x47d  [0xe15bdd]
  [0:12] _ZN9ScriptTcl7Tcl_runEPvP10Tcl_InterpiPPc+0x2c5  [0xba44d5]
  [0:13] TclInvokeStringCommand+0x88  [0xe712a8]
  [0:14]   [0xe73ec7]
  [0:15]   [0xe752e2]
  [0:16] Tcl_EvalEx+0x16  [0xe75b06]
  [0:17] Tcl_FSEvalFileEx+0x151  [0xed7cb1]
  [0:18] Tcl_EvalFile+0x2e  [0xed7e6e]
  [0:19] _ZN9ScriptTcl4loadEPc+0xf  [0xba126f]
  [0:20] main+0x3e7  [0x617ac7]
  [0:21] __libc_start_main+0xfd  [0x300081ecdd]
  [0:22]   [0x57ccf9]

The input files worked also fine with NAMD-2.9 on, say, gordon cluster or stampede. Unfortunately I cannot successfully compile NAMD-2.9 on our current cluster after several tries. So I cannot say..

Thanks in advance for any advices!

Leili

This archive was generated by hypermail 2.1.6 : Thu Dec 31 2015 - 23:21:20 CST