AW: NAMD error LInux 3.11.10-21-desktop kernel AMD opternon 6272

From: Norman Geist (norman.geist_at_uni-greifswald.de)
Date: Thu Nov 06 2014 - 01:43:36 CST

Hey Tom,

 

I have a similar machine, Supermicro 4-way AMD Opteron 6380. I think the problem you have is _not_ related to NAMD but to the linux kernel not detecting the special AMD NUMA architecture for magny-cores correctly. Do you have the latest BIOS installed on that system?

 

Norman Geist.

 

Von: owner-namd-l_at_ks.uiuc.edu [mailto:owner-namd-l_at_ks.uiuc.edu] Im Auftrag von Thomas C. Bishop
Gesendet: Mittwoch, 5. November 2014 16:48
An: namd-l_at_ks.uiuc.edu
Betreff: namd-l: NAMD error LInux 3.11.10-21-desktop kernel AMD opternon 6272

 

The following may be related to the recent colvars post but not very likely.
Has anyone seen similar problems or willing to run a test on similar hardware/kernel configuration?

I recently demonstrated that my Supermicro (H8DG6 motherboard) with AMD Opteron(TM) Processor 6272
and the linux 3.11.10-21 x86_64 kernel (opensuse 13.1) has a memory problem that crashes a shared memory run w/ NAMD2.9/2.10

Using the same kernel/OS/simulation/namd versions but on intel based machines works fine.
Using the same simulation/OS/namd versions but with desktop-3.11.6-4.1.x86_64 kernel works fine on the supermicro /AMD machine

Seems something has gone wrong between desktop-3.11.6-4.1.x86_64 and 3.11.10-21 x86_64 that may be
AMD Opteron 6272 or Supermicro H8DG6 specific to my shared memory namd runs. Charmrun works in all cases.

Thanks
TOm

On 11/04/2014 10:44 PM, Leili Zhang wrote:

Dear all:

I recently compiled NAMD-2.10b1 for Linux-x86_64-MPI. I ran normal MD simulations perfectly fine with 16-128 cores of CPU. However when I tried to start metadynamics simulations, I got the following error messages:

...
colvars: Collective variables biases initialized, 1 in total.
colvars: ----------------------------------------------------------------------
colvars: Collective variables module initialized.
colvars: ----------------------------------------------------------------------
Info: Startup phase 10 took 0.015816 s, 381.293 MB of memory in use
Info: Startup phase 11 took 0.000250816 s, 381.293 MB of memory in use
Info: useSync: 1 useProxySync: 0
Info: Startup phase 12 took 0.000249147 s, 381.293 MB of memory in use
Info: Finished startup at 2.20578 s, 381.293 MB of memory in use

TCL: Running for 10000000 steps
colvars: Error: NAMD does not have yet a way to communicate atom velocities to the colvars.
colvars: If this error message is unclear, try recompiling with -DCOLVARS_DEBUG.
FATAL ERROR: Error in the collective variables module: exiting.
: Success
[0] Stack Traceback:
  [0:0] _Z8NAMD_errPKc+0xde [0x61345e]
  [0:1] _ZN16colvarproxy_namd11fatal_errorERKSs+0x52 [0xa67232]
  [0:2] _ZN12colvarmodule4atom13read_velocityEv+0x2c [0xa631ac]
  [0:3] _ZN12colvarmodule10atom_group15read_velocitiesEv+0x1fc [0xa2d04c]
  [0:4] _ZN6colvar4calcEv+0x11a [0x9ee14a]
  [0:5] _ZN12colvarmodule4calcEv+0x55 [0x9bff85]
  [0:6] _ZN16colvarproxy_namd9calculateEv+0x5e2 [0xa64be2]
  [0:7] _ZN12GlobalMaster11processDataEPiS0_P6VectorS2_S2_PdS3_S0_S0_S2_S0_S0_S2_+0x6e [0x9979be]
  [0:8] _ZN18GlobalMasterServer11callClientsEv+0xcfc [0x99a48c]
  [0:9] _ZN18GlobalMasterServer8recvDataEP20ComputeGlobalDataMsg+0x67c [0x998eac]
  [0:10] _Z15_processHandlerPvP11CkCoreState+0x705 [0xcb5da5]
  [0:11] CsdScheduler+0x47d [0xe15bdd]
  [0:12] _ZN9ScriptTcl7Tcl_runEPvP10Tcl_InterpiPPc+0x2c5 [0xba44d5]
  [0:13] TclInvokeStringCommand+0x88 [0xe712a8]
  [0:14] [0xe73ec7]
  [0:15] [0xe752e2]
  [0:16] Tcl_EvalEx+0x16 [0xe75b06]
  [0:17] Tcl_FSEvalFileEx+0x151 [0xed7cb1]
  [0:18] Tcl_EvalFile+0x2e [0xed7e6e]
  [0:19] _ZN9ScriptTcl4loadEPc+0xf [0xba126f]
  [0:20] main+0x3e7 [0x617ac7]
  [0:21] __libc_start_main+0xfd [0x300081ecdd]
  [0:22] [0x57ccf9]

The input files worked also fine with NAMD-2.9 on, say, gordon cluster or stampede. Unfortunately I cannot successfully compile NAMD-2.9 on our current cluster after several tries. So I cannot say..

 

Thanks in advance for any advices!

Leili

 

---
Diese E-Mail ist frei von Viren und Malware, denn der avast! Antivirus Schutz ist aktiv.
http://www.avast.com

This archive was generated by hypermail 2.1.6 : Thu Dec 31 2015 - 23:21:20 CST