RE: BUG: NAMD-2.10 CUDA REMD segfault with TCL exec randomly

From: Norman Geist (
Date: Wed Mar 04 2015 - 04:40:07 CST

Here's the backtrace I got, seems that libcuda isn't thread-safe. It's a
real issue at many places that TCLs exec uses fork() .

This could maybe be circumvented by adding a command to NAMDs TCL interface
for calling external tools without need to exec and fork() as NAMD will
anyway throw a FATAL for TCL errors, the forking exec doesn't change a thing


Program terminated with signal SIGSEGV, Segmentation fault.

#0 0x00002ae8a2893602 in ?? () from /lib64/

(gdb) bt

#0 0x00002ae8a2893602 in ?? () from /lib64/

#1 0x00002ae8a7c7ef81 in ?? () from /usr/lib64/

#2 0x00002ae8a75f2b5a in ?? () from /usr/lib64/

#3 0x00002ae8a7c7f588 in ?? () from /usr/lib64/

#4 0x00002ae8a17ab1f3 in start_thread () from /lib64/

#5 0x00002ae8a29041ad in clone () from /lib64/


Norman Geist.


From: [] On Behalf
Of Norman Geist
Sent: Wednesday, March 04, 2015 8:48 AM
Subject: namd-l: BUG: NAMD-2.10 CUDA REMD segfault with TCL exec randomly




I want to report that there's a weird problem with segmentation faults
occurring after a random time (number of steps) on "exec" from a NAMD
jobscript, but so far only observed for CUDA+REMD runs. The same system runs
fine with the CPU version. I already ran into two cases where calling "exec"
caused a segfault after some time. The first one was a call to "date" to
measure the time per run for REMD. Another was calling VMD to do some
measurements during a REMD.


If there's interest to solve this, I can supply the problematic code and


Norman Geist

This archive was generated by hypermail 2.1.6 : Tue Dec 27 2016 - 23:20:56 CST