Unpredictable crash at runtime

From: pellegrini (pellegrini_at_ill.fr)
Date: Fri Mar 18 2011 - 03:14:22 CDT

Dear NAMD user,

I faced a problem when running a simulation using NAMD 2.7. The
simulation is an NPT-equilibration MD of a single protein in a waterbox.

My problem is that the simulation crashes from times to times during the
simulation at an unpredictable (hence unreproducible) time step.

You will find attached the input file and the error message I get.

I run NAMD 2.7 on a 4 nodes with 8 cores each node being connected by
Infiny band device. I would like to mention that I used to have the same
problem in the past with an ethernet connected network.

Here is the library dependancy of my namd2 executable:

BSA/300K> ldd /serhom/cs/model/NAMD/NAMD_2.7_Linux-x86_64-ibverbs/namd2
        linux-vdso.so.1 => (0x00007fffdbfff000)
        libibverbs.so.1 => /usr/lib64/libibverbs.so.1 (0x00002b7dcedd3000)
        libdl.so.2 => /lib64/libdl.so.2 (0x00002b7dcefe1000)
        libm.so.6 => /lib64/libm.so.6 (0x00002b7dcf1e5000)
        libstdc++.so.6 => /usr/lib64/libstdc++.so.6 (0x00002b7dcf43c000)
        libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00002b7dcf748000)
        libc.so.6 => /lib64/libc.so.6 (0x00002b7dcf960000)
        libpthread.so.0 => /lib64/libpthread.so.0 (0x00002b7dcfcba000)
        /lib64/ld-linux-x86-64.so.2 (0x00002b7dcebb4000)

And the one of my charmrun executable:

BSA/300K> ldd /serhom/cs/model/NAMD/NAMD_2.7_Linux-x86_64-ibverbs/charmrun
        linux-vdso.so.1 => (0x00007ffffa9fe000)
        libc.so.6 => /lib64/libc.so.6 (0x00002b3ab045c000)
        /lib64/ld-linux-x86-64.so.2 (0x00002b3ab023d000)

I know that this kind of behaviour has already been discussed in the
mailing list but it usually ended up with 'not solved' status.

Would you have any idea ?

thanks a lot

Eric

-- 
Eric Pellegrini
Calcul Scientifique
Institut Laue-Langevin
Grenoble, France

This archive was generated by hypermail 2.1.6 : Wed Feb 29 2012 - 15:56:46 CST