NAMD 2.12 multicore CUDA on Ubuntu 16.04 with NVIDIA Tesla M10 produces segmentation fault

From: Wilson,Michael (michawilson_at_uchc.edu)
Date: Tue Jul 18 2017 - 13:57:16 CDT

Hello,

I am working on the development team for NMRbox (http://nmrbox.org). NMRbox is a cloud-based virtual machine loaded with NMR software. We would like to include NAMD with multicore and CUDA support on our platform. I have been struggling to get the right combination of GPU and CUDA drivers. I now have compatible versions that allow the graphical use of the GPU, but running NAMD with CUDA is generating a segmentation fault without meaningful error messages. I am hopeful that someone on the list will be able to help recognize the problem and set me on the correct path to resolving the issue.

Installation specifications:

Physical Hardware:
Dell PowerEdge R730, Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz
VMWare ESXi 6.0 Hypervisor with 72 logical processors
NVIDIA Tesla M10 GPU

Virtual Hardware:
10 CPUs
20 GB RAM
NVIDIA Tesla M10 GPU (passthrough mode)

OS:
Ubuntu 16.04

GPU Driver:
NVIDIA-Linux-x86_64-367.92.run

CUDA Driver:
cuda-repo-ubuntu1604_8.0.61-1_amd64.deb

Environment variables:

mwilson_at_dailymike:~/namd-tutorial-files/1-2-sphere$ echo $PATH
/home/nmrbox/mwilson/NAMD_2.12_Linux-x86_64-multicore-CUDA:/usr/local/cuda-8.0/bin:/usr/software/bin:/usr/software/nmr-scripts:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/software/cns_solve_1.3/intel-x86_64bit-linux/bin:/usr/software/cns_solve_1.3/intel-x86_64bit-linux/utils:/usr/software/mddnmr/binCentOS64:/usr/software/mddnmr/com:/usr/software/nmrpipe/nmrbin.linux212_64:/usr/software/nmrpipe/com:/usr/software/nmrpipe/dynamo/tcl:/usr/software/redcraft/scripts:/usr/software/rosetta/main/source/bin:/usr/software/shifts-5.1/bin:/home/nmrbox/mwilson/bin

mwilson_at_dailymike:~/namd-tutorial-files/1-2-sphere$ echo $LD_LIBRARY_PATH
/home/nmrbox/mwilson/NAMD_2.12_Linux-x86_64-multicore-CUDA:/home/nmrbox/mwilson/NAMD_2.12_Linux-x86_64-multicore-CUDA/lib:.:/usr/local/cuda-8.0/lib64:/usr/software/modeller-9.17/lib/x86_64-intel8:/usr/software/nmrpipe/nmrbin.linux212_64/lib

Prior to this point I had been getting various CUDA errors, but I was able to fix those. For want of a config file, I have been using one that I found in the tutorial, but this is an installation question, not a tutorial question. The config file works fine with the non-CUDA version of NAMD. The following is the output when I attempt to run:

mwilson_at_dailymike:~/namd-tutorial-files/1-2-sphere$ namd2 +isomalloc_sync +idlepoll +p4 +devices 0 ubq_ws_eq.conf
Charm++: standalone mode (not using charmrun)
Charm++> Running in Multicore mode: 4 threads
Charm++> Using recursive bisection (scheme 3) for topology aware partitions
Converse/Charm++ Commit ID: v6.7.1-0-gbdf6a1b-namd-charm-6.7.1-build-2016-Nov-07-136676
Warning> Randomization of stack pointer is turned on in kernel.
Charm++> synchronizing isomalloc memory region...
[0] consolidated Isomalloc memory region: 0x440000000 - 0x7fa000000000 (133807104 megs)
CharmLB> Load balancer assumes all CPUs are same.
Charm++> Running on 1 unique compute nodes (10-way SMP).
Charm++> cpu topology info is gathered in 0.007 seconds.
Info: Built with CUDA version 6050
Pe 1 physical rank 1 will use CUDA device of pe 2
Pe 0 physical rank 0 will use CUDA device of pe 2
Info: NAMD 2.12 for Linux-x86_64-multicore-CUDA
Info:
Info: Please visit http://www.ks.uiuc.edu/Research/namd/
Info: for updates, documentation, and support information.
Info:
Info: Please cite Phillips et al., J. Comp. Chem. 26:1781-1802 (2005)
Info: in all publications reporting results obtained with NAMD.
Info:
Info: Based on Charm++/Converse 60701 for multicore-linux64-iccstatic
Info: Built Wed Dec 21 11:34:15 CST 2016 by jim on harare.ks.uiuc.edu
Info: 1 NAMD 2.12 Linux-x86_64-multicore-CUDA 4 dailymike mwilson
Info: Running on 4 processors, 1 nodes, 1 physical nodes.
Info: CPU topology information available.
Info: Charm++/Converse parallel runtime startup completed at 0.325034 s
Pe 3 physical rank 3 will use CUDA device of pe 2
Pe 2 physical rank 2 binding to CUDA device 0 on dailymike: 'Tesla M10' Mem: 8127MB Rev: 5.0
CkLoopLib is used in SMP with a simple dynamic scheduling (converse-level notification) but not using node-level queue
Info: 21.6836 MB of memory in use based on /proc/self/stat
Info: Configuration file is ubq_ws_eq.conf
Info: Working in the current directory /home/nmrbox/mwilson/namd-tutorial-files/1-2-sphere
TCL: Suspending until startup complete.
Info: SIMULATION PARAMETERS:
Info: TIMESTEP 2
Info: NUMBER OF STEPS 0
Info: STEPS PER CYCLE 10
Info: LOAD BALANCER Centralized
Info: LOAD BALANCING STRATEGY New Load Balancers -- DEFAULT
Info: LDB PERIOD 2000 steps
Info: FIRST LDB TIMESTEP 50
Info: LAST LDB TIMESTEP -1
Info: LDB BACKGROUND SCALING 1
Info: HOM BACKGROUND SCALING 1
Info: MIN ATOMS PER PATCH 40
Info: INITIAL TEMPERATURE 310
Info: CENTER OF MASS MOVING INITIALLY? NO
Info: DIELECTRIC 1
Info: EXCLUDE SCALED ONE-FOUR
Info: 1-4 ELECTROSTATICS SCALED BY 1
Info: MODIFIED 1-4 VDW PARAMETERS WILL BE USED
Info: DCD FILENAME ubq_ws_eq.dcd
Info: DCD FREQUENCY 250
Info: DCD FIRST STEP 250
Info: NO EXTENDED SYSTEM TRAJECTORY OUTPUT
Info: NO VELOCITY DCD OUTPUT
Info: NO FORCE DCD OUTPUT
Info: OUTPUT FILENAME ubq_ws_eq
Info: BINARY OUTPUT FILES WILL BE USED
Info: RESTART FILENAME ubq_ws_eq.restart
Info: RESTART FREQUENCY 500
Info: BINARY RESTART FILES WILL BE USED
Info: SWITCHING ACTIVE
Info: SWITCHING ON 10
Info: SWITCHING OFF 12
Info: PAIRLIST DISTANCE 14
Info: PAIRLIST SHRINK RATE 0.01
Info: PAIRLIST GROW RATE 0.01
Info: PAIRLIST TRIGGER 0.3
Info: PAIRLISTS PER CYCLE 2
Info: PAIRLISTS ENABLED
Info: MARGIN 0
Info: HYDROGEN GROUP CUTOFF 2.5
Info: PATCH DIMENSION 16.5
Info: ENERGY OUTPUT STEPS 100
Info: CROSSTERM ENERGY INCLUDED IN DIHEDRAL
Info: TIMING OUTPUT STEPS 1000
Info: SPHERICAL BOUNDARY CONDITIONS ACTIVE
Info: RADIUS #1 26
Info: FORCE CONSTANT #1 10
Info: EXPONENT #1 2
Info: SPHERE BOUNDARY CENTER(30.3082, 28.805, 15.354)
Info: LANGEVIN DYNAMICS ACTIVE
Info: LANGEVIN TEMPERATURE 310
Info: LANGEVIN USING BBK INTEGRATOR
Info: LANGEVIN DAMPING COEFFICIENT IS 1 INVERSE PS
Info: LANGEVIN DYNAMICS NOT APPLIED TO HYDROGENS
Info: MULTILEVEL SUMMATION METHOD (MSM) FOR ELECTROSTATICS ACTIVE
Info: MSM WITH C1 CUBIC INTERPOLATION AND C2 TAYLOR SPLITTING
Info: FULL ELECTROSTATIC EVALUATION FREQUENCY 2
Info: USING VERLET I (r-RESPA) MTS SCHEME.
Info: C1 SPLITTING OF LONG RANGE ELECTROSTATICS
Info: PLACING ATOMS IN PATCHES BY HYDROGEN GROUPS
Info: RIGID BONDS TO HYDROGEN : ALL
Info: ERROR TOLERANCE : 1e-08
Info: MAX ITERATIONS : 100
Info: RIGID WATER USING SETTLE ALGORITHM
Info: RANDOM NUMBER SEED 1500400344
Info: USE HYDROGEN BONDS? NO
Info: COORDINATE PDB ../common/ubq_ws.pdb
Info: STRUCTURE FILE ../common/ubq_ws.psf
Info: PARAMETER file: CHARMM format!
Info: PARAMETERS ../common/par_all27_prot_lipid.inp
Info: USING ARITHMETIC MEAN TO COMBINE L-J SIGMA PARAMETERS
Info: SUMMARY OF PARAMETERS:
Info: 180 BONDS
Info: 447 ANGLES
Info: 566 DIHEDRAL
Info: 46 IMPROPER
Info: 6 CROSSTERM
Info: 119 VDW
Info: 0 VDW_PAIRS
Info: 0 NBTHOLE_PAIRS
Info: TIME FOR READING PSF FILE: 0.0478668
Info: Reading pdb file ../common/ubq_ws.pdb
Info: TIME FOR READING PDB FILE: 0.00912404
Info:
Info: ****************************
Info: STRUCTURE SUMMARY:
Info: 6682 ATOMS
Info: 4871 BONDS
Info: 4074 ANGLES
Info: 3293 DIHEDRALS
Info: 204 IMPROPERS
Info: 74 CROSSTERMS
Info: 0 EXCLUSIONS
Info: 6080 RIGID BONDS
Info: 13966 DEGREES OF FREEDOM
Info: 2419 HYDROGEN GROUPS
Info: 4 ATOMS IN LARGEST HYDROGEN GROUP
Info: 2419 MIGRATION GROUPS
Info: 4 ATOMS IN LARGEST MIGRATION GROUP
Info: TOTAL MASS = 41298.8 amu
Info: TOTAL CHARGE = 1.00955e-06 e
Info: *****************************
Info:
Info: Entering startup at 0.663526 s, 157.812 MB of memory in use
Info: Startup phase 0 took 6.19888e-05 s, 157.812 MB of memory in use
Info: ADDED 12209 IMPLICIT EXCLUSIONS
Info: Startup phase 1 took 0.00247598 s, 158.645 MB of memory in use
Info: NONBONDED TABLE R-SQUARED SPACING: 0.0625
Info: NONBONDED TABLE SIZE: 769 POINTS
Info: INCONSISTENCY IN FAST TABLE ENERGY VS FORCE: 0.000325096 AT 11.9556
Info: INCONSISTENCY IN SCOR TABLE ENERGY VS FORCE: 3.42196e-09 AT 0.0732877
Info: ABSOLUTE IMPRECISION IN VDWA TABLE ENERGY: 4.59334e-32 AT 11.9974
Info: RELATIVE IMPRECISION IN VDWA TABLE ENERGY: 7.4108e-17 AT 11.9974
Info: INCONSISTENCY IN VDWA TABLE ENERGY VS FORCE: 0.0040507 AT 0.251946
Info: ABSOLUTE IMPRECISION IN VDWB TABLE ENERGY: 1.53481e-26 AT 11.9974
Info: RELATIVE IMPRECISION IN VDWB TABLE ENERGY: 7.96691e-18 AT 11.9974
Info: INCONSISTENCY IN VDWB TABLE ENERGY VS FORCE: 0.00150189 AT 0.251946
Info: Startup phase 2 took 0.000729084 s, 159.418 MB of memory in use
Info: Startup phase 3 took 4.1008e-05 s, 159.418 MB of memory in use
Info: Startup phase 4 took 4.88758e-05 s, 159.418 MB of memory in use
Info: Startup phase 5 took 3.09944e-05 s, 159.418 MB of memory in use
Info: PATCH GRID IS 4 BY 4 BY 4
Info: PATCH GRID IS 1-AWAY BY 1-AWAY BY 1-AWAY
Info: REMOVING COM VELOCITY 0.0105965 0.0210536 -0.0289361
Info: LARGEST PATCH (26) HAS 434 ATOMS
Info: TORUS A SIZE 4 USING 0
Info: TORUS B SIZE 1 USING 0
Info: TORUS C SIZE 1 USING 0
Info: TORUS MINIMAL MESH SIZE IS 1 BY 1 BY 1
Info: Placed 100% of base nodes on same physical node as patch
Info: Startup phase 6 took 0.00190306 s, 160.766 MB of memory in use
Info: Startup phase 7 took 4.88758e-05 s, 160.766 MB of memory in use
Info: Startup phase 8 took 0.000332117 s, 161.078 MB of memory in use
LDB: Central LB being created...
Info: Startup phase 9 took 0.000120878 s, 161.078 MB of memory in use
Info: CREATING 1012 COMPUTE OBJECTS
Info: Updated CUDA force table with 4096 elements.
Info: Updated CUDA LJ table with 119 x 119 elements.
Info: Found 223 unique exclusion lists needing 632 bytes
Info: useSync: 0 useProxySync: 0
Info: Startup phase 10 took 0.00586009 s, 163.273 MB of memory in use
Info: Startup phase 11 took 5.4121e-05 s, 163.273 MB of memory in use
Info: Startup phase 12 took 0.000556946 s, 163.781 MB of memory in use
Info: Finished startup at 0.67579 s, 163.781 MB of memory in use

TCL: Minimizing for 100 steps
Segmentation fault (core dumped)
mwilson_at_dailymike:~/namd-tutorial-files/1-2-sphere$

It is not clear to me why it dumped core, and I’m not certain where to look to find more information. All of my searches in the NAMD-L archive and NamdWiki have not shown me anything that might explain my problem. Apologies if I somehow overlooked anything, but everyone at NMRbox will greatly appreciate advice that leads to a resolution that allows NAMD to function on NMRbox with multicore and CUDA support.

Thank you,

Michael Wilson

This archive was generated by hypermail 2.1.6 : Mon Dec 31 2018 - 23:20:26 CST