Re: NAMD 2.12 multicore CUDA on Ubuntu 16.04 with NVIDIA Tesla M10 produces segmentation fault

From: Vermaas, Joshua (Joshua.Vermaas_at_nrel.gov)
Date: Tue Jul 18 2017 - 14:10:20 CDT

Hi Michael,

One thing I would try is to install a newer version from CVS. I have a
similar configuration (NVIDIA GPU+Ubuntu), and multilevel summation
(MSM) wasn't working for me on the GPU with 2.12, but was working based
on a CVS build from the end of March, when the developers were alerted
to the problem. One quick way to confirm this would be to edit the
configuration to turn off MSM and use either PME or cutoff
electrostatics. If that doesn't crash, CVS versions would be the way to go.

-Josh

On 07/18/2017 01:03 PM, Wilson,Michael wrote:
> Hello,
>
> I am working on the development team for NMRbox (https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fnmrbox.org&data=02%7C01%7CJoshua.Vermaas%40nrel.gov%7C3381852656d448a1e69b08d4ce0facb8%7Ca0f29d7e28cd4f5484427885aee7c080%7C0%7C0%7C636360014098638228&sdata=%2ByfhRpr80I12DtGTrx%2FUZXE4LdFqb1diPld%2Fop05vNo%3D&reserved=0). NMRbox is a cloud-based virtual machine loaded with NMR software. We would like to include NAMD with multicore and CUDA support on our platform. I have been struggling to get the right combination of GPU and CUDA drivers. I now have compatible versions that allow the graphical use of the GPU, but running NAMD with CUDA is generating a segmentation fault without meaningful error messages. I am hopeful that someone on the list will be able to help recognize the problem and set me on the correct path to resolving the issue.
>
> Installation specifications:
>
> Physical Hardware:
> Dell PowerEdge R730, Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz
> VMWare ESXi 6.0 Hypervisor with 72 logical processors
> NVIDIA Tesla M10 GPU
>
> Virtual Hardware:
> 10 CPUs
> 20 GB RAM
> NVIDIA Tesla M10 GPU (passthrough mode)
>
> OS:
> Ubuntu 16.04
>
> GPU Driver:
> NVIDIA-Linux-x86_64-367.92.run
>
> CUDA Driver:
> cuda-repo-ubuntu1604_8.0.61-1_amd64.deb
>
> Environment variables:
>
> mwilson_at_dailymike:~/namd-tutorial-files/1-2-sphere$ echo $PATH
> /home/nmrbox/mwilson/NAMD_2.12_Linux-x86_64-multicore-CUDA:/usr/local/cuda-8.0/bin:/usr/software/bin:/usr/software/nmr-scripts:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/software/cns_solve_1.3/intel-x86_64bit-linux/bin:/usr/software/cns_solve_1.3/intel-x86_64bit-linux/utils:/usr/software/mddnmr/binCentOS64:/usr/software/mddnmr/com:/usr/software/nmrpipe/nmrbin.linux212_64:/usr/software/nmrpipe/com:/usr/software/nmrpipe/dynamo/tcl:/usr/software/redcraft/scripts:/usr/software/rosetta/main/source/bin:/usr/software/shifts-5.1/bin:/home/nmrbox/mwilson/bin
>
> mwilson_at_dailymike:~/namd-tutorial-files/1-2-sphere$ echo $LD_LIBRARY_PATH
> /home/nmrbox/mwilson/NAMD_2.12_Linux-x86_64-multicore-CUDA:/home/nmrbox/mwilson/NAMD_2.12_Linux-x86_64-multicore-CUDA/lib:.:/usr/local/cuda-8.0/lib64:/usr/software/modeller-9.17/lib/x86_64-intel8:/usr/software/nmrpipe/nmrbin.linux212_64/lib
>
>
> Prior to this point I had been getting various CUDA errors, but I was able to fix those. For want of a config file, I have been using one that I found in the tutorial, but this is an installation question, not a tutorial question. The config file works fine with the non-CUDA version of NAMD. The following is the output when I attempt to run:
>
> mwilson_at_dailymike:~/namd-tutorial-files/1-2-sphere$ namd2 +isomalloc_sync +idlepoll +p4 +devices 0 ubq_ws_eq.conf
> Charm++: standalone mode (not using charmrun)
> Charm++> Running in Multicore mode: 4 threads
> Charm++> Using recursive bisection (scheme 3) for topology aware partitions
> Converse/Charm++ Commit ID: v6.7.1-0-gbdf6a1b-namd-charm-6.7.1-build-2016-Nov-07-136676
> Warning> Randomization of stack pointer is turned on in kernel.
> Charm++> synchronizing isomalloc memory region...
> [0] consolidated Isomalloc memory region: 0x440000000 - 0x7fa000000000 (133807104 megs)
> CharmLB> Load balancer assumes all CPUs are same.
> Charm++> Running on 1 unique compute nodes (10-way SMP).
> Charm++> cpu topology info is gathered in 0.007 seconds.
> Info: Built with CUDA version 6050
> Pe 1 physical rank 1 will use CUDA device of pe 2
> Pe 0 physical rank 0 will use CUDA device of pe 2
> Info: NAMD 2.12 for Linux-x86_64-multicore-CUDA
> Info:
> Info: Please visit https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.ks.uiuc.edu%2FResearch%2Fnamd%2F&data=02%7C01%7CJoshua.Vermaas%40nrel.gov%7C3381852656d448a1e69b08d4ce0facb8%7Ca0f29d7e28cd4f5484427885aee7c080%7C0%7C0%7C636360014098638228&sdata=xtKGtpwAcS0BydeH2Xqc2LttVDB6uJWZivnciolBnwY%3D&reserved=0
> Info: for updates, documentation, and support information.
> Info:
> Info: Please cite Phillips et al., J. Comp. Chem. 26:1781-1802 (2005)
> Info: in all publications reporting results obtained with NAMD.
> Info:
> Info: Based on Charm++/Converse 60701 for multicore-linux64-iccstatic
> Info: Built Wed Dec 21 11:34:15 CST 2016 by jim on harare.ks.uiuc.edu
> Info: 1 NAMD 2.12 Linux-x86_64-multicore-CUDA 4 dailymike mwilson
> Info: Running on 4 processors, 1 nodes, 1 physical nodes.
> Info: CPU topology information available.
> Info: Charm++/Converse parallel runtime startup completed at 0.325034 s
> Pe 3 physical rank 3 will use CUDA device of pe 2
> Pe 2 physical rank 2 binding to CUDA device 0 on dailymike: 'Tesla M10' Mem: 8127MB Rev: 5.0
> CkLoopLib is used in SMP with a simple dynamic scheduling (converse-level notification) but not using node-level queue
> Info: 21.6836 MB of memory in use based on /proc/self/stat
> Info: Configuration file is ubq_ws_eq.conf
> Info: Working in the current directory /home/nmrbox/mwilson/namd-tutorial-files/1-2-sphere
> TCL: Suspending until startup complete.
> Info: SIMULATION PARAMETERS:
> Info: TIMESTEP 2
> Info: NUMBER OF STEPS 0
> Info: STEPS PER CYCLE 10
> Info: LOAD BALANCER Centralized
> Info: LOAD BALANCING STRATEGY New Load Balancers -- DEFAULT
> Info: LDB PERIOD 2000 steps
> Info: FIRST LDB TIMESTEP 50
> Info: LAST LDB TIMESTEP -1
> Info: LDB BACKGROUND SCALING 1
> Info: HOM BACKGROUND SCALING 1
> Info: MIN ATOMS PER PATCH 40
> Info: INITIAL TEMPERATURE 310
> Info: CENTER OF MASS MOVING INITIALLY? NO
> Info: DIELECTRIC 1
> Info: EXCLUDE SCALED ONE-FOUR
> Info: 1-4 ELECTROSTATICS SCALED BY 1
> Info: MODIFIED 1-4 VDW PARAMETERS WILL BE USED
> Info: DCD FILENAME ubq_ws_eq.dcd
> Info: DCD FREQUENCY 250
> Info: DCD FIRST STEP 250
> Info: NO EXTENDED SYSTEM TRAJECTORY OUTPUT
> Info: NO VELOCITY DCD OUTPUT
> Info: NO FORCE DCD OUTPUT
> Info: OUTPUT FILENAME ubq_ws_eq
> Info: BINARY OUTPUT FILES WILL BE USED
> Info: RESTART FILENAME ubq_ws_eq.restart
> Info: RESTART FREQUENCY 500
> Info: BINARY RESTART FILES WILL BE USED
> Info: SWITCHING ACTIVE
> Info: SWITCHING ON 10
> Info: SWITCHING OFF 12
> Info: PAIRLIST DISTANCE 14
> Info: PAIRLIST SHRINK RATE 0.01
> Info: PAIRLIST GROW RATE 0.01
> Info: PAIRLIST TRIGGER 0.3
> Info: PAIRLISTS PER CYCLE 2
> Info: PAIRLISTS ENABLED
> Info: MARGIN 0
> Info: HYDROGEN GROUP CUTOFF 2.5
> Info: PATCH DIMENSION 16.5
> Info: ENERGY OUTPUT STEPS 100
> Info: CROSSTERM ENERGY INCLUDED IN DIHEDRAL
> Info: TIMING OUTPUT STEPS 1000
> Info: SPHERICAL BOUNDARY CONDITIONS ACTIVE
> Info: RADIUS #1 26
> Info: FORCE CONSTANT #1 10
> Info: EXPONENT #1 2
> Info: SPHERE BOUNDARY CENTER(30.3082, 28.805, 15.354)
> Info: LANGEVIN DYNAMICS ACTIVE
> Info: LANGEVIN TEMPERATURE 310
> Info: LANGEVIN USING BBK INTEGRATOR
> Info: LANGEVIN DAMPING COEFFICIENT IS 1 INVERSE PS
> Info: LANGEVIN DYNAMICS NOT APPLIED TO HYDROGENS
> Info: MULTILEVEL SUMMATION METHOD (MSM) FOR ELECTROSTATICS ACTIVE
> Info: MSM WITH C1 CUBIC INTERPOLATION AND C2 TAYLOR SPLITTING
> Info: FULL ELECTROSTATIC EVALUATION FREQUENCY 2
> Info: USING VERLET I (r-RESPA) MTS SCHEME.
> Info: C1 SPLITTING OF LONG RANGE ELECTROSTATICS
> Info: PLACING ATOMS IN PATCHES BY HYDROGEN GROUPS
> Info: RIGID BONDS TO HYDROGEN : ALL
> Info: ERROR TOLERANCE : 1e-08
> Info: MAX ITERATIONS : 100
> Info: RIGID WATER USING SETTLE ALGORITHM
> Info: RANDOM NUMBER SEED 1500400344
> Info: USE HYDROGEN BONDS? NO
> Info: COORDINATE PDB ../common/ubq_ws.pdb
> Info: STRUCTURE FILE ../common/ubq_ws.psf
> Info: PARAMETER file: CHARMM format!
> Info: PARAMETERS ../common/par_all27_prot_lipid.inp
> Info: USING ARITHMETIC MEAN TO COMBINE L-J SIGMA PARAMETERS
> Info: SUMMARY OF PARAMETERS:
> Info: 180 BONDS
> Info: 447 ANGLES
> Info: 566 DIHEDRAL
> Info: 46 IMPROPER
> Info: 6 CROSSTERM
> Info: 119 VDW
> Info: 0 VDW_PAIRS
> Info: 0 NBTHOLE_PAIRS
> Info: TIME FOR READING PSF FILE: 0.0478668
> Info: Reading pdb file ../common/ubq_ws.pdb
> Info: TIME FOR READING PDB FILE: 0.00912404
> Info:
> Info: ****************************
> Info: STRUCTURE SUMMARY:
> Info: 6682 ATOMS
> Info: 4871 BONDS
> Info: 4074 ANGLES
> Info: 3293 DIHEDRALS
> Info: 204 IMPROPERS
> Info: 74 CROSSTERMS
> Info: 0 EXCLUSIONS
> Info: 6080 RIGID BONDS
> Info: 13966 DEGREES OF FREEDOM
> Info: 2419 HYDROGEN GROUPS
> Info: 4 ATOMS IN LARGEST HYDROGEN GROUP
> Info: 2419 MIGRATION GROUPS
> Info: 4 ATOMS IN LARGEST MIGRATION GROUP
> Info: TOTAL MASS = 41298.8 amu
> Info: TOTAL CHARGE = 1.00955e-06 e
> Info: *****************************
> Info:
> Info: Entering startup at 0.663526 s, 157.812 MB of memory in use
> Info: Startup phase 0 took 6.19888e-05 s, 157.812 MB of memory in use
> Info: ADDED 12209 IMPLICIT EXCLUSIONS
> Info: Startup phase 1 took 0.00247598 s, 158.645 MB of memory in use
> Info: NONBONDED TABLE R-SQUARED SPACING: 0.0625
> Info: NONBONDED TABLE SIZE: 769 POINTS
> Info: INCONSISTENCY IN FAST TABLE ENERGY VS FORCE: 0.000325096 AT 11.9556
> Info: INCONSISTENCY IN SCOR TABLE ENERGY VS FORCE: 3.42196e-09 AT 0.0732877
> Info: ABSOLUTE IMPRECISION IN VDWA TABLE ENERGY: 4.59334e-32 AT 11.9974
> Info: RELATIVE IMPRECISION IN VDWA TABLE ENERGY: 7.4108e-17 AT 11.9974
> Info: INCONSISTENCY IN VDWA TABLE ENERGY VS FORCE: 0.0040507 AT 0.251946
> Info: ABSOLUTE IMPRECISION IN VDWB TABLE ENERGY: 1.53481e-26 AT 11.9974
> Info: RELATIVE IMPRECISION IN VDWB TABLE ENERGY: 7.96691e-18 AT 11.9974
> Info: INCONSISTENCY IN VDWB TABLE ENERGY VS FORCE: 0.00150189 AT 0.251946
> Info: Startup phase 2 took 0.000729084 s, 159.418 MB of memory in use
> Info: Startup phase 3 took 4.1008e-05 s, 159.418 MB of memory in use
> Info: Startup phase 4 took 4.88758e-05 s, 159.418 MB of memory in use
> Info: Startup phase 5 took 3.09944e-05 s, 159.418 MB of memory in use
> Info: PATCH GRID IS 4 BY 4 BY 4
> Info: PATCH GRID IS 1-AWAY BY 1-AWAY BY 1-AWAY
> Info: REMOVING COM VELOCITY 0.0105965 0.0210536 -0.0289361
> Info: LARGEST PATCH (26) HAS 434 ATOMS
> Info: TORUS A SIZE 4 USING 0
> Info: TORUS B SIZE 1 USING 0
> Info: TORUS C SIZE 1 USING 0
> Info: TORUS MINIMAL MESH SIZE IS 1 BY 1 BY 1
> Info: Placed 100% of base nodes on same physical node as patch
> Info: Startup phase 6 took 0.00190306 s, 160.766 MB of memory in use
> Info: Startup phase 7 took 4.88758e-05 s, 160.766 MB of memory in use
> Info: Startup phase 8 took 0.000332117 s, 161.078 MB of memory in use
> LDB: Central LB being created...
> Info: Startup phase 9 took 0.000120878 s, 161.078 MB of memory in use
> Info: CREATING 1012 COMPUTE OBJECTS
> Info: Updated CUDA force table with 4096 elements.
> Info: Updated CUDA LJ table with 119 x 119 elements.
> Info: Found 223 unique exclusion lists needing 632 bytes
> Info: useSync: 0 useProxySync: 0
> Info: Startup phase 10 took 0.00586009 s, 163.273 MB of memory in use
> Info: Startup phase 11 took 5.4121e-05 s, 163.273 MB of memory in use
> Info: Startup phase 12 took 0.000556946 s, 163.781 MB of memory in use
> Info: Finished startup at 0.67579 s, 163.781 MB of memory in use
>
> TCL: Minimizing for 100 steps
> Segmentation fault (core dumped)
> mwilson_at_dailymike:~/namd-tutorial-files/1-2-sphere$
>
>
> It is not clear to me why it dumped core, and I’m not certain where to look to find more information. All of my searches in the NAMD-L archive and NamdWiki have not shown me anything that might explain my problem. Apologies if I somehow overlooked anything, but everyone at NMRbox will greatly appreciate advice that leads to a resolution that allows NAMD to function on NMRbox with multicore and CUDA support.
>
> Thank you,
>
> Michael Wilson
>
>
>
>
>

This archive was generated by hypermail 2.1.6 : Mon Dec 31 2018 - 23:20:26 CST