Re: NAMD 2.12 multicore CUDA on Ubuntu 16.04 with NVIDIA Tesla M10 produces segmentation fault

From: Wilson,Michael (michawilson_at_uchc.edu)
Date: Tue Jul 18 2017 - 15:18:28 CDT

That worked! Thank you!

> On Jul 18, 2017, at 3:10 PM, Vermaas, Joshua <Joshua.Vermaas_at_nrel.gov> wrote:
>
> Hi Michael,
>
> One thing I would try is to install a newer version from CVS. I have a
> similar configuration (NVIDIA GPU+Ubuntu), and multilevel summation
> (MSM) wasn't working for me on the GPU with 2.12, but was working based
> on a CVS build from the end of March, when the developers were alerted
> to the problem. One quick way to confirm this would be to edit the
> configuration to turn off MSM and use either PME or cutoff
> electrostatics. If that doesn't crash, CVS versions would be the way to go.
>
> -Josh
>
> On 07/18/2017 01:03 PM, Wilson,Michael wrote:
>> Hello,
>>
>> I am working on the development team for NMRbox (https://urldefense.proofpoint.com/v2/url?u=https-3A__na01.safelinks.protection.outlook.com_-3Furl-3Dhttp-253A-252F-252Fnmrbox.org-26data-3D02-257C01-257CJoshua.Vermaas-2540nrel.gov-257C3381852656d448a1e69b08d4ce0facb8-257Ca0f29d7e28cd4f5484427885aee7c080-257C0-257C0-257C636360014098638228-26sdata-3D-252ByfhRpr80I12DtGTrx-252FUZXE4LdFqb1diPld-252Fop05vNo-253D-26reserved-3D0&d=DwIDEA&c=EZxp_D7cDnouwj5YEFHgXuSKoUq2zVQZ_7Fw9yfotck&r=kFsowpehv_qV3-PJpjY9-bwecvmIOkvGmBDCAoh7LgE&m=4kXhIXSSTyyXUaRNJGAMh9IuweagX8JIEtfEURyQfuc&s=iRLQxHzQKh_BLh4Q0GYvBzS7zUBO2apmWpkpCNdoeyY&e= ). NMRbox is a cloud-based virtual machine loaded with NMR software. We would like to include NAMD with multicore and CUDA support on our platform. I have been struggling to get the right combination of GPU and CUDA drivers. I now have compatible versions that allow the graphical use of the GPU, but running NAMD with CUDA is generating a segmentation fault without meaningful error messages. I am
hopeful that someone on the list will be able to help recognize the problem and set me on the correct path to resolving the issue.
>>
>> Installation specifications:
>>
>> Physical Hardware:
>> Dell PowerEdge R730, Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz
>> VMWare ESXi 6.0 Hypervisor with 72 logical processors
>> NVIDIA Tesla M10 GPU
>>
>> Virtual Hardware:
>> 10 CPUs
>> 20 GB RAM
>> NVIDIA Tesla M10 GPU (passthrough mode)
>>
>> OS:
>> Ubuntu 16.04
>>
>> GPU Driver:
>> NVIDIA-Linux-x86_64-367.92.run
>>
>> CUDA Driver:
>> cuda-repo-ubuntu1604_8.0.61-1_amd64.deb
>>
>> Environment variables:
>>
>> mwilson_at_dailymike:~/namd-tutorial-files/1-2-sphere$ echo $PATH
>> /home/nmrbox/mwilson/NAMD_2.12_Linux-x86_64-multicore-CUDA:/usr/local/cuda-8.0/bin:/usr/software/bin:/usr/software/nmr-scripts:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/software/cns_solve_1.3/intel-x86_64bit-linux/bin:/usr/software/cns_solve_1.3/intel-x86_64bit-linux/utils:/usr/software/mddnmr/binCentOS64:/usr/software/mddnmr/com:/usr/software/nmrpipe/nmrbin.linux212_64:/usr/software/nmrpipe/com:/usr/software/nmrpipe/dynamo/tcl:/usr/software/redcraft/scripts:/usr/software/rosetta/main/source/bin:/usr/software/shifts-5.1/bin:/home/nmrbox/mwilson/bin
>>
>> mwilson_at_dailymike:~/namd-tutorial-files/1-2-sphere$ echo $LD_LIBRARY_PATH
>> /home/nmrbox/mwilson/NAMD_2.12_Linux-x86_64-multicore-CUDA:/home/nmrbox/mwilson/NAMD_2.12_Linux-x86_64-multicore-CUDA/lib:.:/usr/local/cuda-8.0/lib64:/usr/software/modeller-9.17/lib/x86_64-intel8:/usr/software/nmrpipe/nmrbin.linux212_64/lib
>>
>>
>> Prior to this point I had been getting various CUDA errors, but I was able to fix those. For want of a config file, I have been using one that I found in the tutorial, but this is an installation question, not a tutorial question. The config file works fine with the non-CUDA version of NAMD. The following is the output when I attempt to run:
>>
>> mwilson_at_dailymike:~/namd-tutorial-files/1-2-sphere$ namd2 +isomalloc_sync +idlepoll +p4 +devices 0 ubq_ws_eq.conf
>> Charm++: standalone mode (not using charmrun)
>> Charm++> Running in Multicore mode: 4 threads
>> Charm++> Using recursive bisection (scheme 3) for topology aware partitions
>> Converse/Charm++ Commit ID: v6.7.1-0-gbdf6a1b-namd-charm-6.7.1-build-2016-Nov-07-136676
>> Warning> Randomization of stack pointer is turned on in kernel.
>> Charm++> synchronizing isomalloc memory region...
>> [0] consolidated Isomalloc memory region: 0x440000000 - 0x7fa000000000 (133807104 megs)
>> CharmLB> Load balancer assumes all CPUs are same.
>> Charm++> Running on 1 unique compute nodes (10-way SMP).
>> Charm++> cpu topology info is gathered in 0.007 seconds.
>> Info: Built with CUDA version 6050
>> Pe 1 physical rank 1 will use CUDA device of pe 2
>> Pe 0 physical rank 0 will use CUDA device of pe 2
>> Info: NAMD 2.12 for Linux-x86_64-multicore-CUDA
>> Info:
>> Info: Please visit https://urldefense.proofpoint.com/v2/url?u=https-3A__na01.safelinks.protection.outlook.com_-3Furl-3Dhttp-253A-252F-252Fwww.ks.uiuc.edu-252FResearch-252Fnamd-252F-26data-3D02-257C01-257CJoshua.Vermaas-2540nrel.gov-257C3381852656d448a1e69b08d4ce0facb8-257Ca0f29d7e28cd4f5484427885aee7c080-257C0-257C0-257C636360014098638228-26sdata-3DxtKGtpwAcS0BydeH2Xqc2LttVDB6uJWZivnciolBnwY-253D-26reserved-3D0&d=DwIDEA&c=EZxp_D7cDnouwj5YEFHgXuSKoUq2zVQZ_7Fw9yfotck&r=kFsowpehv_qV3-PJpjY9-bwecvmIOkvGmBDCAoh7LgE&m=4kXhIXSSTyyXUaRNJGAMh9IuweagX8JIEtfEURyQfuc&s=5oWiks7gOvJXfh-B1_etkyFj3a_wK4onWaSnQAtGLgI&e=
>> Info: for updates, documentation, and support information.
>> Info:
>> Info: Please cite Phillips et al., J. Comp. Chem. 26:1781-1802 (2005)
>> Info: in all publications reporting results obtained with NAMD.
>> Info:
>> Info: Based on Charm++/Converse 60701 for multicore-linux64-iccstatic
>> Info: Built Wed Dec 21 11:34:15 CST 2016 by jim on harare.ks.uiuc.edu
>> Info: 1 NAMD 2.12 Linux-x86_64-multicore-CUDA 4 dailymike mwilson
>> Info: Running on 4 processors, 1 nodes, 1 physical nodes.
>> Info: CPU topology information available.
>> Info: Charm++/Converse parallel runtime startup completed at 0.325034 s
>> Pe 3 physical rank 3 will use CUDA device of pe 2
>> Pe 2 physical rank 2 binding to CUDA device 0 on dailymike: 'Tesla M10' Mem: 8127MB Rev: 5.0
>> CkLoopLib is used in SMP with a simple dynamic scheduling (converse-level notification) but not using node-level queue
>> Info: 21.6836 MB of memory in use based on /proc/self/stat
>> Info: Configuration file is ubq_ws_eq.conf
>> Info: Working in the current directory /home/nmrbox/mwilson/namd-tutorial-files/1-2-sphere
>> TCL: Suspending until startup complete.
>> Info: SIMULATION PARAMETERS:
>> Info: TIMESTEP 2
>> Info: NUMBER OF STEPS 0
>> Info: STEPS PER CYCLE 10
>> Info: LOAD BALANCER Centralized
>> Info: LOAD BALANCING STRATEGY New Load Balancers -- DEFAULT
>> Info: LDB PERIOD 2000 steps
>> Info: FIRST LDB TIMESTEP 50
>> Info: LAST LDB TIMESTEP -1
>> Info: LDB BACKGROUND SCALING 1
>> Info: HOM BACKGROUND SCALING 1
>> Info: MIN ATOMS PER PATCH 40
>> Info: INITIAL TEMPERATURE 310
>> Info: CENTER OF MASS MOVING INITIALLY? NO
>> Info: DIELECTRIC 1
>> Info: EXCLUDE SCALED ONE-FOUR
>> Info: 1-4 ELECTROSTATICS SCALED BY 1
>> Info: MODIFIED 1-4 VDW PARAMETERS WILL BE USED
>> Info: DCD FILENAME ubq_ws_eq.dcd
>> Info: DCD FREQUENCY 250
>> Info: DCD FIRST STEP 250
>> Info: NO EXTENDED SYSTEM TRAJECTORY OUTPUT
>> Info: NO VELOCITY DCD OUTPUT
>> Info: NO FORCE DCD OUTPUT
>> Info: OUTPUT FILENAME ubq_ws_eq
>> Info: BINARY OUTPUT FILES WILL BE USED
>> Info: RESTART FILENAME ubq_ws_eq.restart
>> Info: RESTART FREQUENCY 500
>> Info: BINARY RESTART FILES WILL BE USED
>> Info: SWITCHING ACTIVE
>> Info: SWITCHING ON 10
>> Info: SWITCHING OFF 12
>> Info: PAIRLIST DISTANCE 14
>> Info: PAIRLIST SHRINK RATE 0.01
>> Info: PAIRLIST GROW RATE 0.01
>> Info: PAIRLIST TRIGGER 0.3
>> Info: PAIRLISTS PER CYCLE 2
>> Info: PAIRLISTS ENABLED
>> Info: MARGIN 0
>> Info: HYDROGEN GROUP CUTOFF 2.5
>> Info: PATCH DIMENSION 16.5
>> Info: ENERGY OUTPUT STEPS 100
>> Info: CROSSTERM ENERGY INCLUDED IN DIHEDRAL
>> Info: TIMING OUTPUT STEPS 1000
>> Info: SPHERICAL BOUNDARY CONDITIONS ACTIVE
>> Info: RADIUS #1 26
>> Info: FORCE CONSTANT #1 10
>> Info: EXPONENT #1 2
>> Info: SPHERE BOUNDARY CENTER(30.3082, 28.805, 15.354)
>> Info: LANGEVIN DYNAMICS ACTIVE
>> Info: LANGEVIN TEMPERATURE 310
>> Info: LANGEVIN USING BBK INTEGRATOR
>> Info: LANGEVIN DAMPING COEFFICIENT IS 1 INVERSE PS
>> Info: LANGEVIN DYNAMICS NOT APPLIED TO HYDROGENS
>> Info: MULTILEVEL SUMMATION METHOD (MSM) FOR ELECTROSTATICS ACTIVE
>> Info: MSM WITH C1 CUBIC INTERPOLATION AND C2 TAYLOR SPLITTING
>> Info: FULL ELECTROSTATIC EVALUATION FREQUENCY 2
>> Info: USING VERLET I (r-RESPA) MTS SCHEME.
>> Info: C1 SPLITTING OF LONG RANGE ELECTROSTATICS
>> Info: PLACING ATOMS IN PATCHES BY HYDROGEN GROUPS
>> Info: RIGID BONDS TO HYDROGEN : ALL
>> Info: ERROR TOLERANCE : 1e-08
>> Info: MAX ITERATIONS : 100
>> Info: RIGID WATER USING SETTLE ALGORITHM
>> Info: RANDOM NUMBER SEED 1500400344
>> Info: USE HYDROGEN BONDS? NO
>> Info: COORDINATE PDB ../common/ubq_ws.pdb
>> Info: STRUCTURE FILE ../common/ubq_ws.psf
>> Info: PARAMETER file: CHARMM format!
>> Info: PARAMETERS ../common/par_all27_prot_lipid.inp
>> Info: USING ARITHMETIC MEAN TO COMBINE L-J SIGMA PARAMETERS
>> Info: SUMMARY OF PARAMETERS:
>> Info: 180 BONDS
>> Info: 447 ANGLES
>> Info: 566 DIHEDRAL
>> Info: 46 IMPROPER
>> Info: 6 CROSSTERM
>> Info: 119 VDW
>> Info: 0 VDW_PAIRS
>> Info: 0 NBTHOLE_PAIRS
>> Info: TIME FOR READING PSF FILE: 0.0478668
>> Info: Reading pdb file ../common/ubq_ws.pdb
>> Info: TIME FOR READING PDB FILE: 0.00912404
>> Info:
>> Info: ****************************
>> Info: STRUCTURE SUMMARY:
>> Info: 6682 ATOMS
>> Info: 4871 BONDS
>> Info: 4074 ANGLES
>> Info: 3293 DIHEDRALS
>> Info: 204 IMPROPERS
>> Info: 74 CROSSTERMS
>> Info: 0 EXCLUSIONS
>> Info: 6080 RIGID BONDS
>> Info: 13966 DEGREES OF FREEDOM
>> Info: 2419 HYDROGEN GROUPS
>> Info: 4 ATOMS IN LARGEST HYDROGEN GROUP
>> Info: 2419 MIGRATION GROUPS
>> Info: 4 ATOMS IN LARGEST MIGRATION GROUP
>> Info: TOTAL MASS = 41298.8 amu
>> Info: TOTAL CHARGE = 1.00955e-06 e
>> Info: *****************************
>> Info:
>> Info: Entering startup at 0.663526 s, 157.812 MB of memory in use
>> Info: Startup phase 0 took 6.19888e-05 s, 157.812 MB of memory in use
>> Info: ADDED 12209 IMPLICIT EXCLUSIONS
>> Info: Startup phase 1 took 0.00247598 s, 158.645 MB of memory in use
>> Info: NONBONDED TABLE R-SQUARED SPACING: 0.0625
>> Info: NONBONDED TABLE SIZE: 769 POINTS
>> Info: INCONSISTENCY IN FAST TABLE ENERGY VS FORCE: 0.000325096 AT 11.9556
>> Info: INCONSISTENCY IN SCOR TABLE ENERGY VS FORCE: 3.42196e-09 AT 0.0732877
>> Info: ABSOLUTE IMPRECISION IN VDWA TABLE ENERGY: 4.59334e-32 AT 11.9974
>> Info: RELATIVE IMPRECISION IN VDWA TABLE ENERGY: 7.4108e-17 AT 11.9974
>> Info: INCONSISTENCY IN VDWA TABLE ENERGY VS FORCE: 0.0040507 AT 0.251946
>> Info: ABSOLUTE IMPRECISION IN VDWB TABLE ENERGY: 1.53481e-26 AT 11.9974
>> Info: RELATIVE IMPRECISION IN VDWB TABLE ENERGY: 7.96691e-18 AT 11.9974
>> Info: INCONSISTENCY IN VDWB TABLE ENERGY VS FORCE: 0.00150189 AT 0.251946
>> Info: Startup phase 2 took 0.000729084 s, 159.418 MB of memory in use
>> Info: Startup phase 3 took 4.1008e-05 s, 159.418 MB of memory in use
>> Info: Startup phase 4 took 4.88758e-05 s, 159.418 MB of memory in use
>> Info: Startup phase 5 took 3.09944e-05 s, 159.418 MB of memory in use
>> Info: PATCH GRID IS 4 BY 4 BY 4
>> Info: PATCH GRID IS 1-AWAY BY 1-AWAY BY 1-AWAY
>> Info: REMOVING COM VELOCITY 0.0105965 0.0210536 -0.0289361
>> Info: LARGEST PATCH (26) HAS 434 ATOMS
>> Info: TORUS A SIZE 4 USING 0
>> Info: TORUS B SIZE 1 USING 0
>> Info: TORUS C SIZE 1 USING 0
>> Info: TORUS MINIMAL MESH SIZE IS 1 BY 1 BY 1
>> Info: Placed 100% of base nodes on same physical node as patch
>> Info: Startup phase 6 took 0.00190306 s, 160.766 MB of memory in use
>> Info: Startup phase 7 took 4.88758e-05 s, 160.766 MB of memory in use
>> Info: Startup phase 8 took 0.000332117 s, 161.078 MB of memory in use
>> LDB: Central LB being created...
>> Info: Startup phase 9 took 0.000120878 s, 161.078 MB of memory in use
>> Info: CREATING 1012 COMPUTE OBJECTS
>> Info: Updated CUDA force table with 4096 elements.
>> Info: Updated CUDA LJ table with 119 x 119 elements.
>> Info: Found 223 unique exclusion lists needing 632 bytes
>> Info: useSync: 0 useProxySync: 0
>> Info: Startup phase 10 took 0.00586009 s, 163.273 MB of memory in use
>> Info: Startup phase 11 took 5.4121e-05 s, 163.273 MB of memory in use
>> Info: Startup phase 12 took 0.000556946 s, 163.781 MB of memory in use
>> Info: Finished startup at 0.67579 s, 163.781 MB of memory in use
>>
>> TCL: Minimizing for 100 steps
>> Segmentation fault (core dumped)
>> mwilson_at_dailymike:~/namd-tutorial-files/1-2-sphere$
>>
>>
>> It is not clear to me why it dumped core, and I’m not certain where to look to find more information. All of my searches in the NAMD-L archive and NamdWiki have not shown me anything that might explain my problem. Apologies if I somehow overlooked anything, but everyone at NMRbox will greatly appreciate advice that leads to a resolution that allows NAMD to function on NMRbox with multicore and CUDA support.
>>
>> Thank you,
>>
>> Michael Wilson
>>
>>
>>
>>
>>
>
>

This archive was generated by hypermail 2.1.6 : Mon Dec 31 2018 - 23:20:26 CST