Re: Segmentation fault for the tutorial

From: Vermaas, Joshua (Joshua.Vermaas_at_nrel.gov)
Date: Tue Mar 06 2018 - 14:27:43 CST

Hi Mahmood,

I've built it before myself on CUDA, so I know it works for some value
of compiler/runtime options. The usual things to check when the CPU
build works but the GPU doesn't are things like the output of nvidia-smi
on the machine, to make sure that the driver version matches the runtime
version, and that both versions are recent enough for the CUDA build you
have built against.

This is what the header looks like for me:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.111 Driver Version:
384.111 |
|-------------------------------+----------------------+----------------------+

Which confirms that the driver (right) matches the installed runtime
(left). For me, this driver version would not be enough to run CUDA 9.0,
since that requires a minimum driver version of 390 (I think).

-Josh

On 03/06/2018 01:03 PM, Mahmood Naderan wrote:
> Update:
> In the previous post, I built namd with cuda using the following command
>
> ./config Linux-x86_64-g++ --charm-arch multicore-linux64
> --with-cuda --cuda-prefix /usr/local/cuda-9.0 --with-tcl --tcl-prefix
> /home/mahmood/namd-2.12/NAMD_2.12_Source/tcl
>
> That failed as I said. Then I dropped the cuda and ran
>
> ./config Linux-x86_64-g++ --charm-arch multicore-linux64
> --with-tcl --tcl-prefix /home/mahmood/namd-2.12/NAMD_2.12_Source/tcl
>
> Now the following command
>
> namd2 ubq_ws_eq.conf
>
> works without any error. So, it seems that there is a problem with the
> cuda option. Any thought?
> Regards,
> Mahmood
>
>
>
>
> On Tue, Mar 6, 2018 at 10:36 PM, Mahmood Naderan <mahmood.nt_at_gmail.com> wrote:
>> Hi
>> I followed the tutorial up to [1]. I also downloaded the files and
>> build the necessary file with vmd according to that toturial. namd2 is
>> also installed with cuda and tcl-threaded and it is ok. However the
>> following command fails.
>>
>> May I know how to debug more?
>>
>>
>> mahmood_at_orca:1-1-build$ namd2 ubq_ws_eq.conf
>> Charm++: standalone mode (not using charmrun)
>> Charm++> Running in Multicore mode: 1 threads
>> Charm++> Using recursive bisection (scheme 3) for topology aware partitions
>> Converse/Charm++ Commit ID:
>> v6.7.1-0-gbdf6a1b-namd-charm-6.7.1-build-2016-Nov-07-136676
>> Warning> Randomization of stack pointer is turned on in kernel, thread
>> migration may not work! Run 'echo 0 >
>> /proc/sys/kernel/randomize_va_space' as root to disable it, or try run
>> with '+isomalloc_sync'.
>> CharmLB> Load balancer assumes all CPUs are same.
>> Charm++> Running on 1 unique compute nodes (16-way SMP).
>> Charm++> cpu topology info is gathered in 0.000 seconds.
>> Info: Built with CUDA version 9000
>> Did not find +devices i,j,k,... argument, using all
>> Pe 0 physical rank 0 binding to CUDA device 0 on orca: 'Quadro M2000'
>> Mem: 4035MB Rev: 5.2
>> Info: NAMD 2.12 for Linux-x86_64-multicore-CUDA
>> Info:
>> Info: Please visit https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.ks.uiuc.edu%2FResearch%2Fnamd%2F&data=02%7C01%7CJoshua.Vermaas%40nrel.gov%7C680e2868e0cd409edf5f08d5839d57e3%7Ca0f29d7e28cd4f5484427885aee7c080%7C0%7C0%7C636559634166248526&sdata=yBG5g16xeLiaCnGjxVHrGZVX6lfwqMTJUJ%2FhlqvVdp4%3D&reserved=0
>> Info: for updates, documentation, and support information.
>> Info:
>> Info: Please cite Phillips et al., J. Comp. Chem. 26:1781-1802 (2005)
>> Info: in all publications reporting results obtained with NAMD.
>> Info:
>> Info: Based on Charm++/Converse 60701 for multicore-linux64
>> Info: Built Tue Mar 6 22:26:21 +0330 2018 by mahmood on orca
>> Info: 1 NAMD 2.12 Linux-x86_64-multicore-CUDA 1 orca mahmood
>> Info: Running on 1 processors, 1 nodes, 1 physical nodes.
>> Info: CPU topology information available.
>> Info: Charm++/Converse parallel runtime startup completed at 0.400351 s
>> CkLoopLib is used in SMP with a simple dynamic scheduling
>> (converse-level notification) but not using node-level queue
>> Info: 356.035 MB of memory in use based on /proc/self/stat
>> Info: Configuration file is ubq_ws_eq.conf
>> Info: Working in the current directory /home/mahmood/namd-2.12/bench/1-1-build
>> TCL: Suspending until startup complete.
>> Info: SIMULATION PARAMETERS:
>> Info: TIMESTEP 2
>> Info: NUMBER OF STEPS 0
>> Info: STEPS PER CYCLE 10
>> Info: LOAD BALANCER Centralized
>> Info: LOAD BALANCING STRATEGY New Load Balancers -- DEFAULT
>> Info: LDB PERIOD 2000 steps
>> Info: FIRST LDB TIMESTEP 50
>> Info: LAST LDB TIMESTEP -1
>> Info: LDB BACKGROUND SCALING 1
>> Info: HOM BACKGROUND SCALING 1
>> Info: MIN ATOMS PER PATCH 40
>> Info: INITIAL TEMPERATURE 310
>> Info: CENTER OF MASS MOVING INITIALLY? NO
>> Info: DIELECTRIC 1
>> Info: EXCLUDE SCALED ONE-FOUR
>> Info: 1-4 ELECTROSTATICS SCALED BY 1
>> Info: MODIFIED 1-4 VDW PARAMETERS WILL BE USED
>> Info: DCD FILENAME ubq_ws_eq.dcd
>> Info: DCD FREQUENCY 250
>> Info: DCD FIRST STEP 250
>> Info: NO EXTENDED SYSTEM TRAJECTORY OUTPUT
>> Info: NO VELOCITY DCD OUTPUT
>> Info: NO FORCE DCD OUTPUT
>> Info: OUTPUT FILENAME ubq_ws_eq
>> Info: BINARY OUTPUT FILES WILL BE USED
>> Info: RESTART FILENAME ubq_ws_eq.restart
>> Info: RESTART FREQUENCY 500
>> Info: BINARY RESTART FILES WILL BE USED
>> Info: SWITCHING ACTIVE
>> Info: SWITCHING ON 10
>> Info: SWITCHING OFF 12
>> Info: PAIRLIST DISTANCE 14
>> Info: PAIRLIST SHRINK RATE 0.01
>> Info: PAIRLIST GROW RATE 0.01
>> Info: PAIRLIST TRIGGER 0.3
>> Info: PAIRLISTS PER CYCLE 2
>> Info: PAIRLISTS ENABLED
>> Info: MARGIN 0
>> Info: HYDROGEN GROUP CUTOFF 2.5
>> Info: PATCH DIMENSION 16.5
>> Info: ENERGY OUTPUT STEPS 100
>> Info: CROSSTERM ENERGY INCLUDED IN DIHEDRAL
>> Info: TIMING OUTPUT STEPS 1000
>> Info: SPHERICAL BOUNDARY CONDITIONS ACTIVE
>> Info: RADIUS #1 26
>> Info: FORCE CONSTANT #1 10
>> Info: EXPONENT #1 2
>> Info: SPHERE BOUNDARY CENTER(30.3082, 28.805, 15.354)
>> Info: LANGEVIN DYNAMICS ACTIVE
>> Info: LANGEVIN TEMPERATURE 310
>> Info: LANGEVIN USING BBK INTEGRATOR
>> Info: LANGEVIN DAMPING COEFFICIENT IS 1 INVERSE PS
>> Info: LANGEVIN DYNAMICS NOT APPLIED TO HYDROGENS
>> Info: MULTILEVEL SUMMATION METHOD (MSM) FOR ELECTROSTATICS ACTIVE
>> Info: MSM WITH C1 CUBIC INTERPOLATION AND C2 TAYLOR SPLITTING
>> Info: FULL ELECTROSTATIC EVALUATION FREQUENCY 2
>> Info: USING VERLET I (r-RESPA) MTS SCHEME.
>> Info: C1 SPLITTING OF LONG RANGE ELECTROSTATICS
>> Info: PLACING ATOMS IN PATCHES BY HYDROGEN GROUPS
>> Info: RIGID BONDS TO HYDROGEN : ALL
>> Info: ERROR TOLERANCE : 1e-08
>> Info: MAX ITERATIONS : 100
>> Info: RIGID WATER USING SETTLE ALGORITHM
>> Info: RANDOM NUMBER SEED 1520362840
>> Info: USE HYDROGEN BONDS? NO
>> Info: COORDINATE PDB ./ubq_ws.pdb
>> Info: STRUCTURE FILE ./ubq_ws.psf
>> Info: PARAMETER file: CHARMM format!
>> Info: PARAMETERS ./par_all27_prot_lipid.inp
>> Info: USING ARITHMETIC MEAN TO COMBINE L-J SIGMA PARAMETERS
>> Info: SUMMARY OF PARAMETERS:
>> Info: 180 BONDS
>> Info: 447 ANGLES
>> Info: 566 DIHEDRAL
>> Info: 46 IMPROPER
>> Info: 6 CROSSTERM
>> Info: 119 VDW
>> Info: 0 VDW_PAIRS
>> Info: 0 NBTHOLE_PAIRS
>> Info: TIME FOR READING PSF FILE: 0.0326149
>> Info: Reading pdb file ./ubq_ws.pdb
>> Info: TIME FOR READING PDB FILE: 0.00761604
>> Info:
>> Info: ****************************
>> Info: STRUCTURE SUMMARY:
>> Info: 6682 ATOMS
>> Info: 4871 BONDS
>> Info: 4074 ANGLES
>> Info: 3293 DIHEDRALS
>> Info: 204 IMPROPERS
>> Info: 74 CROSSTERMS
>> Info: 0 EXCLUSIONS
>> Info: 6080 RIGID BONDS
>> Info: 13966 DEGREES OF FREEDOM
>> Info: 2419 HYDROGEN GROUPS
>> Info: 4 ATOMS IN LARGEST HYDROGEN GROUP
>> Info: 2419 MIGRATION GROUPS
>> Info: 4 ATOMS IN LARGEST MIGRATION GROUP
>> Info: TOTAL MASS = 41298.8 amu
>> Info: TOTAL CHARGE = 1.00955e-06 e
>> Info: *****************************
>> Info:
>> Info: Entering startup at 0.450153 s, 360.785 MB of memory in use
>> Info: Startup phase 0 took 3.60012e-05 s, 360.785 MB of memory in use
>> Info: ADDED 12209 IMPLICIT EXCLUSIONS
>> Info: Startup phase 1 took 0.00181603 s, 362.074 MB of memory in use
>> Info: NONBONDED TABLE R-SQUARED SPACING: 0.0625
>> Info: NONBONDED TABLE SIZE: 769 POINTS
>> Info: INCONSISTENCY IN FAST TABLE ENERGY VS FORCE: 0.000325096 AT 11.9556
>> Info: INCONSISTENCY IN SCOR TABLE ENERGY VS FORCE: 3.42196e-09 AT 0.0732877
>> Info: INCONSISTENCY IN VDWA TABLE ENERGY VS FORCE: 0.0040507 AT 0.251946
>> Info: INCONSISTENCY IN VDWB TABLE ENERGY VS FORCE: 0.00150189 AT 0.251946
>> Info: Startup phase 2 took 0.000404119 s, 363.164 MB of memory in use
>> Info: Startup phase 3 took 1.69277e-05 s, 363.164 MB of memory in use
>> Info: Startup phase 4 took 1.78814e-05 s, 363.164 MB of memory in use
>> Info: Startup phase 5 took 1.3113e-05 s, 363.164 MB of memory in use
>> Info: PATCH GRID IS 4 BY 4 BY 4
>> Info: PATCH GRID IS 1-AWAY BY 1-AWAY BY 1-AWAY
>> Info: REMOVING COM VELOCITY 0.0833122 0.116815 -0.0236598
>> Info: LARGEST PATCH (26) HAS 434 ATOMS
>> Info: TORUS A SIZE 1 USING 0
>> Info: TORUS B SIZE 1 USING 0
>> Info: TORUS C SIZE 1 USING 0
>> Info: TORUS MINIMAL MESH SIZE IS 1 BY 1 BY 1
>> Info: Placed 100% of base nodes on same physical node as patch
>> Info: Startup phase 6 took 0.00118899 s, 364.195 MB of memory in use
>> Info: Startup phase 7 took 2.19345e-05 s, 364.195 MB of memory in use
>> Segmentation fault (core dumped)
>> mahmood_at_orca:1-1-build$ ls
>> 1UBQ.pdb download ubqp.pdb ubq_ws_eq.conf
>> coor par_all27_prot_lipid.inp ubq.psf ubq_ws.pdb
>> del_water.log top_all27_prot_lipid.inp ubq_wb.log ubq_ws.psf
>> del_water.pdb ubq.pdb ubq_wb.pdb wat_sphere.tcl
>> del_water.psf ubq.pgn ubq_wb.psf
>>
>>
>>
>>
>> [1] https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.ks.uiuc.edu%2FTraining%2FTutorials%2Fnamd%2Fnamd-tutorial-html%2Fnode8.html&data=02%7C01%7CJoshua.Vermaas%40nrel.gov%7C680e2868e0cd409edf5f08d5839d57e3%7Ca0f29d7e28cd4f5484427885aee7c080%7C0%7C0%7C636559634166248526&sdata=3qt0pzHyYuVFqT0GKpmCbKhwEzLuHq9tC0IjxNgQ8Pg%3D&reserved=0
>>
>>
>> Regards,
>> Mahmood
>>
>

This archive was generated by hypermail 2.1.6 : Mon Dec 31 2018 - 23:20:54 CST