Re: Re: Re: q=AD=94=E5=A4=8D=3A_namd-l=3A_compilation_of?= namd

From: Bjoern Olausson (namdlist_at_googlemail.com)
Date: Thu May 12 2011 - 11:13:21 CDT

On Thursday 12 May 2011 15:34:14 Axel Kohlmeyer wrote:
> > Thanks for the information. I 'll test if namd performs better with -O2
> > and - no-vec then with my current flags and leave it that way.
>
> i would do one more iteration with and without -ansi-alias.
> this flag is not enabled by default (strangely, considering
> all the other quite aggressive and less promising flags
> that are enabled by default), but would give the compiler
> more freedom to optimize memory accesses, which is
> almost always helpful (if the code is written according to
> ANSI standard, that is).
>
So far my results for Linux-x86_64-ibverbs-net-linux-x86_64-ibverbs-icc and it
does not look good for "-ansi-alias".

NAMD: -O2 -xSSSE3 -no-vec -ansi-alias
CHARM: -O2 -xSSSE3 -no-vec -ansi-alias -DCMK_OPTIMIZE=1
Benchmark time: 96 CPUs 0.0306417 s/step 0.177325 days/ns 195.176 MB memory
Benchmark time: 96 CPUs 0.0303346 s/step 0.175547 days/ns 195.176 MB memory
Benchmark time: 96 CPUs 0.0308472 s/step 0.178514 days/ns 195.176 MB memory
Benchmark time: 96 CPUs 0.0249945 s/step 0.144644 days/ns 203.523 MB memory
Benchmark time: 96 CPUs 0.0250449 s/step 0.144936 days/ns 203.523 MB memory
Benchmark time: 96 CPUs 0.0252106 s/step 0.145895 days/ns 203.523 MB memory
WallClock: 297.828461 CPUTime: 297.014862 Memory: 204.312500 MB

NAMD: -O2 -xSSSE3 -no-vec
CHARM: -O2 -xSSSE3 -no-vec -DCMK_OPTIMIZE=1
Benchmark time: 96 CPUs 0.0310732 s/step 0.179822 days/ns 195.543 MB memory
Benchmark time: 96 CPUs 0.0307267 s/step 0.177817 days/ns 195.543 MB memory
Benchmark time: 96 CPUs 0.0302715 s/step 0.175183 days/ns 195.543 MB memory
Benchmark time: 96 CPUs 0.0253474 s/step 0.146686 days/ns 201.781 MB memory
Benchmark time: 96 CPUs 0.025222 s/step 0.145961 days/ns 201.781 MB memory
Benchmark time: 96 CPUs 0.0254034 s/step 0.147011 days/ns 201.781 MB memory
WallClock: 267.736938 CPUTime: 266.905426 Memory: 203.777344 MB

NAMD: -O3 -xSSSE3
CHARM: -O3 -xSSSE3 -DCMK_OPTIMIZE=1
Benchmark time: 96 CPUs 0.0313929 s/step 0.181672 days/ns 196.387 MB memory
Benchmark time: 96 CPUs 0.0315626 s/step 0.182654 days/ns 196.387 MB memory
Benchmark time: 96 CPUs 0.0310617 s/step 0.179755 days/ns 196.387 MB memory
Benchmark time: 96 CPUs 0.0254195 s/step 0.147103 days/ns 202.664 MB memory
Benchmark time: 96 CPUs 0.0255028 s/step 0.147585 days/ns 202.664 MB memory
Benchmark time: 96 CPUs 0.0253829 s/step 0.146892 days/ns 202.664 MB memory
WallClock: 270.806641 CPUTime: 269.985962 Memory: 205.171875 MB

System details:

Charmrun> scalable start enabled.
Charmrun> IBVERBS version of charmrun
Charm++> scheduler running in netpoll mode.
Charm++> cpu affinity enabled.
Charm++> Running on 8 unique compute nodes (12-way SMP).
Charm++> cpu topology info is gathered in 0.069 seconds.
Info: NAMD 2.8b2 for Linux-x86_64-ibverbs-net-linux-x86_64-ibverbs-icc
Info:
Info: Please visit http://www.ks.uiuc.edu/Research/namd/
Info: for updates, documentation, and support information.
Info:
Info: Please cite Phillips et al., J. Comp. Chem. 26:1781-1802 (2005)
Info: in all publications reporting results obtained with NAMD.
Info:
Info: Based on Charm++/Converse 60305 for net-linux-x86_64-ibverbs-icc
Info: Built Thu May 12 17:01:27 CEST 2011 by blub on mrbird
Info: 1 NAMD 2.8b2 Linux-x86_64-ibverbs-net-linux-x86_64-ibverbs-icc 96
node020 blub
Info: Running on 96 processors, 96 nodes, 8 physical nodes.
Info: CPU topology information available.
Info: Charm++/Converse parallel runtime startup completed at 0.115451 s
Info: 131.324 MB of memory in use based on /proc/self/stat
Info: Configuration file is firstrun_NVT.conf
Info: Working in the current directory
/home/blub/work/bdvm/namd/monolayer_no_bdvm/run000
TCL: Suspending until startup complete.
Warning: The following variables were set in the
Warning: configuration file but will be ignored:
Warning: LangevinPistonTarget (LangevinPiston)
Warning: LangevinPistonPeriod (LangevinPiston)
Warning: LangevinPistonDecay (LangevinPiston)
Warning: LangevinPistonTemp (LangevinPiston)
Warning: consref (constraints)
Warning: conskfile (constraints)
Warning: conskcol (constraints)
Warning: constraintScaling (constraints)
Info: SIMULATION PARAMETERS:
Info: TIMESTEP 2
Info: NUMBER OF STEPS 0
Info: STEPS PER CYCLE 10
Info: PERIODIC CELL BASIS 1 95.338 0 0
Info: PERIODIC CELL BASIS 2 0 95.338 0
Info: PERIODIC CELL BASIS 3 0 0 163.097
Info: PERIODIC CELL CENTER 0 0 0
Info: WRAPPING ALL CLUSTERS AROUND PERIODIC BOUNDARIES ON OUTPUT.
Info: LOAD BALANCER Centralized
Info: LOAD BALANCING STRATEGY New Load Balancers -- DEFAULT
Info: LDB PERIOD 2000 steps
Info: FIRST LDB TIMESTEP 50
Info: LAST LDB TIMESTEP -1
Info: LDB BACKGROUND SCALING 1
Info: HOM BACKGROUND SCALING 1
Info: PME BACKGROUND SCALING 1
Info: REMOVING PATCHES FROM PROCESSOR 0
Info: MIN ATOMS PER PATCH 40
Info: INITIAL TEMPERATURE 303
Info: CENTER OF MASS MOVING INITIALLY? NO
Info: DIELECTRIC 1
Info: EXCLUDE SCALED ONE-FOUR
Info: 1-4 ELECTROSTATICS SCALED BY 1
Info: MODIFIED 1-4 VDW PARAMETERS WILL BE USED
Info: DCD FILENAME mono-wo-bdvm.dcd
Info: DCD FREQUENCY 500
Info: DCD FIRST STEP 500
Info: DCD FILE WILL CONTAIN UNIT CELL DATA
Info: XST FILENAME mono-wo-bdvm.xst
Info: XST FREQUENCY 500
Info: NO VELOCITY DCD OUTPUT
Info: NO FORCE DCD OUTPUT
Info: OUTPUT FILENAME mono-wo-bdvm
Info: BINARY OUTPUT FILES WILL BE USED
Info: RESTART FILENAME mono-wo-bdvm.restart
Info: RESTART FREQUENCY 500
Info: BINARY RESTART FILES WILL BE USED
Info: SWITCHING ACTIVE
Info: SWITCHING ON 10
Info: SWITCHING OFF 12
Info: PAIRLIST DISTANCE 16
Info: PAIRLIST SHRINK RATE 0.01
Info: PAIRLIST GROW RATE 0.01
Info: PAIRLIST TRIGGER 0.3
Info: PAIRLISTS PER CYCLE 2
Info: PAIRLISTS ENABLED
Info: MARGIN 0
Info: HYDROGEN GROUP CUTOFF 2.5
Info: PATCH DIMENSION 18.5
Info: ENERGY OUTPUT STEPS 100
Info: CROSSTERM ENERGY INCLUDED IN DIHEDRAL
Info: TIMING OUTPUT STEPS 1000
Info: PRESSURE OUTPUT STEPS 100
Info: LANGEVIN DYNAMICS ACTIVE
Info: LANGEVIN TEMPERATURE 303
Info: LANGEVIN DAMPING COEFFICIENT IS 5 INVERSE PS
Info: LANGEVIN DYNAMICS NOT APPLIED TO HYDROGENS
Info: PARTICLE MESH EWALD (PME) ACTIVE
Info: PME TOLERANCE 1e-06
Info: PME EWALD COEFFICIENT 0.257952
Info: PME INTERPOLATION ORDER 4
Info: PME GRID DIMENSIONS 96 96 168
Info: PME MAXIMUM GRID SPACING 1
Info: Attempting to read FFTW data from FFTW_NAMD_2.8b2_Linux-x86_64-ibverbs-
net-linux-x86_64-ibverbs-icc.txt
Info: Optimizing 6 FFT steps. 1... 2... 3... 4... 5... 6... Done.
Info: Writing FFTW data to FFTW_NAMD_2.8b2_Linux-x86_64-ibverbs-net-linux-
x86_64-ibverbs-icc.txt
Info: FULL ELECTROSTATIC EVALUATION FREQUENCY 1
Info: USING VERLET I (r-RESPA) MTS SCHEME.
Info: C1 SPLITTING OF LONG RANGE ELECTROSTATICS
Info: PLACING ATOMS IN PATCHES BY HYDROGEN GROUPS
Info: RIGID BONDS TO HYDROGEN : ALL
Info: ERROR TOLERANCE : 1e-08
Info: MAX ITERATIONS : 100
Info: RIGID WATER USING SETTLE ALGORITHM
Info: RANDOM NUMBER SEED 1305213005
Info: USE HYDROGEN BONDS? NO
Info: COORDINATE PDB ../mono30_run400.pdb
Info: STRUCTURE FILE ../mono30_run400.xplor.psf
Info: PARAMETER file: CHARMM format!
Info: PARAMETERS ../par_all27_prot_lipid_na.prm
Info: PARAMETERS ../toppar_all27_lipid_cholesterol.str
Info: USING ARITHMETIC MEAN TO COMBINE L-J SIGMA PARAMETERS
Info: SKIPPING rtf SECTION IN STREAM FILE
Info: SUMMARY OF PARAMETERS:
Info: 304 BONDS
Info: 785 ANGLES
Info: 1293 DIHEDRAL
Info: 76 IMPROPER
Info: 6 CROSSTERM
Info: 189 VDW
Info: 0 VDW_PAIRS
Info: 0 NBTHOLE_PAIRS
Warning: Ignored 13089 bonds with zero force constants.
Warning: Will get H-H distance in rigid H2O from H-O-H angle.
Info: TIME FOR READING PSF FILE: 3.24134
Info: TIME FOR READING PDB FILE: 0.203022
Info:
Info: ****************************
Info: STRUCTURE SUMMARY:
Info: 79687 ATOMS
Info: 66512 BONDS
Info: 91349 ANGLES
Info: 112316 DIHEDRALS
Info: 602 IMPROPERS
Info: 0 CROSSTERMS
Info: 0 EXCLUSIONS
Info: 63777 RIGID BONDS
Info: 175284 DEGREES OF FREEDOM
Info: 28999 HYDROGEN GROUPS
Info: 4 ATOMS IN LARGEST HYDROGEN GROUP
Info: 28999 MIGRATION GROUPS
Info: 4 ATOMS IN LARGEST MIGRATION GROUP
Info: TOTAL MASS = 468603 amu
Info: TOTAL CHARGE = 3.23579e-05 e
Info: MASS DENSITY = 0.524912 g/cm^3
Info: ATOM DENSITY = 0.0537538 atoms/A^3
Info: *****************************
Info:
Info: Entering startup at 33.8566 s, 155.715 MB of memory in use
Info: Startup phase 0 took 0.00103211 s, 155.715 MB of memory in use
Info: Startup phase 1 took 0.420947 s, 194.199 MB of memory in use
Info: Startup phase 2 took 0.000926018 s, 194.199 MB of memory in use
Info: Startup phase 3 took 0.000293016 s, 194.199 MB of memory in use
Info: PATCH GRID IS 5 (PERIODIC) BY 5 (PERIODIC) BY 8 (PERIODIC)
Info: PATCH GRID IS 1-AWAY BY 1-AWAY BY 1-AWAY
Info: REMOVING COM VELOCITY 0.0132388 0.0277316 -0.0288584
Info: LARGEST PATCH (56) HAS 831 ATOMS
Info: Startup phase 4 took 0.0381131 s, 195.176 MB of memory in use
Info: PME using 48 and 48 processors for FFT and reciprocal sum.
Info: PME GRID LOCATIONS: 1 3 5 7 9 11 13 15 17 19 ...
Info: PME TRANS LOCATIONS: 0 2 4 6 8 10 12 14 16 18 ...
Info: Optimizing 4 FFT steps. 1... 2... 3... 4... Done.
Info: Startup phase 5 took 0.00849795 s, 195.176 MB of memory in use
Info: Startup phase 6 took 0.00717092 s, 195.176 MB of memory in use
LDB: Central LB being created...
Info: Startup phase 7 took 1.20994 s, 195.176 MB of memory in use
Info: CREATING 4370 COMPUTE OBJECTS
Info: useSync: 0 useProxySync: 0
Info: NONBONDED TABLE R-SQUARED SPACING: 0.0625
Info: NONBONDED TABLE SIZE: 769 POINTS
Info: Startup phase 8 took 0.0048089 s, 195.176 MB of memory in use
Info: Startup phase 9 took 0.000138998 s, 195.176 MB of memory in use
Info: Finished startup at 35.5485 s, 195.176 MB of memory in use

TCL: Minimizing for 1000 steps
[...]
WRITING EXTENDED SYSTEM TO RESTART FILE AT STEP 1000
WRITING COORDINATES TO DCD FILE AT STEP 1000
WRITING COORDINATES TO RESTART FILE AT STEP 1000
FINISHED WRITING RESTART COORDINATES
The last position output (seq=1000) takes 0.025 seconds, 203.523 MB of memory
in use
WRITING VELOCITIES TO RESTART FILE AT STEP 1000
FINISHED WRITING RESTART VELOCITIES
The last velocity output (seq=1000) takes 0.019 seconds, 203.523 MB of memory
in use
REINITIALIZING VELOCITIES AT STEP 1000 TO 303 KELVIN.
TCL: Running for 9000 steps
[...]

Cheers,
Bjoern

-- 
Bjoern Olausson
Martin-Luther-Universitt Halle-Wittenberg 
Fachbereich Biochemie/Biotechnologie
Kurt-Mothes-Str. 3
06120 Halle/Saale
Phone: +49-345-55-24942

This archive was generated by hypermail 2.1.6 : Wed Feb 29 2012 - 05:23:55 CST