Re: Compiling issue: selfcompiled NAMD2.14 multicore version factor ~2x slower - SOLVED

From: René Hafner TUK (hamburge_at_physik.uni-kl.de)
Date: Sat Aug 07 2021 - 17:44:01 CDT

Next message: Ruturaj warake: "MD simulation"
Previous message: Giacomo Fiorin: "Re: Compiling issue: selfcompiled NAMD2.14 multicore version factor ~2x slower"
In reply to: Giacomo Fiorin: "Re: Compiling issue: selfcompiled NAMD2.14 multicore version factor ~2x slower"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]

Hi Giacomo,

yes I noticed that icc was used in the precompiled one but did not
think it had such a big influence wether its gcc or icc..

But indeed it does very much!

FYI:

Using intel/2019 compiler

and comparim performances on hardware like:

* 12cores (of XEON SP 6126, best available at the cluster) +1xV100 gpu

Now I even get a slightly higher speed than the
precompiled one (compared without colvars)! :)

With colvars I now get:

* on 12cores + V100: ~350ns/day.

* 6cores +V100: ~ 250 ns/day.

And intel + CUDA 11.3 or CUDA 10.1 still makes no differences.

Best

René

On 8/7/2021 10:40 PM, Giacomo Fiorin wrote:
> Hello René, if Colvars is not active in both runs (with pre-compiled
> and self-compiled) it is very unlikely that changes in its source
> files can impact performance. You could probably confirm this by
> using your own build of an /unmodified/ 2.14 source tree, which I
> would expect to behave the same way.
>
> One thing to note is that most pre-compiled builds of NAMD are built
> with the Intel compiler, see e.g. the following when you launch the
> "Linux-x86_64-multicore-CUDA" build:
>
> Info: Based on Charm++/Converse 61002 for
> *multicore-linux-x86_64-iccstatic*
> Info: Built Mon Aug 24 10:10:58 CDT 2020 by jim on belfast.ks.uiuc.edu
> <http://belfast.ks.uiuc.edu>
>
> Jim Phillips, or one of the other core maintainers at UIUC may be able
> to comment further about what you could do on your end to reproduce
> the optimizations of the pre-compiled build on your own build.
>
> In general, I would also look into changing the number of CPU threads
> associated in 2.x NAMD. You have a fairly good performance to begin
> with, consistent with such a small system. The CPU-GPU communication
> step is one of the main factors limiting simulation speed, and this is
> definitely affected by how many CPU threads communicate with the GPU.
>
> For such a small system, fewer CPU threads per GPU would be more
> appropriate. (Note that this is valid for 2.x, NAMD 3.0 is entirely
> different).
>
> Giacomo
>
> On Sat, Aug 7, 2021 at 3:18 PM René Hafner TUK
> <hamburge_at_physik.uni-kl.de <mailto:hamburge_at_physik.uni-kl.de>> wrote:
>
>    Dear NAMD maintainers,
>
>
>     I tried implementing a new colvar (which was successful) but
> wondered about speed reduction by it.
>
>     Though I compared my self compiled version plain MD
> simulations (finally without colvars) with the precompiled binary
> from the website.
>
>     The only thing changed in the code is in colvars module files
> that is not active for the following comparism.
>
>
>     I obtain the following speed of simulations for a single
> standard cutoff etc. simulation (membrane + water, 7k atoms)
>
>         Precompiled: 300 ns/day (4fs ,HMR)
>
>         Selfcompiled: 162 ns/day (4fs timestep, HMR)
>
>
> This is not CUDA Version dependent as this results is stable with
> both CUDA 11.3 as well as with CUDA 10.1 (this latter version was
> used in the precompiled binary).
>
>
> Any help is appreciated.
>
> Kind regards
>
> René
>
> I compiled it with the following settings:
>
> """
>
> # building charmm
> module purge
> module load gcc/8.4
> ./build charm++ multicore-linux-x86_64 gcc -j16 --with-production
>
> ###
> module purge
> module load gcc/8.4
> module load nvidia/10.1
> ./config Linux-x86_64-g++ --charm-arch multicore-linux-x86_64-gcc
> --with-tcl --with-python --with-fftw --with-cuda --arch-suffix
> SelfCompiledNAMD214MultiCoreCUDA_cuda101_gcc_with_newcolvar_def
> cd Linux-x86_64-g++
> # append the line CXXOPTS=-lstdc++ -std=c++11 to the Make.config
> ## if no CXXOPTS like --with-debug are defined then it will not work
> echo "CXXOPTS=-lstdc++ -std=c++11" >> Make.config
>
> echo "showin Make.config"
> cat Make.config
> # then run it
> make -j 12 | tee
> make_log_SelfCompiledNAMD214MultiCoreCUDA_cuda101_gcc_with_newcolvar_def.txt
>
> """
>
>
> --
> Dipl.-Phys. René Hafner
> TU Kaiserslautern
> Germany
>

-- 
--
Dipl.-Phys. René Hafner
TU Kaiserslautern
Germany

Next message: Ruturaj warake: "MD simulation"
Previous message: Giacomo Fiorin: "Re: Compiling issue: selfcompiled NAMD2.14 multicore version factor ~2x slower"
In reply to: Giacomo Fiorin: "Re: Compiling issue: selfcompiled NAMD2.14 multicore version factor ~2x slower"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]

This archive was generated by hypermail 2.1.6 : Fri Dec 31 2021 - 23:17:11 CST