Re: Slow performance over multi-core processor and CUDA build

From: Josh Vermaas (joshua.vermaas_at_gmail.com)
Date: Tue Jul 28 2020 - 20:32:26 CDT

My understanding is that this is currently not possible with NAMD 3.0,
since I was told that CUDASOAintegrate can't be set after minimization or
dynamics have started. What I've been doing is to have the initial run be a
few thousand steps at most, and then use the fast integrator after a
restart. Keep NAMD-L on the cc line. I typically don't answer individually
posed questions.

-Josh

On Tue, Jul 28, 2020 at 8:30 PM Roshan Shrestha <roshanpra_at_gmail.com> wrote:

> Josh,
> Thanks a lot. I am using NAMD 3.0 now and since *CUDASOAintegrate*
> parameter should be set to off for the minimization run, is there anyway I
> can write a loop so that my namd configuration file sets this parameter off
> for minimization and then for dynamics run, it sets this parameter on as I
> normally perform both minimization and dynamics run with a single namd
> configuration where dynamics run takes place immediately after minimization
> run. Thanks.
>
> With best regards
>
> On Wed, Jul 29, 2020 at 1:14 AM Josh Vermaas <joshua.vermaas_at_gmail.com>
> wrote:
>
>> Hi Roshan,
>>
>> +idlepoll isn't required for CUDA builds since 2.11 or 2.12. My usual
>> command line for a single node build with GPUs looks something like this:
>>
>> /path/to/namd/binary/namd2 +p8 input.namd | tee output.log
>>
>> The processor count matters, since in the namd 2.X branch, the integrator
>> is on the CPU. If you decide to use the NAMD 3.0 alpha, which moves the
>> integrator to the GPU, the equivalent line would be:
>>
>> /path/to/namd/binary/namd3 +p1 input.namd | tee output.log
>>
>> And you'd want to add the CUDASOAIntegrate flag in your .namd file. The
>> NVIDIA developers blog has a recent post that details the changes you'd
>> need to make.
>> https://developer.nvidia.com/blog/delivering-up-to-9x-throughput-with-namd-v3-and-a100-gpu/
>>
>> -Josh
>>
>> On Tue, Jul 28, 2020 at 10:38 AM Roshan Shrestha <roshanpra_at_gmail.com>
>> wrote:
>>
>>> Prof. Giacomo,
>>> So, if I use the newest nightly build of namd
>>> with
>>> Nvidia cuda acceleration, do I need to specify something in my command
>>> arguments like the number of processors with *+p8 *and +idlepoll or the
>>> normal *namd2 file.conf | tee output.log *shall work? Which is the best
>>> command I can use to have access to all cuda cores and the cpu cores? The
>>> thing with gromacs, was I had to build the source myself so that I can
>>> use
>>> cuda, whereas since namd seems like automate things, I am unable to grasp
>>> to understand how can I maximize its performance. For now, my system is
>>> pretty simple with 50K + atoms and the simulation parameters are pretty
>>> standard for normal equilibration and a production run. Thanks.
>>>
>>> With best regards
>>>
>>>
>>> On Tue, Jul 28, 2020 at 6:52 PM Giacomo Fiorin <giacomo.fiorin_at_gmail.com
>>> >
>>> wrote:
>>>
>>> > Not sure why hyperthreading is mentioned, which is not supported by the
>>> > processor in question:
>>> >
>>> >
>>> https://ark.intel.com/content/www/us/en/ark/products/186604/intel-core-i7-9700k-processor-12m-cache-up-to-4-90-ghz.html
>>> >
>>> > Roshan, what are the system size and simulation parameters? It is
>>> > possible that the system is not suitable for a CPU-GPU hybrid scheme
>>> > (possibly made worse by using too many CPU cores). The Gromacs
>>> benchmark
>>> > (which was probably run in single precision and on the CPU) seems to
>>> > suggest a rather small system. Have you tried running a non-GPU
>>> build? Or
>>> > the GPU-optimized 3.0 alpha build?
>>> >
>>> > For typical biological systems (of the order of 100,000 atoms) and
>>> running
>>> > over CPUs, Gromacs would be faster over a few nodes but scale over
>>> multiple
>>> > nodes less well than NAMD. The tipping point depends on the system
>>> and to
>>> > a lesser extent on the hardware makeup. I suggest you benchmark your
>>> > system thoroughly with both codes, and then decide.
>>> >
>>> > Giacomo
>>> >
>>> > On Tue, Jul 28, 2020 at 8:37 AM Norman Geist <
>>> > norman.geist_at_uni-greifswald.de> wrote:
>>> >
>>> >> I’d say don’t use hyperthreading in HPC in general, nothing special
>>> >> about GPUs. You can assign your tasks/threads to physical core only,
>>> e.g
>>
>>
>
> --
> *Roshan Shrestha*
> M.Sc (Physics)
> Central Department of Physics, Tribhuvan University
> Kathmandu, Nepal
>
>
>
>

This archive was generated by hypermail 2.1.6 : Thu Dec 31 2020 - 23:17:13 CST