Re: problems running GPU-accelerated namd

From: Renfro, Michael (Renfro_at_tntech.edu)
Date: Sun Dec 09 2018 - 10:07:21 CST

Are you running through a cluster scheduler, or directly logging into the GPU server? Even if your server has multiple GPUs in it, a scheduler may map your reserved GPUs to be devices 0, 1, 2, … regardless of their physical ID in the server.

You can specify whatever number of CPUs you want, but past a certain number, there will be little to no benefit to adding more once the GPU performance is the bottleneck. For a 3M atom benchmark model I’ve run on our particular cluster, that number was 4 E5-2680v4 CPUs per K80 GPU.

> On Dec 9, 2018, at 2:02 AM, 李耀 <liyao17_at_mails.tsinghua.edu.cn> wrote:
>
> Dear NAMD users,
>
> I'm running namd2 on GPU server with NAMD_Git-2017-12-25_Linux-x86_64-multicore-CUDA and this is the command:
> ./namd2 +devices 5 ~/4ntw/string_eq_5ns/f1t10/win_f1/win5_f1.conf > ~/4ntw/string_eq_5ns/f1t10/win_f1/win5_f1.log
>
> It came out the following error:
> Charm++ fatal error:
> FATAL ERROR: Pe 0 unable to bind to CUDA device 5 on omnisky because only 1 devices are present
>
> Aborted (core dumped)
>
> Then I declare 8 CPUs:
> ./namd2 +p 8 +devices 5 ~/4ntw/string_eq_5ns/f1t10/win_f1/win5_f1.conf > ~/4ntw/string_eq_5ns/f1t10/win_f1/win5_f1.log
>
> The coming message is:
> Charm++ fatal error:
> FATAL ERROR: Pe 4 unable to bind to CUDA device 5 on omnisky because only 1 devices are present
>
> Aborted (core dumped)
>
> I have a few questions:
> 1) Do I need to declare the number of CPU to use(+p num) in this version of NAMD?
> 2) How to make all the calculation(configuration and energy calculation) run with GPU but not with GPU and CPU?
> 3) How to explain those errors? There seems no problem about the GPU devices.
>
> Thank you for reading my mail.
>
> Best,
> Yao Li
>

This archive was generated by hypermail 2.1.6 : Tue Dec 31 2019 - 23:20:23 CST