Re: Query for coarse grain system

From: Abhishek TYAGI (atyagiaa_at_connect.ust.hk)
Date: Mon Apr 09 2018 - 08:42:35 CDT

Dear Giacomo,

I have tried the nightly build version of NAMD as well as NAMD_2.12, they both results in the following error:

$ charmrun +ignoresharing +idlepoll +isomalloc_sync +p4 namd2 +devices 0,1,2,3 system-nvt-02.conf > system-nvt-02.log &
[1] 22476
[keatyagiaa_at_login-0 cg-test]$ ------------- Processor 2 Exiting: Called CmiAbort ------------
Reason: FATAL ERROR: CUDA error device not of compute capability 3.0 or higher on Pe 2 (login-0.local device 2)

------------- Processor 0 Exiting: Called CmiAbort ------------
Reason: FATAL ERROR: CUDA error device not of compute capability 3.0 or higher on Pe 0 (login-0.local device 0)

------------- Processor 1 Exiting: Called CmiAbort ------------
Reason: FATAL ERROR: CUDA error device not of compute capability 3.0 or higher on Pe 1 (login-0.local device 1)

------------- Processor 3 Exiting: Called CmiAbort ------------
Reason: FATAL ERROR: CUDA error device not of compute capability 3.0 or higher on Pe 3 (login-0.local device 3)

Charm++ fatal error:
FATAL ERROR: CUDA error device not of compute capability 3.0 or higher on Pe 2 (login-0.local device 2)

The GPU we are using:

nvidia-smi
Mon Apr 9 21:40:46 2018
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 375.26 Driver Version: 375.26 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla M2090 On | 0000:09:00.0 Off | Off |
| N/A N/A P12 29W / N/A | 0MiB / 6066MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla M2090 On | 0000:0A:00.0 Off | Off |
| N/A N/A P12 29W / N/A | 0MiB / 6066MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 2 Tesla M2090 On | 0000:0D:00.0 Off | Off |
| N/A N/A P12 29W / N/A | 0MiB / 6066MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 3 Tesla M2090 On | 0000:0E:00.0 Off | Off |
| N/A N/A P12 29W / N/A | 0MiB / 6066MiB | 0% Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+

It seems that NAMD_2.12 and nightly built does not support the GPU we are using as mentioned in the following thread:
https://www-s.ks.uiuc.edu/Research/namd/mailing_list/namd-l.2016-2017/2089.html

however, I am looking forward to the solution if you have any to use recent version of NAMD on our cluster.

Thanks again for your time

Regards

Abhi

Abhishek Tyagi,PhD

Chemical and Biological Engineering

Hong Kong University of Science and Technology

Clear Water Bay, Hong Kong

________________________________
From: Giacomo Fiorin <giacomo.fiorin_at_gmail.com>
Sent: Thursday, April 5, 2018 11:27:16 PM
To: Abhishek TYAGI
Cc: NAMD list
Subject: Re: namd-l: Query for coarse grain system

CUDA capability was significantly improved in 2.12, and refined in the later versions (nightly). Try upgrading and see if it solves your problem. For sure, the speedup would be worth the upgrade by itself.

Giacomo

On Thu, Apr 5, 2018 at 11:23 AM, Abhishek TYAGI <atyagiaa_at_connect.ust.hk<mailto:atyagiaa_at_connect.ust.hk>> wrote:

Dear Giacomo,

Thanks for the reply.

I am using namd 2.11 version. i have looked at the trajectories and they seems normal to me, for the CUDA-GPU the warning is coming still md is running similar to md on PC.

Thanks

Abhi

Abhishek Tyagi,PhD

Chemical and Biological Engineering

Hong Kong University of Science and Technology

Clear Water Bay, Hong Kong

________________________________
From: Giacomo Fiorin <giacomo.fiorin_at_gmail.com<mailto:giacomo.fiorin_at_gmail.com>>
Sent: Thursday, April 5, 2018 11:17:47 PM
To: NAMD list; Abhishek TYAGI
Subject: Re: namd-l: Query for coarse grain system

This is squarely a NAMD question: removing vmd-l.

The CUDA compute kernels were significantly modified multiple times in the past versions. Without stating which versions you used, you are not really asking a question that can be answered.

Also, have you looked at the trajectory before the error shows up? Well, this will involve VMD but there should be no question involved, since it should be standard practice to monitor the progress of the simulation.

Giacomo

On Thu, Apr 5, 2018 at 10:53 AM, Abhishek TYAGI <atyagiaa_at_connect.ust.hk<mailto:atyagiaa_at_connect.ust.hk>> wrote:

Hi,

i am working on graphene-lipid system for coarse grain MD and I have encountered a strange problem (might be i don't understand it correctly). i need your suggestions to understand what is causing warning.

During MD, when I run a system on CUDA based GPU (+p32; 4 GPU) system I encounter following error:

Warning: Low global CUDA exclusion count! (6547 vs 6552) System unstable or pairlistdist or cutoff too small.

i understand it is due to the reasons explained here http://www.ks.uiuc.edu/Research/namd/mailing_list/namd-l.2013-2014/3253.html, http://www.ks.uiuc.edu/Research/namd/mailing_list/namd-l.2009-2010/2786.html
and here http://www.ks.uiuc.edu/Research/namd/mailing_list/namd-l.2010-2011/3814.html

Although, when I run this same system on my linux PC I don't observe this error (+p8 1 GPU) and MD runs smoothly without any error or warning.

Can any one please explain what I am missing here or what is potential problem I need to fix.

Thanks in advance

Abhi

Abhishek Tyagi, PhD

Chemical and Biological Engineering

Hong Kong University of Science and Technology

Clear Water Bay, Hong Kong

--
Giacomo Fiorin
Associate Professor of Research, Temple University, Philadelphia, PA
Contractor, National Institutes of Health, Bethesda, MD
http://goo.gl/Q3TBQU
https://github.com/giacomofiorin
--
Giacomo Fiorin
Associate Professor of Research, Temple University, Philadelphia, PA
Contractor, National Institutes of Health, Bethesda, MD
http://goo.gl/Q3TBQU
https://github.com/giacomofiorin

This archive was generated by hypermail 2.1.6 : Tue Dec 31 2019 - 23:19:49 CST