Re: Query for coarse grain system

From: Giacomo Fiorin (giacomo.fiorin_at_gmail.com)
Date: Mon Apr 09 2018 - 09:02:23 CDT

The error means that your GPUs (from 2011) are not supported any more by
the NVIDIA CUDA driver.

Giacomo

On Mon, Apr 9, 2018 at 9:42 AM, Abhishek TYAGI <atyagiaa_at_connect.ust.hk>
wrote:

> Dear Giacomo,
>
>
> I have tried the nightly build version of NAMD as well as NAMD_2.12, they
> both results in the following error:
>
>
> $ charmrun +ignoresharing +idlepoll +isomalloc_sync +p4 namd2 +devices
> 0,1,2,3 system-nvt-02.conf > system-nvt-02.log &
> [1] 22476
> [keatyagiaa_at_login-0 cg-test]$ ------------- Processor 2 Exiting: Called
> CmiAbort ------------
> Reason: FATAL ERROR: CUDA error device not of compute capability 3.0 or
> higher on Pe 2 (login-0.local device 2)
>
> ------------- Processor 0 Exiting: Called CmiAbort ------------
> Reason: FATAL ERROR: CUDA error device not of compute capability 3.0 or
> higher on Pe 0 (login-0.local device 0)
>
> ------------- Processor 1 Exiting: Called CmiAbort ------------
> Reason: FATAL ERROR: CUDA error device not of compute capability 3.0 or
> higher on Pe 1 (login-0.local device 1)
>
> ------------- Processor 3 Exiting: Called CmiAbort ------------
> Reason: FATAL ERROR: CUDA error device not of compute capability 3.0 or
> higher on Pe 3 (login-0.local device 3)
>
> Charm++ fatal error:
> FATAL ERROR: CUDA error device not of compute capability 3.0 or higher on
> Pe 2 (login-0.local device 2)
>
>
> The GPU we are using:
>
> nvidia-smi
> Mon Apr 9 21:40:46 2018
> +-----------------------------------------------------------
> ------------------+
> | NVIDIA-SMI 375.26 Driver Version:
> 375.26 |
> |-------------------------------+----------------------+----
> ------------------+
> | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr.
> ECC |
> | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute
> M. |
> |===============================+======================+====
> ==================|
> | 0 Tesla M2090 On | 0000:09:00.0 Off |
> Off |
> | N/A N/A P12 29W / N/A | 0MiB / 6066MiB | 0%
> Default |
> +-------------------------------+----------------------+----
> ------------------+
> | 1 Tesla M2090 On | 0000:0A:00.0 Off |
> Off |
> | N/A N/A P12 29W / N/A | 0MiB / 6066MiB | 0%
> Default |
> +-------------------------------+----------------------+----
> ------------------+
> | 2 Tesla M2090 On | 0000:0D:00.0 Off |
> Off |
> | N/A N/A P12 29W / N/A | 0MiB / 6066MiB | 0%
> Default |
> +-------------------------------+----------------------+----
> ------------------+
> | 3 Tesla M2090 On | 0000:0E:00.0 Off |
> Off |
> | N/A N/A P12 29W / N/A | 0MiB / 6066MiB | 0%
> Default |
> +-------------------------------+----------------------+----
> ------------------+
>
>
> +-----------------------------------------------------------
> ------------------+
> | Processes: GPU
> Memory |
> | GPU PID Type Process name
> Usage |
> |===========================================================
> ==================|
> | No running processes found
> |
> +-----------------------------------------------------------
> ------------------+
>
> It seems that NAMD_2.12 and nightly built does not support the GPU we are
> using as mentioned in the following thread:
> https://www-s.ks.uiuc.edu/Research/namd/mailing_list/namd-l.
> 2016-2017/2089.html
>
> however, I am looking forward to the solution if you have any to use
> recent version of NAMD on our cluster.
>
>
> Thanks again for your time
>
>
> Regards
>
> Abhi
>
>
> Abhishek Tyagi,PhD
>
> Chemical and Biological Engineering
>
> Hong Kong University of Science and Technology
>
> Clear Water Bay, Hong Kong
>
>
> ------------------------------
> *From:* Giacomo Fiorin <giacomo.fiorin_at_gmail.com>
> *Sent:* Thursday, April 5, 2018 11:27:16 PM
> *To:* Abhishek TYAGI
> *Cc:* NAMD list
>
> *Subject:* Re: namd-l: Query for coarse grain system
>
> CUDA capability was significantly improved in 2.12, and refined in the
> later versions (nightly). Try upgrading and see if it solves your
> problem. For sure, the speedup would be worth the upgrade by itself.
>
> Giacomo
>
> On Thu, Apr 5, 2018 at 11:23 AM, Abhishek TYAGI <atyagiaa_at_connect.ust.hk>
> wrote:
>
> Dear Giacomo,
>
>
> Thanks for the reply.
>
> I am using namd 2.11 version. i have looked at the trajectories and they
> seems normal to me, for the CUDA-GPU the warning is coming still md is
> running similar to md on PC.
>
>
> Thanks
>
> Abhi
>
> Abhishek Tyagi,PhD
>
> Chemical and Biological Engineering
>
> Hong Kong University of Science and Technology
>
> Clear Water Bay, Hong Kong
>
>
> ------------------------------
> *From:* Giacomo Fiorin <giacomo.fiorin_at_gmail.com>
> *Sent:* Thursday, April 5, 2018 11:17:47 PM
> *To:* NAMD list; Abhishek TYAGI
> *Subject:* Re: namd-l: Query for coarse grain system
>
> This is squarely a NAMD question: removing vmd-l.
>
> The CUDA compute kernels were significantly modified multiple times in the
> past versions. Without stating which versions you used, you are not really
> asking a question that can be answered.
>
> Also, have you looked at the trajectory before the error shows up? Well,
> this * will* involve VMD but there should be no question involved, since
> it should be standard practice to monitor the progress of the simulation.
>
> Giacomo
>
> On Thu, Apr 5, 2018 at 10:53 AM, Abhishek TYAGI <atyagiaa_at_connect.ust.hk>
> wrote:
>
> Hi,
>
>
> i am working on graphene-lipid system for coarse grain MD and I have
> encountered a strange problem (might be i don't understand it correctly). i
> need your suggestions to understand what is causing warning.
>
> During MD, when I run a system on CUDA based GPU (+p32; 4 GPU) system I
> encounter following error:
>
> Warning: Low global CUDA exclusion count! (6547 vs 6552) System unstable
> or pairlistdist or cutoff too small.
>
> i understand it is due to the reasons explained here
> http://www.ks.uiuc.edu/Research/namd/mailing_list/namd-l.201
> 3-2014/3253.html, http://www.ks.uiuc.edu/Researc
> h/namd/mailing_list/namd-l.2009-2010/2786.html
> and here http://www.ks.uiuc.edu/Research/namd/mailing_list/namd-l.201
> 0-2011/3814.html
>
> Although, when I run this same system on my linux PC I don't observe this
> error (+p8 1 GPU) and MD runs smoothly without any error or warning.
>
> Can any one please explain what I am missing here or what is potential
> problem I need to fix.
>
>
> Thanks in advance
>
> Abhi
>
>
> Abhishek Tyagi, PhD
>
> Chemical and Biological Engineering
>
> Hong Kong University of Science and Technology
>
> Clear Water Bay, Hong Kong
>
>
>
>
>
> --
> Giacomo Fiorin
> Associate Professor of Research, Temple University, Philadelphia, PA
> Contractor, National Institutes of Health, Bethesda, MD
> http://goo.gl/Q3TBQU
> https://github.com/giacomofiorin
>
>
>
>
> --
> Giacomo Fiorin
> Associate Professor of Research, Temple University, Philadelphia, PA
> Contractor, National Institutes of Health, Bethesda, MD
> http://goo.gl/Q3TBQU
> https://github.com/giacomofiorin
>

-- 
Giacomo Fiorin
Associate Professor of Research, Temple University, Philadelphia, PA
Contractor, National Institutes of Health, Bethesda, MD
http://goo.gl/Q3TBQU
https://github.com/giacomofiorin

This archive was generated by hypermail 2.1.6 : Tue Dec 31 2019 - 23:19:49 CST