Re: Query for coarse grain system

From: Abhishek TYAGI (atyagiaa_at_connect.ust.hk)
Date: Mon Apr 09 2018 - 12:17:06 CDT

that makes sense, we need to buy new versions of GPUs

Thanks for your time

Abhi

Abhishek Tyagi, PhD

Chemical and Biological Engineering

Hong Kong University of Science and Technology

Clear Water Bay, Hong Kong

________________________________
From: Axel Kohlmeyer <akohlmey_at_gmail.com>
Sent: Monday, April 9, 2018 10:22:11 PM
To: NAMD list; Abhishek TYAGI
Cc: Giacomo Fiorin
Subject: Re: namd-l: Query for coarse grain system

an nvidia tesla M2090 is a 7 year old GPU device. considering the fast
pace of the GPU hardware development, it has outlived its usefulness.
it will also not be included in future driver releases from nvidia and
thus consequently be dropped from compatibility with the CUDA toolkit.

the only solution to use a newer version of NAMD with GPUs is to
replace the GPU hardware.

axel.

On Mon, Apr 9, 2018 at 9:42 AM, Abhishek TYAGI <atyagiaa_at_connect.ust.hk> wrote:
> Dear Giacomo,
>
>
> I have tried the nightly build version of NAMD as well as NAMD_2.12, they
> both results in the following error:
>
>
> $ charmrun +ignoresharing +idlepoll +isomalloc_sync +p4 namd2 +devices
> 0,1,2,3 system-nvt-02.conf > system-nvt-02.log &
> [1] 22476
> [keatyagiaa_at_login-0 cg-test]$ ------------- Processor 2 Exiting: Called
> CmiAbort ------------
> Reason: FATAL ERROR: CUDA error device not of compute capability 3.0 or
> higher on Pe 2 (login-0.local device 2)
>
> ------------- Processor 0 Exiting: Called CmiAbort ------------
> Reason: FATAL ERROR: CUDA error device not of compute capability 3.0 or
> higher on Pe 0 (login-0.local device 0)
>
> ------------- Processor 1 Exiting: Called CmiAbort ------------
> Reason: FATAL ERROR: CUDA error device not of compute capability 3.0 or
> higher on Pe 1 (login-0.local device 1)
>
> ------------- Processor 3 Exiting: Called CmiAbort ------------
> Reason: FATAL ERROR: CUDA error device not of compute capability 3.0 or
> higher on Pe 3 (login-0.local device 3)
>
> Charm++ fatal error:
> FATAL ERROR: CUDA error device not of compute capability 3.0 or higher on Pe
> 2 (login-0.local device 2)
>
>
> The GPU we are using:
>
> nvidia-smi
> Mon Apr 9 21:40:46 2018
> +-----------------------------------------------------------------------------+
> | NVIDIA-SMI 375.26 Driver Version: 375.26
> |
> |-------------------------------+----------------------+----------------------+
> | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr.
> ECC |
> | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute
> M. |
> |===============================+======================+======================|
> | 0 Tesla M2090 On | 0000:09:00.0 Off |
> Off |
> | N/A N/A P12 29W / N/A | 0MiB / 6066MiB | 0%
> Default |
> +-------------------------------+----------------------+----------------------+
> | 1 Tesla M2090 On | 0000:0A:00.0 Off |
> Off |
> | N/A N/A P12 29W / N/A | 0MiB / 6066MiB | 0%
> Default |
> +-------------------------------+----------------------+----------------------+
> | 2 Tesla M2090 On | 0000:0D:00.0 Off |
> Off |
> | N/A N/A P12 29W / N/A | 0MiB / 6066MiB | 0%
> Default |
> +-------------------------------+----------------------+----------------------+
> | 3 Tesla M2090 On | 0000:0E:00.0 Off |
> Off |
> | N/A N/A P12 29W / N/A | 0MiB / 6066MiB | 0%
> Default |
> +-------------------------------+----------------------+----------------------+
>
> +-----------------------------------------------------------------------------+
> | Processes: GPU
> Memory |
> | GPU PID Type Process name Usage
> |
> |=============================================================================|
> | No running processes found
> |
> +-----------------------------------------------------------------------------+
>
> It seems that NAMD_2.12 and nightly built does not support the GPU we are
> using as mentioned in the following thread:
> https://www-s.ks.uiuc.edu/Research/namd/mailing_list/namd-l.2016-2017/2089.html
>
> however, I am looking forward to the solution if you have any to use recent
> version of NAMD on our cluster.
>
>
> Thanks again for your time
>
>
> Regards
>
> Abhi
>
>
> Abhishek Tyagi,PhD
>
> Chemical and Biological Engineering
>
> Hong Kong University of Science and Technology
>
> Clear Water Bay, Hong Kong
>
>
> ________________________________
> From: Giacomo Fiorin <giacomo.fiorin_at_gmail.com>
> Sent: Thursday, April 5, 2018 11:27:16 PM
> To: Abhishek TYAGI
> Cc: NAMD list
>
> Subject: Re: namd-l: Query for coarse grain system
>
> CUDA capability was significantly improved in 2.12, and refined in the later
> versions (nightly). Try upgrading and see if it solves your problem. For
> sure, the speedup would be worth the upgrade by itself.
>
> Giacomo
>
> On Thu, Apr 5, 2018 at 11:23 AM, Abhishek TYAGI <atyagiaa_at_connect.ust.hk>
> wrote:
>
> Dear Giacomo,
>
>
> Thanks for the reply.
>
> I am using namd 2.11 version. i have looked at the trajectories and they
> seems normal to me, for the CUDA-GPU the warning is coming still md is
> running similar to md on PC.
>
>
> Thanks
>
> Abhi
>
> Abhishek Tyagi,PhD
>
> Chemical and Biological Engineering
>
> Hong Kong University of Science and Technology
>
> Clear Water Bay, Hong Kong
>
>
> ________________________________
> From: Giacomo Fiorin <giacomo.fiorin_at_gmail.com>
> Sent: Thursday, April 5, 2018 11:17:47 PM
> To: NAMD list; Abhishek TYAGI
> Subject: Re: namd-l: Query for coarse grain system
>
> This is squarely a NAMD question: removing vmd-l.
>
> The CUDA compute kernels were significantly modified multiple times in the
> past versions. Without stating which versions you used, you are not really
> asking a question that can be answered.
>
> Also, have you looked at the trajectory before the error shows up? Well,
> this will involve VMD but there should be no question involved, since it
> should be standard practice to monitor the progress of the simulation.
>
> Giacomo
>
> On Thu, Apr 5, 2018 at 10:53 AM, Abhishek TYAGI <atyagiaa_at_connect.ust.hk>
> wrote:
>
> Hi,
>
>
> i am working on graphene-lipid system for coarse grain MD and I have
> encountered a strange problem (might be i don't understand it correctly). i
> need your suggestions to understand what is causing warning.
>
> During MD, when I run a system on CUDA based GPU (+p32; 4 GPU) system I
> encounter following error:
>
> Warning: Low global CUDA exclusion count! (6547 vs 6552) System unstable
> or pairlistdist or cutoff too small.
>
> i understand it is due to the reasons explained here
> http://www.ks.uiuc.edu/Research/namd/mailing_list/namd-l.2013-2014/3253.html,
> http://www.ks.uiuc.edu/Research/namd/mailing_list/namd-l.2009-2010/2786.html
> and here
> http://www.ks.uiuc.edu/Research/namd/mailing_list/namd-l.2010-2011/3814.html
>
> Although, when I run this same system on my linux PC I don't observe this
> error (+p8 1 GPU) and MD runs smoothly without any error or warning.
>
> Can any one please explain what I am missing here or what is potential
> problem I need to fix.
>
>
> Thanks in advance
>
> Abhi
>
>
> Abhishek Tyagi, PhD
>
> Chemical and Biological Engineering
>
> Hong Kong University of Science and Technology
>
> Clear Water Bay, Hong Kong
>
>
>
>
>
> --
> Giacomo Fiorin
> Associate Professor of Research, Temple University, Philadelphia, PA
> Contractor, National Institutes of Health, Bethesda, MD
> http://goo.gl/Q3TBQU
> https://github.com/giacomofiorin
>
>
>
>
> --
> Giacomo Fiorin
> Associate Professor of Research, Temple University, Philadelphia, PA
> Contractor, National Institutes of Health, Bethesda, MD
> http://goo.gl/Q3TBQU
> https://github.com/giacomofiorin

--
Dr. Axel Kohlmeyer  akohlmey_at_gmail.com  http://goo.gl/1wk0
College of Science & Technology, Temple University, Philadelphia PA, USA
International Centre for Theoretical Physics, Trieste. Italy.

This archive was generated by hypermail 2.1.6 : Mon Dec 31 2018 - 23:20:59 CST