RE: NAMD 2.9 with CUDA runs

From: Ashley Chew (ashley.chew_at_uwa.edu.au)
Date: Sun Sep 02 2012 - 20:37:30 CDT

>From what I understand, it is very reproducible with different data sets (Pretty sure it not a hardware problem, as there are 96 identical compute nodes with Tesla cards so its distributed on whatever nodes is free)

I'll get him to lodge the input data and follow the process outlined below (Thanks for the quick response)

For bug reports, mail namd_at_ks.uiuc.edu with:
- A synopsis of the problem as the subject (not "need help", "URGENT", "NAMD", or "need urgent NAMD help!!!").
- The NAMD version, platform and number of CPUs the problem occurs with (to be very complete just copy the first 20 lines of output).
- A description of the problematic behavior and any error messages.
- If the problem is consistent or random.
- A complete log of your run showing any error messages.
- The URL of your compressed tar file on a web server.

Ashley Chew
HPC System Administrator
iVEC_at_UWA (MBDP: M024)
The University of Western Australia
35 Stirling Highway
CRAWLEY  WA  6009

E:             ashley.chew_at_uwa.edu.au
P:             +61 8 6488 8742
F:             +61 8 6488 1015

CRICOS Provider Code: 00126G

Confidentiality and Privacy Notice
The contents of this email are strictly private and intended only for the addressee. This email may contain legally privileged or confidential information.  If you receive this communication in error, please notify the sender immediately by reply email and delete both emails and any attachments contained therein. No further disclosure, copying or relaying of any part of this correspondence is permitted without the express permission of the sender. The contents of this email, and any response or further correspondence, may be stored on an electronic filing record system pursuant to the privacy statement for records at The University of Western Australia. The University accepts no liability in connection with computer virus, data corruption, delay, interruption, unauthorized access or unauthorized amendment. This notice should not be removed.

 Save a tree...please don't print this e-mail unless you really need to

-----Original Message-----
From: Chris Harrison [mailto:charris5_at_gmail.com]
Sent: Saturday, September 01, 2012 2:14 AM
To: Ashley Chew
Cc: namd-l_at_ks.uiuc.edu
Subject: Re: namd-l: NAMD 2.9 with CUDA runs

Ashley,

How reproducible is the error and does it occur on other GPU boards? I ask b/c if you have a system where it occurs reproducibly at ~320K steps or very close to that we would ask you to send us the inputs so we can use it to track down the problem.

Best,
Chris

Ashley Chew <ashley.chew_at_uwa.edu.au> writes:
> Date: Fri, 31 Aug 2012 17:02:48 +0800
> From: Ashley Chew <ashley.chew_at_uwa.edu.au>
> To: "namd-l_at_ks.uiuc.edu" <namd-l_at_ks.uiuc.edu>
> Subject: namd-l: NAMD 2.9 with CUDA runs
>
> Hi this is my first post in regards to NAMD
>
> I was wondering if anyone in the community was having problems with
> NAMD built with CUDA (Using a single Tesla M2075 6gb, node has 72GB of
> Ram) once it passes a certain point (In his case pass 320k steps)
>
> In our case one of the researcher notice the errors returned in the output are common internal errors to do with unstable simulations but if they checkpoint and stop the runs prior to 320K steps, and then restart from the restart files internally generated by NAMD, the restarted simulation runs past the previous crash point.
>
> I have even rebuilt the NAMD from CVS 20120828 build with fftw3 (which works) but it pretty much did the same things once it passes a certain point.
>
> Ashley Chew
> HPC System Administrator
> iVEC_at_UWA (MBDP: M024)
> The University of Western Australia
> 35 Stirling Highway
> CRAWLEY WA 6009
>
> E: ashley.chew_at_uwa.edu.au<mailto:ashley.chew_at_uwa.edu.au>
> P: +61 8 6488 8742
> F: +61 8 6488 1015
>
>
> CRICOS Provider Code: 00126G
>
> [cid:image003.png_at_01CD879A.72D35BE0]
>
> Confidentiality and Privacy Notice
> The contents of this email are strictly private and intended only for the addressee. This email may contain legally privileged or confidential information. If you receive this communication in error, please notify the sender immediately by reply email and delete both emails and any attachments contained therein. No further disclosure, copying or relaying of any part of this correspondence is permitted without the express permission of the sender. The contents of this email, and any response or further correspondence, may be stored on an electronic filing record system pursuant to the privacy statement for records at The University of Western Australia. The University accepts no liability in connection with computer virus, data corruption, delay, interruption, unauthorized access or unauthorized amendment. This notice should not be removed.
>
> P Save a tree...please don't print this e-mail unless you really need
> to
>

Best,
Chris

--
Chris Harrison, Ph.D.
NIH Center for Macromolecular Modeling and Bioinformatics Theoretical and Computational Biophysics Group Beckman Institute for Advanced Science and Technology University of Illinois, 405 N. Mathews Ave., Urbana, IL 61801
http://www.ks.uiuc.edu/Research/namd       Voice: 773-570-6078
http://www.ks.uiuc.edu/~char               Fax:   217-244-6078

This archive was generated by hypermail 2.1.6 : Tue Dec 31 2013 - 23:22:30 CST