Re: Periodic cell too small with GPUs, not with pur CPUs

From: Francesco Pietra (chiendarret_at_gmail.com)
Date: Fri Aug 04 2017 - 06:44:37 CDT

Dear David;

I am answering late (had problems with the cluster0, however with "useCUDA2
off" added to namd config file worked well. Thank you
francesco

On Tue, Jul 18, 2017 at 6:05 PM, David Hardy <dhardy_at_ks.uiuc.edu> wrote:

> Dear Francesco,
>
> You could try disabling the newer CUDA kernels with "useCUDA2 off" in your
> config file.
>
> Others have also reported issues with NPT simulation using the new CUDA
> kernels. I am looking into it.
>
> Best regards,
> Dave
>
>
> On Jul 17, 2017, at 9:09 AM, Francesco Pietra <chiendarret_at_gmail.com>
> wrote:
>
> Dear Dave:
> I got the impression that problems with such poor GPUs like my GTX680 are
> exacerbated with version 12 of namd, or at least with the night build that
> I mentioned on previous mail.
>
> I am now at a larger box for the same system that I mentioned (just to
> allow the protein tumbling). Minimization on a 4 core desktop gave no
> problems. Gradual heating on the GTX680 box led to chaining of the TIP3P
> waters. Again no problem with the desktop, at its very low speed
> (rigidbonds water, ts=1.0fs, margin 0)
>
> NPT (based on the heating output) on the GTX680 box could not be run by
> even going to ts=0.1fs, larger margin).
>
> Again no problems on the desktop (the nextScale cluster in under tuning of
> nam12 with knl, presently very poor performance).
>
> I had no similar problems with previous versions of namd on the same
> GTX680 box.
>
> francesco
>
> On Wed, Jul 5, 2017 at 6:02 PM, David Hardy <dhardy_at_ks.uiuc.edu> wrote:
>
>> Dear Francesco,
>>
>> This issue shouldn't have anything to do with the "wrapall" option, since
>> this just affects the output of atom coordinates to the DCD file.
>>
>> The "periodic cell became too small" error occurs when using a barostat
>> that shrinks the periodic cell to the point where the patch size along a
>> dimension becomes smaller than the extended cutoff distance, which is why
>> increasing "margin" is helping.
>>
>> Overall, your problem between running on GPUs vs CPUs is most likely due
>> to differences in the calculation of the virial, which then affects the
>> barostat. The virial appears to be, numerically speaking, not a well
>> conditioned quantity to compute, so the use of single precision (GPUs)
>> versus double precision (CPUs) in its calculation is probably at the root
>> of your issue.
>>
>> In terms of using NAMD on GPUs, increasing the "margin," as Josh
>> recommends, is probably your best course of action. Increasing the margin
>> to 10 seems to me to be really extreme, as this will impede performance.
>> Ask yourself: Is your periodic cell really expected to shrink as much as
>> (minimum number of patches along a given dimension)*margin? Maybe first
>> try setting it more modestly to 1 or 2 to see if you can then successfully
>> run.
>>
>> Best regards,
>> Dave
>>
>> --
>> David J. Hardy, Ph.D.
>> Theoretical and Computational Biophysics
>> Beckman Institute, University of Illinois
>> dhardy_at_ks.uiuc.edu
>> http://www.ks.uiuc.edu/~dhardy/
>>
>>
>> On Jul 5, 2017, at 9:39 AM, Joshua . <joshua.timmons1_at_gmail.com> wrote:
>>
>> Hello Francesco,
>>
>> I am new to NAMD, but I had a similar problem "periodic cell became too
>> small" and resolved it by setting margin to 10 (while keeping wrapAll on).
>> Documentation from 2.6 says to leave it alone unless trying to optimize
>> performance, but it solved the issue for me immediately:
>> http://www.ks.uiuc.edu/Research/namd/2.6/olddocs/ug/node26.html
>>
>> Josh
>>
>> On Wed, Jul 5, 2017 at 3:09 AM, Francesco Pietra <chiendarret_at_gmail.com>
>> wrote:
>>
>>> Hallo:
>>> I am at an unbiased MD with a large protein containing organic ligands
>>> in a TIP3P box that (wrapall on) gave no troubles on a Nextscale cluster
>>> on 264 pure CPUs along a 58.2ns simulation, ts=1.0fs.
>>>
>>> On trying to continue the simulation with my workstation with a couple
>>> of GTX680, I am facing immediate "periodic cell became too small" under
>>> either "wrapall on" or "wrapall no" (I used successfully this hardware for
>>> MD up to this case). NAMD_CVS_2017-05-25_Linux-x86_64_multicore-CUDA.
>>>
>>> In contrast, the simulation continues without problems (albeit very
>>> slowly) on pure CPUs with a desktop, either "wrapall no" (which was the
>>> reason for continuing the simulation in order to safely measure the
>>> distances between the centers of mass of protein and ligands) or "wrapall
>>> on".
>>>
>>> francesco pietra
>>>
>>
>>
>>
>
>

This archive was generated by hypermail 2.1.6 : Sun Dec 31 2017 - 23:21:31 CST