Re: Namd-3 alpha 7 error

From: Dr. Eddie (eackad_at_gmail.com)
Date: Wed Feb 24 2021 - 14:25:16 CST

They are notnvlinked. I say David's email which had:
"I had thought that multi-GPU alpha 7 required NVLink. It happens to be
recommended, but not required."
so I thought I'd try.

On Wed, Feb 24, 2021 at 1:42 PM Bassam Haddad <bhaddad_at_pdx.edu> wrote:

> Are all of your GPUs NV-linked? It's my understanding that you cannot use
> the multi-gpu NAMD 3.0 unless the GPUs are linked together.
>
> Best,
> Bassam
>
> On Wed, Feb 24, 2021 at 11:37 AM David Hardy <dhardy_at_ks.uiuc.edu> wrote:
>
>> Hi Eddie,
>>
>> This looks similar to the error that Lorenzo recently reported.
>>
>> Would you be willing to share your data set for us to try to reproduce
>> this error locally? Although we don’t have your exact hardware setup, we do
>> have a few multi-GPU platforms in-house that we could try.
>>
>> Best regards,
>> Dave
>>
>> --
>> David J. Hardy, Ph.D.
>> Beckman Institute
>> University of Illinois at Urbana-Champaign
>> 405 N. Mathews Ave., Urbana, IL 61801
>> dhardy_at_ks.uiuc.edu, http://www.ks.uiuc.edu/~dhardy/
>>
>> On Feb 23, 2021, at 11:49 AM, Dr. Eddie <eackad_at_gmail.com> wrote:
>>
>> Hello,
>> I'm trying to get a small 150k system working with namd3.
>> I am using 4 gtx1080's with the command
>> nice -n 5
>> /home/eddie/binaries/NAMD_3.0alpha7_Linux-x86_64-multicore-CUDA/namd3 +p4
>> +idlepoll +setcpuaffinity +devices 0,1,2,3 step5_production.inp
>>
>> I get the error:
>> FATAL ERROR: CUDA error cub::DeviceSelect::If(d_temp_storage,
>> temp_storage_bytes, hgi, hgi, d_nHG, natoms, notZero(), stream) in file
>> src/SequencerCUDAKernel.cu, function buildRattleLists, line 4461
>> on Pe 0 (node10.cl.siue.edu
>> <https://urldefense.com/v3/__http://node10.cl.siue.edu__;!!DZ3fjg!vsAnMqOceXLf3noTlGGQZngwX7S0_fAHxuf_sBu3qz4_Up4XE-C_4gEdnrWauu7kXw$>
>> device 0 pci 0:2:0): invalid device function
>>
>> and
>> CUDANBOND[2]: Allocating patch data structure with 87 patches!
>> CUDANBOND[3]: Allocating patch data structure with 101 patches!
>> CUDANBOND[1]: Allocating patch data structure with 89 patches!
>> CUDANBOND[0]: Allocating patch data structure with 114 patches!
>> FATAL ERROR: CUDA error cub::DeviceSelect::If(d_temp_storage,
>> temp_storage_bytes, hgi, hgi, d_nHG, natoms, notZero(), stream) in file
>> src/SequencerCUDAKernel.cu, function buildRattleLists, line 4461
>> on Pe 0 (node10.cl.siue.edu
>> <https://urldefense.com/v3/__http://node10.cl.siue.edu__;!!DZ3fjg!vsAnMqOceXLf3noTlGGQZngwX7S0_fAHxuf_sBu3qz4_Up4XE-C_4gEdnrWauu7kXw$>
>> device 0 pci 0:2:0): invalid device function
>>
>> Any ideas? I know these are commerical gpus, is that an issue?
>> Thanks,
>> Eddie
>>
>>
>>

-- 
Eddie

This archive was generated by hypermail 2.1.6 : Fri Dec 31 2021 - 23:17:10 CST