From: jrhau lung (jrhaulung_at_gmail.com)
Date: Fri Feb 01 2019 - 17:33:45 CST

Hi Joao:
      The simuation is running on a new setup small desktop server with
dual CPU (112 core in max. and the simulation with error messages were
tested in either 24 and 96 core), RTX2080 and 16 gb DDR4 2133 reg only
Dram. A transformed protein structure from PDB ID 3FN3 was simulated under
the implicit solvent condition (same setting was run with no error in
another server with 48 core/GTX970). When check the GPU usage use
nvidia-smi during the short peroid of simulaion, the gpu load did quite low
(~3%). But there is one thing could added is when adopting John's
suggestion to switch to nightly built cuda NAMD, the first simulation do
persist a little bit longer (<0.005ns)with higher gpu load (~30%). But when
the first termination happened, new test will be terminated much short
after launch.
Since our building is closed during the Chinese New Year break, the result
of two steps simualtion you suggested will be back to you as soon as
possible. Thanks for all your helps.

Best,
Jrhau

João Ribeiro <jribeiro_at_ks.uiuc.edu> 於 2019年2月1日 週五 下午10:59寫道:

> Hi Jrhau,
>
> Would it be possible for you to give us more details, like the size of
> the system, the solvent type (if any) and the number of CPUs that you are
> using? From what you are describing it must be quite small (at least for
> the GPU in use).
>
> Another thing to consider is how scattered is your system. Are you running
> the simulation in implicit solvent or in vacuum? If not, It might be useful
> to run the minimization and short equilibration using the multicore version
> and then run the rest of the simulation using the CUDA version, to
> eliminate sudden changes in the volume of the box during the equilibration.
>
> Best
>
> João
>
>
> On Thu, Jan 31, 2019 at 5:58 PM jrhau lung <jrhaulung_at_gmail.com> wrote:
>
>> Hi Joao and John
>> Thanks for your guidance, I try to reduce the number of CPU core
>> usage in the running, The simulation does not persist longer and is
>> eventually automatically terminated from the same reason.
>> The error message is attached below. And the problem persists even when
>> switchs to the nightly built Linux-x86_64-multicore-CUDA
>> <https://www.ks.uiuc.edu/Development/Download/download.cgi?UserID=&AccessCode=&ArchiveID=1584>
>> NAMD.
>> Current system is running on ubuntu 16.04LTS with VMD LinuxAMD64
>> (1.9.4a12).
>>
>> sincerely
>>
>> Jrhau
>>
>> error message from running with fewer CPU cores
>>
>> ------------- Processor 16 Exiting: Called CmiAbort ------------
>> Reason: FATAL ERROR: ComputeBondedCUDA::copyTupleData, invalid number of
>> exclusions
>> FATAL ERROR: See http://www.ks.uiuc.edu/Research/namd/bugreport.html
>>
>> Charm++ fatal error:
>> FATAL ERROR: ComputeBondedCUDA::copyTupleData, invalid number of
>> exclusions
>> FATAL ERROR: See http://www.ks.uiuc.edu/Research/namd/bugreport.html
>>
>> Info) IMD connection ended unexpectedly; connection terminated.
>>
>> error message from running with nightly built Linux-x86_64-multicore-CUDA
>> <https://www.ks.uiuc.edu/Development/Download/download.cgi?UserID=&AccessCode=&ArchiveID=1584>
>> NAMD.
>>
>> ------------ Processor 64 Exiting: Called CmiAbort ------------
>> Reason: FATAL ERROR: ComputeBondedCUDA::copyTupleData, invalid number of
>> exclusions
>> FATAL ERROR: See http://www.ks.uiuc.edu/Research/namd/bugreport.html
>>
>> Charm++ fatal error:
>> FATAL ERROR: ComputeBondedCUDA::copyTupleData, invalid number of
>> exclusions
>> FATAL ERROR: See http://www.ks.uiuc.edu/Research/namd/bugreport.html
>>
>> ------------- Processor 64 Exiting: Called CmiAbort ------------
>> Reason: FATAL ERROR: ComputeBondedCUDA::copyTupleData, invalid number of
>> exclusions
>> FATAL ERROR: See http://www.ks.uiuc.edu/Research/namd/bugreport.html
>>
>> Charm++ fatal error:
>> FATAL ERROR: ComputeBondedCUDA::copyTupleData, invalid number of
>> exclusions
>> FATAL ERROR: See http://www.ks.uiuc.edu/Research/namd/bugreport.html
>>
>> while executing
>> "::exec
>> /home/jrhau/Downloads/NAMD_Git-2019-01-31_Linux-x86_64-multicore-CUDA/namd2
>> +idlepoll +setcpuaffinity +p96 qwikmd_equilibration_0.conf >> qwikm..."
>> ("eval" body line 1)
>> invoked from within
>> "eval ::exec [list $exec_path] [lrange $args 1 end]"
>> (procedure "::ExecTool::exec" line 14)
>> invoked from within
>> "::ExecTool::exec namd2 +idlepoll +setcpuaffinity +p96
>> qwikmd_equilibration_0.conf >> qwikmd_equilibration_0.log"
>> ("eval" body line 1)
>> invoked from within
>> "eval ::ExecTool::exec $exec_command >> $conf.log"
>> (procedure "QWIKMD::Run" line 250)
>> invoked from within
>> "QWIKMD::Run"
>> invoked from within
>> ".qwikmd.nbinput.f1.fcontrol.fcolapse.f1.run.button_Calculate invoke "
>> invoked from within
>> ".qwikmd.nbinput.f1.fcontrol.fcolapse.f1.run.button_Calculate instate
>> {pressed !disabled} {
>> .qwikmd.nbinput.f1.fcontrol.fcolapse.f1.run.button_Calculat..."
>> (command bound to event)
>>
>> João Ribeiro <jribeiro_at_ks.uiuc.edu> 於 2019年1月31日 週四 下午11:04寫道:
>>
>>> Hi Jrhau,
>>>
>>> Thank you for reporting the error. The error that you are seeing is
>>> related to the system size and the number of CPU cores+GPU that you
>>> selected to run your simulation. I would guess that the system you are
>>> running is not that big to justify many cores plus a GPU. Reduce the number
>>> of CPU cores and the simulation should run smoothly.
>>>
>>> Now, on the performance of the multicore version of VMD, did you run
>>> exactly the same configuration file?
>>>
>>> Please allow me to add some notes about the configuration files produced
>>> by QwikMD. QwikMD has a lot of ludic behavior in the selection of the MD
>>> parameters in the config files, namely, high frequency of energy outputs
>>> and trajectory saving frequency (dcd freq), which is useful when you are
>>> starting running simulations but has high penalties in the NAMD
>>> performance. Also, running your simulations with the option "Live view"
>>> mode (Interactive Molecular Dynamics activated) also decreases your
>>> performance substantially, as NAMD-VMD communication occurs every so often.
>>>
>>> In summary, if you are trying to squeeze the most ns/day from your
>>> machine, please increase periods of saving frames (dcd freq) and output
>>> events (outputpressure, outputenergies and etc.) and run your simulations
>>> in the background (Live view off == IMDon off).
>>>
>>> I hope this helps and I am also copying the NAMD developers on this
>>> thread so they can comment further on the NAMD issues and how to improve
>>> NAMD performance.
>>>
>>> Best
>>>
>>> Joao
>>>
>>>
>>>
>>> On Thu, Jan 31, 2019 at 5:44 AM jrhau lung <jrhaulung_at_gmail.com> wrote:
>>>
>>>> Dear VMD friends:
>>>> In order to run simulation on new Geforce RTX20 vedio card, a NAMD
>>>> was compiled from nightly build Git source to generate a
>>>> Linux-x86_64-multicore-CUDA
>>>> <https://www.ks.uiuc.edu/Development/Download/download.cgi?UserID=&AccessCode=&ArchiveID=1584>
>>>> suported NAMD accroding to the recommend process in the relase note. The
>>>> compile was successful without any error and should be successful as the
>>>> Linux-x86_64-multicore
>>>> <https://www.ks.uiuc.edu/Development/Download/download.cgi?UserID=&AccessCode=&ArchiveID=1584>
>>>> verion NAMD was also compiled before generating the CUDA supported version
>>>> and multicore version works fine with QwikMD in MD simulation.
>>>> Unfortunately, running MD simulation using the self-built CUDA NAMD, the
>>>> simulation aborted shortly after launch with the follwing messages. Any
>>>> suggestions and hints would be highly appreciated.
>>>>
>>>> Info) Using multithreaded IMD implementation.
>>>> ------------- Processor 64 Exiting: Called CmiAbort ------------
>>>> Reason: FATAL ERROR: ComputeBondedCUDA::copyTupleData, invalid number
>>>> of exclusions
>>>> FATAL ERROR: See http://www.ks.uiuc.edu/Research/namd/bugreport.html
>>>>
>>>> Charm++ fatal error:
>>>> FATAL ERROR: ComputeBondedCUDA::copyTupleData, invalid number of
>>>> exclusions
>>>> FATAL ERROR: See http://www.ks.uiuc.edu/Research/namd/bugreport.html
>>>>
>>>> Info) IMD connection ended unexpectedly; connection terminated.
>>>>
>>>> Another issue would like to have your comments is the simulation
>>>> speed uisng self-built Linux-x86_64-multicore
>>>> <https://www.ks.uiuc.edu/Development/Download/download.cgi?UserID=&AccessCode=&ArchiveID=1584>
>>>> NAMD is significantly slower than of the 2.13 multicore version. What
>>>> woulld be the potential causes for this? Is this related to the comfiling
>>>> tools or libs? Thanks
>>>>
>>>> sincerely,
>>>>
>>>> Jrhau
>>>>
>>>>
>>>>
>>>
>>> --
>>> ……………………………………………………...
>>> João Vieira Ribeiro
>>> Theoretical and Computational Biophysics Group
>>> Beckman Institute, University of Illinois
>>> http://www.ks.uiuc.edu/~jribeiro/
>>> jribeiro_at_ks.uiuc.edu
>>> +1 (217) 3005851
>>>
>>
>
> --
> ……………………………………………………...
> João Vieira Ribeiro
> Theoretical and Computational Biophysics Group
> Beckman Institute, University of Illinois
> http://www.ks.uiuc.edu/~jribeiro/
> jribeiro_at_ks.uiuc.edu
> +1 (217) 3005851
>