From: 张驭洲 (zhangyuzhou15_at_mails.ucas.edu.cn)
Date: Tue Mar 03 2020 - 01:04:07 CST
I have a question about the code of GPU PME force gather function.
In the __global__ void gather_force kenrel in CudaPmeSolverUtilKernel.cu, the cuda warp vote function __ballot_sync is used to mask out any threads that are not running force gather loop, and there is an annotation saying that this will happen if the number of atoms is not a multiple of 32. However, on older cuda versions and nvidia GPU devices, the mask parameter of the ballot function has no effect. How do the tail atoms treated in this situation? Is this (I mean that the number of atoms is not a multiple of 32) really a problem? I tried assigning the warp_mask a random int number instead of the result returned by the ballot function, and the final results seemed not affected.
This archive was generated by hypermail 2.1.6 : Thu Dec 31 2020 - 23:17:12 CST