From: Ray, William (ray.29_at_osu.edu)
Date: Tue May 04 2021 - 18:15:24 CDT

I've been mostly away from VMD for a depressingly long time now, but, I'd like to put this a bit more bluntly than the others:

The reason your script takes a long time, is almost certainly because of the time it takes for disk access, not the time it takes to compute and prepare the results for writing.

As a result, parallelizing across processors is unlikely to have any significant benefit, and may actually slow things down.

What may be useful, is parallelizing across physical storage units. If you have a system where you have separate file systems on separate physical spinning media or other media with a useful write buffer, your OS will (probably) be able to write to the separate file systems (essentially) simultaneously. In that case, the "parallel" command approach can be helpful, because you can have different parallel processes each monopolize a different physical storage device. It won't help you a bit however, to split things up into attempted multiple simultaneous writes onto the _same_ physical storage - that'll actually be worse.

________________________________________
From: owner-vmd-l_at_ks.uiuc.edu [owner-vmd-l_at_ks.uiuc.edu] on behalf of Mcguire, Kelly [klmcguire_at_UCSD.EDU]
Sent: Tuesday, May 4, 2021 4:53 PM
To: John Stone; Vermaas, Josh
Cc: vmd-l_at_ks.uiuc.edu
Subject: Re: vmd-l: Tcl Scripting Question

Thanks for all of the suggestions!

Dr. Kelly McGuire
Postdoc
Chemistry/Biochemistry Department
Natural Science Building, 4104A, 4106A, 4017
________________________________
From: John Stone <johns_at_ks.uiuc.edu>
Sent: Tuesday, May 4, 2021 1:44 PM
To: Vermaas, Josh <vermaasj_at_msu.edu>
Cc: Mcguire, Kelly <klmcguire_at_UCSD.EDU>; vmd-l_at_ks.uiuc.edu <vmd-l_at_ks.uiuc.edu>
Subject: Re: vmd-l: Tcl Scripting Question

Hi,
  Josh's suggestion to use the "parallel" commands is correct,
but I would warn you that I/O is one of the things that tends
not to parallelize much. It is pretty easy for a well-written
program to become I/O bound on modern hardware.

If all you're doing is splitting out a DCD file into thousands of PDBs
with a relatively inexpensive atom selection operation, then
the most important thing will be to ensure that you're writing
those PDBs onto files contained on a very fast storage system.

I would expect that VMD should be able to write those files
at a rate that approaches your disk's maximum write speed,
even with a single process and without MPI.

One question I would ask is why you're having to write the PDB
files in the first place? Maybe it would be more efficient
to teach other software tool(s) to read the DCD file directly
rather than processing PDBs? What are you doing with the resulting
PDB files?

In my mind, it wouldn't make much sense to go through the trouble of compiling
VMD for MPI just to emit a zillion PDB files due to a slow or poorly
written analysis tool. I would seriously question using PDB files
for anything important since they also truncate your coordinate precision..