From: Bogdan Costescu (bcostescu_at_gmail.com)
Date: Thu Sep 02 2010 - 23:51:23 CDT

On Thu, Sep 2, 2010 at 9:28 PM, Thomas C. Bishop <bishop_at_tulane.edu> wrote:
> For purposes of analysis I often  break a DCD into pdbs that are written to disk, analyze them with another program and remove the pdbs.
> Can I avoid writing them to disk by using named pipes or sockets.

If the analysis programs needs to only read once each PDB file, then
what you describe is doable. With named pipes it is normally easier
because you can replace file names with named pipes directly; with
sockets most likely some small code changes are also necessary.

> I've considered a ram disk as another option.

This is a good option on Linux, especially in modern distributions
where tmpfs is available.

> To my surprise, I noticed that sometimes when I write a pdb,  analyze it and remove it  w/in a script
> that there seems to be no actual disc activity. Every thing happens in buffers before anything is committed to disc.

This depends on various OS settings like the type of file system these
files are located on, settings for this file system, etc. But it also
depends on your way of 'measuring' it: the HDD activity light is quite
unreliable, there's no way to know in which conditions it lights; best
is to use tools that look at what is actually written to disk - f.e.
iostat.

> Given this observation should I just let the OS handle things rather than force the issue w/ pipes/sockets?

If you are doing lots of such operations, so the potential for speed
increase exists (both of these are subjective of course ;-)), then you
could at least try once for your peace of mind :-)

If the workflow includes writing a small PDB file and then waiting
seconds or more for the analysis of each PDB, then don't bother - the
I/O time is most likely just a negligible fraction of the total time
and you won't see any significant improvement.

> What if any chances are there for data corruption if I pass data via pdbs in this manner.

Using tmpfs won't change the workflow in any way - the temporary files
will be created in memory instead of being written to disk, but this
is totally transparent for either VMD or the analysis program; of
course if the computer crashes just at that time, you won't have a
copy of the last PDB file written... but you need to restart your
analysis anyway so it probably doesn't matter.

With named pipes and sockets, the data cannot be corrupted in the
sense of being modified in random ways - but what can happen is
getting incomplete data in various corner cases (f.e. socket is closed
forcefully without waiting for conformation from the other end), but
very likely you won't have to deal with them. To prevent the
pipe/socket buffer from becoming full and stalling the process writing
to it, you should start the reader (the analysis process, if my guess
is right) before starting to write.

Bogdan