From: John Stone (johns_at_ks.uiuc.edu)
Date: Thu Jul 31 2014 - 10:02:02 CDT

Hi,
  I wanted to summarize my answer to Chris's question since other
VMD users will definitely have similar experiences when doing large
analysis runs on network-attached drives.

Chris ran his analysis job on a workstation mounting an NFS
network filesystem on a 1Gbps ethernet network. When VMD reports
the I/O rate (as below) for trajectory frames, it is reporting only the
achieved timestep read/write rates, and these rates don't count any
computing time associated with per-timestep analysis work done in between
loading or writing of timesteps. So in the case Chris asked about below,
his analysis script itself has no bearing on the reported I/O rates.

The second part of the question is why the I/O rates initially start
very high, and they end up degrading by as much as two orders of magnitude
during part of the run he shows below. In this particular case, the file
format is DCD, which is read using traditional buffered I/O APIs such as
Unix lseek()/read()/readv(). The Linux kernel uses otherwise unused system
memory as a huge filesystem cache. At the beginning of Chris' run, he's
seeing I/O rates of 1652 MB/sec, but this way too fast to be due to anything
but the Linux in-memory filesystem cache, since he's loading a file over
an NSF mount on a 1Gbps network. The peak network I/O rate, assuming a
perfect transfer rate (not achievable by a long shot using normal NFS) is
only 100 MB/sec, so that clearly shows that the peak observed rate was
due to Linux filesystem caching. As the run progresses, the I/O rate
quickly drops under 100 MB/sec, which is showing that the network is not
able to keep up with the rate that VMD is flying through in-cache DCD
file, and the network+cache are falling behind VMD's read rate. As the
Linux filesystem cache is effectively empty, the I/O rate drops down into
a range between 25 MB/sec and 43 MB/sec. Since the NFS server Chris used
is shared by others, his analysis run is competing for I/O by other jobs
and any other network traffic. The I/O rate looks somewhat randomized as
his run progresses, and this is due to a combination of Linux trying to do
file read-ahead, combined with competition for the shared network and
NFS server with anything else that might have been running such as
other VMD analysis jobs, simulation jobs, backups being done at night,
and so on.

I would characterize the I/O rates Chris saw as being quite "normal" for
access to DCD files on a shared NFS server through a typical 1Gbps network.
The results could be better, and they could be worse, but the numbers
he reported are not unreasonable given the scenario.

The best case scenarios for doing high-performance trajectory analysis
is to use the "js" file format (rather than DCD) and a locally attached
array of fast SSDs in a RAID0 configuration on a PCIe 3.0 RAID controller.
Using such a configuration is is possible to hit I/O rates on the order of
8,000 MB/sec without trying too hard. I plan to develop new VMD tutorials
that describe how to do this for people that are interested.

Cheers,
  John Stone
  vmd_at_ks.uiuc.edu

On Thu, Jul 31, 2014 at 02:20:53AM +0000, Mayne, Christopher G wrote:
> I'm using the bigdcd script for analysis and I'm seeing huge drop-offs in IO speed followed by fluctuations. The per-frame proc is building a histogram -- measure bond distances (indices stored in global variables) and increment an element (i.e. my "bin") held in a global array. The only thing I can think of is that the initial array is small, and hence, fast. Because of structural constraints on my system, the final array size converges at around 100 elements, which I wouldn't guess is this much slower and it doesn't explain the fluctuations with the later dcd loads.
>
> Any ideas?
>
> Thanks,
> Chris
>
> VMD 1.9.2a39 running in text mode
> Linux, CentOS
>
> DCD IO output
> Info) Coordinate I/O rate 1429.2 frames/sec, 1652 MB/sec, 4.5 sec
> Info) Finished with coordinate file <redacted>.0.eq.0.0.dcd.
> Info) Coordinate I/O rate 446.1 frames/sec, 515 MB/sec, 6.1 sec
> Info) Finished with coordinate file <redacted>.0.eq.0.1.dcd.
> Info) Coordinate I/O rate 73.8 frames/sec, 85 MB/sec, 86.8 sec
> Info) Finished with coordinate file <redacted>.0.eq.0.2.dcd.
> Info) Coordinate I/O rate 22.3 frames/sec, 25 MB/sec, 120.9 sec
> Info) Finished with coordinate file <redacted>.0.eq.0.3.dcd.
> Info) Coordinate I/O rate 37.7 frames/sec, 43 MB/sec, 238.9 sec
> Info) Finished with coordinate file <redacted>.0.eq.0.4.dcd.
> Info) Coordinate I/O rate 25.4 frames/sec, 29 MB/sec, 353.9 sec
> Info) Finished with coordinate file <redacted>.0.eq.0.5.dcd.
> Info) Coordinate I/O rate 11.3 frames/sec, 13 MB/sec, 416.6 sec
> Info) Finished with coordinate file <redacted>.0.eq.0.6.dcd.
> Info) Coordinate I/O rate 11.7 frames/sec, 13 MB/sec, 482.8 sec
> Info) Finished with coordinate file <redacted>.0.eq.0.7.dcd.
> Info) Coordinate I/O rate 34.0 frames/sec, 39 MB/sec, 806.0 sec
> Info) Finished with coordinate file <redacted>.0.eq.0.8.dcd.
>
>
> VMD Startup Details:
> Info) VMD for LINUXAMD64, version 1.9.2a39 (March 21, 2014)
> Info) Exiting normally.
>
> Info) VMD for LINUXAMD64, version 1.9.2a39 (March 21, 2014)
> Info) http://www.ks.uiuc.edu/Research/vmd/
> Info) Email questions and bug reports to vmd_at_ks.uiuc.edu
> Info) Please include this reference in published work using VMD:
> Info) Humphrey, W., Dalke, A. and Schulten, K., `VMD - Visual
> Info) Molecular Dynamics', J. Molec. Graphics 1996, 14.1, 33-38.
> Info) -------------------------------------------------------------
> Info) Multithreading available, 8 CPUs detected.
> Info) Free system memory: 23683MB (98%)
> Info) Creating CUDA device pool and initializing hardware...
> Info) Detected 1 available CUDA accelerator:
> Info) [0] GeForce GTX 570 15 SM_2.0 @ 1.46 GHz, 1.2GB RAM, KTO, OIO, ZCP
> Info) Dynamically loaded 2 plugins in directory:
> Info) /Projects/vmd/pub/linux64/lib/vmd192a39/plugins/LINUXAMD64/molfile

-- 
NIH Center for Macromolecular Modeling and Bioinformatics
Beckman Institute for Advanced Science and Technology
University of Illinois, 405 N. Mathews Ave, Urbana, IL 61801
http://www.ks.uiuc.edu/~johns/           Phone: 217-244-3349
http://www.ks.uiuc.edu/Research/vmd/