Re: Storage of large files

From: Vincent Kraeutler (vincent_at_kraeutler.net)
Date: Thu Aug 09 2007 - 04:03:32 CDT

Monika Sharma wrote:
> Dear All,
> We have started our venture into MD recently, for which we are using
> our in-house resources. Now that MD runs are giving very large output
> files like for dcd files. The files keep piling up and using spaces on
> the work machines. This is creating problems with the depletion of
> space with every run. Can anyone please suggest an "economical and
> efficient" way how to take backup of such a large files of the order
> of Gb or so, so that we dont end up piling up our work machines with
> such files. And the data need to be saved for future references..
> Thanks in advance..
> Regards,
> Monika
>
After a long time of experiments with tapes and various network storage
solutions, I have in the end returned to what might seem a trivial
solution: external hard drives. In terms of cost they're on par with all
the rest. In terms of ease of use, they're much better. And don't forget
about bzip2. ;-)

Also, it typically helps to play with these parameters:

dcdfreq 500
xstFreq 500

These are old default values from times when trajectory lengths were correspondingly shorter. I've had good
success using

dcdfreq 1000
xstFreq 1000

(occasionally even 5000) instead. Watch your trajectories shrink from GB to MB! ;-)
Think about it this way: for time-series plots, you're limited by the resolution of the printed plot
(i.e. ~500 dpi, 5 inches, means no more than 10'000 points are necessary).
For statistical analyses, where you want as much data as you can have, the highly correlated samples obtained
by printing out frames at short intervals will most often not buy you much (if anything) either.

In short: my proposed solution to the data problem (if it is one) is to think about what you intend to look at,
and then (conservatively!) tune the input file for it.

Cheers,
v.


This archive was generated by hypermail 2.1.6 : Wed Feb 29 2012 - 15:45:03 CST