From: Axel Kohlmeyer (akohlmey_at_cmm.chem.upenn.edu)
Date: Wed Oct 22 2008 - 12:49:56 CDT

On Wed, 22 Oct 2008, Tibbitt, Jeffrey A. wrote:

hi,

please let me point out that there is a significant difference
between reading binary files from fortran and from c/c++/perl/etc.
the fortran code only works, if the platform reading the .dcd file
is equivalent to the way it is written.

.dcd is a binary format, and thus "endianness" of a machine matters.
to explain: there are some machines that store a (32-bit) integer
like 66100 as the four bytes 0, 1, 2, 52 and others that would do
52, 2, 1, 0. so you have to know which machine a dcd file was written
on or what endiannes it was encoded into (some fortran compilers can
swap on the fly based on compiler flags and environment variables).

the really nasty part is, that fortran binary files do not only contain
the data that is written, but a record length indicator that tells
the fortran binary reader how large a block (a single write) exactly is.

to explain: in fortran you can do on writing (in unformatted binary).

write(8) a, b, c
write(8) d, e, f
write(8) (g(i),i=1,100)

if you are only interested in a part of the data you can do:

read(8) a, b
read(8) d
read(8) (g(i),i=1,50)

and it will _still_ read what was in "a" into "a" and what was
in "d" into "d" and so on.

now the size of this record length indicator can be set manually
for some compilers or is dependent on the platform (most have 32-bit
even on 64-bit machines, some have 64-bit if their integer type is
64-bit by default). if you try to read a .dcd file with fortran, it
will either work or fail miserably (frequenly with a "failed to seek
to the end of block" or a "file too short" type error message or
even a segmentation fault), depending on your platform. in c you
have a bit more flexibility and the same applied to perl, but the
correct handling of all these issues is tricky.

please have a look at the dcdplugin source code of the molfile_plugin
package that reads .dcd files for VMD (and after many, many tests is
almost perfect at handling endiannes and record size length
automatically and correctly...

cheers,
    axel.

JT> Jonathan,
JT> My FORTRAN code for reading an entire trajectory is different than what you have and may be easier to follow. It is very short and well commented. I'll explain it. The lines beginning with a c are comments. The first line is just a my name for the program. The second group of lines declare the variables. Here you notice the * in different places. It specifies the sizes of the data types read in (here character*4 means a word 4 characters in length). It works similarly for integers, but it signifies bytes (I think) when applied to real numbers. The third group of lines open up the trajectory file and start reading it from the beginning. Other information is stored at the beginning before the coordinates. The fourth group of lines loop through the frames and read the coordinates.
JT>
JT> The first thing in a trajectory to read is a word (here I call it header), 4 characters in length, containing the characters: CORD or VELO. It should be one of these two only and defines the trajectory as containing coordinates or velocities.
JT>
JT> The second thing in the trajectory coming right after that is a 20 dimensional array of integers (called control). This contains information (control(1)=number of frames, control(2)=number of previous dynamic steps, control(3)=skip frequency for writing frames, control(4)=number of dynamics steps).
JT>
JT> The third thing read is an integer (called ntitle). This is the dimension of the 4th thing read. This dimension is the same as the number of title lines written out by the MD software.
JT>
JT> The fourth thing read is an array of characters of dimension ntitle (the dimension previously read). This array is called title here.
JT>
JT> The fifth thing read is an integer stating the number of atoms (called natom here).
JT>
JT> The sixth thing read are the coordinates. First frame coordinates come first. But all the x-coordinates come first, then all the ys then all the zs. First, all the x coordinates for the first frame are read. It is read in at once as an array natom in length (here I call it xx(natom)). Next, all the y coordinates of the first frame are read (yy(natom)). Then all the z coordinates are read (zz(natom). Repeat reading the coordinates with the second frame.
JT>
JT> I've gone on at length here, because I know it is invaluable to have your own code to get at the information in the DCD trajectory. And I've seen this question before. Note that this is used only for a simple DCD trajectory with no fixed atoms. They are a little different if they contain that sort of stuff. I am not an expert at DCD format, but I know that this format works for my trajectories. For fixed atoms and other stuff included someone else can help. But hopefully this helps you in constructing one in Perl for yourself.
JT>
JT> Jeff Tibbitt
JT>
JT>
JT> ccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc
JT> program readdcd
JT>
JT> c TRAJECTORY VARIABLES
JT> character*4 header
JT> integer control(20)
JT> integer i,j,ntitle,natom,nf
JT> character*80 title(4)
JT> real*4 xx(natom)
JT> real*4 yy(natom)
JT> real*4 zz(natom)
JT>
JT> c OPEN TRAJECTORY FILE
JT> open(unit=56,file="tmp.dcd",status='old',form='unformatted')
JT> c READ IN HEADER AND CONTROL INFO
JT> read(56) header,control
JT> c READ IN THE TITLE LINES
JT> read(56) ntitle,(title(j),j=1,ntitle)
JT> c READ IN NUMBER OF ATOMS
JT> read(56) natom
JT>
JT> c LOOP OVER FRAMES
JT> do i=1,nf
JT> c READ IN COORDINATES
JT> read(56) xx
JT> read(56) yy
JT> read(56) zz
JT> enddo
JT>
JT> end program
JT> cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc
JT>
JT>
JT>
JT>
JT>
JT>
JT>
JT>
JT>
JT> ________________________________________
JT> From: owner-vmd-l_at_ks.uiuc.edu [owner-vmd-l_at_ks.uiuc.edu] On Behalf Of JONATHAN BLACK [nihilique_at_hotmail.com]
JT> Sent: Wednesday, October 22, 2008 5:30 AM
JT> To: vmd-l_at_ks.uiuc.edu
JT> Subject: vmd-l: dcd format
JT>
JT> actually i have to write a perl script that can do that as part of a larger program. i've found a fortran script that can do that but it will take me a while to "decode" how it does what it does since i don't know fortran. however i would like to ask for the format with which a dcd is packed. i've found this :
JT>
JT> #===DCD=FORMAT==============================================
JT> #HDR NSET ISTRT NSAVC 5-ZEROS NATOM-NFREAT DELTA 9-ZEROS
JT> #CORD files step1 step zeroes (zero) timestep zeroes
JT> #C*4 INT INT INT 5INT INT DOUBLE 9INT
JT> # [CHARACTER*20]
JT> #===========================================================
JT> #NTITLE TITLE
JT> #INT C*MAXTITL
JT> #C*2 C*80
JT> #===========================================================
JT> #NATOM
JT> #INT
JT> #===========================================================
JT> #X(I), I=1,NATOM (DOUBLE)
JT> #Y(I), I=1,NATOM
JT> #Z(I), I=1,NATOM
JT> #===========================================================
JT>
JT> as part of another script but i don't know if it's true and accurate, and even if it is, i can't understand some parts of it (eg : character*20 at 5th line). Any guidelines would be appreciated. Thank you...
JT>
JT>
JT>
JT> ----------------------------------------
JT> > From: gillescche_at_gmail.com
JT> > Subject: Re: vmd-l:
JT> > Date: Tue, 21 Oct 2008 08:22:01 -0400
JT> > To: nihilique_at_hotmail.com
JT> >
JT> > Since you want the information contained in the dcd file I would use
JT> > catdcd to write out the coordinates of a certain frame in some format
JT> > you can use such as pdb or xyz.
JT> >
JT> > You can find catdcd here:
JT> >
JT> > http://www.ks.uiuc.edu/Development/MDTools/catdcd/
JT> >
JT> > If you want to include the files in a script I know there are some
JT> > fortran reads that are given on the mailing list.
JT> >
JT> > Best
JT> > Chris
JT> >
JT> > On Oct 21, 2008, at 5:16 AM, JONATHAN BLACK wrote:
JT> >
JT> >>
JT> >> hi guys. i am a biologist and i haven't been programming in perl
JT> >> for a long time so my question may seem a little naive but i'm at a
JT> >> total loss and any help would be appreciated. i need a perl script
JT> >> to convert some binary files (.dcd if anyone knows what they are)
JT> >> to ascii or just extract the ascii that they contain. i tried plain
JT> >> read with no results. i searched the net and i only found this code :
JT> >>
JT> >> use warnings;
JT> >> use strict;
JT> >> @ARGV == 2 or die "usage: $0 in_filename out_filename\n";
JT> >> # get first argument, i.e filename my $in_filename = shift;
JT> >> print "You chose input \n";
JT> >> my $out_filename = shift;
JT> >> print "You chose output \n";
JT> >> #set infile to binary mode open INFILE, '', $out_filename or die
JT> >> "can't open $out_filename: $!";
JT> >> # read 8 bytes at a time
JT> >> $/ = \8;
JT> >> while ( )
JT> >> {
JT> >> print OUTFILE join( ', ', map sprintf( '0x%04x', $_ ), unpack 'S*',
JT> >> $_ + ), "\n";
JT> >> }
JT> >>
JT> >> but when i run it it comes out with four columns of something
JT> >> seeming like hexadecimal (0xe15d ...). The dcd files are about 700
JT> >> mb so i can't upload it for you to see it, but i managed to cut the
JT> >> first 80kb of it with hexedit and i upload it here: http://
JT> >> www.gigasize.com/get.php?d=7qfs5f331bf Does anyone have any idea or
JT> >> some plain guidelines because i really need get it done for my
JT> >> work. Thanks anyway!
JT> >> _________________________________________________________________
JT> >> You live life beyond your PC. So now Windows goes beyond your PC.
JT> >> http://clk.atdmt.com/MRT/go/115298556/direct/01/
JT> >
JT>
JT> _________________________________________________________________
JT> You live life beyond your PC. So now Windows goes beyond your PC.
JT> http://clk.atdmt.com/MRT/go/115298556/direct/01/
JT> _________________________________________________________________
JT> Stay organized with simple drag and drop from Windows Live Hotmail.
JT> http://windowslive.com/Explore/hotmail?ocid=TXT_TAGLM_WL_hotmail_102008
JT>

-- 
=======================================================================
Axel Kohlmeyer   akohlmey_at_cmm.chem.upenn.edu   http://www.cmm.upenn.edu
   Center for Molecular Modeling   --   University of Pennsylvania
Department of Chemistry, 231 S.34th Street, Philadelphia, PA 19104-6323
tel: 1-215-898-1582,  fax: 1-215-573-6233,  office-tel: 1-215-898-5425
=======================================================================
If you make something idiot-proof, the universe creates a better idiot.