From: Axel Kohlmeyer (akohlmey_at_cmm.chem.upenn.edu)
Date: Wed Oct 22 2008 - 15:59:08 CDT

On Wed, 22 Oct 2008, Leandro Martínez wrote:

LM> Thanks Axel for the explanations.
LM>
LM> Anyway, just to not confuse people that, like me, only
LM> wants to write custom analysis programs with fortran,
LM> my experience is that in reading NAMD written dcd files:
LM>
LM> 1 - In fortran77, in "normal" machines (that is, those
LM> you can buy at the store at the corner, which are the
LM> only ones we have), the small code sent here should
LM> work. And, if you are using g77 and a 64 bit machine, you
LM> must compile the program with the "-m32" option.
LM>
LM> 2 - If the program is in fortran90, the code should work
LM> as well, and the "-m32" option is not required, being
LM> the code portable between 32 and 64 machines, at
LM> least with gfortran or the intel fortran compiler.

with gfortran it always depends on the value for -frecord-marker
i.e. whether the compiler has been compiled with large file
support or not. this is on a 32-bit linux fedora laptop:

[akohlmey_at_zero ~]$ uname -p
i686
[akohlmey_at_zero ~]$ cat test.f90
program test
   open(20,form='unformatted')
   write(20) 66100 ! = 1*65536 +2*256 + 52*1 (52= 3*16 +4)
   close(20)
end program test

now when running this and looking at the binary output
as a sequence of hexdecimal numbers.

[akohlmey_at_zero ~]$ gfortran test.f90 && ./a.out
[akohlmey_at_zero ~]$ hexdump -f /tmp/hexdump-fmt-2183 fort.20
04 00 00 00 00 00 00 00 34 02 01 00 04 00 00 00 00 00 00 00

[akohlmey_at_zero ~]$ gfortran -frecord-marker=4 test.f90 && ./a.out
[akohlmey_at_zero ~]$ hexdump -f /tmp/hexdump-fmt-2183 fort.20
04 00 00 00 34 02 01 00 04 00 00 00

[akohlmey_at_zero ~]$ gfortran -frecord-marker=4 -fconvert=big-endian test.f90 && ./a.out
[akohlmey_at_zero ~]$ hexdump -f /tmp/hexdump-fmt-2183 fort.20
00 00 00 04 00 01 02 34 00 00 00 04

you can see that a) the machine is little-endian since
the integer written takes 4 bytes and thus the record
marker should be 4bytes and the "little" byte comes first
in the native version (note this is -m32 !) it has a 64-bit
record marker. only in the second compile, i get the 32-bit
record marker which uses 4 bytes instead of 8. the third
shows the binary data in big-endian ordering.

if i run this on a x86_64 machine, i get _identical_ output.
regardless of whether i use -m64 or -m32.

with intel fortran it is different, since intel always uses
32-bit record markers (so far).

whether you use fortran77 or fortran90 makes no difference,
but i suspect that you have compilers that were compiled with
different settings and that may have given you the wrong
impression. also the size of integer is always the
same unless the compiler is instructed to change it.

[akohlmey_at_zero ~]$ gfortran -frecord-marker=4 -fdefault-integer-8
test.f90 && ./a.out
[akohlmey_at_zero ~]$ hexdump -f /tmp/hexdump-fmt-2183 fort.20 000000 08 00
00 00 34 02 01 00 00 00 00 00 08 00 00 00

hope that clears matters up, finally.

cheers,
   axel.

LM>
LM> I cannot say anything about perl or anything else.
LM>
LM> Leandro.
LM>
LM> On Wed, Oct 22, 2008 at 5:17 PM, Axel Kohlmeyer
LM> <akohlmey_at_cmm.chem.upenn.edu> wrote:
LM> > On Wed, 22 Oct 2008, Leandro Martínez wrote:
LM> >
LM> > LM> In fortran77, if you write a program to read a DCD file and you
LM> > LM> want to use it in a 64 bit machine, you need to compile it
LM> > LM> with the "-m32" flag.
LM> >
LM> > leandro,
LM> >
LM> > this is NOT correct. the fortran standard mandates that lenght
LM> > of an INTEGER is generally the same as a the length of a REAL
LM> > on 64-bit as well as on 32-bit machnes the real is a IEEE-754
LM> > single precision real with 32-bit, so the integer has to be
LM> > 32-bit as well. on cray vector machines (YMP and similar) the
LM> > default real was double precision and so the integer was 64-bit.
LM> >
LM> > in any case some compilers used to violate that (e.g. there is
LM> > a g95 version for x86_64 with default 64-bit integer) and some
LM> > compilers allow to override this via flags (usually -i8 or so).
LM> >
LM> > in any case to adjust the record length byte you have to use
LM> > yet another flag, if at all available. with gfortran for example
LM> > this is -frecord-marker=4 or -frecord-marker=8.
LM> >
LM> > LM> Just to add some information on this, when using Fortran90 (or latter
LM> > LM> versions), the problem of the size of the record is not present anymore,
LM> >
LM> > is is not the size of the record, but the size of the record marker,
LM> > something that a fortran programmer never gets to mess with. see above.
LM> >
LM> > LM> because the size of the variables is defined independently on the
LM> > LM> platform (that is, a double precision variable is the same size in
LM> > LM> a 32 or 64 machine in Fortran90+). I think you can also deal with
LM> >
LM> > this was always true in fortran. this is what the standard requires.
LM> > it is only that some compilers violate it. gfortran for example
LM> > decided to make it dependent on the size of "off_t" which changes
LM> > between platforms.
LM> >
LM> > LM> the endianess problem in these newer fortran versions, but that
LM> > LM> will be more tricky as it is in the other languages (and I have never
LM> > LM> faced that problem, probably because the scope of architectures
LM> > LM> I use is limited).
LM> >
LM> > actually, if you know about the low level details it is possible to
LM> > deal with it, but it requires programming in c/c++ and cannot be done
LM> > automatically in fortran. since the original request was about decoding
LM> > dcd in perl (which is even trickier, since perl usually reads everything
LM> > as text and thus always has to "pack/unpack" binary data) it is better
LM> > too look at the problem from the c/c++ perspective, instead of the
LM> > fortran perspective.
LM> >
LM> > as i wrote before. the dcdplugin is proof that it can be done, but it
LM> > was messy to handle all common (and uncommon cases) in an automatic way.
LM> >
LM> > cheers,
LM> > axel.
LM> >
LM> >
LM> >
LM> >
LM> > LM>
LM> > LM> Leandro.
LM> > LM>
LM> > LM>
LM> > LM>
LM> > LM> On Wed, Oct 22, 2008 at 3:49 PM, Axel Kohlmeyer
LM> > LM> <akohlmey_at_cmm.chem.upenn.edu> wrote:
LM> > LM> > On Wed, 22 Oct 2008, Tibbitt, Jeffrey A. wrote:
LM> > LM> >
LM> > LM> >
LM> > LM> > hi,
LM> > LM> >
LM> > LM> > please let me point out that there is a significant difference
LM> > LM> > between reading binary files from fortran and from c/c++/perl/etc.
LM> > LM> > the fortran code only works, if the platform reading the .dcd file
LM> > LM> > is equivalent to the way it is written.
LM> > LM> >
LM> > LM> > .dcd is a binary format, and thus "endianness" of a machine matters.
LM> > LM> > to explain: there are some machines that store a (32-bit) integer
LM> > LM> > like 66100 as the four bytes 0, 1, 2, 52 and others that would do
LM> > LM> > 52, 2, 1, 0. so you have to know which machine a dcd file was written
LM> > LM> > on or what endiannes it was encoded into (some fortran compilers can
LM> > LM> > swap on the fly based on compiler flags and environment variables).
LM> > LM> >
LM> > LM> > the really nasty part is, that fortran binary files do not only contain
LM> > LM> > the data that is written, but a record length indicator that tells
LM> > LM> > the fortran binary reader how large a block (a single write) exactly is.
LM> > LM> >
LM> > LM> > to explain: in fortran you can do on writing (in unformatted binary).
LM> > LM> >
LM> > LM> > write(8) a, b, c
LM> > LM> > write(8) d, e, f
LM> > LM> > write(8) (g(i),i=1,100)
LM> > LM> >
LM> > LM> > if you are only interested in a part of the data you can do:
LM> > LM> >
LM> > LM> > read(8) a, b
LM> > LM> > read(8) d
LM> > LM> > read(8) (g(i),i=1,50)
LM> > LM> >
LM> > LM> > and it will _still_ read what was in "a" into "a" and what was
LM> > LM> > in "d" into "d" and so on.
LM> > LM> >
LM> > LM> > now the size of this record length indicator can be set manually
LM> > LM> > for some compilers or is dependent on the platform (most have 32-bit
LM> > LM> > even on 64-bit machines, some have 64-bit if their integer type is
LM> > LM> > 64-bit by default). if you try to read a .dcd file with fortran, it
LM> > LM> > will either work or fail miserably (frequenly with a "failed to seek
LM> > LM> > to the end of block" or a "file too short" type error message or
LM> > LM> > even a segmentation fault), depending on your platform. in c you
LM> > LM> > have a bit more flexibility and the same applied to perl, but the
LM> > LM> > correct handling of all these issues is tricky.
LM> > LM> >
LM> > LM> > please have a look at the dcdplugin source code of the molfile_plugin
LM> > LM> > package that reads .dcd files for VMD (and after many, many tests is
LM> > LM> > almost perfect at handling endiannes and record size length
LM> > LM> > automatically and correctly...
LM> > LM> >
LM> > LM> > cheers,
LM> > LM> > axel.
LM> > LM> >
LM> > LM> >
LM> > LM> > JT> Jonathan,
LM> > LM> > JT> My FORTRAN code for reading an entire trajectory is different than what you have and may be easier to follow. It is very short and well commented. I'll explain it. The lines beginning with a c are comments. The first line is just a my name for the program. The second group of lines declare the variables. Here you notice the * in different places. It specifies the sizes of the data types read in (here character*4 means a word 4 characters in length). It works similarly for integers, but it signifies bytes (I think) when applied to real numbers. The third group of lines open up the trajectory file and start reading it from the beginning. Other information is stored at the beginning before the coordinates. The fourth group of lines loop through the frames and read the coordinates.
LM> > LM> > JT>
LM> > LM> > JT> The first thing in a trajectory to read is a word (here I call it header), 4 characters in length, containing the characters: CORD or VELO. It should be one of these two only and defines the trajectory as containing coordinates or velocities.
LM> > LM> > JT>
LM> > LM> > JT> The second thing in the trajectory coming right after that is a 20 dimensional array of integers (called control). This contains information (control(1)=number of frames, control(2)=number of previous dynamic steps, control(3)=skip frequency for writing frames, control(4)=number of dynamics steps).
LM> > LM> > JT>
LM> > LM> > JT> The third thing read is an integer (called ntitle). This is the dimension of the 4th thing read. This dimension is the same as the number of title lines written out by the MD software.
LM> > LM> > JT>
LM> > LM> > JT> The fourth thing read is an array of characters of dimension ntitle (the dimension previously read). This array is called title here.
LM> > LM> > JT>
LM> > LM> > JT> The fifth thing read is an integer stating the number of atoms (called natom here).
LM> > LM> > JT>
LM> > LM> > JT> The sixth thing read are the coordinates. First frame coordinates come first. But all the x-coordinates come first, then all the ys then all the zs. First, all the x coordinates for the first frame are read. It is read in at once as an array natom in length (here I call it xx(natom)). Next, all the y coordinates of the first frame are read (yy(natom)). Then all the z coordinates are read (zz(natom). Repeat reading the coordinates with the second frame.
LM> > LM> > JT>
LM> > LM> > JT> I've gone on at length here, because I know it is invaluable to have your own code to get at the information in the DCD trajectory. And I've seen this question before. Note that this is used only for a simple DCD trajectory with no fixed atoms. They are a little different if they contain that sort of stuff. I am not an expert at DCD format, but I know that this format works for my trajectories. For fixed atoms and other stuff included someone else can help. But hopefully this helps you in constructing one in Perl for yourself.
LM> > LM> > JT>
LM> > LM> > JT> Jeff Tibbitt
LM> > LM> > JT>
LM> > LM> > JT>
LM> > LM> > JT> ccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc
LM> > LM> > JT> program readdcd
LM> > LM> > JT>
LM> > LM> > JT> c TRAJECTORY VARIABLES
LM> > LM> > JT> character*4 header
LM> > LM> > JT> integer control(20)
LM> > LM> > JT> integer i,j,ntitle,natom,nf
LM> > LM> > JT> character*80 title(4)
LM> > LM> > JT> real*4 xx(natom)
LM> > LM> > JT> real*4 yy(natom)
LM> > LM> > JT> real*4 zz(natom)
LM> > LM> > JT>
LM> > LM> > JT> c OPEN TRAJECTORY FILE
LM> > LM> > JT> open(unit=56,file="tmp.dcd",status='old',form='unformatted')
LM> > LM> > JT> c READ IN HEADER AND CONTROL INFO
LM> > LM> > JT> read(56) header,control
LM> > LM> > JT> c READ IN THE TITLE LINES
LM> > LM> > JT> read(56) ntitle,(title(j),j=1,ntitle)
LM> > LM> > JT> c READ IN NUMBER OF ATOMS
LM> > LM> > JT> read(56) natom
LM> > LM> > JT>
LM> > LM> > JT> c LOOP OVER FRAMES
LM> > LM> > JT> do i=1,nf
LM> > LM> > JT> c READ IN COORDINATES
LM> > LM> > JT> read(56) xx
LM> > LM> > JT> read(56) yy
LM> > LM> > JT> read(56) zz
LM> > LM> > JT> enddo
LM> > LM> > JT>
LM> > LM> > JT> end program
LM> > LM> > JT> cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc
LM> > LM> > JT>
LM> > LM> > JT>
LM> > LM> > JT>
LM> > LM> > JT>
LM> > LM> > JT>
LM> > LM> > JT>
LM> > LM> > JT>
LM> > LM> > JT>
LM> > LM> > JT>
LM> > LM> > JT> ________________________________________
LM> > LM> > JT> From: owner-vmd-l_at_ks.uiuc.edu [owner-vmd-l_at_ks.uiuc.edu] On Behalf Of JONATHAN BLACK [nihilique_at_hotmail.com]
LM> > LM> > JT> Sent: Wednesday, October 22, 2008 5:30 AM
LM> > LM> > JT> To: vmd-l_at_ks.uiuc.edu
LM> > LM> > JT> Subject: vmd-l: dcd format
LM> > LM> > JT>
LM> > LM> > JT> actually i have to write a perl script that can do that as part of a larger program. i've found a fortran script that can do that but it will take me a while to "decode" how it does what it does since i don't know fortran. however i would like to ask for the format with which a dcd is packed. i've found this :
LM> > LM> > JT>
LM> > LM> > JT> #===DCD=FORMAT==============================================
LM> > LM> > JT> #HDR NSET ISTRT NSAVC 5-ZEROS NATOM-NFREAT DELTA 9-ZEROS
LM> > LM> > JT> #CORD files step1 step zeroes (zero) timestep zeroes
LM> > LM> > JT> #C*4 INT INT INT 5INT INT DOUBLE 9INT
LM> > LM> > JT> # [CHARACTER*20]
LM> > LM> > JT> #===========================================================
LM> > LM> > JT> #NTITLE TITLE
LM> > LM> > JT> #INT C*MAXTITL
LM> > LM> > JT> #C*2 C*80
LM> > LM> > JT> #===========================================================
LM> > LM> > JT> #NATOM
LM> > LM> > JT> #INT
LM> > LM> > JT> #===========================================================
LM> > LM> > JT> #X(I), I=1,NATOM (DOUBLE)
LM> > LM> > JT> #Y(I), I=1,NATOM
LM> > LM> > JT> #Z(I), I=1,NATOM
LM> > LM> > JT> #===========================================================
LM> > LM> > JT>
LM> > LM> > JT> as part of another script but i don't know if it's true and accurate, and even if it is, i can't understand some parts of it (eg : character*20 at 5th line). Any guidelines would be appreciated. Thank you...
LM> > LM> > JT>
LM> > LM> > JT>
LM> > LM> > JT>
LM> > LM> > JT> ----------------------------------------
LM> > LM> > JT> > From: gillescche_at_gmail.com
LM> > LM> > JT> > Subject: Re: vmd-l:
LM> > LM> > JT> > Date: Tue, 21 Oct 2008 08:22:01 -0400
LM> > LM> > JT> > To: nihilique_at_hotmail.com
LM> > LM> > JT> >
LM> > LM> > JT> > Since you want the information contained in the dcd file I would use
LM> > LM> > JT> > catdcd to write out the coordinates of a certain frame in some format
LM> > LM> > JT> > you can use such as pdb or xyz.
LM> > LM> > JT> >
LM> > LM> > JT> > You can find catdcd here:
LM> > LM> > JT> >
LM> > LM> > JT> > http://www.ks.uiuc.edu/Development/MDTools/catdcd/
LM> > LM> > JT> >
LM> > LM> > JT> > If you want to include the files in a script I know there are some
LM> > LM> > JT> > fortran reads that are given on the mailing list.
LM> > LM> > JT> >
LM> > LM> > JT> > Best
LM> > LM> > JT> > Chris
LM> > LM> > JT> >
LM> > LM> > JT> > On Oct 21, 2008, at 5:16 AM, JONATHAN BLACK wrote:
LM> > LM> > JT> >
LM> > LM> > JT> >>
LM> > LM> > JT> >> hi guys. i am a biologist and i haven't been programming in perl
LM> > LM> > JT> >> for a long time so my question may seem a little naive but i'm at a
LM> > LM> > JT> >> total loss and any help would be appreciated. i need a perl script
LM> > LM> > JT> >> to convert some binary files (.dcd if anyone knows what they are)
LM> > LM> > JT> >> to ascii or just extract the ascii that they contain. i tried plain
LM> > LM> > JT> >> read with no results. i searched the net and i only found this code :
LM> > LM> > JT> >>
LM> > LM> > JT> >> use warnings;
LM> > LM> > JT> >> use strict;
LM> > LM> > JT> >> @ARGV == 2 or die "usage: $0 in_filename out_filename\n";
LM> > LM> > JT> >> # get first argument, i.e filename my $in_filename = shift;
LM> > LM> > JT> >> print "You chose input \n";
LM> > LM> > JT> >> my $out_filename = shift;
LM> > LM> > JT> >> print "You chose output \n";
LM> > LM> > JT> >> #set infile to binary mode open INFILE, '', $out_filename or die
LM> > LM> > JT> >> "can't open $out_filename: $!";
LM> > LM> > JT> >> # read 8 bytes at a time
LM> > LM> > JT> >> $/ = \8;
LM> > LM> > JT> >> while ( )
LM> > LM> > JT> >> {
LM> > LM> > JT> >> print OUTFILE join( ', ', map sprintf( '0x%04x', $_ ), unpack 'S*',
LM> > LM> > JT> >> $_ + ), "\n";
LM> > LM> > JT> >> }
LM> > LM> > JT> >>
LM> > LM> > JT> >> but when i run it it comes out with four columns of something
LM> > LM> > JT> >> seeming like hexadecimal (0xe15d ...). The dcd files are about 700
LM> > LM> > JT> >> mb so i can't upload it for you to see it, but i managed to cut the
LM> > LM> > JT> >> first 80kb of it with hexedit and i upload it here: http://
LM> > LM> > JT> >> www.gigasize.com/get.php?d=7qfs5f331bf Does anyone have any idea or
LM> > LM> > JT> >> some plain guidelines because i really need get it done for my
LM> > LM> > JT> >> work. Thanks anyway!
LM> > LM> > JT> >> _________________________________________________________________
LM> > LM> > JT> >> You live life beyond your PC. So now Windows goes beyond your PC.
LM> > LM> > JT> >> http://clk.atdmt.com/MRT/go/115298556/direct/01/
LM> > LM> > JT> >
LM> > LM> > JT>
LM> > LM> > JT> _________________________________________________________________
LM> > LM> > JT> You live life beyond your PC. So now Windows goes beyond your PC.
LM> > LM> > JT> http://clk.atdmt.com/MRT/go/115298556/direct/01/
LM> > LM> > JT> _________________________________________________________________
LM> > LM> > JT> Stay organized with simple drag and drop from Windows Live Hotmail.
LM> > LM> > JT> http://windowslive.com/Explore/hotmail?ocid=TXT_TAGLM_WL_hotmail_102008
LM> > LM> > JT>
LM> > LM> >
LM> > LM> > --
LM> > LM> > =======================================================================
LM> > LM> > Axel Kohlmeyer akohlmey_at_cmm.chem.upenn.edu http://www.cmm.upenn.edu
LM> > LM> > Center for Molecular Modeling -- University of Pennsylvania
LM> > LM> > Department of Chemistry, 231 S.34th Street, Philadelphia, PA 19104-6323
LM> > LM> > tel: 1-215-898-1582, fax: 1-215-573-6233, office-tel: 1-215-898-5425
LM> > LM> > =======================================================================
LM> > LM> > If you make something idiot-proof, the universe creates a better idiot.
LM> > LM> >
LM> > LM>
LM> >
LM> > --
LM> > =======================================================================
LM> > Axel Kohlmeyer akohlmey_at_cmm.chem.upenn.edu http://www.cmm.upenn.edu
LM> > Center for Molecular Modeling -- University of Pennsylvania
LM> > Department of Chemistry, 231 S.34th Street, Philadelphia, PA 19104-6323
LM> > tel: 1-215-898-1582, fax: 1-215-573-6233, office-tel: 1-215-898-5425
LM> > =======================================================================
LM> > If you make something idiot-proof, the universe creates a better idiot.
LM>

-- 
=======================================================================
Axel Kohlmeyer   akohlmey_at_cmm.chem.upenn.edu   http://www.cmm.upenn.edu
   Center for Molecular Modeling   --   University of Pennsylvania
Department of Chemistry, 231 S.34th Street, Philadelphia, PA 19104-6323
tel: 1-215-898-1582,  fax: 1-215-573-6233,  office-tel: 1-215-898-5425
=======================================================================
If you make something idiot-proof, the universe creates a better idiot.