From: Leandro Martínez (leandromartinez98_at_gmail.com)
Date: Wed Oct 22 2008 - 14:41:32 CDT

Thanks Axel for the explanations.

Anyway, just to not confuse people that, like me, only
wants to write custom analysis programs with fortran,
my experience is that in reading NAMD written dcd files:

1 - In fortran77, in "normal" machines (that is, those
you can buy at the store at the corner, which are the
only ones we have), the small code sent here should
work. And, if you are using g77 and a 64 bit machine, you
must compile the program with the "-m32" option.

2 - If the program is in fortran90, the code should work
as well, and the "-m32" option is not required, being
the code portable between 32 and 64 machines, at
least with gfortran or the intel fortran compiler.

I cannot say anything about perl or anything else.

Leandro.

On Wed, Oct 22, 2008 at 5:17 PM, Axel Kohlmeyer
<akohlmey_at_cmm.chem.upenn.edu> wrote:
> On Wed, 22 Oct 2008, Leandro Martínez wrote:
>
> LM> In fortran77, if you write a program to read a DCD file and you
> LM> want to use it in a 64 bit machine, you need to compile it
> LM> with the "-m32" flag.
>
> leandro,
>
> this is NOT correct. the fortran standard mandates that lenght
> of an INTEGER is generally the same as a the length of a REAL
> on 64-bit as well as on 32-bit machnes the real is a IEEE-754
> single precision real with 32-bit, so the integer has to be
> 32-bit as well. on cray vector machines (YMP and similar) the
> default real was double precision and so the integer was 64-bit.
>
> in any case some compilers used to violate that (e.g. there is
> a g95 version for x86_64 with default 64-bit integer) and some
> compilers allow to override this via flags (usually -i8 or so).
>
> in any case to adjust the record length byte you have to use
> yet another flag, if at all available. with gfortran for example
> this is -frecord-marker=4 or -frecord-marker=8.
>
> LM> Just to add some information on this, when using Fortran90 (or latter
> LM> versions), the problem of the size of the record is not present anymore,
>
> is is not the size of the record, but the size of the record marker,
> something that a fortran programmer never gets to mess with. see above.
>
> LM> because the size of the variables is defined independently on the
> LM> platform (that is, a double precision variable is the same size in
> LM> a 32 or 64 machine in Fortran90+). I think you can also deal with
>
> this was always true in fortran. this is what the standard requires.
> it is only that some compilers violate it. gfortran for example
> decided to make it dependent on the size of "off_t" which changes
> between platforms.
>
> LM> the endianess problem in these newer fortran versions, but that
> LM> will be more tricky as it is in the other languages (and I have never
> LM> faced that problem, probably because the scope of architectures
> LM> I use is limited).
>
> actually, if you know about the low level details it is possible to
> deal with it, but it requires programming in c/c++ and cannot be done
> automatically in fortran. since the original request was about decoding
> dcd in perl (which is even trickier, since perl usually reads everything
> as text and thus always has to "pack/unpack" binary data) it is better
> too look at the problem from the c/c++ perspective, instead of the
> fortran perspective.
>
> as i wrote before. the dcdplugin is proof that it can be done, but it
> was messy to handle all common (and uncommon cases) in an automatic way.
>
> cheers,
> axel.
>
>
>
>
> LM>
> LM> Leandro.
> LM>
> LM>
> LM>
> LM> On Wed, Oct 22, 2008 at 3:49 PM, Axel Kohlmeyer
> LM> <akohlmey_at_cmm.chem.upenn.edu> wrote:
> LM> > On Wed, 22 Oct 2008, Tibbitt, Jeffrey A. wrote:
> LM> >
> LM> >
> LM> > hi,
> LM> >
> LM> > please let me point out that there is a significant difference
> LM> > between reading binary files from fortran and from c/c++/perl/etc.
> LM> > the fortran code only works, if the platform reading the .dcd file
> LM> > is equivalent to the way it is written.
> LM> >
> LM> > .dcd is a binary format, and thus "endianness" of a machine matters.
> LM> > to explain: there are some machines that store a (32-bit) integer
> LM> > like 66100 as the four bytes 0, 1, 2, 52 and others that would do
> LM> > 52, 2, 1, 0. so you have to know which machine a dcd file was written
> LM> > on or what endiannes it was encoded into (some fortran compilers can
> LM> > swap on the fly based on compiler flags and environment variables).
> LM> >
> LM> > the really nasty part is, that fortran binary files do not only contain
> LM> > the data that is written, but a record length indicator that tells
> LM> > the fortran binary reader how large a block (a single write) exactly is.
> LM> >
> LM> > to explain: in fortran you can do on writing (in unformatted binary).
> LM> >
> LM> > write(8) a, b, c
> LM> > write(8) d, e, f
> LM> > write(8) (g(i),i=1,100)
> LM> >
> LM> > if you are only interested in a part of the data you can do:
> LM> >
> LM> > read(8) a, b
> LM> > read(8) d
> LM> > read(8) (g(i),i=1,50)
> LM> >
> LM> > and it will _still_ read what was in "a" into "a" and what was
> LM> > in "d" into "d" and so on.
> LM> >
> LM> > now the size of this record length indicator can be set manually
> LM> > for some compilers or is dependent on the platform (most have 32-bit
> LM> > even on 64-bit machines, some have 64-bit if their integer type is
> LM> > 64-bit by default). if you try to read a .dcd file with fortran, it
> LM> > will either work or fail miserably (frequenly with a "failed to seek
> LM> > to the end of block" or a "file too short" type error message or
> LM> > even a segmentation fault), depending on your platform. in c you
> LM> > have a bit more flexibility and the same applied to perl, but the
> LM> > correct handling of all these issues is tricky.
> LM> >
> LM> > please have a look at the dcdplugin source code of the molfile_plugin
> LM> > package that reads .dcd files for VMD (and after many, many tests is
> LM> > almost perfect at handling endiannes and record size length
> LM> > automatically and correctly...
> LM> >
> LM> > cheers,
> LM> > axel.
> LM> >
> LM> >
> LM> > JT> Jonathan,
> LM> > JT> My FORTRAN code for reading an entire trajectory is different than what you have and may be easier to follow. It is very short and well commented. I'll explain it. The lines beginning with a c are comments. The first line is just a my name for the program. The second group of lines declare the variables. Here you notice the * in different places. It specifies the sizes of the data types read in (here character*4 means a word 4 characters in length). It works similarly for integers, but it signifies bytes (I think) when applied to real numbers. The third group of lines open up the trajectory file and start reading it from the beginning. Other information is stored at the beginning before the coordinates. The fourth group of lines loop through the frames and read the coordinates.
> LM> > JT>
> LM> > JT> The first thing in a trajectory to read is a word (here I call it header), 4 characters in length, containing the characters: CORD or VELO. It should be one of these two only and defines the trajectory as containing coordinates or velocities.
> LM> > JT>
> LM> > JT> The second thing in the trajectory coming right after that is a 20 dimensional array of integers (called control). This contains information (control(1)=number of frames, control(2)=number of previous dynamic steps, control(3)=skip frequency for writing frames, control(4)=number of dynamics steps).
> LM> > JT>
> LM> > JT> The third thing read is an integer (called ntitle). This is the dimension of the 4th thing read. This dimension is the same as the number of title lines written out by the MD software.
> LM> > JT>
> LM> > JT> The fourth thing read is an array of characters of dimension ntitle (the dimension previously read). This array is called title here.
> LM> > JT>
> LM> > JT> The fifth thing read is an integer stating the number of atoms (called natom here).
> LM> > JT>
> LM> > JT> The sixth thing read are the coordinates. First frame coordinates come first. But all the x-coordinates come first, then all the ys then all the zs. First, all the x coordinates for the first frame are read. It is read in at once as an array natom in length (here I call it xx(natom)). Next, all the y coordinates of the first frame are read (yy(natom)). Then all the z coordinates are read (zz(natom). Repeat reading the coordinates with the second frame.
> LM> > JT>
> LM> > JT> I've gone on at length here, because I know it is invaluable to have your own code to get at the information in the DCD trajectory. And I've seen this question before. Note that this is used only for a simple DCD trajectory with no fixed atoms. They are a little different if they contain that sort of stuff. I am not an expert at DCD format, but I know that this format works for my trajectories. For fixed atoms and other stuff included someone else can help. But hopefully this helps you in constructing one in Perl for yourself.
> LM> > JT>
> LM> > JT> Jeff Tibbitt
> LM> > JT>
> LM> > JT>
> LM> > JT> ccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc
> LM> > JT> program readdcd
> LM> > JT>
> LM> > JT> c TRAJECTORY VARIABLES
> LM> > JT> character*4 header
> LM> > JT> integer control(20)
> LM> > JT> integer i,j,ntitle,natom,nf
> LM> > JT> character*80 title(4)
> LM> > JT> real*4 xx(natom)
> LM> > JT> real*4 yy(natom)
> LM> > JT> real*4 zz(natom)
> LM> > JT>
> LM> > JT> c OPEN TRAJECTORY FILE
> LM> > JT> open(unit=56,file="tmp.dcd",status='old',form='unformatted')
> LM> > JT> c READ IN HEADER AND CONTROL INFO
> LM> > JT> read(56) header,control
> LM> > JT> c READ IN THE TITLE LINES
> LM> > JT> read(56) ntitle,(title(j),j=1,ntitle)
> LM> > JT> c READ IN NUMBER OF ATOMS
> LM> > JT> read(56) natom
> LM> > JT>
> LM> > JT> c LOOP OVER FRAMES
> LM> > JT> do i=1,nf
> LM> > JT> c READ IN COORDINATES
> LM> > JT> read(56) xx
> LM> > JT> read(56) yy
> LM> > JT> read(56) zz
> LM> > JT> enddo
> LM> > JT>
> LM> > JT> end program
> LM> > JT> cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc
> LM> > JT>
> LM> > JT>
> LM> > JT>
> LM> > JT>
> LM> > JT>
> LM> > JT>
> LM> > JT>
> LM> > JT>
> LM> > JT>
> LM> > JT> ________________________________________
> LM> > JT> From: owner-vmd-l_at_ks.uiuc.edu [owner-vmd-l_at_ks.uiuc.edu] On Behalf Of JONATHAN BLACK [nihilique_at_hotmail.com]
> LM> > JT> Sent: Wednesday, October 22, 2008 5:30 AM
> LM> > JT> To: vmd-l_at_ks.uiuc.edu
> LM> > JT> Subject: vmd-l: dcd format
> LM> > JT>
> LM> > JT> actually i have to write a perl script that can do that as part of a larger program. i've found a fortran script that can do that but it will take me a while to "decode" how it does what it does since i don't know fortran. however i would like to ask for the format with which a dcd is packed. i've found this :
> LM> > JT>
> LM> > JT> #===DCD=FORMAT==============================================
> LM> > JT> #HDR NSET ISTRT NSAVC 5-ZEROS NATOM-NFREAT DELTA 9-ZEROS
> LM> > JT> #CORD files step1 step zeroes (zero) timestep zeroes
> LM> > JT> #C*4 INT INT INT 5INT INT DOUBLE 9INT
> LM> > JT> # [CHARACTER*20]
> LM> > JT> #===========================================================
> LM> > JT> #NTITLE TITLE
> LM> > JT> #INT C*MAXTITL
> LM> > JT> #C*2 C*80
> LM> > JT> #===========================================================
> LM> > JT> #NATOM
> LM> > JT> #INT
> LM> > JT> #===========================================================
> LM> > JT> #X(I), I=1,NATOM (DOUBLE)
> LM> > JT> #Y(I), I=1,NATOM
> LM> > JT> #Z(I), I=1,NATOM
> LM> > JT> #===========================================================
> LM> > JT>
> LM> > JT> as part of another script but i don't know if it's true and accurate, and even if it is, i can't understand some parts of it (eg : character*20 at 5th line). Any guidelines would be appreciated. Thank you...
> LM> > JT>
> LM> > JT>
> LM> > JT>
> LM> > JT> ----------------------------------------
> LM> > JT> > From: gillescche_at_gmail.com
> LM> > JT> > Subject: Re: vmd-l:
> LM> > JT> > Date: Tue, 21 Oct 2008 08:22:01 -0400
> LM> > JT> > To: nihilique_at_hotmail.com
> LM> > JT> >
> LM> > JT> > Since you want the information contained in the dcd file I would use
> LM> > JT> > catdcd to write out the coordinates of a certain frame in some format
> LM> > JT> > you can use such as pdb or xyz.
> LM> > JT> >
> LM> > JT> > You can find catdcd here:
> LM> > JT> >
> LM> > JT> > http://www.ks.uiuc.edu/Development/MDTools/catdcd/
> LM> > JT> >
> LM> > JT> > If you want to include the files in a script I know there are some
> LM> > JT> > fortran reads that are given on the mailing list.
> LM> > JT> >
> LM> > JT> > Best
> LM> > JT> > Chris
> LM> > JT> >
> LM> > JT> > On Oct 21, 2008, at 5:16 AM, JONATHAN BLACK wrote:
> LM> > JT> >
> LM> > JT> >>
> LM> > JT> >> hi guys. i am a biologist and i haven't been programming in perl
> LM> > JT> >> for a long time so my question may seem a little naive but i'm at a
> LM> > JT> >> total loss and any help would be appreciated. i need a perl script
> LM> > JT> >> to convert some binary files (.dcd if anyone knows what they are)
> LM> > JT> >> to ascii or just extract the ascii that they contain. i tried plain
> LM> > JT> >> read with no results. i searched the net and i only found this code :
> LM> > JT> >>
> LM> > JT> >> use warnings;
> LM> > JT> >> use strict;
> LM> > JT> >> @ARGV == 2 or die "usage: $0 in_filename out_filename\n";
> LM> > JT> >> # get first argument, i.e filename my $in_filename = shift;
> LM> > JT> >> print "You chose input \n";
> LM> > JT> >> my $out_filename = shift;
> LM> > JT> >> print "You chose output \n";
> LM> > JT> >> #set infile to binary mode open INFILE, '', $out_filename or die
> LM> > JT> >> "can't open $out_filename: $!";
> LM> > JT> >> # read 8 bytes at a time
> LM> > JT> >> $/ = \8;
> LM> > JT> >> while ( )
> LM> > JT> >> {
> LM> > JT> >> print OUTFILE join( ', ', map sprintf( '0x%04x', $_ ), unpack 'S*',
> LM> > JT> >> $_ + ), "\n";
> LM> > JT> >> }
> LM> > JT> >>
> LM> > JT> >> but when i run it it comes out with four columns of something
> LM> > JT> >> seeming like hexadecimal (0xe15d ...). The dcd files are about 700
> LM> > JT> >> mb so i can't upload it for you to see it, but i managed to cut the
> LM> > JT> >> first 80kb of it with hexedit and i upload it here: http://
> LM> > JT> >> www.gigasize.com/get.php?d=7qfs5f331bf Does anyone have any idea or
> LM> > JT> >> some plain guidelines because i really need get it done for my
> LM> > JT> >> work. Thanks anyway!
> LM> > JT> >> _________________________________________________________________
> LM> > JT> >> You live life beyond your PC. So now Windows goes beyond your PC.
> LM> > JT> >> http://clk.atdmt.com/MRT/go/115298556/direct/01/
> LM> > JT> >
> LM> > JT>
> LM> > JT> _________________________________________________________________
> LM> > JT> You live life beyond your PC. So now Windows goes beyond your PC.
> LM> > JT> http://clk.atdmt.com/MRT/go/115298556/direct/01/
> LM> > JT> _________________________________________________________________
> LM> > JT> Stay organized with simple drag and drop from Windows Live Hotmail.
> LM> > JT> http://windowslive.com/Explore/hotmail?ocid=TXT_TAGLM_WL_hotmail_102008
> LM> > JT>
> LM> >
> LM> > --
> LM> > =======================================================================
> LM> > Axel Kohlmeyer akohlmey_at_cmm.chem.upenn.edu http://www.cmm.upenn.edu
> LM> > Center for Molecular Modeling -- University of Pennsylvania
> LM> > Department of Chemistry, 231 S.34th Street, Philadelphia, PA 19104-6323
> LM> > tel: 1-215-898-1582, fax: 1-215-573-6233, office-tel: 1-215-898-5425
> LM> > =======================================================================
> LM> > If you make something idiot-proof, the universe creates a better idiot.
> LM> >
> LM>
>
> --
> =======================================================================
> Axel Kohlmeyer akohlmey_at_cmm.chem.upenn.edu http://www.cmm.upenn.edu
> Center for Molecular Modeling -- University of Pennsylvania
> Department of Chemistry, 231 S.34th Street, Philadelphia, PA 19104-6323
> tel: 1-215-898-1582, fax: 1-215-573-6233, office-tel: 1-215-898-5425
> =======================================================================
> If you make something idiot-proof, the universe creates a better idiot.