From: Axel Kohlmeyer (akohlmey_at_cmm.chem.upenn.edu)
Date: Thu Jun 18 2009 - 11:13:59 CDT

On Thu, 2009-06-18 at 09:59 -0500, John Stone wrote:
> Rob,
>
> On Thu, Jun 18, 2009 at 12:27:08AM -0700, Rob wrote:
> [...]
> > Remains the problem: what was the byte size of integers and
> > doubles on the computer, which generated the binary file?
> > (I guess this is a matter of just trying to read the file and check
> > whether the numbers make sense.....).
>
> Determining the integer size is usually the more common problem case.

nod. this happens particularly, when people download "g95" as this
offers two variants for x86_64 and ia64: 32-bit default integer
and 64-bit default integer, and many people that don't know that
bigger is not always better.

the fortran standard stipulates, that the size of INTEGER and REAL
has to be the same. they can be both 32-bit or both 64-bit (i don't
think you will run across the latter these days, since operating
a Cray T3E or Cray T90 is a bit expensive these days, considering
that a powerful desktop runs about as fast or faster). so the
default 64-bit integer actually violates the fortran standard,
unless DOUBLE PRECISION would actually be "quadruple precision",
and REAL the IEEE-754 double precision.
but for traditional gaussian based quantum chemistry codes,
64-bit integers are very helpful to conveniently index the huge
number of integrals on large problems without having to rewrite
the entire code.

the size of integer*4, integer*8, real*4, real*8, and their modern
style counter parts integer_4, integer_8 etc. is, of course, fixed.

most codes, and i believe abinit, too, use a fixed real*8 or
real(kind=8) definition. so the only variable you need to care
about are indeed the integers, so if one of the next records after
the version string contains a fixed number of integers, you can
deduce their size from the record length marker, as john said.

> Depends on how your code works and how the file is structured.
> If the file is structures with long runs of floating point values, or
> long runs of integer values all grouped together, then it's pretty
> easy to write a read routine that simply takes flags according to
> the size of the integer or floating point types it must read.

you can also do the seeking based on the record length indicators.
each block is of exactly that size, plus 2x the size of the record
length marker (there is one at the beginning and the end, the one
at the end is 0 on a few exotic platforms). this is the safest, but
a bit more complicated way to navigate through an unformatted fortran
output, but this is exactly what the fortran runtime does.

> > For now I assume that integers are 4 bytes and doubles 8.
> > May I assume that this is almost generic? Or generic enough?
>
> Assuming integers are 4 bytes is a good default, and you can wait until
> you get your hands on a file produced by a compilation that used 8 byte
> integers to worry about the altnernative case..

right, with the one exception of g95 where the default can be set
differently (and from extensive experience with cp2k and
quantum-espresso, this is the version the people most frequently
install), all other commonly used compilers have 4-bit integers,
unless you explicitly set a flag at compile time.

cheers,
   axel.

>
> Double precision FP numbers will always be 8 bytes, as I mention above.
>
> Cheers,
> John Stone
> vmd_at_ks.uiuc.edu
>

-- 
=======================================================================
Axel Kohlmeyer   akohlmey_at_cmm.chem.upenn.edu   http://www.cmm.upenn.edu
   Center for Molecular Modeling   --   University of Pennsylvania
Department of Chemistry, 231 S.34th Street, Philadelphia, PA 19104-6323
tel: 1-215-898-1582,  fax: 1-215-573-6233,  office-tel: 1-215-898-5425
=======================================================================
If you make something idiot-proof, the universe creates a better idiot.