From: Andrew Dalke (dalke_at_ks.uiuc.edu)
Date: Fri Feb 28 1997 - 16:38:21 CST

  In this week's VMDTech we will describe some of the ways you can
modify or access the secondary structure definitions. With a single
command you can make parts of your structure be any of 7 different
types. Following that, we'll describe a bit of the inner details of
how VMD reads and stores theose definitions. Finally, after
describing a bit about the STRIDE interface, we'll show you how to
modify VMD (using the command language) to set the HELIX and SHEET
definitions based on the PDB file.

  VMDTech, for the people that joined the VMD mailing list in the last
week, is a weekly column we write about some aspect of VMD about which
we believe many people may be interested in reading. This is not the
only reason for the mailing list, so feel free to bring up other
topics on the vmd-l list, or send you comments directly to the VMD
authors at vmd_at_ks.uiuc.edu.

                                                Andrew Dalke
                                                dalke_at_ks.uiuc.edu

 = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =

                Changing the Secondary Structure

  There are records in the PDB file that describe the secondary
structure but VMD doesn't use them. Instead, it dynamically calls
Dmitri Frishman's STRIDE program (for more information about it see
http://www.embl-heidelberg.de/stride/stride_info.html) to compute the
definition as needed.

  Note the expression "as needed"? Unless you provide a definition
yourself, VMD won't try to figure out the secondary structure until
you use a option that requires it; for instance, if you want to show
the secondary structure cartoon or color the system by structure type.

Of course, sometimes the definition it comes up with isn't quite the
one you want, so there is a command in VMD to override that
definition. The command is called "structure" and takes three
parameters:

  structure <molecule id> <atom selection text> <secondary structure>

The molecule id and selection text are pretty obvious from the
examples below. The secondary structure can be one of many things,
which are listed here:

  secondary structure valid name(s)
  - - - - - - - - - - - - - - - - -
   alpha helix helix, alpha_helix, H
   pi helix pi_helix, I
   3-10 helix helix_310, G

   extended beta sheet, extended_beta, E
   bridge beta bridge_beta, B

   turn turn, T
   random coil coil, C

So some examples are:

   # Make residues 5 to 23 of the top molecule be an alpha helix
   structure top "resid 5 to 23" helix
   # with a little pi-helix at the beginning
   structure top "resid 2 to 4" pi_helix

   # Make the prolines in molecule 9 be in a random coil
   structure 9 "resname PRO" coil
   # and add some beta sheets (these are the same type of structure)
   structure 9 "resid 50 to 58" sheet
   structure 9 "resid 62 to 71" E

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
                        The Details

The "structure" function is a wrapper for the standard text-based atom
selection accessor commands, which described in the manual. The first
example is really done as:

         set sel [atomselect top {resid 5 to 23}]
         $sel set structure helix

You can use the converse of "set", "get" to retrieve the information
and use it, or save it to a file. For instance, use the "structure"
command to tweak the secondary structure definition to the way you
want it, then (assuming you are still using the top molecule):

         # Each residue has only one secondary structure definition
         # for all the atoms, so get the definition for the CA atom.
         # Also, for now, only protein secondary structure makes any
         # sense.
         set sel [atomselect top "protein and name CA"]
         set sstruct [$sel get structure]
 
The info it prints out can be used to make a script. For example, for
the small system I'm testing things out on here, the value of sstruct
is:

{C} {C} {C} {C} {E} {E} {E} {E} {E} {E} {E} {E} {E} {E} {E} {E} {E}
{E} {E} {E} {E} {E} {E} {E} {E} {C} {C} {C} {C} {C} {C} {C} {C} {C}
{C} {C} {C} {C} {C} {C} {C} {C} {C} {C} {C} {C} {C} {C} {C} {C} {C}
{C} {C} {C} {C} {C} {C} {C}

and indicates that this system has a mix of coil and beta sheets. Now
feel free to modify the structure as you please. You might try
the following:

        structure top all coil
        structure top {residue % 8 < 6} helix

which sets the first 6 out of every eight residues to a helix and
everything else as a coil. If you are looking at your structure you
will have to press the "Apply Changes" button in the bottom left of
the Graphics form to update the display.

Once you've played around with it you can restore the original
secondary definition:

         ## If not already done (you don't have to do this again)
         # set sel [atomselect top "protein and name CA"]
         # set the definition by hand
         $sel set structure {{C} {C} {C} {C} {E} {E} {E} ...}
 
or use the variable
         $sel set structure $sstruct

Or you can use the Tcl commands to save to a file; something like:
         # save to a file
         set outfile [open tmp.dat w]
         puts $outfile $sstruct
         close $outfile

And later on restore it with the following:
         # restore
         set infile [open tmp.dat r]
         gets $outfile sstruct
         # set sel [atomselect top "protein and name CA"]
         $sel set structure $sstruct

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

                STRIDE and DSSP (and others?)

  When the secondary structure computation is needed, VMD uses a Tcl
routine to call the STRIDE executable, which is included (with
permission) in the standard distribution. The Tcl routine is located
in $VMDDIR/scripts/vmd/stride.tcl and is named
"vmd_calculate_structure". The full definition is:

proc vmd_calculate_structure {molid} {
    vmd_stride_full $molid
}

(***** BUG ALERT****: As I just found out, the current release of VMD
has a bug in that it calls "vmd_stride_full" directly instead of going
through "vmd_calculate_structure". To get the most current version of
the vmd_IRIX5 binary, go to ftp.ks.uiuc.edu in
/pub/group/dalke/vmd_IRIX5.Z, get it, uncompress, chmod +x, and
replace your version of vmd_IRIX5 in your VMDDIR directory *)

The vmd_stride_full then extracts all the protein atoms from the given
molecule and passes it off to STRIDE. STRIDE computes the secondary
structure so that routine reads the STRIDE output file and calls the
appropriate commands to set the secondary structure.

Now, I realize that people use DSSP (for more information see
http://www.sander.embl-heidelberg.de/dssp/) to Determine the Secondary
Structure of Proteins, but the output of that program was hard to
machine parse. For normal proteins it was okay, but I could never
figure out how to deal with modified residues so I decided to
interface with STRIDE, which is much also easier to use and cleaner in
implementation (and available in source and we could redistribute it
without a problem). However, if someone else want to write an
interface to DSSP, connecting it to VMD should be easy; just replace
the above function to call (eg) vmd_dssp_full with the appropriate
molecule id.

 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

                Secondary Structure Records from a PDB file

  In a similar vein, it is possible to post-process the PDB file to
extract the secondary structure definitions from it. Since this has
been a highly requested features, I decided to implement it for this
week's VMDTech. I have before me the documentation of the PDB file
format (available from http://www.pdb.bnl.gov/doc_help.html as "The
Protein Data Bank Contents Guide").

The record of interest to VMD are:

COLUMNS DATA TYPE FIELD DEFINITION
--------------------------------------------------------------
 1 - 6 Record name "HELIX "
20 Character initChainID
           Chain identifier for the chain containing this helix.
22 - 25 Integer initSeqNum
           Sequence number of the initial residue
32 Character endChainID
           Chain identifier for the chain containing this helix
34 - 37 Integer endSeqNum
           Sequence number of the terminal residue
39 - 40 Integer class Helix class
     (there are actually many types of helicies listed -- we'll call
them all "helix")

COLUMNS DATA TYPE FIELD DEFINITION
--------------------------------------------------------
 1 - 6 Record name "SHEET "
22 Character initChainID
                  Chain identifier of initial residue in strand
23 - 26 Integer initSeqNum
                  Sequence number of initial residue in strand
33 Character endChainID
                  Chain identifier of terminal residue
34 - 37 Integer endSeqNum
                  Sequence number of terminal residue

Here is a routine which, when given the name of a PDB file, will
extract the fields. All you have to do is copy and paste.

---------------------------------- start of copy and paste #1
# read the secondary structure records from a file and return
# the information in the form:
# {SSType chain1 resid1 chain2 resid2} {... }
proc vmd_read_pdb_ss {pdb_filename} {
    set response {}
    # open the PDB file
    set infile [open $pdb_filename r]
    # read until the end of file
    while {[gets $infile line]} {
        set str [string range $line 0 5]
        if {$str == "HELIX "} {
            set ss helix
            set chain1 [string range $line 19 19]
            set resid1 [string range $line 21 24]
            set chain2 [string range $line 31 31]
            set resid2 [string range $line 33 36]
            lappend response [list $ss $chain1 $resid1 $chain2 $resid2]
            continue
        }
        if {$str == "SHEET "} {
            # get the needed fields
            set ss sheet
            set chain1 [string range $line 21 21]
            set resid1 [string range $line 22 25]
            set chain2 [string range $line 32 32]
            set resid2 [string range $line 33 36]
            lappend response [list $ss $chain1 $resid1 $chain2 $resid2]
            continue
        }
        
        # also, if read ATOM then there are no more def's
        if {$str == "ATOM "} {
            break
        }
    }
    # close the file and return the list of info
    close $infile
    return $response
}

#### Now a driver to get info from this routine
# Return 0 if no information available
# Return 1 otherwise
proc vmd_use_pdb_ss {molid pdb_filename} {
    # get the list of information
    set ssdata [vmd_read_pdb_ss $pdb_filename]
    # was there data?
    if {$ssdata == {}} {return 0}
    # Go through each of the element
    foreach ele $ssdata {
        lassign $ele ss chain1 resid1 chain2 resid2
        # if the chains are " ", don't use them
        if {$chain1 == " "} {
            set seltext "protein and (resid $resid1 to $resid2)"
        } else {
            set seltext "chain $chain1 and (resid $resid1 to $resid2)"
        }
        # get the selection/ make it the right structure/ free it
        set sel [atomselect $molid $seltext]
        $sel set structure $ss
        $sel delete
    }
    # all done, so return that I did something
    return 1
}
---------------------------------- end of copy and paste #1

Amazingly enough, it seems to work. Suppose you have a PDB file
loaded, in my case, pdb2plv.pdb, which is a protomer of the
poliovirus. Suppose too it is the "top" molecule. Then call the
above routines as:

        vmd_use_pdb_ss top pdb2plv.pdb

(To be exact, this says
Wait a bit (I don't think I ever said Tcl was fast :( ) then look at
the structure in cartoon mode (remember, you may have to press the
"Apply Changes" button in the lower left of the Graphics form).

For the final bit, this method can be called automatically by VMD by
replacing the vmd_calculate_structure. Again, just cut and paste the
following into the text window:

--------------------------- start cut and paste #2

# read in stride.tcl for otherwise the autoloader messes
# things up by re-defining vmd_calculate_structure
source $env(VMDDIR)/scripts/vmd/stride.tcl

# first read from the PDB file. If that doesn't work,
# use STRIDE
proc vmd_calculate_structure {molid} {
    # get the input PDB filename if one was used
    set pdb_file {}
    if {[molinfo $molid get filetype] == "pdb"} {
        set pdb_file [molinfo $molid get filename]
    } else {
        if {[molinfo $molid get filetype2] == "pdb"} {
        set pdb_file [molinfo $molid get filename2]
        }
    }
    # was there a PDB file?
    if {$pdb_file != {} } {
        # try and get the structure info from that file
        if [vmd_use_pdb_ss $molid $pdb_file] {
            # if returned true, it worked
            return
        }
    }
    # if all else files (no PDB file or no info in it)
    # call the original routine
    puts "Can't get info from the structure file"
    vmd_stride_full $molid
}
---------------------------------- end of copy and paste #2

So
 0) remember to read the BUG ALERT message above!
 1) load your favorite PDB structure (mine is 9pti :)
 2) copy and paste these two sections of code into your text window
(or save them in a file and source it)
 3) choose the "cartoon" drawing method -- ta-da! (I hope)

 = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =

                                The End

  As usual, this VMDTech probably went into too much detail (but be
glad I didn't start describing how to animate the secondary structure
over time! :). On the other hand, these are highly useful script, so
you may be glad to know that we are in the middle of rewriting the VMD
script library to make it easier to use them and tailor them for your
own research.

                                                Andrew
                                                dalke_at_ks.uiuc.edu