Next: save/load VMD state information Up: Advanced Script Writing Previous: Drawing a distance matrix

Analysis of PDB files

The VMD atom selection commands were prototypes in two in-house programs developed previously, pdblang, which showed the need for an easy-to-use language for manipulating structures, and parse which tested the usefulness of Tcl for analyzing large numbers of structures.

Specifically, the goal of the second project was to find what features were needed to write a script to analyze the propensity of various residues to be located near metal ions. Such a script would need to do the following:

given a list of representative PDB files, get them from the PDB
find if a metal ion is present
find the residues within 3, 5, and 7 Å from the ion
keep track of the results

The hardest part of the script is determining if a metal ion is present, as it is hard to distinguish between a CA calcium and a CA alpha carbon. That still hasn't been solved, though the method below should work for nearly all cases, except when ions are inadvertently bonded to other atoms. The new PDB definition has a new field for element type, but VMD does not yet recognize it.

There are several uncommon methods used here. They are described in the comment statements. Note that this uses the 'pdbload' command to ftp the structure files directly from the PDB ftp site.

# adds the given (floating point) value to the value
# if the value doesn't exist, sets it to 0
# This procedure is used because "incr" fails if the variable doesn't exist
proc myincr {var val} {
  regexp {^[^(]*} $var prefix
  global $prefix
  if {![info exists $var]} {
    set $var 0
  }
  set $var [eval "expr $var + $val"]
}

# given the atom index, find the ions within the given distance
# return 
proc find_nearby_residues {index ion distance} {
  set nearby [atomselect top "(within $distance of index $index) \
			and not index $index"]
 
 # I need to count each residue once, but I need to distinguish
 # two successive residues, so using just the residue name is not
 # enough.  "resname residue" is unique and, since atoms on the
 # same residue have successive indices, the luniq gets just one
 # of them.
 foreach res_pair [luniq [$nearby get {resname residue}]] {
   lassign $res_pair resname
   myincr count($ion,$distance,$resname) 1
 }
}


proc analyze_ion_propensity {pdblist metals} {
  global count
  # get each of the entries from the list of PDB files
  foreach entry $pdblist {
    # load them from the PDB ftp site
    mol pdbload $entry
    # go through the search list of metal names
    foreach ion $metals {
      set sel [atomselect top "name $ion and numbonds == 0"]
      foreach atom [$sel list] {
        # find neighbors for each of the test ranges
        find_nearby_residues $atom $ion 3
        find_nearby_residues $atom $ion 5
        find_nearby_residues $atom $ion 7
      }
      # save memory space by forcing the deletion of the 
      # temporary selection.  (Otherwise they wouldn't be purged
      # until the end of the procedure.)
      $sel delete
    }
    mol delete top
  }
  # the array ``count'' contains the data in the form
  # (ion name,distance,residue name)
  # For now just print the values for the normal residues within
  # 7A of a Zn.  Use a histogram of '*'
  set resnames {ALA ARG ASN ASP CYS GLN GLU GLY HIS ILE LEU LYS MET \
          PHE PRO SER THR TRP TYR VAL}
  foreach resname $resnames {
    puts -nonewline "$resname :"
    myincr count(ZN,7,$resname) 0
    puts [replicate * $count(ZN,7,$resname)]
  }
}

For the example, the files 1TRZ, 1LND, and 1EZM contain zincs.

vmd> unset count
vmd> analyze_ion_propensity {1trz 1lnd 1ezm} ZN
ALA :****
ARG :***
ASN :****
ASP :***
CYS :**
GLN :
GLU :******
GLY :
HIS :***********
ILE :
LEU :*
LYS :**
MET :
PHE :*
PRO :
SER :****
THR :
TRP :
TYR :***
VAL :****

As expected, histidines were one of the most common zinc neighbors. Of course, there will still be problems of missampling (for instance, overcounting molecules with zinc finger dimers) so you should be very sure of what you are doing when using this type of automated analysis.

Justin Gullingsrud
Tue Apr 6 09:22:39 CDT 1999