From: Andrew Dalke (dalke_at_ks.uiuc.edu)
Date: Tue Jun 10 1997 - 13:05:11 CDT

(for the bug, jump to ===)

I was recently asked a question about how to find two pairs of
residues that are within 5 A of each other as it might be possible to
mutate them to CYSs to (hopefully) stabilze the structure with a S-S
bond.

This specification isn't quite sufficient -- for a given CA you don't
want to select those atoms that are on neighboring residues of the
same chain, since it doesn't make sense to bond residue i with residue
i+3.

A VMD function to do something like that is:

proc find_possible_mutation_sites {{molid top} {distance 5}} {
    set protein_selection [atomselect $molid "protein and name CA"]
    foreach res_info [$protein_selection get {index resname resid chain pfrag}] {
        lassign $res_info index resname resid chain pfrag
        
        set ca_selection [atomselect $molid "protein and name CA \
and (within $distance of index $index) and not (pfrag == $pfrag \
and resid >= [expr $resid - 7] and resid <= [expr $resid + 7])"]

        if {[$ca_selection num] > 0} {
            puts "[list $index $resname $resid $chain is within $distance A of:"
            foreach atom [$ca_selection get {index resname resid chain}] {
                lassign $atom index resname resid chain
                puts " $index $resname $resid $chain"
            }
        }
    }
}

To use it, load a molecule, copy and paste this function into the VMD
terminal window, and run:

find_possible_mutation_sites

(It defaults to the "top" molecule and the default distance for the
search is 5 angstroms.)

The output looks something like:

39 ASN 5 is within 5 A of:
      1538 ASP 193
47 TYR 6 is within 5 A of:
      1546 PHE 194
59 GLY 7 is within 5 A of:
      339 TYR 42
      351 ASP 43
      359 VAL 44
      1557 LEU 195
63 VAL 8 is within 5 A of:
      366 VAL 45
      1557 LEU 195
      1565 PHE 196
   [...]

For some implementation notes, the "ca_selection" is:
  protein and name CA - this helps eliminate lipids and calcium ions

  pfrag == $pfrag - This is part of the step that gets rid of residues
that are within a few residues of the current protein on the same
chain. I used "pfrag" instead of "chain" because the chain value comes
from the PDB record, and I don't necessarily trust it.
  Imagine the case where the PDB file has 1/2 of a dimer. Someone
reads it in, makes the other half, and saves it, but doesn't change
the chain name. There will be two proteins with the same chain, and
if resid 55 of one protein could S-S bond with resid 55 of the other,
this script wouldn't detect that possibility since it sees the two
residues are on the same chain.

  === bug report ===

resid >= [expr $resid - 7]' and resid <= [expr $resid + 7]
  This is the result of a bug in the VMD atom selection language. A
cleaner version of it would be:
      resid [expr $resid - 7] to [expr $resid + 7]
However, VMD doesn't not allow you to give a keyword value with
a negative sign before it. It is not possible to say
   resid -3 to 8
 or
   x -5

Until that is fixed, there are two ways around it:

    resid '-3' to 8
 (putting it in single quotes forces the lexer to recognize it as a
string, but the selection engine, when it realizes that resid takes an
integer value, will autoconvert the string to an integer)

   (resid >= -3 and resid <= 8)
because the expression code does allow a unary negation.

   === end ==

There is also a slight problem if the output it is to be used by other programs.
If there is no chain identifier (and there isn't for the example I gave above)
then a space will be printed for that location. If this is to be automated,
there should be a check if the chain field is blank and instead use a '-'.
This is the normal convention.

                                                Andrew Dalke
                                                dalke_at_ks.uiuc.edu