From: Tristan Croll (tristan.croll_at_qut.edu.au)
Date: Wed Mar 19 2014 - 21:27:20 CDT

Allowing PSFgen to guess all coordinates for missing residues is going to get VERY messy for gaps any longer than 1-2 residues. I wonder if there is a decent loop modelling package that you can make scripted calls to?

Tristan Croll
Lecturer
Faculty of Science and Technology
Institute of Health and Biomedical Engineering
Queensland University of Technology
60 Musk Ave
Kelvin Grove QLD 4059 Australia
+61 7 3138 6443

This email and its attachments (if any) contain confidential information intended for use by the addressee and may be privileged. We do not waive any confidentiality, privilege or copyright associated with the email or the attachments. If you are not the intended addressee, you must not use, transmit, disclose or copy the email or any attachments. If you receive this email by mistake, please notify the sender immediately and delete the original email.

On 20 Mar 2014, at 11:14 am, "Ivan Gregoretti" <ivangreg_at_gmail.com<mailto:ivangreg_at_gmail.com>> wrote:

Hi Josh

Valid Question:
Why to avoid splitting the original pdb file into multiple, manually built partial pdb files?

Answer:
Because I need to create tools (scripts) that will allow a group of co-workers with less computational skills to start from non-ideal PDB structures and move quickly to a molecular dynamic simulations. (Needless to say, I need to create the scripts that will analyse the trajectories for them too.)

Unfortunately, I must expect that if they do it manually, they will make mistakes or overlook missing amino-acids and only realise it after a week of MD simulation. The solution is automation. Programming.

Thanks for asking. I am pretty sure you are not the only one wondering the same.

Ivan

Ivan Gregoretti, PhD
Bioinformatics

On Wed, Mar 19, 2014 at 7:39 PM, Josh Vermaas <vermaas2_at_illinois.edu<mailto:vermaas2_at_illinois.edu>> wrote:
Hi Ivan,

If you have access to CHARMM, I know that its structure building does allow the specification of ranges when dealing with these kinds of cases and it might make your life easier without exploding the number of files. Why is an explosion of pdb files bad though? You can just delete them once the full structure is built if they bug you, and since you know where the gaps are, its reasonably simple to script the generation of these temporary pdbs if you are so inclined.

-Josh

On 03/19/2014 05:58 PM, Ivan Gregoretti wrote:
Hi Josh,

Splitting the pdb into nterm.pdb and cterm.pdb does the trick. Thank you.

Now I am trying to figure out a way to do that without splitting the original file into two. Why? Because in the structures I will be working on I expect multiple missing amino-acids per monomer. And I will be working with many structures. I better try not to provoke and explosion of partial pdb files.

The solution would be (it may not exist) to call the pdb command with a range, something like

segment H {
pdb 1:134 4HBC_p.pdb
residue 135 SER
residue 136 SER
pdb 137:end 4HBC_p.pdb
}

I do not think that a call to "pdb" offers that option. I will keep trying, I have no choice honestly.

Thank you

Ivan

Ivan Gregoretti, PhD
Bioinformatics

On Wed, Mar 19, 2014 at 3:01 PM, Josh Vermaas <vermaas2_at_illinois.edu<mailto:vermaas2_at_illinois.edu>> wrote:
Hi Ivan,

The residue command operates within the segment. For example, this is what I did for a protein that was missing its N-terminal methionine:

segment EMU {
        residue 1 MET
        pdb 3VVJ-chainA.pdb
}

For your case, I think what would be cleanest would be to make two pdbs, one containing residues 1 to 134 (call it nterm.pdb), one containing the rest (call it cterm.pdb), and then do the following:

segment H {
pdb nterm.pdb
residue 135 SER
residue 136 SER
pdb cterm.pdb
}

The coordpdb command can use the original pdb, as I believe psfgen doesn't count on the residues being contiguous when adding coordinates to the topology (although I've been known to be wrong in the past).

Good luck!
-Josh Vermaas

On 03/19/2014 01:07 PM, Ivan Gregoretti wrote:
Hi everybody,

Can somebody share an example of how to use "residue" in psfgen to fill in amino-acids that are missing in a PDB?

I am working with structure 4HBC. The monomer named H has two serines in positions 135 and 136 that are missing in the positional information.

This is an excerpt fro the 4HBC.pdb wher you can see that positional information skips from resid 134 to 137.
...
ATOM 1017 N THR H 133 -60.259 26.798<tel:60.259%C2%A0%2026.798> -21.481 1.00 44.57 N
ATOM 1018 CA THR H 133 -60.887 25.485<tel:60.887%C2%A0%2025.485> -21.267 1.00 51.05 C
ATOM 1019 C THR H 133 -59.916 24.273 -21.475 1.00 54.98 C
ATOM 1020 O THR H 133 -59.389 24.074 -22.579 1.00 57.77 O
ATOM 1021 CB THR H 133 -62.200 25.381 -22.089 1.00 53.68 C
ATOM 1022 OG1 THR H 133 -62.808 24.096 -21.894 1.00 60.25 O
ATOM 1023 CG2 THR H 133 -61.958 25.650<tel:61.958%C2%A0%2025.650> -23.594 1.00 56.91 C
ATOM 1024 N PRO H 134 -59.670 23.473 -20.405 1.00 57.37 N
ATOM 1025 CA PRO H 134 -58.706 22.338 -20.445 1.00 56.94 C
ATOM 1026 C PRO H 134 -59.074 21.172 -21.373 1.00 62.62 C
ATOM 1027 O PRO H 134 -58.254 20.268 -21.584 1.00 62.58 O
ATOM 1028 CB PRO H 134 -58.654 21.855<tel:58.654%C2%A0%2021.855> -18.984 1.00 58.07 C
ATOM 1029 CG PRO H 134 -59.897 22.397 -18.347 1.00 58.81 C
ATOM 1030 CD PRO H 134 -60.205 23.690 -19.044 1.00 50.39 C
ATOM 1031 N THR H 137 -56.064 18.107 -18.639 1.00 50.63 N
ATOM 1032 CA THR H 137 -54.612 18.345 -18.474 1.00 44.58 C
ATOM 1033 C THR H 137 -54.202 19.838 -18.557 1.00 40.24 C
ATOM 1034 O THR H 137 -54.496 20.527<tel:54.496%C2%A0%2020.527> -19.551 1.00 40.54 O
ATOM 1035 CB THR H 137 -53.780 17.473 -19.454 1.00 52.37 C
ATOM 1036 OG1 THR H 137 -52.386 17.754 -19.298 1.00 53.79 O
ATOM 1037 CG2 THR H 137 -54.188 17.709<tel:54.188%C2%A0%2017.709> -20.924 1.00 53.97 C
ATOM 1038 N VAL H 138 -53.514 20.333 -17.524 1.00 31.81 N
ATOM 1039 CA VAL H 138 -53.117 21.759 -17.499 1.00 28.77 C
ATOM 1040 C VAL H 138 -51.619 21.892 -17.199 1.00 26.21 C
ATOM 1041 O VAL H 138 -51.065 21.118<tel:51.065%C2%A0%2021.118> -16.437 1.00 27.01 O
ATOM 1042 CB VAL H 138 -54.025 22.634<tel:54.025%C2%A0%2022.634> -16.556 1.00 29.39 C
ATOM 1043 CG1 VAL H 138 -53.992 22.156<tel:53.992%C2%A0%2022.156> -15.122 1.00 31.74 C
ATOM 1044 CG2 VAL H 138 -53.651 24.114 -16.583 1.00 27.45 C
...

I am trying to figure out how to modify my VMD script to fill in those missing serines.

At the moment I am only establishing the disulfide bonds and adding the hydrogen:

package require psfgen
topology top_all27_prot_lipid.inp
pdbalias residue HIS HSE
pdbalias atom ILE CD1 CD

segment H {pdb 4HBC_p_H.pdb}
patch DISU H:21 H:92
patch DISU H:142 H:195
patch DISU H:130 H:215
coordpdb 4HBC_p_H.pdb H
guesscoord
writepdb 4HBC_p_H_H.pdb
writepsf 4HBC_p_H_H.psf

quit

I tried inserting the statement

residue 135 SER H

into a number of positions in that script but I consistently get the same error: "no segment in progress for residue".

Could somebody enlighten me with a proper segment definition?

Thank you,

Ivan

I have a tcl script that takes a protein only pdb file and

Ivan Gregoretti, PhD
Bioinformatics