Next: Modeling amino acid insertions Up: Rosetta/MDFF Tutorial Previous: Required software Contents

Subsections

Folding protein termini using ModelMaker

Here, we predict a structural model for the truncated C-terminal tail from amino acid 217 to 306 of chain B of 4OCM. Create a folder named terminus for this section and navigate to it. Sample input and output files are provided in the 3.terminus folder in the tutorial folder. First you need to obtain a template structure which should be completed by modeling the structurally unresolved segments. In this tutorial we complete Rpn11, the deubiquitilyation subunit of the 26S proteasome. As template for modeling, we use chain B of PDB structure 4OCM. Go to the Protein Data Bank (PDB, http://www.rcsb.org/pdb/home/home.do) and download the PDB structure with the PDB-ID 4OCM. Alternatively you can download the PDB structure directly through VMD by going to File $\rightarrow$ New Molecule, type 4OCM in the Filename box and click the Load button. The PDB structure 4OCM contains two Rpn8/Rpn11 dimers in the unit cell. For now we only need one Rpn11, so we use chain B. In order to create a PDB file containing only chain B run the following command in the TK console (Extensions $\rightarrow$ Tk Console):

[atomselect top "chain B and protein"] writepdb rpn11_yeast_4ocm.pdb

Structure prediction

In order to use ModelMaker for structure prediction you first need to modify the configuration file.

1

Preparing the configuration file:
Copy the prepared configuration file fold_rpn11_terminus.tcl from the tutorial files to your terminus folder, open it in a text editor and change the following variables to fit your workstation configuration.

variable	description
`packagePath`	path of the ModelMaker plugin files
`vmdexe`	path of your vmd executable
`gnuplotexe`	path of your gnuplot executable
`rosettapath`	directory path containing the Rosetta binaries
`rosettaDBpath`	Rosetta database path
`platform`	`"linuxgccrelease"` or `"macosclangrelease"`

2

Obtaining the target amino acid residue sequence:
As we are only going to fold the C-terminus of Rpn11, we will discard the missing N-terminus in our following procedures. Go to the uniprot website (http://www.uniprot.org) and download the fasta sequence of Rpn11 in yeast (UniprotID: P43588). Create a folder called input. With a text editor, remove the first 22 amino acids from the sequence and save the file in the input folder as rpn11_yeast_23-306.fasta. To facilitate the sub-sequence extraction, a Python script subrange.py is provided in the scripts folder, that you can use to get the amino acid codes for a given range. Simply copy the script to your working directory and make the following changes:

line 3: start defines the start of the sequence
line 4: end defines the end of the sequence
line 6: name defines the input name of the given fasta sequence

Running

python subrange.py

creates a file <name>_<start>-<end>.fasta from the input fasta sequence.

3: Generating the fragment file library:
Use the Robetta server to generate two library files containing internal coordinates for the target sequence structure. The server performs a homolgy search algorithm in the PDB Data Bank based on a running window of 3 and 9 amino acid length and produce two files (3mer and 9mer) presenting the best 200 results for each window. To do so, go to http://www.robetta.org and set an academic user account. Submit the target sequence rpn11_yeast_23-306.fasta to the Fragment file server and as soon as the search is finished you will receive an email with a link to download the results. Save the 3mer and 9mer files as rpn11_yeast_23-306.frag3 and rpn11_yeast_23-306.frag9 in your input folder.

4

Building one complete model for the target amino acid sequence:
Create a full_length_model folder and copy the file rpn11_yeast_4ocm.pdb to it. Furthermore, copy the file run_full_length_model.sh from the tutorial files to your full_length_model folder. Make the following changes in run_full_length_model.sh:

line 9: rosetta=/path/to/rosetta/bin
indicates the path to your Rosetta binary directory.
line 11: platform="linuxgccrelease"
supplies the platform Rosetta has been built on, thus either
platform="linuxgccrelease" or
platform="macosclangrelease" is accepted.

Run run_full_length_model.sh to generate a complete template model. Afterwards, rename the output file rpn11_yeast_4ocm.pdb_full_length.pdb to rpn11_yeast_23-306_complete.pdb.

./run_full_length_model.sh

mv rpn11_yeast_4ocm.pdb_full_length.pdb rpn11_yeast_23-306_complete.pdb

Rosetta's full length modell application yields PDB files that do not keep the original amino acid numbering. To keep it, copy the script renumber.tcl from the scripts folder to the full_length_model folder. In the mols list, you can specify the file name of the input PDB file, in this case, the line should look like:

set mols [list rpn11_yeast_23-306_complete.pdb]

In the next line, you can set the newstart varible to 23, so that the output PDB file starts its numbering from 23, as in the fasta file. Run

vmd -dispdev text -e renumber.tcl

to get the output file rpn11_yeast_23-306_complete-numb.pdb, then replace the old file with the new one:
mv rpn11_yeast_23-306_complete-numb.pdb rpn11_yeast_23-306_complete.pdb.

5

Running Rosetta from VMD:
Now that we have prepared all necessary input files, we can complete the configuration file to finally run Rosetta from VMD to predict the C-terminal structure of Rpn11. The recommendation from the literature is to predict between 5,000 and 20,000 modelss. In our test case we predict only 100 structures for demonstration purpose. We use RosettaScripts with a Brokered Environment and execute the classic Rosetta de novo protocol upon it. The ModelMaker plugin can handle input file generation for Rosetta automatically, so we just need to add a few lines to fold_rpn11_terminus.tcl.

line 14: set nstruct 100
creates a variable that is later on passed to Rosetta and indicates the number of structures to generate.
line 15: set bestN 25
shows the number of highest scored structures that should be taken into account for analysis.
line 18: set tempPath [pwd]/full_length_model
points to the path containing the full length Rpn11 model, created in the step before.
line 20: set tempdir [pwd]/full_length_model
points to the path containing the PDB file for alignment in the analysis step. If you have an alternative PDB file you want to align the predicted models to, you can specify it by setting tempdir. In our test case, we simply use the full length model for alignment as well.
line 22: set comps [list [list ss 196 284 "A"]]
sets a list to define analysis tasks for the predicted structures. The ModelMaker packages can take a list of several analysis tasks. Here, we only define a secondary structure analysis task, that scans the amino acids 196 to 284 of chain A for the average secondary structure. The secondary structure analysis task is configured by the list in $comps, where the single elements stand for: [list ss start residue ID end residue ID chain]
line 25: start_rosetta_abinitio ...
starts a Rosetta structure prediction task with the given arguments in Tab. 2.
line 27: analyze_abinitio ...
starts the analysis procedure. The argument configuration is explained in Tab. 3. This command automatically calls Rosetta energy scoring, aligns the best ($bestN) structures and performs the analysis tasks defined in $comps.

Execute the configuration file in VMD text mode and wait for the tasks to finish. Depending on the number of structures to generate, this may take a while. vmd -dispdev text -e fold_rpn11_terminus.tcl If no error occurs, go to the folder called rosetta_output_rpn11_terminus containing the results of your run. Table 2: Rosetta ab initio procedure arguments Table 3: Rosetta ab initio analysis procedure arguments Interactive fitting to a cryo-EM density with iMDFF and QwikMD Create a new folder named mdff in your working directory. 1 Aligning the predicted model with the cryo-EM density map: In order to perform interartive molecular dynamics flexible fitting you first need to place the modeled structure in the right position inside the density map. Download the cryo-EM density map of the 26S proteasome (EMDB-ID 2594) from the electron microscopy database (http://www.ebi.ac.uk/pdbe/emdb/): emd_2594.map. In this special case there exist already a near-atomic structural model (PDB-ID 4CR2) for this map. Download the structure with the PDB-ID 4CR2 from the PDB (http://www.rcsb.org/pdb/home/home.do). Use align_segments.tcl to align the output structure from the structure prediction (ss_average_100.pdb) to chain V of 4CR2.pdb in order to get ss_average_100_aligned.pdb. 2 Generating a density map file for MDFF: In order to generate a readable density map file for MDFF the file emd_2594.map first needs to be renamed to emd_2594.ccp4 and then run the command in your terminal (The script is contained in the scripts folder): vmd -dispdev text -e get_density.tcl which will excecute mdff griddx -i emdb_2594.ccp4 -o emdb_2594_potential.dx mdff griddx -i emdb_2594_potential.dx -o emdb_2594_density.dx to obtain the density file emdb_2594_density.dx, which can be read by MDFF. 3 Crop the density: In order to crop the density to the area of interest, which is here around the predicted structural model of Rpn11, run crop_density.tcl. vmd -dispdev text -e crop_density.tcl The script generates the density file rpn11_model_5_2594_density.dx, which contains the density of emd_2594_density.dx within a cutoff of around . 4 Fitting the modeled part to the cryo EM density: We are going to use the VMD plugin QwikMD to setup the interactive MDFF run as it automatically generates all the required input files and structures. Structure preparation with QwikMD Run VMD and open the QwikMD plugin (Extensions Simulation QwikMD). Browse to the average secondary structure PDB file ss_average_25.pdb located at ./rosetta_output_rpn11_terminus/analysis/ss_196_284/ and load it into QwikMD. Click on Structure Manipulation and ignore occurring errors. On top, navigate to the Advanced Run tab and select the MDFF tab. In the Protocol dropdown menu, adjust Fixed to "resid 1 to 195" and select "same fragment as protein" for Sec. Structure, Chirality and Cispeptide restraints. Click on Prepare and give your QwikMD file the name rpn11_terminus_mdff when prompted. QwikMD automatically generates the necessary PSF file and restraint files and redirects you to the MDFF graphical user interface. Open the MDFF files dropdown and add the cropped density rpn11_model_5_2594_density.dx To improve performance, you can adjust the number of CPU cores NAMD should use for the MDFF run by changing the value for Processors in the IMD parameters dropdown. Run MDFF Click on the IMDFF Connect tab on top of the MDFF GUI and open the Cross Correlation Analysis dropdown. Check Calculate real-time Cross Correlation, set Experimental Density (Mol ID) to the Mol ID of the loaded density (you can obtain the ID from the first column of the VMD Main menu) and set the Map Resolution to . Click on Submit and Connect to start the simulation and interact with it through the VMD window. Hint: If you have problems moving the structure to the correct positions, load the final Rpn11 structure rpn11_yeast.pdb from the tutorial folder into VMD while performing iMDFF. Short introduction to iMDFF: In the first step drag the predicted structure to the density apply forces (Mouse Forces Atom) by clicking on an atom. In this step a grid spacing of 0.3 should be applied. For detailed instructions on the usage of the MDFF GUI and interactive fitting see the MDFF tutorial and the Youtube tutorial https://www.youtube.com/watch?v=-KJiH_WF65s. As soon as the predicted segment fits the density a second MDFF run with a grid spacing of 0.6 can be performed. Hint: Use cartoon representation for the protein with different coloring for the fixed and flexible segments. Represent the density as solid surface with white color and transparent material. Use the CPK representation for the backbone atoms of the flexible segment and apply only forces to these backbone atoms. Next: Modeling amino acid insertions Up: Rosetta/MDFF Tutorial Previous: Required software Contents www.ks.uiuc.edu/Training/Tutorials