next up previous contents
Next: Running ARP/wARP : The arp_warp.sh Up: Using ARP/wARP  Previous: Killing Jobs

Setting Things Up With arp_warp_setup.sh

This is the most important script on which all the others rely. It consists of a series of relatively simple questions about your protein and about how you would like to run the refinement. Based on your answers and package defaults a parameter file called warp.par is set up. This file is then fed into all subsequent applications.
To get an impression there follows a typical example of a arp_warp_setup.sh session.
For this example we have a set of amplitudes and phases from a MAD experiment, extending to 2.0 Å resolution, and a native dataset extending to 1.7 Å.
% arp_warp_setup.sh
===========================================================================
| This is the setup procedure for ARP/wARP version 5.1 |
===========================================================================
This setup must be run for all modes of operation.

Enter the name of the mtz file:test.mtz
--------------------
Here are the data included in that file.

OVERALL FILE STATISTICS for resolution range 0.000 - 0.346
=======================

Col Sort Min Max Num   %    Mean Mean Resolution Type Column  
num order     Missing complete   abs. Low   High   label  
                       
1 ASC 0 36 0 100.00 13.6 13.6 45.01 1.70 H H  
2 NONE 0 38 0 100.00 14.1 14.1 45.01 1.70 H K  
3 NONE 0 43 0 100.00 16.2 16.2 45.01 1.70 H L  
4 NONE 6.3 1068.6 9890 41.71 162.59 162.59 45.01 1.70 F FSE1  
5 NONE 0.8 24.3 9890 41.71 4.09 4.09 45.01 1.70 Q SIGFSE1  
6 NONE 0.0 360.0 9796 42.26 171.93 171.93 45.01 1.70 P PHIB_123p  
7 NONE 0.000 1.000 9796 42.26 0.560 0.560 45.01 1.70 W FOM_123p  
8 NONE 0.0 360.0 9674 42.98 169.49 169.49 45.01 1.70 P PHIDM_123p  
9 NONE 0.000 1.000 9674 42.98 0.736 0.736 45.01 1.70 W FOMDM_123p  
10 NONE 1.3 930.8 1567 90.76 109.03 109.03 45.01 1.70 F F17  
11 NONE 0.6 36.2 1567 90.76 3.15 3.15 45.01 1.70 Q SIGF17  
12 NONE 9.9 1048.5 5126 69.79 152.04 152.04 45.01 1.70 F FP  
13 NONE 0.9 46.8 5126 69.79 3.85 3.85 45.01 1.70 Q SIGFP  
14 NONE 0.0 19.0 0 100.00 9.51 9.51 45.01 1.70 I FreeR_flag  


No. of reflections used in FILE STATISTICS    16967
-----------------------

A nice report of all the file's contents has appeared.

You need to specify some labels from above

Native data amplitude: F17
Native data sigma amplitude: SIGF17
Double-click with the left mouse button to the relevant label, paste it by pressing the middle mouse button and then press 'Enter'. This will help avoiding typos. As native we choose the 1.7 Å data, these are columns labeled as F17 and SIGF17.

Now enter the size of the protein in RESIDUES / AU: 145

Protein size estimated at about 1117 atoms
Average B factor from Wilson Plot estimated to be 31

Do you plan to use experimental phases as input (i.e. for mode warpNtrace or warp) (Y/N) ? Y
Amplitude (weighted) for initial map calculation: FSE1
Phase for initial map calculation: PHIDM_123p
FOM. Press <Enter> if amplitude is already weighted : FOMDM_123p

First the number of residues in the asymmetric unit is entered. Since we want to start from experimental phases the answer was Y. Answering with N means that you are interested in either starting from a molecular replacement solution, building the solvent of a refined structure or trying the ab initio option.

How many total cycles of arp/warp you plan to run? 100

How many refinement cycles between rebuilding (for warpNtrace only) ? 10
How many molecules per asymmetric unit (for warpNtrace only)? 1

For warpNtrace and molrep modes (restrained modes)
a proper weight must be set for Xray/geometry contributions.
Matrix suggested Default: 0.7
-Decrease to tighten geometry
-Increase to increase X-ray terms contribution
Enter an appropriate number (Enter for default)

The first parameter (total cycles) refers to any of the applications and is the total number of cycles to be run. The following two questions refer exclusively to the newest ARP/wARP mode, warpNtrace. The first one defines the number of ARP cycles that a 'big' warpNtrace cycle consists of and the second one refers to the number of monomers in the asymmetric unit, so the programs can try assembling a 'real' molecule as good as possible. The last parameter is the weight between the X-ray terms and geometry. The default will do for most of the cases.

Do you plan to use free atom model density modification (mode warp) (Y/N) ? Y

wARP usually makes 3 big iterations before averaging.
Typical values for each cycle is 10-30, read the manual for details.
Note that for good starting maps you can skip the 2nd and 3rd round.
(just specify 0 cycles for this round)
How many refinement cycles for 1st wARP iteration ? 15
How many refinement cycles for 2nd wARP iteration ? 15
How many refinement cycles for 3rd wARP iteration ? 15

The initial question was if you you would like to use free atom model density modification at all. If you had answered 'N' (if for example you had good phases and just wanted to run warpNtrace) then the setup would skip to the refinement protocol choosing, but in this example we assumed 'Y'.
We now have to decide how many cycles we need per iteration. In one iteration each model is refined with unrestrained ARP$\!\!$.  After it has finished it rejects lots of bad atoms, limits B factors and randomises coordinates a bit, to escape from local minima. The higher the resolution, the fewer cycles you need. In the last iteration it is recommended to use a few more cycles to let the models converge better. In most cases you can just use just a few cycles for the 1st iteration (around 15) and skip the second and the third one.

Do you want multiple free atom models averaging (Y/N) ? Y

How many models do you plan to use for averaging ? 6

You will be now asked how many processors you can use at the SAME time
for running arp/warp jobs. Remember that these machines should share
a common home directory.
If you are not sure of what you are doing please consult the local System manager.

How many processors can you use simultaneously ? 3
Processor 1 is in a machine named: edmund
edmund is OK.
Processor 2 is in a machine named: baldrick
baldrick is OK.
Processor 3 is in a machine named: percy
percy is OK.

Multiple models setup finished.

The use of multiple models is extensively described in [3]. Given the power of maximum likelihood refinement we recommend to exploit this (time consuming) option if your data are poorer than 1.8 - 2.0 Å, which is not very unlikely to be honest. Single unrestrained ARP jobs are perfectly reasonable provided your data are higher than 1.8 Å. Anyhow, here the answer was Y. If you answer N, the setup assumes that you will be using a single model and goes on to the refinement protocol choosing.
Since the answer was Y, you are asked to provide some details for how many models you want to use, the machine names, how many cycles of wARP you would like to run, etc. In this case we have decided to average 6 models. Averaging 2 models is not very helpful. Averaging of 3, 4 or 5 models is possible but not recommended, 6 is a much better number. Then you are asked how many processors you can use. Suppose you have a 4-processor machine. Using all 4 is not very wise and may be impolite to others. Since we request 6 refinement runs the script will first run 4 of them and then the remaining 2. In total this would take 2 'job cycles'. If we choose to use 3 processors, the script will run 3+3 jobs while leaving the fourth processor free for something else.


Now, we will proceed to protocols choosing, where you end up all the time:

You can choose between the following REFMAC protocols:
F: A fast protocol that works with good data.
S: A considerably slower one which might work better in difficult cases.
R: The slow protocol together with Rfree.
P: Phased maximum likelihood refinement.
O: The good old SFALL ...
H: Optimised parameters for starting from heavy atoms alone.
W: Optimised parameters for solvent building.
A: Advanced mode for setting parameters manually.
What is your choice ? (F/S/O/R/H/W) F

Advanced parameters set to default values for mode F

These questions are basically specific to the use of REFMAC.
The Fast protocol will setup the job so as it will not use an R $_{\rm free}$ factor for monitoring refinement progress. Lots of people like using R $_{\rm free}$ (and in general they do well to do so!) and you are right to get suspicious if this is not done. Just to clarify things: The ARP/wARP authors believe that R $_{\rm free}$ is essential for a restrained model refinement to validate the protocol (unless the protocol has already been proven to be valid under the conditions used). However if no geometry is present there is certainly no danger of over-weighting or down-weighting X-ray data against geometry terms, which is what basically R $_{\rm free}$ tells you ...
The Slow protocol will run CDIR minimisation applying 0.3 of calculated shifts. It will also run 4 internal REFMAC cycles before the model is updated by ARP$\!$.
The Rfree protocol, is the slow one plus usage of R $_{\rm free}$, not only to be used as a test set, but most important for calculating $\sigma_A$ weights based on the free set. Although theoretically more sound it often fails with very bad starting models. But, it is worth a try.
The Phased protocol, enables to use phased maximum likelihood refinement as implemented in REFMAC. It has not been much tested but we would use it preferably with realistic HL coefficients (e.g. from SHARP).
The Heavy protocol is optimised for starting from very few atoms. It runs lots of REFMAC cycles, fixes solvent scaling parameters, etc. We must say, that we do not have much experience with it. The parameters chosen will work with rubredoxin. If you have some high resolution data on a metalloprotein and this protocol does not work, we strongly encourage you to contact us.
The Water protocol is basically the same as Rfree one, i.e. it DOES use R $_{\rm free}$, since it refers to serious model building and you should make sure you monitor R $_{\rm free}$ to see if you are doing anything sensible. It is also assumed that the model is in a good state.
The Advanced protocol is naturally meant for advanced users. These parameters are REFMAC specific (see CCP4 documentation), if you don't set them up then standard default values will be chosen, they should work well but may be not optimally. Before setting up advanced parameters on your own, please at least make sure you understand the following points, otherwise don't bother.


The setup script has now finished and if nothing went too wrong there should now be a file named warp.par in your directory. If you want to change some of the parameters without going through the setup once again - just edit the file, but make sure you know what you are doing!
next up previous contents
Next: Running ARP/wARP : The arp_warp.sh Up: Using ARP/wARP  Previous: Killing Jobs
Richard J. Morris
1999-12-22