SHELXD / XM beta test instructions

The new structure solution program SHELXD (called XM in the Bruker SHELXTL system) is able to solve larger ab initio problems than SHELXS-97, and is also useful for locating the heavy atoms or anomalous scatterers from SIR, SAS, SIRAS or MAD data. The input consists of two files, name.ins and name.hkl, and with the exception of the different commands in the .ins file is very similar to the input for SHELXS. Since SHELXD is still being developed, it is only available as a precompiled beta-test release with a built-in expiry date; the final version will be made available as open source without an expiry date as for the other SHELX programs. XM is available as part of SHELXTL without an expiry date or as a demo/test version with one. There are currently precompiled versions of SHELXD and XM for Linux(Intel), IRIX (5.3 and 6.5) and Windows 95/98/NT/2000.

SHELXD expects ONE and only one source of starting atoms. This can take the form:

A: Input atoms in normal SHELX format for expansion using PLOP

B: PATS to generate 'slightly better than random' atoms consistent with the Patterson

C: GROP and a PDB-format model fragment

D: Random atoms (used if none of the above apply)

In each case the action is specified in the .ins file that also contains crystal data in the usual SHELX form. The reflection data consists of an .hkl file containing F2 (HKLF 4) or F-values (HKLF 3). These may correspond to either native data for ab initio structure solution or structure expansion, or MAD, SAD, SIR or SIRAS FA or delta-F values for heavy or anomalous atom location.

Dual-space recycling, using the largest E-values (FIND) is followed by peaklist optimization (PLOP); one or both of these commands must be present. In the case of structure expansion only PLOP can be used and the program then stops. When the starting atoms are generated randomly or by PATS or GROP, the calculations are repeated for a new set of starting atoms. The total number of such tries may be specified with NTRY, otherwise the program runs for ever; however when the job is running the calculation may be terminated at the end of the current try by creating a name.fin file in the current working directory.

In the following examples, TITL...UNIT in the normal SHELX format is assumed at the start of the .ins file and HKLF 4 (or 3) followed by END at the end of the file. The cell contents defined by SFAC and UNIT are only used by PLOP; in the FIND stage the atoms are assumed to be of the same type but with occupancies proportional to the square root of the peak height.

a. To solve an approximately equal-atom structure using native data to atomic resolution (1.2Å or better) the middle of the .ins file (between UNIT and HKLF) might be as follows (for 500 unique non-hydrogen atoms):

FIND 400
PLOP 500 600

b. To solve the same structure by first locating a disulfide bond (PATS with a super-sharp Patterson) then expanding to the complete structure (FIND/PLOP):

PATS -2.06
PSMF -4
FIND 400
PLOP 500 600

Instead of PATS -2.06 simply PATS (which uses a different algorithm) would also be a good way to try to locate the sulphur atoms in this example.

c. To locate 30 selenium atoms from MAD data:

PATS
FIND 30
MIND -3.5

[the .hkl file could contain h, k, l, FA and sigma(FA) in FORMAT(3I4,2F8.2)].

d. To solve a cyclodextrin structure with four beta-cyclodextrins in the asymmetric unit and with data barely to atomic resolution, the following could be tried:

GROP -1.8
FIND 240
PLOP 320 400
GEOM 4
ATOM 1 C41 MOL 1 -3.859 4.863 7.904 1.000 10.00
ATOM 2 C31 MOL 1 -5.081 4.209 8.524 1.000 10.00
ATOM 3 C21 MOL 1 -5.211 2.740 8.155 1.000 10.00 


   ... diglucose fragment in PDB format ...


ATOM 21 C52 MOL 1 -0.292 4.714 7.025 1.000 10.00
ATOM 22 O52 MOL 1 -0.642 5.837 6.253 1.000 10.00
 

SHELXD is started with the command line:

shelxd name

(similarly xm name) and expects to find both input files name.ins and name.hkl in the current directory. It writes a summary to the current window (standard output) and creates the files name.lst (more extensive listing file) and name.res (SHELX format atoms, crystal coordinates).
 
 

The following instructions may be included in the .ins file. Default values are given in square brackets; the # sign indicates that the default depends on other instructions:

TITL, CELL, ZERR, LATT, SYMM, SFAC and UNIT as usual (see the SHELX manual).
 
 

TRIC (or TRIK)

Flags expansion to non-centrosymmetric triclinic for all calculations.
 
 

SHEL dmax [infinity], dmin [0]

Resolution limits in Å for all calculations.
 
 

NTRY ntry [0]

Number of global tries if starting from random atoms, PATS or GROP. If ntry is zero or absent, the program runs until it is interrupted by writing a name.fin file in the current working directory.
 
 

PATS +np or -dis [100], npt [#], nf [5]

Calculates and stores Patterson. Using top np peaks or a random orientation vector of length |dis|, tries npt random translations, selecting the one with the best Patterson minimum function PMF (see PSMF). When selecting a vector from the list of unique Patterson peaks, special vectors are ignored and the highest vector is chosen from nf random selections. This favors the highest peaks but (if nf is not too large) also allows lower peaks a chance. For examples, with the default np = 100 and nf = 5, the chance is 39.5% that one of the first 10 vectors will be chosen and 91.9% that one of the first 50 will be chosen.

If the first parameter is negative, nf random oriented vectors of length |dis| are compared on the basis of their heights in the Patterson and the 'best' used for the translation search.

If PATS is used together with a second FIND parameter ncy greater than zero (or FIND followed by only one number) a full-symmetry Patterson superposition minimum function (i.e. a superposition based on the two peaks and all their symmetry equivalents) is used to locate the atoms in the first FIND cycle. PATS and GROP are mutually exclusive.
 
 

GROP +ZZ or -Egr [0], +/-ngt [99], nor [99], ntr [9999]

6D Patterson search for small rigid group. If the first parameter is positive, the search is performed using the Patterson minimum function PMF (see PSMF), using interatomic vectors for which the product of the two atomic numbers is greater than ZZ. For each of |ngt| attempts, nor random orientations are generated. The orientation with the best PMF (based on intramolecular vectors only) for each attempt is subject to ntr translations. The solution with the best PMF in the translational search (using both intra- and intermolecular vectors) in all the |ngt| attempts is used to generate the starting atoms for the next stage (usually FIND). If the first parameter is negative, an analogous procedure is employed but the function maximized is the sum of Ec2(Eo2-1) for reflections with E > |Egr| and resolution d > dlim (see ESEL).

If the second parameter ngt is negative, the above procedure is used for the rotation and translation search, but then a correlation coefficient (CC20) between Eo2 and Ec2 is calculated for each 'best' rotation/translation combination using 20% of all reflections up to the limiting resolution of dlim (20% rather than 100% is used to speed up the calculation). Thus one CC20 value is calculated for each of the |ngt| attempts. The solution with the highest CC provides the starting atoms for the next stage. This is a slower but almost always better than the other criteria.

The search model is read from PDB-format ATOM or HETATM records in the .ins file. All other PDB records should be removed. The atomic number is deduced from the atom name applying PDB rules. The PMF search is recommended for searching for a heavy-atom cluster (e.g. from SAS or MAD data) whereas the (slower) structure-factor based search is suitable for equal-atom fragments such as a short piece of alpha-helix (for solving small proteins) or a diglucose fragment (for solving cyclodextrins).

In practice, a six-dimensional search using GROP is too time-consuming, but it works well in combination with TRIK because then only a three-dimensional search is required (e.g. GROP -1.8 1 99999 1).
 
 

PSMF pres [4.0], psfac [0.34]

pres is the resolution of the Patterson in terms of minimum ratio of the number of grid points along an axis and the maximum reflection index along that axis. If nres is negative a 'super-sharp' Patterson with coefficients Ö (E3F) is calculated, otherwise a normal F2 Patterson is used. psfac is the fraction of the lowest values in the sorted list of Patterson heights that is summed to get the PMF.
 
 

FRES res [3.0]

Resolution of all Fourier syntheses (including the PSMF but excluding the Patterson itself) in terms of the minimum ratio of the number of grid points along an axis and the maximum reflection index used along that axis.
 
 

ESEL Emin [#], dlim [1.0]

Minimum E and high-resolution limit for FIND and TANG. The E2 values are normalized to 1 in resolution shells, then smoothed. Emin defaults to 1.2 for ab initio structure solution and to 1.5 for heavy atom location (the absolute value of the first MIND parameter is used to distinguish between these two cases depending on whether it is less than 1.6 or not).
 
 

FIND na [0], ncy [#]

Search for na atoms in ncy internal loop cycles (tangent formula + E-Fourier). ncy defaults to 20 (for heavy-atom location) or the maximum of 20 or na (for ab initio direct methods, distinguished using MIND mdis). Set occupancy proportional to square root of peak height for na / ( 1 - fr ) atoms, where fr is the WEED parameter, before 'WEEDing'. If FIND is absent, PLOP expands from input atoms.
 
 

TANG ftan [0.9], fex [0.4]

Fraction ftan of the ncy FIND cycles are performed using the tangent formula, the rest using a Sim-weighted E-map. fex is the fraction of reflections with the largest Ecalc values to hold fixed when doing tangent expansion to find the remaining phases.
 
 

NTPR ntpr [100]

Maximum number of (largest) TPR per reflection; negative for output of mean phase errors (if phases were input).
 
 

MIND mdis [1.0], mdeq [2.2]

|mdis| is the shortest distance allowed between atoms for PATS and FIND. If mdis is negative PATFOM is calculated, and the crossword table for the best PATFOM value so far is output to the .lst file. In this case the solution is passed on to the PLOP stage if either the CC is the best so far or the PATFOM is the best so far. mdeq is the minimum distance between symmetry equivalents for FIND (for PATS the |mdis| distance is used). Thus the default setting of mdeq prevents FIND from placing atoms on special positions. This is usually desirable because it helps to avoid pseudo-solutions such as the 'uraninum atom solution' that are incorrect but fit the tangent formula, but it might be better to change this setting to -0.1 to allow special positions when looking for e.g. metal ions. For PLOP the PREJ instruction can be used to control whether peaks on special positions are selected. Note also that a |mdis| threshold of 1.6A is used to decide between all-atom ab initio and heavy atom location for the purpose of setting various defaults for other parameters.
 
 

SKIP min2 [0.5]

During FIND, if the second peak height is less than min2 times the first, the first peak is rejected (before applying WEED to reject other peaks). This is sometimes useful to suppress 'uranium atom' solutions. In fact, for large equal-atom structures in space group P1 it is a good idea to specify ?SKIP 0.999? so that the first peak is ALWAYS rejected!
 
 

WEED fr [0.3]

Randomly OMIT fraction fr of atoms in FIND stage (except in the last cycle). Does not apply to PLOP.
 
 

GEOM ngm [0], ndwt [1.0], nha [0], d13 [2.45], dd [0.3]

After the peaksearch in the FIND and POLP routines, ngm cycles (typically 2 to 5) of geometry optimization are performed so that distances within dd of d13 are brought closer to d13. In addition, all peak heights after the highest nha (heavy atoms) are multiplied by ndwt (typically 0.7; 1.0 for no action) if the peaks have no other atoms or peaks within the distance range (d13+dd) to (d13-dd). This instruction is an attempt to build in a little chemical information and it is hoped that it will enable the resolution requirement to be relaxed a little.

Initial tests of GEOM indicate that in practice it does not decrease the time required per solution and it will probably be removed in future versions.
 

TEST Ccmin [#], delCC [#]

Go on to PLOP if CC > CCmin. CCmin is reduced by 0.1% each cycle until a solution passes this test. In subsequent attempts, go on to PLOP if CC is within delCC of best CC value so far. The defaults are 45 and 1 resp. for ab initio solutions, and 10 and 5 resp. for heavy atom location (MIND mdis test).
 
 

KEEP nh [0]

Number of (heavy) atoms to retain during PLOP expansion.
 
 

PLOP followed by up to 10 numbers

PLOP specifies the number of peaks to start with in each cycle of the 'peaklist optimization' algorithm of Sheldrick & Gould (1995). Peaks are then eliminated one at a time until either the correlation coefficient cannot be increased any more or 50% of the peaks have been eliminated.
 
 

PREJ maxb [3], dsp [-0.01], mf [1]

maxb is the maximum number of bonds to atoms or higher peaks, the peak is deleted if there are more. Peaks are also deleted if they are less than dsp from their equivalents (PLOP only, FIND uses second MIND parameter), do not output atoms to final .res file if less than mf atoms in 'molecule'.
 
 

SEED nrand [0]

Set random number seed so that exactly the same results are generated if the job is repeated; each integer nrand defines a different sequence of random numbers. If nrand is omitted or zero, the seed is randomized so a different sequence is always generated..
 
 

MOVE dx [0], dy [0], dz [0], sign [1]

Shift following coordinates (not ATOM/HETATM).
 
 

ATOM and HETATM

PDB format atoms for GROP
 
 

HKLF m

m = 4 for F2 in .hkl file, m = 3 for F (or FA or delF)
 
 

END