The new structure solution program SHELXD (called XM in the Bruker SHELXTL system) is able to solve larger ab initio problems than SHELXS-97, and is also useful for locating the heavy atoms or anomalous scatterers from SIR, SAS, SIRAS or MAD data. The input consists of two files, name.ins and name.hkl, and with the exception of the different commands in the .ins file is very similar to the input for SHELXS. Since SHELXD is still being developed, it is only available as a precompiled beta-test release with a built-in expiry date; the final version will be made available as open source without an expiry date as for the other SHELX programs. XM is available as part of SHELXTL without an expiry date or as a demo/test version with one. There are currently precompiled versions of SHELXD and XM for Linux(Intel), IRIX (5.3 and 6.5) and Windows 95/98/NT/2000.
SHELXD expects ONE and only one source of starting atoms. This can take the form:
A: Input atoms in normal SHELX format for expansion using PLOP
B: PATS to generate 'slightly better than random' atoms consistent with the Patterson
C: GROP and a PDB-format model fragment
D: Random atoms (used if none of the above apply)
In each case the action is specified in the .ins file that also contains crystal data in the usual SHELX form. The reflection data consists of an .hkl file containing F2 (HKLF 4) or F-values (HKLF 3). These may correspond to either native data for ab initio structure solution or structure expansion, or MAD, SAD, SIR or SIRAS FA or delta-F values for heavy or anomalous atom location.
Dual-space recycling, using the largest E-values (FIND) is followed by peaklist optimization (PLOP); one or both of these commands must be present. In the case of structure expansion only PLOP can be used and the program then stops. When the starting atoms are generated randomly or by PATS or GROP, the calculations are repeated for a new set of starting atoms. The total number of such tries may be specified with NTRY, otherwise the program runs for ever; however when the job is running the calculation may be terminated at the end of the current try by creating a name.fin file in the current working directory.
In the following examples, TITL...UNIT in the normal SHELX format is assumed at the start of the .ins file and HKLF 4 (or 3) followed by END at the end of the file. The cell contents defined by SFAC and UNIT are only used by PLOP; in the FIND stage the atoms are assumed to be of the same type but with occupancies proportional to the square root of the peak height.
a. To solve an approximately equal-atom structure using native data to atomic resolution (1.2Å or better) the middle of the .ins file (between UNIT and HKLF) might be as follows (for 500 unique non-hydrogen atoms):
FIND 400
PLOP 500 600
b. To solve the same structure by first locating a disulfide bond (PATS with a super-sharp Patterson) then expanding to the complete structure (FIND/PLOP):
PATS -2.06
PSMF -4
FIND 400
PLOP 500 600
Instead of PATS -2.06 simply PATS (which uses a different algorithm) would also be a good way to try to locate the sulphur atoms in this example.
c. To locate 30 selenium atoms from MAD data:
PATS
FIND 30
MIND -3.5
[the .hkl file could contain h, k, l, FA and sigma(FA) in FORMAT(3I4,2F8.2)].
d. To solve a cyclodextrin structure with four beta-cyclodextrins in the asymmetric unit and with data barely to atomic resolution, the following could be tried:
GROP -1.8
FIND 240
PLOP 320 400
GEOM 4
ATOM 1 C41 MOL 1 -3.859 4.863 7.904 1.000
10.00
ATOM 2 C31 MOL 1 -5.081 4.209 8.524 1.000
10.00
ATOM 3 C21 MOL 1 -5.211 2.740 8.155 1.000
10.00
... diglucose fragment in
PDB format ... .
ATOM 21 C52 MOL 1 -0.292 4.714 7.025 1.000
10.00
ATOM 22 O52 MOL 1 -0.642 5.837 6.253 1.000
10.00
SHELXD is started with the command line:
(similarly xm name) and expects to find both input files
name.ins and name.hkl in the current directory. It writes
a summary to the current window (standard output) and creates the files
name.lst (more extensive listing file) and name.res (SHELX
format atoms, crystal coordinates).
The following instructions may be included in the .ins file. Default values are given in square brackets; the # sign indicates that the default depends on other instructions:
TITL, CELL, ZERR, LATT, SYMM, SFAC
and UNIT as usual (see the SHELX manual).
TRIC (or TRIK)
Flags expansion to non-centrosymmetric triclinic for all calculations.
SHEL dmax [infinity], dmin [0]
Resolution limits in Å for all calculations.
NTRY ntry [0]
Number of global tries if starting from random atoms, PATS or GROP.
If ntry is zero or absent, the program runs until it is interrupted
by writing a name.fin file in the current working directory.
PATS +np or -dis [100], npt [#], nf [5]
Calculates and stores Patterson. Using top np peaks or a random orientation vector of length |dis|, tries npt random translations, selecting the one with the best Patterson minimum function PMF (see PSMF). When selecting a vector from the list of unique Patterson peaks, special vectors are ignored and the highest vector is chosen from nf random selections. This favors the highest peaks but (if nf is not too large) also allows lower peaks a chance. For examples, with the default np = 100 and nf = 5, the chance is 39.5% that one of the first 10 vectors will be chosen and 91.9% that one of the first 50 will be chosen.
If the first parameter is negative, nf random oriented vectors of length |dis| are compared on the basis of their heights in the Patterson and the 'best' used for the translation search.
If PATS is used together with a second FIND parameter ncy greater
than zero (or FIND followed by only one number) a full-symmetry Patterson
superposition minimum function (i.e. a superposition based on the two peaks
and all their symmetry equivalents) is used to locate the atoms in the
first FIND cycle. PATS and GROP are mutually exclusive.
GROP +ZZ or -Egr [0], +/-ngt [99], nor [99], ntr [9999]
6D Patterson search for small rigid group. If the first parameter is positive, the search is performed using the Patterson minimum function PMF (see PSMF), using interatomic vectors for which the product of the two atomic numbers is greater than ZZ. For each of |ngt| attempts, nor random orientations are generated. The orientation with the best PMF (based on intramolecular vectors only) for each attempt is subject to ntr translations. The solution with the best PMF in the translational search (using both intra- and intermolecular vectors) in all the |ngt| attempts is used to generate the starting atoms for the next stage (usually FIND). If the first parameter is negative, an analogous procedure is employed but the function maximized is the sum of Ec2(Eo2-1) for reflections with E > |Egr| and resolution d > dlim (see ESEL).
If the second parameter ngt is negative, the above procedure is used for the rotation and translation search, but then a correlation coefficient (CC20) between Eo2 and Ec2 is calculated for each 'best' rotation/translation combination using 20% of all reflections up to the limiting resolution of dlim (20% rather than 100% is used to speed up the calculation). Thus one CC20 value is calculated for each of the |ngt| attempts. The solution with the highest CC provides the starting atoms for the next stage. This is a slower but almost always better than the other criteria.
The search model is read from PDB-format ATOM or HETATM records in the .ins file. All other PDB records should be removed. The atomic number is deduced from the atom name applying PDB rules. The PMF search is recommended for searching for a heavy-atom cluster (e.g. from SAS or MAD data) whereas the (slower) structure-factor based search is suitable for equal-atom fragments such as a short piece of alpha-helix (for solving small proteins) or a diglucose fragment (for solving cyclodextrins).
In practice, a six-dimensional search using GROP is too time-consuming,
but it works well in combination with TRIK because then only a three-dimensional
search is required (e.g. GROP -1.8 1 99999 1).
PSMF pres [4.0], psfac [0.34]
pres is the resolution of the Patterson in terms of minimum ratio
of the number of grid points along an axis and the maximum reflection index
along that axis. If nres is negative a 'super-sharp' Patterson with
coefficients Ö (E3F)
is calculated, otherwise a normal F2 Patterson is used.
psfac is the fraction of the lowest values in the sorted list of
Patterson heights that is summed to get the PMF.
FRES res [3.0]
Resolution of all Fourier syntheses (including the PSMF but excluding
the Patterson itself) in terms of the minimum ratio of the number of grid
points along an axis and the maximum reflection index used along that axis.
ESEL Emin [#], dlim [1.0]
Minimum E and high-resolution limit for FIND and TANG. The E2
values are normalized to 1 in resolution shells, then smoothed. Emin
defaults to 1.2 for ab initio structure solution and to 1.5 for
heavy atom location (the absolute value of the first MIND parameter is
used to distinguish between these two cases depending on whether it is
less than 1.6 or not).
FIND na [0], ncy [#]
Search for na atoms in ncy internal loop cycles (tangent
formula + E-Fourier). ncy defaults to 20 (for heavy-atom
location) or the maximum of 20 or na (for ab initio direct
methods, distinguished using MIND mdis). Set occupancy proportional
to square root of peak height for na / ( 1 - fr ) atoms,
where fr is the WEED parameter, before 'WEEDing'. If FIND is absent,
PLOP expands from input atoms.
TANG ftan [0.9], fex [0.4]
Fraction ftan of the ncy FIND cycles are performed using
the tangent formula, the rest using a Sim-weighted E-map. fex
is the fraction of reflections with the largest Ecalc
values to hold fixed when doing tangent expansion to find the remaining
phases.
NTPR ntpr [100]
Maximum number of (largest) TPR per reflection; negative for output
of mean phase errors (if phases were input).
MIND mdis [1.0], mdeq [2.2]
|mdis| is the shortest distance allowed between atoms for PATS
and FIND. If mdis is negative PATFOM is calculated, and the crossword
table for the best PATFOM value so far is output to the .lst
file. In this case the solution is passed on to the PLOP stage if either
the CC is the best so far or the PATFOM is the best so far. mdeq
is the minimum distance between symmetry equivalents for FIND (for PATS
the |mdis| distance is used). Thus the default setting of mdeq
prevents FIND from placing atoms on special positions. This is usually
desirable because it helps to avoid pseudo-solutions such as the 'uraninum
atom solution' that are incorrect but fit the tangent formula, but it might
be better to change this setting to -0.1 to allow special positions when
looking for e.g. metal ions. For PLOP the PREJ instruction can be used
to control whether peaks on special positions are selected. Note also that
a |mdis| threshold of 1.6A is used to decide between all-atom ab
initio and heavy atom location for the purpose of setting various defaults
for other parameters.
SKIP min2 [0.5]
During FIND, if the second peak height is less than min2 times
the first, the first peak is rejected (before applying WEED to reject other
peaks). This is sometimes useful to suppress 'uranium atom' solutions.
In fact, for large equal-atom structures in space group P1 it is a good
idea to specify ?SKIP 0.999? so that the first peak is ALWAYS rejected!
WEED fr [0.3]
Randomly OMIT fraction fr of atoms in FIND stage (except in the
last cycle). Does not apply to PLOP.
GEOM ngm [0], ndwt [1.0], nha [0], d13 [2.45], dd [0.3]
After the peaksearch in the FIND and POLP routines, ngm cycles (typically 2 to 5) of geometry optimization are performed so that distances within dd of d13 are brought closer to d13. In addition, all peak heights after the highest nha (heavy atoms) are multiplied by ndwt (typically 0.7; 1.0 for no action) if the peaks have no other atoms or peaks within the distance range (d13+dd) to (d13-dd). This instruction is an attempt to build in a little chemical information and it is hoped that it will enable the resolution requirement to be relaxed a little.
Initial tests of GEOM indicate that in practice it does not decrease
the time required per solution and it will probably be removed in future
versions.
TEST Ccmin [#], delCC [#]
Go on to PLOP if CC > CCmin. CCmin is reduced by 0.1%
each cycle until a solution passes this test. In subsequent attempts, go
on to PLOP if CC is within delCC of best CC value so far. The defaults
are 45 and 1 resp. for ab initio solutions, and 10 and 5 resp. for
heavy atom location (MIND mdis test).
KEEP nh [0]
Number of (heavy) atoms to retain during PLOP expansion.
PLOP followed by up to 10 numbers
PLOP specifies the number of peaks to start with in each cycle of the
'peaklist optimization' algorithm of Sheldrick & Gould (1995). Peaks
are then eliminated one at a time until either the correlation coefficient
cannot be increased any more or 50% of the peaks have been eliminated.
PREJ maxb [3], dsp [-0.01], mf [1]
maxb is the maximum number of bonds to atoms or higher peaks,
the peak is deleted if there are more. Peaks are also deleted if they are
less than dsp from their equivalents (PLOP only, FIND uses second
MIND parameter), do not output atoms to final .res file if less
than mf atoms in 'molecule'.
SEED nrand [0]
Set random number seed so that exactly the same results are generated
if the job is repeated; each integer nrand defines a different sequence
of random numbers. If nrand is omitted or zero, the seed is randomized
so a different sequence is always generated..
MOVE dx [0], dy [0], dz [0], sign [1]
Shift following coordinates (not ATOM/HETATM).
ATOM and HETATM
PDB format atoms for GROP
HKLF m
m = 4 for F2 in .hkl file, m
= 3 for F (or FA or delF)
END