Department of PhysiologyDevelopment of Software to Automate the Building of Protein Structures Into X-ray Crystallographic Electron Density Maps
2-164A Jackson Hall
321 Church St SE
Minneapolis, MN 55455
612-625-7649
levitt@dcmir.med.umn.edu
The determination of protein structure by X-ray crystallography requires 4 separate steps: 1) Purification and crystallization; 2) Collection of X-ray diffraction intensities and calculation of approximate phases that can be used to make a preliminary electron density map; 3) Building the protein structure into the preliminary map; and 4) Refinement of this initial structure. Recent advances in synchrotron radiation sources (Hendrickson, 1991; Ogata, 1998) combined with the use of selenomethionine cloned proteins (Hendrickson et al., 1990; Doublie, 1997) has markedly reduced the time required for step 2. These advances, fueled by recent funding programs such as the Protein Structure Initiative (Smaglik, 2000) (Abbott, 2000) that are designed to stimulate the production of thousands of protein structures, have increased the incentive for automating the other steps involved in protein structure determination. Dr. Levitt has recently developed a new software routine (MAID) for automating step 3 (Levitt, 2001). A beta version of the program is being freely distributed from the site http://www.msi.umn.edu/~levitt and has been tested and by a large number of users. His current research is directed at continuing to develop and improve this routine and to develop algorithms for interfacing steps 3 and 4.
![]() |
The classical approach to step 3 requires several weeks to months of a skilled investigator's time, manually building the protein using a graphical workstation. The approach used by MAID is basically an automation of these steps. The current version of MAID simply requires the user to input the preliminary map and the amino acid sequence and then MAID outputs the final protein structure without any additional user intervention. Figure 1 demonstrates a dramatic example of one application of MAID. It shows a comparison of the main chain protein structure output by MAID (dark line) with the final refined structure (light line) determined by classical methods for the protein Fumarylacetoacetate hydrolase (Timm et al., 1999). MAID accurately positioned all 418 amino acid residues of the protein. The average deviation of the MAID structure from the final refined structure was 0.46 Å for the main chain atoms and 1.00 Å for all atoms (main chain plus side chain). In addition to automating step 3, the structure produced by MAID is much more accurate than that produced by the classical technique. This increased accuracy significantly decreases the time required for the subsequent refinement steps. The success of MAID depends, of course, on the quality of the initial electron density map. For the example shown in figure 1, the starting map was of very good and MAID is not always this successful. The current research efforts are now directed at improving the performance of MAID on these lower quality maps.
The MAID program involves a complex branching tree consisting of more than a hundred different routines. Each of these routines is characterized by a number of parameters that were chosen by a crude trial and error procedure in which a few select parameters are changed and then the result of the application of MAID to a test map is determined. A major research effort is now directed at improving these parameter assignments. For example, the current version of MAID uses just one set of parameters for all map conditions. Preliminary results clearly indicate that the optimum value of the parameters should depend on the resolution limits of the map. Adjusting these parameters involves changing a small set of parameters, and then running MAID on a suite of 5 test proteins of different quality and resolution. This is extremely CPU intensive since each run of MAID on one protein takes from 5 to 24 hours (using an SGI R12000, 300 MHz processor) and there are hundreds of different parameter combinations that need to be tested. The quality of the final parameter set is essentially limited by the availability of SGI processors. The additional 32 high speed SGI processors requested in this proposal will significantly improve this parameter set and reduce the development time required to obtain it.