Supercomputing Institute Research Bulletin online

Volume 15 Number 3

July 1999

 

Conductivity of Molten Semiconductors
Building of Protein Structures
Diamond Growth
Turbulent Flow and Heat Transfer in Propulsion Systems
Interns
Future Symposia
Colloquium Series
Special Seminars
Visitors
Supercomputing '98
Research Reports

n recent years, there have been fantastic advances in the determination of protein and nucleic acid structures by X-ray crystallography. These include use of synchrotron X-ray sources that allow the collection of a complete data set in about five minutes and the development of sophisticated software to automate many steps involved in converting X-ray diffraction data into the final structure. X-ray diffraction is proportional to electron density, and knowledge of this electron density "map" allows one to identify the spatial positions of all atoms in the structure.

The first step in determining a new protein structure is obtainment of a rough estimate of this electron density map. This estimate is obtained either by combining data from native and heavy atom derivatives or, if synchrotron radiation is available, collecting data at two different wavelengths from a protein that has selenomethionine substituted for the normal amino acid methionine.

The second step is positioning the amino acid residues of the protein in this electron density map using the previously determined amino acid sequence. This is currently a time consuming step, requiring the manual building of the protein, one residue at a time, into the map using a high-resolution monitor. Depending on the size of the protein, this can take several weeks to more than a month of a skilled investigators time.

Levitt.jpg
Snapshot of a screen generated by MAID showing the agreement between the automated fit (maroon and white) and the final refined protein structure (yellow and blue).
The final step is to "refine" this initial structure by adjusting it until it yields the optimal fit to the X-ray diffraction data. Sophisticated computer programs have been developed to automate this refinement.

Professor David Levitt of the Physiology Department at the University of Minnesota has been developing a program (MAID) to automate the second step in this procedure‹the building of the protein into the initial map. He has broken the problem into two steps. In the first step, the map is searched to find regions that are either -helices or ß-sheets. These are regions in which there are strong constraints on the possible positions of the atoms. Once these regions are located, a generic amino acid sequence is optimally fitted into the map. The second step is to extend these fits into the "loop" regions. This is a much more difficult problem because there are fewer constraints on the possible structures and because the map is usually poor in these regions.

Professor Levitt has developed a complete graphic visualization program to aid in the writing and testing of the routines required for MAID. The program is written in C++ and uses OpenGL and Motif. The figure to the left shows a snapshot of the application of MAID to an experimental preliminary electron density map (not shown in the figure) that was actually used to solve for the final structure using the conventional manual fitting technique. The final protein structure, obtained after intensive refinement procedures, is indicated by the yellow (main chain atoms) and blue (side chain atoms) lines in the figure. The test of MAID is to see if the automatically fit protein structure, indicated by the maroon (main chain) and white (side chain) lines, agrees with this final refined structure. One example of the type of fit produced by MAID is shown in the figure that illustrates the fit to one helix region (amino acids 53 to 68) and the extension into the loop region (amino acids 68 to 77). This fit is very good, accurately fitting all the main chain and most of the side chain atoms.

At present, MAID can accurately fit more than 95% of the -helix and ß-sheet regions in the map. However, the extension into the loops is much poorer, accurately fitting less than 50% of the amino acids. In the future, the focus will be on improving the fits in the loop regions making it possible to completely automate what is now the most time consuming step in the solution of protein structures.

previous articlenext article this issue

 
HOME | BULLETINS | CONTACT US | PREVIOUS ARTICLE | NEXT ARTICLE | THIS ISSUE

 

This information is available in alternative formats upon request by individuals with disabilities. Please send email to alt-format@msi.umn.edu or call 612-624-0528.
 

URL: http://
This page last modified on  
Website related questions or problems should be directed to webmaster@msi.umn.edu
The Supercomputing Institute does not collect personal information on visitors to our website. For the University of Minnesota policy, see www.privacy.umn.edu.