Segment assembly, structure alignment and iterative simulation in protein structure prediction
© Zhang and Skolnick; licensee BioMed Central Ltd. 2013
Received: 7 January 2013
Accepted: 12 April 2013
Published: 15 April 2013
It has been 50 years since Anfinsen first showed that the native structure of protein molecules is determined solely by their amino acid sequence, with the folded state representing a unique and kinetically accessible minimum of the free energy . This finding, also known as the Anfinsen's thermodynamic hypothesis, motivated the belief that the solution to the protein structure prediction problem should be based on physicochemical principles; that is, all one has to do to find the native structure of a protein is to identify the lowest free-energy state. However, the success of such first principle-based methods has been modest at best. Knowledge-based approaches, which predict structural models using the regularities and rules of known protein structures seen in the Protein Data Bank (PDB) library, have enjoyed more extensive success in protein structure prediction [2–4]. Among these approaches, TASSER (Threading ASSEmbly Refinement) is a hierarchical structure modeling method designed to predict full-length atomic models from primary amino acid sequences . For a given query sequence, TASSER first threads the sequence through the PDB to identify template proteins that may have similar topology to the query. Continuous segments are then excised from the top-ranked template structures following the query-template alignments, and these are reassembled into full-length atomic models by Monte Carlo simulations. An essential advantage of TASSER over traditional comparative modeling methods, which often deteriorate the quality of template models, is its ability to drive the template structures closer to the native than the input templates. This is mainly attributed to its highly optimized knowledge-based potential and the efficiency in combining complementary threading alignments from multiple template structures.
The I-TASSER (Iterative Threading ASSEmbly Refinement) method that we published in BMC Biology in 2007 is an extension of the TASSER algorithm for iterative structure assembly and refinement of protein molecules . The idea of I-TASSER was inspired by the encouraging template structure refinement of the TASSER simulations. Here, a highly appealing question is to examine whether we can continuously improve the quality of protein structural models by repeatedly re-folding the query sequence starting from the last step of assembly simulations. The result of initial tests was not encouraging since the final structural models stay essentially the same as the first round of TASSER models, although the local structural quality, including hydrogen-bonding networks and steric clash of backbone atoms, was generally improved. The reason for the failure in structural refinement became obvious once it was realized that no new structural information was introduced in the reassembly simulations by simply starting from the last round TASSER models. As long as the dynamic searching of conformational space is complete in the first round of simulations, the iterations should in principle lead to exactly the same modeling results. More pronounced topology-level improvements were achieved when new structural templates identified from the PDB were incorporated into the folding iterations; these templates were detected by the structure alignment program TM-align , which matches the TASSER models with each of the known proteins in the PDB to identify the templates that are structurally closest to the TASSER models. In a benchmark testing experiment on a set of known proteins, the TM-score of the template structures identified by TM-align (a measure of similarity between model and native with value in 0) is shown to be generally lower than that of the TASSER models (only 21% of the TM-align alignments have a higher TM-score, as seen in Figure 4A of ). Nevertheless, the combination of the new structural alignment information in the I-TASSER simulations eventually resulted in final models with improved TM-score in 77% of the test proteins, or an overall TM-score increase of approximately 3% with improved local structure quality .
The major goal of protein structure prediction is to help understand the biological roles of protein molecules in living cells. Since the launch, at the beginning of this century, of structural genomics projects that aim to solve experimentally the structure of a set of proteins covering all representative structural types in nature , the general challenges to the field of structure prediction have been the development of methods for better template identification and consequent structural refinement. Progress has been substantial but significant difficulties still remain in distant-homology identifications and atomic-level structure refinements despite the fact that the PDB library is approaching completeness in structural space [12, 13]. Meanwhile, it is now known that nearly 10% of all proteins (or partial sequence in 40% of eukaryotic proteins) do not follow Anfinsen's dogma to fold into unique states - these sequences are unfolded or intrinsically disordered to conduct their physiological functions . Template-based approaches cannot be used for deducing structural and functional characteristics of these molecules. While recent progress has demonstrated promising use of the structure-based iteration as driven by ab initio modeling in both fold-recognition and structure refinement procedures, the development of efficient ab initio folding algorithms will remain a major theme in the field and should have important impacts on all aspects of protein structure predictions.
This article is part of the BMC Biology tenth anniversary series. Other articles in this series can be found at http://www.biomedcentral.com/bmcbiol/series/tenthanniversary.
- Anfinsen CB, Haber E, Sela M, White FH: The kinetics of formation of native ribonuclease during oxidation of the reduced polypeptide chain. Proc Natl Acad Sci USA. 1961, 47: 1309-1314. 10.1073/pnas.47.9.1309.PubMed CentralView ArticlePubMedGoogle Scholar
- Sali A, Blundell TL: Comparative protein modelling by satisfaction of spatial restraints. J Mol Biol. 1993, 234: 779-815. 10.1006/jmbi.1993.1626.View ArticlePubMedGoogle Scholar
- Simons KT, Kooperberg C, Huang E, Baker D: Assembly of protein tertiary structures from fragments with similar local sequences using simulated annealing and Bayesian scoring functions. J Mol Biol. 1997, 268: 209-225. 10.1006/jmbi.1997.0959.View ArticlePubMedGoogle Scholar
- Zhang Y, Skolnick J: Automated structure prediction of weakly homologous proteins on a genomic scale. Proc Natl Acad Sci USA. 2004, 101: 7594-7599. 10.1073/pnas.0305695101.PubMed CentralView ArticlePubMedGoogle Scholar
- Wu S, Skolnick J, Zhang Y: Ab initio modeling of small proteins by iterative TASSER simulations. BMC Biol. 2007, 5: 17-10.1186/1741-7007-5-17.PubMed CentralView ArticlePubMedGoogle Scholar
- Zhang Y, Skolnick J: TM-align: a protein structure alignment algorithm based on the TM-score. Nucleic Acids Res. 2005, 33: 2302-2309. 10.1093/nar/gki524.PubMed CentralView ArticlePubMedGoogle Scholar
- Zhang J, Liang Y, Zhang Y: Atomic-level protein structure refinement using fragment-guided molecular dynamics conformation sampling. Structure. 2011, 19: 1784-1795. 10.1016/j.str.2011.09.022.PubMed CentralView ArticlePubMedGoogle Scholar
- Xu D, Zhang Y: Ab initio protein structure assembly using continuous structure fragments and optimized knowledge-based force field. Proteins. 2012, 80: 1715-1735.PubMed CentralView ArticlePubMedGoogle Scholar
- Wang JM, Cieplak P, Kollman PA: How well does a restrained electrostatic potential (RESP) model perform in calculating conformational energies of organic and biological molecules?. J Comput Chem. 2000, 21: 1049-1074. 10.1002/1096-987X(200009)21:12<1049::AID-JCC3>3.0.CO;2-F.View ArticleGoogle Scholar
- Moult J: A decade of CASP: progress, bottlenecks and prognosis in protein structure prediction. Curr Opin Struct Biol. 2005, 15: 285-289. 10.1016/j.sbi.2005.05.011.View ArticlePubMedGoogle Scholar
- Burley SK, Almo SC, Bonanno JB, Capel M, Chance MR, Gaasterland T, Lin D, Sali A, Studier FW, Swaminathan S: Structural genomics: beyond the human genome project. Nat Genet. 1999, 23: 151-157. 10.1038/13783.View ArticlePubMedGoogle Scholar
- Zhang Y, Skolnick J: The protein structure prediction problem could be solved using the current PDB library. Proc Natl Acad Sci USA. 2005, 102: 1029-1034. 10.1073/pnas.0407152101.PubMed CentralView ArticlePubMedGoogle Scholar
- Skolnick J, Zhou HY, Brylinski M: Further evidence for the likely completeness of the library of solved single domain protein structures. J Phys Chem B. 2012, 116: 6654-6664. 10.1021/jp211052j.PubMed CentralView ArticlePubMedGoogle Scholar
- Uversky VN, Dunker AK: Understanding protein non-folding. Biochim Biophys Acta. 2010, 1804: 1231-1264. 10.1016/j.bbapap.2010.01.017.PubMed CentralView ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.