- Open Access
MESSA: MEta-Server for protein Sequence Analysis
© Cong and Grishin; licensee BioMed Central Ltd. 2012
- Received: 14 September 2012
- Accepted: 2 October 2012
- Published: 2 October 2012
Computational sequence analysis, that is, prediction of local sequence properties, homologs, spatial structure and function from the sequence of a protein, offers an efficient way to obtain needed information about proteins under study. Since reliable prediction is usually based on the consensus of many computer programs, meta-severs have been developed to fit such needs. Most meta-servers focus on one aspect of sequence analysis, while others incorporate more information, such as PredictProtein for local sequence feature predictions, SMART for domain architecture and sequence motif annotation, and GeneSilico for secondary and spatial structure prediction. However, as predictions of local sequence properties, three-dimensional structure and function are usually intertwined, it is beneficial to address them together.
We developed a MEta-Server for protein Sequence Analysis (MESSA) to facilitate comprehensive protein sequence analysis and gather structural and functional predictions for a protein of interest. For an input sequence, the server exploits a number of select tools to predict local sequence properties, such as secondary structure, structurally disordered regions, coiled coils, signal peptides and transmembrane helices; detect homologous proteins and assign the query to a protein family; identify three-dimensional structure templates and generate structure models; and provide predictive statements about the protein's function, including functional annotations, Gene Ontology terms, enzyme classification and possible functionally associated proteins. We tested MESSA on the proteome of Candidatus Liberibacter asiaticus. Manual curation shows that three-dimensional structure models generated by MESSA covered around 75% of all the residues in this proteome and the function of 80% of all proteins could be predicted.
MESSA is free for non-commercial use at http://prodata.swmed.edu/MESSA/
- Signal Peptide
- Protein Data Bank
- Transmembrane Helix
- Confidence Score
- Coiled Coil
It is very beneficial to start a research project on a protein from computational analysis of its sequence. On the one hand, well-designed sequence analysis is an efficient way to obtain predictive information. Because a common evolutionary origin leaves distinct imprints on the sequences, structures and function of protein molecules, comparative computational methods supported by the accumulating biomolecular data often offer shortcuts to obtaining valuable hypotheses about a protein that cut the cost and time associated with experimental work. On the other hand, computational analysis of sequence can prevent potential misinterpretation of experimental data. The widely known argument about the report of a plant G-protein coupled receptor , which subsequently was suggested to be a cytoplasmic lanthionine synthetase-like protein by both computational analysis and experimental verification [2, 3], illustrates the value of sequence analysis.
Many tools have been developed to serve the growing need for computational analysis of protein sequence. Such tools typically predict certain local sequence property, spatial structure or function of a query sequence. However, consensus-based meta-predictors usually produce better results than the individual tools they include [4, 5]. In addition, when independent predictions are combined, errors in certain prediction can be revealed, leading to even better performance. For instance, in the recent ninth Critical Assessment of Structure Prediction (CASP9) experiment, even the top performing three dimensional structure predictors did not detect and remove the signal peptides in the target sequences , resulting in lower prediction quality as a hydrophobic signal peptide is likely to be incorrectly packed in the hydrophobic core. Therefore, to generate a reliable hypothesis on the basis of computational analysis, one needs to consult many predictors and integrate their results, making comprehensive analysis of a given protein sequence a non-trivial task.
Meta-severs have been developed to reduce such difficulty by combining various tools, integrating and displaying their results. Most meta-servers focus on one aspect of predictive analysis, for instance, Jpred on secondary structure , metaPrDOS on disordered regions , metaTM on transmembrane topology , Pcons  and 3D-Jury  on three-dimensional structure, and CombFunc and ProKnow  on function. Other meta-servers incorporate more information to further accelerate sequence analysis, such as PredictProtein  for predictions of local sequence properties, SMART  for domain annotation and sequence motif prediction and GeneSilico  that focuses on secondary and spatial structure predictions.
However, as predictions of local sequence properties, spatial structure and function are usually interconnected, more accurate conclusions can be derived by addressing these questions together. Presence of transmembrane helixes or signal peptides, identification of conserved domains in the protein and predicted three-dimensional structure provide essential clues for function interpretation. At the same time, the predicted three-dimensional structure and function of a protein can validate transmembrane helix or signal peptide prediction to prevent false positives. Thus we developed a MEta-Server for protein Sequence Analysis (MESSA), which balances these predictions, integrates and outputs results about subcellular localization (secondary structure, disordered region, transmembrane, signal peptide, coiled coil and positional conservation prediction), function, three-dimensional structure and domain architecture. We tested MESSA on the proteome of a citrus pathogen, Candidatus Liberibacter asiaticus  and the results showed that MESSA provides structural and functional characterization for the majority of Ca. L. asiaticus proteins, which facilitates further understanding of these proteins and will aid in the experimental study of this bacterium.
Interpretation of results from MESSA
MESSA utilizes a number of well-established programs, integrates their results and returns both a full web page with important information about and links to results of all the predictors and a summary page displaying consensus-based final predictions. The full version offers extensive information and is designed for careful manual analysis of a protein. The summary page is significantly simplified and provides predictions and their confidence that could be directly used by non-expert users.
Description of the full output
The full output presents important information from all programs and provides links to the original results . It contains the following seven sections:
Section I. Prediction of local sequence features
Programs used in MESSA for prediction of local sequence features and their interpretation
Assist three-dimensional structure and domain boundary prediction.
PSIPRED (v2.0) 
SSPRO (v4.0) 
DISEMBL (v1.5) , coils
PSIPRED and SSPRO predict 3-states secondary structures (H: α-helix, E: β-strand, C: coils); DISEMBL predict coils (lower-case letters highlighted in pink)
Disordered and flexible region
Assist three-dimensional structure prediction.
DISEMBL (v1.5) , hot loops
Loops that are likely to have high B factors in the X-ray crystallography (lower-case letters highlighted in pink)
DISEMBL (v1.5) , missing
DISPRO (v1.0) 
DISOPRED (v2.0) 
IsUnstruct (v2.02) 
Residues without a defined structure (represented by star marks and highlighted in red)
Transmembrane helix and Signal Peptide
Predict subcellular localization and transmembrane, reveal topology of transmembrane proteins and provide hints to the protein function.
TMHMM (v2.0) 
TOPPREDa (v2.0) 
HMMTOPa (v2.0) 
MEMSAT (v3.0) 
H: transmembrane helix (colored in blue); h: not confidently predicted transmembrane helix; o: periplasmic loop, i: cytoplasmic loop. x: loop region (not specified as periplasmic or cytoplasmic).
H: transmembrane helix (colored in blue); S: signal peptide (colored in green); h: unconfident transmembrane helix; o: periplasmic loop, i: cytoplasmic loop.
SignalP (v3.0)  (HMM mode)
SignalP (v3.0)  (NN mode)
S: signal peptide (highlighted in green) o: periplasmic region; x: do not have signal peptide
Reveal false positive hits of homology search caused by matching of low-complexity region.
The part with low diversity in amino acid composition (highlighted in pink), likely to be disordered or fold as α helices, such as coiled coil
Assist three-dimensional structure prediction.
x: coiled coils, highlighted in yellow
Reveal essential residues for the folding and function of a protein.
BLAST (hits filtered by > 40% coverage and < 90% identity are included in the profile), AL2CO (calculate conservation indices based on profile) 
Sequence highlighted by the conservation (highlighted from white, through yellow to dark red as conservation increases)
Section II. Close homologs for annotation transfer
Close homologs and orthologs usually preserve the function inherited from the common ancestor. MESSA shows the 10 closest confident homologs in the Swiss-Prot  and non-redundant (NR) databases detected by BLAST  (e-value cut-off: 0.001). The function annotations for homologs from the Swiss-Prot database are shown. As the Swiss-Prot annotations are of high quality , they offer a basis for function prediction by annotation transfer.
Section III. Prediction of gene ontology terms
Gene ontology (GO ) terms are the standard representation of protein attributes and they are widely used by researchers. MESSA predicts the GO terms associated with the query using the AMIGO server . The 10 closest homologs in the GO databases detected by AMIGO and their associated GO terms are provided. Many of these GO terms could be directly transferred to the query.
Section IV. Prediction of enzyme commission number
Enzyme commission (EC) numbers describe the types of reactions enzymes catalyze and they are essential for understanding the function of proteins in the context of metabolic pathways. This section contains EC number predictions by three methods: transfer from close homologs in the Swiss-Prot database; and de novo prediction by the Ezypred server  and by the Enzyme Function Inference by a Combined Approach (EFICAz; version 2.5) software package [24, 25]. For the first approach, the closely related Swiss-Prot entries and their assigned EC numbers are shown, while for the other two approaches, the predicted EC numbers and their definitions in the ENZYME nomenclature database  are listed.
Section V. Identification of functionally associated proteins
This section shows proteins that may function together with the query. The prediction mostly relies on the STRING database  that assigns functional associations between proteins by multiple criteria, such as physical interaction, expression pattern and genomic context. Moreover, when the query comes from a user-specified organism with complete genome sequence available, MESSA will provide a link to National Center for Biotechnology Information (NCBI) Gene database to show the genomic context of the query.
Section VI: Homologous protein families
Protein classification and the extensive information about each protein family in several databases [28–33] greatly assist in functional annotation. In this section, we provide ranked lists of top-scoring homologous protein families and conserved domains identified by RPS-BLAST  (e-value cut-off: 0.005) and HHpred server [35, 36] (probability cut-off: 90%) in the NCBI Conserved Domain database. For each confidently detected domain, the relevant information and the alignment to the query are shown. This section allows users to explore rich information available for the related protein families, and is another useful resource for function prediction.
Section VII. Homologous structures and structure domains
Spatial structure prediction is an important aspect of sequence analysis. The predicted structure is indicative of protein function: the presence of conserved active sites and binding surfaces is useful in providing hypotheses about the function. As three-dimensional structure is usually more conserved among homologous proteins than function, a reliable structure prediction is achievable for most proteins , including many cases for which confident function predictions are not feasible. This section shows homologous structures in the Protein Data Bank (PDB)  and structure domains in the Structure Classification Of Protein (SCOP) database  detected by BLAST (e-value below 0.001), RPS-BLAST (e-value below 0.001) and HHpred server (probability higher than 80%). For each detected protein and protein domain, the alignment and the corresponding structure displayed by Jmol ) can be retrieved. The conservation of protein structures among homologs allows these structures, in most cases, to represent the general fold of the query protein and to be suitable templates for structure modeling. For structure domains detected in SCOP, we provide their classification hierarchy to highlight the evolutionary history and suggest similarities to other proteins.
Description of the summary page
By integrating results from different methods, we generate the consensus-based final predictions for local sequence features, three-dimensional structure and function. We present these predictions as a summary page, which contains three sections:
Section I. Consensus-based prediction of local sequence properties (Figure 1A)
This section contains predictions of secondary structure, disordered regions, transmembrane helices, signal peptides, coiled coils and positional conservation indices. Except the last two, the predictions are based on the consensus between multiple predictors (described in Methods).
Section II. Function prediction (Figure 1B)
The predicted function annotation, GO terms and EC numbers (if the query is an enzyme) are shown in this section. Predictions are ranked by their confidence scores (details in Methods) assigned by MESSA. In addition, a confidence level ('very confident', 'confident' or 'probable') is provided for each prediction.
Section III. Spatial structural prediction (Figure 1C)
This section displays the three-dimensional structure models in Jmol for the query if a MODELLER key  is provided to enable homology modeling by MODELLER [42, 43]. Otherwise, the templates selected by MESSA, their alignments to the query and confidence levels (details in Methods) will be listed.
Users are required to input a query sequence (no less than 30 amino acids and no more than 4,000 amino acids) in FASTA or plain-text format and provide a non-commercial email address to initiate a MESSA job. Users are encouraged to select the organism name and organism type (such as eukaryote, Gram-negative and Gram-positive) from which the input sequence comes. This information is needed for signal peptide prediction, reciprocal BLAST and mapping the protein into its genomic locus. Once a job is submitted, MESSA will redirect the users first to a web page that summarizes the input information and later to a web page showing the status of the job. It generally takes about 30 minutes for a job to complete. For proteins from very large families, it may take several hours for the whole process to complete. While a job is in progress, MESSA can integrate and display available intermediate results upon user's request, allowing users to view results from fast programs in time. The users will be notified by email once the job is completed.
Features of MESSA and comparison to other similar meta-servers
The most important feature of MESSA is a broad and balanced incorporation of predictions about local sequence features, domain architecture, three-dimensional structure and function. Another advanced feature is that MESSA integrates results from multiple predictors and generates consensus-based final predictions. These final predictions summarize the most important information and are very convenient for non-expert users. In addition, MESSA presents the results in a user-friendly way. For instance, the local sequence feature predictions are represented as single lines and aligned to the sequence. Detected structure templates can be directly and interactively visualized on the results page. Finally, MESSA relies on confident homology inferred by sequence and profile similarity for structure and function prediction. On the one hand, structure and function prediction without experimentally studied homologs, such as de novo folding, remains highly challenging, while the conservative homology-based approach ensures confident predictions in most cases. On the other hand, the rapid growth in the numbers of experimentally studied proteins and available protein three-dimensional structures has greatly increased the capability of homology-based structure-function annotation and ensures reasonable prediction coverage.
Widely used web servers similar to MESSA include PredictProtein, SMART and GeneSilico. These meta-servers utilize many programs and aim to facilitate highly integrated sequence analysis. PredictProtein offers rich information about the local sequence features of a protein, such as the secondary structure, transmembrane helices, protein sorting signals and functional sites. Unlike MESSA, PredictProtein does not offer detection of related protein families and pays less attention to three-dimensional structure prediction and function prediction. Moreover, it does not integrate results from different tools to provide a final prediction. Finally, due to the high volume of usage, PredictProtein only offers three free queries for academic users per year. SMART is specialized in annotating domain architecture. It offers predictions of signal peptides, transmembrane helices, low complexity regions and homologous structures detectable by BLAST. Compared with SMART, MESSA has a broader incorporation of programs and the ability to predict three-dimensional structure, predict function and to integrate results from multiple predictors. We consider GeneSilico to be the most similar to MESSA. Although GeneSilico is mainly a fold recognition meta-server for three-dimensional structure prediction, it offers information about related protein families and prediction of transmembrane helices as well. As opposed to GeneSilico's emphasis on three-dimensional structure prediction, MESSA aims to offer a well-balanced set of sequence-derived data to support comprehensive analysis of protein local sequence features, three-dimensional structures and function. As a result, MESSA limits tools for structural template identification to those few that are known to perform best. In addition, MESSA includes prediction of signal peptides, positional conservation, function annotation, GO terms and EC numbers, which are all helpful for function interpretation.
Application of MESSA
In addition, we tested MESSA on the proteome of Ca. L. asiaticus, a Gram-negative bacterium suggested to be the pathogen causing citrus greening disease. The results, together with information about this genome from other databases were assembled as a website . In the genome sequence of Ca. L. asiaticus, the gene prediction pipeline from NCBI and the SEED detected 1,233 protein coding genes, with 1,046 in common. In addition, 58 protein coding genes that are identified by a single gene prediction pipeline display confident homology to other proteins in the NR database. We consider these 1,104 hypothetical protein coding genes to be confidently predicted. The remaining 128 inconsistently predicted genes encode products that are of a relatively small size (usually less than 60 residues), include low complexity sequences, and lack similarity to any known protein. A large portion of them may represent falsely predicted open reading frames and were not considered in the analysis.
Moreover, MESSA detects homologous structures for template-based structure modeling of Ca. L. asiaticus proteins. The confident structure templates identified by MESSA (HHsearch probability above 90%, PSI-BLAST or RPS-BLAST e-value below 0.005) and verified manually cover 74.3% of all residues in the Ca. L. asiaticus proteome. In addition, some of the sequence regions without confidently identified structure templates are predicted to be disordered by no less than two predictors and tend to appear at the boundaries of protein domains. These regions count for another 5.8% of all residues. At a protein level, 65.9% of all Ca. L. asiaticus proteins exhibit greater than 80% coverage by the confident structure templates and predicted disordered regions. It is important to note that we adopted conservative criteria for selecting structure templates, which may underestimate the number of proteins in a bacterial genome that can be confidently predicted by homology modeling. In summary, our results indicate that MESSA can help biologists to efficiently gain understanding of proteins and will be useful to suggest hypotheses for experimental pursuit.
Integration of several approaches enhances the quality of sequence analysis
To illustrate how comprehensive information can be integrated for more confident predictions, we carried out a pilot study to identify proteins that can be secreted to the periplasm through the Sec protein secretion pathway in Ca. L. asiaticus. These proteins are of particular interest, as some of them could be virulence factors of this pathogenic bacterium. Proteins secreted by the Sec machinery are characterized by a signal peptide at their N-termini, which could be predicted by the well-established algorithms included in MESSA. Out of the 1,104 proteins in Ca. L. asiaticus, 217 are predicted to have signal peptides by at least one algorithm. However, signal peptide prediction by itself is not enough to suggest the subcellular localization due to false predictions and the fact that some transmembrane proteins also possess signal peptides .
We manually examined all these 217 candidates with predicted signal peptides. In addition, we briefly curated all other proteins that are predicted to have transmembrane helices to identify possible false negatives, as some signal peptides might be falsely predicted as transmembrane helices, especially when the translation initiation sites are mispredicted. Predictions and supporting evidence for each protein are listed in Table S2 in Additional file 2. As a result, we hypothesize that 84 proteins in this bacterium are secreted to periplasm though the Sec machinery. The consensus between different predictors is the main indicator of prediction confidence, and most of these 84 verified proteins and their orthologs have signal peptides that can be consistently identified by at least two methods out of four. In addition to simple consensus, other evidence provided by MESSA was essential to ensure reliable predictions.
In one case, the hypothetical ribosomal protein L35 (locus: CLIBASIA_01020; gi: 254780319)  is predicted to have a signal peptide by three out of four predictors. However, all the closely related proteins and protein families identified by MESSA support its function of being associated with the ribosome, as opposed to being secreted. Additionally, the gene encoding this protein is located within an operon containing other predicted ribosome proteins coding genes. In the three-dimensional structure of the ribosome complex (PDB id: 3BBO) , the N-terminus of ribosomal protein L35 is buried in the complex, which more likely accounts for the hydrophobic segment that is falsely predicted as a signal peptide.
Many proteins from the initial list of 217 candidates were excluded due to the following reasons: the signal peptide cannot be consistently predicted (predicted by only one out of four methods); the protein has multiple transmembrane helices, such as the sensory box/GGDEF family protein (locus: CLIBASIA_01765; gi: 254780468); the confidently predicted function of the protein suggests that the protein is located in the inner membrane or cytoplasm; close homologs lack signal peptides. It is important to note that multispan transmembrane helical proteins with N-terminal signal peptides do exist, although not common in bacteria . However, they will be localized in the membrane by other transmembrane helices regardless of whether the signal peptides will be cleaved or not.
In summary, the signal peptide predictors provided the initial candidates of secreted proteins. Starting from these 217 candidates, integration of additional information collected by MESSA, such as the consensus between different predictors, other sequence features (transmembrane helices), features of the close homologs, the predicted function and spatial structures, allows us to propose a more confident list of 84 proteins that are likely secreted by the Sec pathway. Comprehensive information collected by MESSA allows us to correct the mistakes by computer programs and generate more reliable hypothesis about a protein. Due to the limited information available for some proteins and the limitation that we only curated proteins with predicted signal peptides or transmembrane helices, it is possible that incorrect predictions still exist even after careful manual curation.
We developed MESSA, a web server that integrates the results of a dozen state-of-the-art sequence analysis tools to provide predictions on local sequence properties, three-dimensional structure and function of a given protein. MESSA offers a user-friendly interface and display the results in a manner convenient for navigation. Our benchmark study showed that MESSA was able to offer extensive information for most of the proteins in a genome. We hope MESSA can help biologists to gain insights about proteins under study.
Assemble computational sequence analysis tools and integrate their results
For a given protein sequence, MESSA carries out the following analyses:
First, MESSA uses multiple predictors (listed in Table 1) [49–64] with default parameters to predict secondary structure, disordered regions, low-complexity regions, transmembrane helices, signal peptides, coiled coils and positional conservation indices. The results from multiple tools for each local feature are then combined to get a final prediction. At each sequence position, the final prediction is based on votes from individual methods. Most methods have a single vote, while PSIPRED for secondary structure prediction and Phobius for signal peptide prediction are counted as two votes due to their documented high accuracy [60, 65].
Confidence score of homologs from Swiss-Prot database.
Sequence identity between the query and the hit
identity 30% to 50%, coverage > 40%
identity 50% to 70%, coverage > 40%
identity 70% to 90%, coverage > 40%
identity 90% to 99%, coverage > 40%
identity > 99%, coverage > 40%
BLAST alignment coverage for both query and hit
60% to 80%
80% to 100%
The query against the proteome associated with the hit
The hit against the proteome associated with the query
Confidence score of predicted gene ontology terms
Sequence identity between the query and the hit
identity 30% to 50%, coverage > 40%
identity 50% to 70%, coverage > 40%
identity 70% to 90%, coverage > 40%
identity 90% to 100%, coverage > 40%
Alignment coverage for query and hit
60% to 80%
80% to 100%
Evidence code of the GO term assigned to the hit
IPI, IMP, IGI, IEP, ISO, TAS
ISS, ISA, ISM, IGC, IBA, IBD, IKR, IRD, RCA, NAS, IC, IEA
Associated with no less than three hits
Confidence score of predicted Enzyme Commission numbers
Confidence score of homologous Swiss-Prot hit for EC number transfer
≥ 6 and < 8
≥ 8 and < 10
If the EC number is assigned for at least three different Swiss-Prot hits
Ezypred prediction (no confidence assigned to prediction)
If the EC number agrees with the prediction of Ezypred
EFICAz prediction confidence
Low confidence prediction
0.6 to 0.7
0.7 to 0.8
0.8 to 0.9
0.9 to 1
Fifth, when the query sequence is from a user-specified organism with available complete genome sequence, MESSA will map the query to its genomic locus by BLAST and show the genomic context of the query through NCBI E-utilities. Moreover, MESSA sends the query sequence to the STRING server to predict the functionally associated proteins.
Sixth, homologous protein families are detected from the Conserved Domain database by RPS-BLAST (e-value cutoff: 0.005) and HHpred server (probability cutoff: 90%). These protein families and protein domains are mapped to the query sequence.
Evaluation of homology modeling templates
Sequence identity for BLAST, RPS-BLAST and HHSearch
20% to 40%
40% to 60%
60% to 80%
80% to 90%
80% to 85%
85% to 90%
90% to 99%
99% to 99.99%
99.99% to 100%
BLAST and RPS-BLAST e-value
1e-6 to 1e-2
1e-6 to 1e-18
1e-18 to 1e-54
Predicted by two methods
Predicted by three methods
Finally, the results from all these procedures are parsed and presented as two web pages: the first one presents all the original results; the second one contains the final consensus-based predictions by integrating the results from different predictors.
Application of MESSA to the proteome of Ca. L. asiaticus and manual curation
All the sequences of Ca. L. asiaticus proteins predicted by NCBI gene prediction pipeline  were downloaded from the GenBank database  and additional proteins that were detected by the SEED (Genome annotation web service on the basis subsystems) [71, 72] but missed by NCBI were added. The relevant information about these proteins was obtained from NCBI , the SEED and Kyoto Encyclopedia of Genes and Genomes (KEGG) [74, 75]. Computational analysis by MESSA was performed on each protein and the results were constructed as a website .
Based on the information from this website, we manually curated the functional assignment, predicted the subcellular localization and selected structure templates for each protein. Functional annotations were mainly based on their close relationship to certain protein families or a certain reviewed entry in the Swiss-Prot database. This relationship was verified on the one hand by the statistical significance, coverage and alignment quality between the Ca. L. asiaticus protein and the identified proteins or families, and on the other hand by the consensus between different methods. In cases where agreement between methods was lacking or statistical support was marginal, identification of conserved sequence motifs, inspection of predicted structure and clustering of homologous proteins were applied to obtain function predictions.
Project name: MEta-Server for protein Sequence Analysis (MESSA)
Project home page: http://prodata.swmed.edu/MESSA/
Operating system(s): This is a web server and users should access it through web browsers.
Other requirements: The server is tested on Mozilla Firefox (version >= 12.0), Microsoft Internet Explorer (version >= 8.0) Google Chrome and Safari (Version >= 5.0). Correct display of the result page requires Java (TM) Platform to be installed and enabled by the browsers.
License: Academic Free License
Any restrictions to use by non-academics: The users need to provide an academic email address to initiate a job.
The project is supported by National Institutes of Health (GM094575 to NVG), the Citrus Research and Development Foundation (10-018-314 to NVG) and the Welch Foundation (I-1505 to NVG). QC is a Howard Hughes Medical Institute International Student Research fellow. We thank Jeremy Semeiks, Lisa N. Kinch and Bong-Hyun Kim for inspiring discussions and technical support. We are grateful to Janusz M. Bujnicki for helpful suggestions to improve the work.
- Liu X, Yue Y, Li B, Nie Y, Li W, Wu WH, Ma L: A G protein-coupled receptor is a plasma membrane receptor for the plant hormone abscisic acid. Science. 2007, 315 (5819): 1712-1716. 10.1126/science.1135882.PubMedView ArticleGoogle Scholar
- Johnston CA, Temple BR, Chen JG, Gao Y, Moriyama EN, Jones AM, Siderovski DP, Willard FS: Comment on 'A G protein coupled receptor is a plasma membrane receptor for the plant hormone abscisic acid'. Science. 2007, 318 (5852): 914-author reply 914.PubMedView ArticleGoogle Scholar
- Gao Y, Zeng Q, Guo J, Cheng J, Ellis BE, Chen JG: Genetic characterization reveals no role for the reported ABA receptor, GCR2, in ABA control of seed germination and early seedling development in Arabidopsis. Plant J. 2007, 52 (6): 1001-1013. 10.1111/j.1365-313X.2007.03291.x.PubMedView ArticleGoogle Scholar
- Fischer D: Servers for protein structure prediction. Curr Opin Struct Biol. 2006, 16 (2): 178-182. 10.1016/j.sbi.2006.03.004.PubMedView ArticleGoogle Scholar
- Kinch L, Yong Shi S, Cong Q, Cheng H, Liao Y, Grishin NV: CASP9 assessment of free modeling target predictions. Proteins. 2011, 79 (Suppl 10): 59-73.PubMedPubMed CentralView ArticleGoogle Scholar
- Cuff JA, Clamp ME, Siddiqui AS, Finlay M, Barton GJ: JPred: a consensus secondary structure prediction server. Bioinformatics. 1998, 14 (10): 892-893. 10.1093/bioinformatics/14.10.892.PubMedView ArticleGoogle Scholar
- Ishida T, Kinoshita K: Prediction of disordered regions in proteins based on the meta approach. Bioinformatics. 2008, 24 (11): 1344-1348. 10.1093/bioinformatics/btn195.PubMedView ArticleGoogle Scholar
- Klammer M, Messina DN, Schmitt T, Sonnhammer EL: MetaTM - a consensus method for transmembrane protein topology prediction. BMC Bioinformatics. 2009, 10: 314-10.1186/1471-2105-10-314.PubMedPubMed CentralView ArticleGoogle Scholar
- Wallner B, Larsson P, Elofsson A: Pcons.net: protein structure prediction meta server. Nucleic Acids Res. 2007, W369-374. 35 Web ServerGoogle Scholar
- Ginalski K, Elofsson A, Fischer D, Rychlewski L: 3D-Jury: a simple approach to improve protein structure predictions. Bioinformatics. 2003, 19 (8): 1015-1018. 10.1093/bioinformatics/btg124.PubMedView ArticleGoogle Scholar
- Wass MN, Barton G, Sternberg MJ: CombFunc: predicting protein function using heterogeneous data sources. Nucleic Acids Res. 2012, W466-470. 40 Web ServerGoogle Scholar
- Pal D, Eisenberg D: Inference of protein function from protein structure. Structure. 2005, 13 (1): 121-130. 10.1016/j.str.2004.10.015.PubMedView ArticleGoogle Scholar
- Rost B, Liu J: The PredictProtein server. Nucleic Acids Res. 2003, 31 (13): 3300-3304. 10.1093/nar/gkg508.PubMedPubMed CentralView ArticleGoogle Scholar
- Letunic I, Doerks T, Bork P: SMART 7: recent updates to the protein domain annotation resource. Nucleic Acids Res. 2012, D302-305. 40 DatabaseGoogle Scholar
- Kurowski MA, Bujnicki JM: GeneSilico protein structure prediction meta-server. Nucleic Acids Res. 2003, 31 (13): 3305-3307. 10.1093/nar/gkg557.PubMedPubMed CentralView ArticleGoogle Scholar
- Duan Y, Zhou L, Hall DG, Li W, Doddapaneni H, Lin H, Liu L, Vahling CM, Gabriel DW, Williams KP, Dickerman A, Sun Y, Gottwald T: Complete genome sequence of citrus huanglongbing bacterium, 'Candidatus Liberibacter asiaticus' obtained through metagenomics. Mol Plant Microbe Interact. 2009, 22 (8): 1011-1020. 10.1094/MPMI-22-8-1011.PubMedView ArticleGoogle Scholar
- An example of MESSA result pages. [http://prodata.swmed.edu/MESSA/results/example.html]
- Reorganizing the protein space at the Universal Protein Resource (UniProt). Nucleic Acids Res. 2012, D71-75. 40(Database)Google Scholar
- Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ: Basic local alignment search tool. J Mol Biol. 1990, 215 (3): 403-410.PubMedView ArticleGoogle Scholar
- Schnoes AM, Brown SD, Dodevski I, Babbitt PC: Annotation error in public databases: misannotation of molecular function in enzyme superfamilies. PLoS Comput Biol. 2009, 5 (12): e1000605-10.1371/journal.pcbi.1000605.PubMedPubMed CentralView ArticleGoogle Scholar
- Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, Harris MA, Hill DP, Issel-Tarver L, Kasarskis A, Lewis S, Matese JC, Richardson JE, Ringwald M, Rubin GM, Sherlock G: Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet. 2000, 25 (1): 25-29. 10.1038/75556.PubMedPubMed CentralView ArticleGoogle Scholar
- Carbon S, Ireland A, Mungall CJ, Shu S, Marshall B, Lewis S: AmiGO: online access to ontology and annotation data. Bioinformatics. 2009, 25 (2): 288-289. 10.1093/bioinformatics/btn615.PubMedPubMed CentralView ArticleGoogle Scholar
- Shen HB, Chou KC: EzyPred: a top-down approach for predicting enzyme functional classes and subclasses. Biochem Biophys Res Commun. 2007, 364 (1): 53-59. 10.1016/j.bbrc.2007.09.098.PubMedView ArticleGoogle Scholar
- Arakaki AK, Huang Y, Skolnick J: EFICAz2: enzyme function inference by a combined approach enhanced by machine learning. BMC Bioinformatics. 2009, 10: 107-10.1186/1471-2105-10-107.PubMedPubMed CentralView ArticleGoogle Scholar
- Tian W, Arakaki AK, Skolnick J: EFICAz: a comprehensive approach for accurate genome-scale enzyme function inference. Nucleic Acids Res. 2004, 32 (21): 6226-6239. 10.1093/nar/gkh956.PubMedPubMed CentralView ArticleGoogle Scholar
- Bairoch A: The ENZYME database in 2000. Nucleic Acids Res. 2000, 28 (1): 304-305. 10.1093/nar/28.1.304.PubMedPubMed CentralView ArticleGoogle Scholar
- Szklarczyk D, Franceschini A, Kuhn M, Simonovic M, Roth A, Minguez P, Doerks T, Stark M, Muller J, Bork P, Jensen LJ, von Mering C: The STRING database in 2011: functional interaction networks of proteins, globally integrated and scored. Nucleic Acids Res. 2011, D561-568. 39 DatabaseGoogle Scholar
- Ouzounis CA, Coulson RM, Enright AJ, Kunin V, Pereira-Leal JB: Classification schemes for protein structure and function. Nat Rev Genet. 2003, 4 (7): 508-519.PubMedView ArticleGoogle Scholar
- Tatusov RL, Fedorova ND, Jackson JD, Jacobs AR, Kiryutin B, Koonin EV, Krylov DM, Mazumder R, Mekhedov SL, Nikolskaya AN, Rao BS, Smirnov S, Sverdlov AV, Vasudevan S, Wolf YI, Yin JJ, Natale DA: The COG database: an updated version includes eukaryotes. BMC Bioinformatics. 2003, 4: 41-10.1186/1471-2105-4-41.PubMedPubMed CentralView ArticleGoogle Scholar
- Marchler-Bauer A, Lu S, Anderson JB, Chitsaz F, Derbyshire MK, DeWeese-Scott C, Fong JH, Geer LY, Geer RC, Gonzales NR, Gwadz M, Hurwitz DI, Jackson JD, Ke Z, Lanczycki CJ, Lu F, Marchler GH, Mullokandov M, Omelchenko MV, Robertson CL, Song JS, Thanki N, Yamashita RA, Zhang D, Zhang N, Zheng C, Bryant SH: CDD: a Conserved Domain Database for the functional annotation of proteins. Nucleic Acids Res. 2011, D225-229. 39 DatabaseGoogle Scholar
- Punta M, Coggill PC, Eberhardt RY, Mistry J, Tate J, Boursnell C, Pang N, Forslund K, Ceric G, Clements J, Heger A, Holm L, Sonnhammer EL, Eddy SR, Bateman A, Finn RD: The Pfam protein families database. Nucleic Acids Res. 2012, 40 (D1): D290-D301. 10.1093/nar/gkr1065.PubMedPubMed CentralView ArticleGoogle Scholar
- Klimke W, Agarwala R, Badretdin A, Chetvernin S, Ciufo S, Fedorov B, Kiryutin B, O'Neill K, Resch W, Resenchuk S, Schafer S, Tolstoy I, Tatusova T: The National Center for Biotechnology Information's Protein Clusters Database. Nucleic Acids Res. 2009, D216-223. 37 DatabaseGoogle Scholar
- Letunic I, Doerks T, Bork P: SMART 6: recent updates and new developments. Nucleic Acids Res. 2009, D229-232. 37 DatabaseGoogle Scholar
- Marchler-Bauer A, Bryant SH: CD-Search: protein domain annotations on the fly. Nucleic Acids Res. 2004, 32 (Web Server issue): W327-331.PubMedPubMed CentralView ArticleGoogle Scholar
- Soding J: Protein homology detection by HMM-HMM comparison. Bioinformatics. 2005, 21 (7): 951-960. 10.1093/bioinformatics/bti125.PubMedView ArticleGoogle Scholar
- Remmert M, Biegert A, Hauser A, Soding J: HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment. Nat Methods. 2012, 9 (2): 173-175.View ArticleGoogle Scholar
- Zhang Y, Skolnick J: Automated structure prediction of weakly homologous proteins on a genomic scale. Proc Natl Acad Sci USA. 2004, 101 (20): 7594-7599. 10.1073/pnas.0305695101.PubMedPubMed CentralView ArticleGoogle Scholar
- Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE: The Protein Data Bank. Nucleic Acids Res. 2000, 28 (1): 235-242. 10.1093/nar/28.1.235.PubMedPubMed CentralView ArticleGoogle Scholar
- Murzin AG, Brenner SE, Hubbard T, Chothia C: SCOP: a structural classification of proteins database for the investigation of sequences and structures. J Mol Biol. 1995, 247 (4): 536-540.PubMedGoogle Scholar
- Jmol: an open-source Java viewer for chemical structures in 3D. [http://www.jmol.org/]
- Apply for MODELLER key. [http://salilab.org/modeller/registration.html]
- Eswar N, Webb B, Marti-Renom MA, Madhusudhan MS, Eramian D, Shen MY, Pieper U, Sali A: Comparative protein structure modeling using MODELLER. Curr Protoc Protein Sci. 2007, Chapter 2: Unit 2.9-PubMedGoogle Scholar
- Eswar N, Eramian D, Webb B, Shen MY, Sali A: Protein structure modeling with MODELLER. Methods Mol Biol. 2008, 426: 145-159. 10.1007/978-1-60327-058-8_8.PubMedView ArticleGoogle Scholar
- Predictive analysis of the Candidatus Liberibacter asiaticus proteome. [http://prodata.swmed.edu/liberibacter_asiaticus/]
- Natale P, Bruser T, Driessen AJ: Sec- and Tat-mediated protein secretion across the bacterial cytoplasmic membrane--distinct translocases and mechanisms. Biochim Biophys Acta. 2008, 1778 (9): 1735-1756. 10.1016/j.bbamem.2007.07.015.PubMedView ArticleGoogle Scholar
- The MESSA result page for the hypothetical ribosomal protein L35 from Candidatus Liberibacter asiaticus. [http://prodata.swmed.edu/liberibacter_asiaticus/254780319.html]
- Sharma MR, Wilson DN, Datta PP, Barat C, Schluenzen F, Fucini P, Agrawal RK: Cryo-EM study of the spinach chloroplast ribosome reveals the structural and functional roles of plastid-specific ribosomal proteins. Proc Natl Acad Sci USA. 2007, 104 (49): 19315-19320. 10.1073/pnas.0709856104.PubMedPubMed CentralView ArticleGoogle Scholar
- Broome-Smith JK, Gnaneshan S, Hunt LA, Mehraein-Ghomi F, Hashemzadeh-Bonehi L, Tadayyon M, Hennessey ES: Cleavable signal peptides are rarely found in bacterial cytoplasmic membrane proteins (review). Mol Membr Biol. 1994, 11 (1): 3-8. 10.3109/09687689409161023.PubMedView ArticleGoogle Scholar
- Jones DT: Protein secondary structure prediction based on position-specific scoring matrices. J Mol Biol. 1999, 292 (2): 195-202. 10.1006/jmbi.1999.3091.PubMedView ArticleGoogle Scholar
- Pollastri G, Przybylski D, Rost B, Baldi P: Improving the prediction of protein secondary structure in three and eight classes using recurrent neural networks and profiles. Proteins. 2002, 47 (2): 228-235. 10.1002/prot.10082.PubMedView ArticleGoogle Scholar
- Linding R, Jensen LJ, Diella F, Bork P, Gibson TJ, Russell RB: Protein disorder prediction: implications for structural proteomics. Structure. 2003, 11 (11): 1453-1459. 10.1016/j.str.2003.10.002.PubMedView ArticleGoogle Scholar
- Cheng J, Sweredoski M, Baldi P: Accurate prediction of protein disordered regions by mining protein structure data. Data Mining and Knowledge Discovery. 2005, 11 (3): 213-222. 10.1007/s10618-005-0001-y.View ArticleGoogle Scholar
- Ward JJ, Sodhi JS, McGuffin LJ, Buxton BF, Jones DT: Prediction and functional analysis of native disorder in proteins from the three kingdoms of life. J Mol Biol. 2004, 337 (3): 635-645. 10.1016/j.jmb.2004.02.002.PubMedView ArticleGoogle Scholar
- Lobanov MY, Galzitskaya OV: The Ising model for prediction of disordered residues from protein sequence alone. Phys Biol. 2011, 8 (3): 035004-10.1088/1478-3975/8/3/035004.PubMedView ArticleGoogle Scholar
- Krogh A, Larsson B, von Heijne G, Sonnhammer EL: Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. J Mol Biol. 2001, 305 (3): 567-580. 10.1006/jmbi.2000.4315.PubMedView ArticleGoogle Scholar
- von Heijne G: Membrane protein structure prediction. Hydrophobicity analysis and the positive-inside rule. J Mol Biol. 1992, 225 (2): 487-494. 10.1016/0022-2836(92)90934-C.PubMedView ArticleGoogle Scholar
- Tusnady GE, Simon I: Principles governing amino acid composition of integral membrane proteins: application to topology prediction. J Mol Biol. 1998, 283 (2): 489-506. 10.1006/jmbi.1998.2107.PubMedView ArticleGoogle Scholar
- Jones DT: Improving the accuracy of transmembrane protein topology prediction using evolutionary information. Bioinformatics. 2007, 23 (5): 538-544. 10.1093/bioinformatics/btl677.PubMedView ArticleGoogle Scholar
- Nugent T, Jones DT: Transmembrane protein topology prediction using support vector machines. BMC Bioinformatics. 2009, 10: 159-10.1186/1471-2105-10-159.PubMedPubMed CentralView ArticleGoogle Scholar
- Kall L, Krogh A, Sonnhammer EL: A combined transmembrane topology and signal peptide prediction method. J Mol Biol. 2004, 338 (5): 1027-1036. 10.1016/j.jmb.2004.03.016.PubMedView ArticleGoogle Scholar
- Bendtsen JD, Nielsen H, von Heijne G, Brunak S: Improved prediction of signal peptides: SignalP 3.0. J Mol Biol. 2004, 340 (4): 783-795. 10.1016/j.jmb.2004.05.028.PubMedView ArticleGoogle Scholar
- Wootton JC: Non-globular domains in protein sequences: automated segmentation using complexity measures. Comput Chem. 1994, 18 (3): 269-285. 10.1016/0097-8485(94)85023-2.PubMedView ArticleGoogle Scholar
- Lupas A, Van Dyke M, Stock J: Predicting coiled coils from protein sequences. Science. 1991, 252 (5009): 1162-1164. 10.1126/science.252.5009.1162.PubMedView ArticleGoogle Scholar
- Pei J, Grishin NV: AL2CO: calculation of positional conservation in a protein sequence alignment. Bioinformatics. 2001, 17 (8): 700-712. 10.1093/bioinformatics/17.8.700.PubMedView ArticleGoogle Scholar
- Aloy P, Stark A, Hadley C, Russell RB: Predictions without templates: new folds, secondary structure, and contacts in CASP5. Proteins. 2003, 53 (Suppl 6): 436-456.PubMedView ArticleGoogle Scholar
- Moreno-Hagelsieb G, Latimer K: Choosing BLAST options for better detection of orthologs as reciprocal best hits. Bioinformatics. 2008, 24 (3): 319-324. 10.1093/bioinformatics/btm585.PubMedView ArticleGoogle Scholar
- Salichos L, Rokas A: Evaluating ortholog prediction algorithms in a yeast model clade. PLoS One. 2011, 6 (4): e18755-10.1371/journal.pone.0018755.PubMedPubMed CentralView ArticleGoogle Scholar
- Wilson D, Pethica R, Zhou Y, Talbot C, Vogel C, Madera M, Chothia C, Gough J: SUPERFAMILY--sophisticated comparative genomics, data mining, visualization and phylogeny. Nucleic Acids Res. 2009, D380-386. 37 DatabaseGoogle Scholar
- Besemer J, Lomsadze A, Borodovsky M: GeneMarkS: a self-training method for prediction of gene starts in microbial genomes. Implications for finding sequence motifs in regulatory regions. Nucleic Acids Res. 2001, 29 (12): 2607-2618. 10.1093/nar/29.12.2607.PubMedPubMed CentralView ArticleGoogle Scholar
- Data for Candidatus Liberibacter asiaticus from GeneBank database. [ftp://ftp.ncbi.nih.gov/genbank/genomes/Bacteria/Candidatus_Liberibacter_asiaticus_psy62_uid29835]
- Aziz RK, Bartels D, Best AA, DeJongh M, Disz T, Edwards RA, Formsma K, Gerdes S, Glass EM, Kubal M, Meyer F, Olsen GJ, Olson R, Osterman AL, Overbeek RA, McNeil LK, Paarmann D, Paczian T, Parrello B, Pusch GD, Reich C, Stevens R, Vassieva O, Vonstein V, Wilke A, Zagnitko O: The RAST Server: rapid annotations using subsystems technology. BMC Genomics. 2008, 9: 75-10.1186/1471-2164-9-75.PubMedPubMed CentralView ArticleGoogle Scholar
- Overbeek R, Begley T, Butler RM, Choudhuri JV, Chuang HY, Cohoon M, de Crecy-Lagard V, Diaz N, Disz T, Edwards R, Fonstein M, Frank ED, Gerdes S, Glass EM, Goesmann A, Hanson A, Iwata-Reuyl D, Jensen R, Jamshidi N, Krause L, Kubal M, Larsen N, Linke B, McHardy AC, Meyer F, Neuweger H, Olsen G, Olson R, Osterman A, Portnoy V: The subsystems approach to genome annotation and its use in the project to annotate 1000 genomes. Nucleic Acids Res. 2005, 33 (17): 5691-5702. 10.1093/nar/gki866.PubMedPubMed CentralView ArticleGoogle Scholar
- Data for Candidatus Liberibacter asiaticus proteins in NCBI Nucleotide database. [http://www.ncbi.nlm.nih.gov/nuccore/CP001677]
- Kanehisa M, Goto S: KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 2000, 28 (1): 27-30. 10.1093/nar/28.1.27.PubMedPubMed CentralView ArticleGoogle Scholar
- Data for Candidatus Liberibacter asiaticus proteins in the KEGG database. [http://www.genome.jp/kegg-bin/show_genomemap_top?org_id=las]
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.