Charged residues next to transmembrane regions revisited: “Positive-inside rule” is complemented by the “negative inside depletion/outside enrichment rule”
© Eisenhaber et al. 2017
Received: 3 July 2017
Accepted: 7 July 2017
Published: 24 July 2017
The Erratum to this article has been published in BMC Biology 2017 15:72
Transmembrane helices (TMHs) frequently occur amongst protein architectures as means for proteins to attach to or embed into biological membranes. Physical constraints such as the membrane’s hydrophobicity and electrostatic potential apply uniform requirements to TMHs and their flanking regions; consequently, they are mirrored in their sequence patterns (in addition to TMHs being a span of generally hydrophobic residues) on top of variations enforced by the specific protein’s biological functions.
With statistics derived from a large body of protein sequences, we demonstrate that, in addition to the positive charge preference at the cytoplasmic inside (positive-inside rule), negatively charged residues preferentially occur or are even enriched at the non-cytoplasmic flank or, at least, they are suppressed at the cytoplasmic flank (negative-not-inside/negative-outside (NNI/NO) rule). As negative residues are generally rare within or near TMHs, the statistical significance is sensitive with regard to details of TMH alignment and residue frequency normalisation and also to dataset size; therefore, this trend was obscured in previous work. We observe variations amongst taxa as well as for organelles along the secretory pathway. The effect is most pronounced for TMHs from single-pass transmembrane (bitopic) proteins compared to those with multiple TMHs (polytopic proteins) and especially for the class of simple TMHs that evolved for the sole role as membrane anchors.
The charged-residue flank bias is only one of the TMH sequence features with a role in the anchorage mechanisms, others apparently being the leucine intra-helix propensity skew towards the cytoplasmic side, tryptophan flanking as well as the cysteine and tyrosine inside preference. These observations will stimulate new prediction methods for TMHs and protein topology from a sequence as well as new engineering designs for artificial membrane proteins.
Two decades ago, the classic concept of a transmembrane helical region was a rather simple story: Typical transmembrane proteins were thought to be anchored in the membrane by membrane-spanning bundles of non-polar α-helices of roughly 20 residues in length, with a consistent orientation of being perpendicular to the membrane surface. Although this is broadly true, hundreds of high-quality membrane structures have elucidated that membrane-embedded helices can adopt a plethora of lengths and orientations within the membrane. They are capable of just partially spanning the membrane, spanning using oblique angles, and even lying flat on the membrane surface [1, 2]. The insertion and formation of the transmembrane helices (TMHs) follow a complex thermodynamic equilibrium . From the biological function point of view, many TMHs have multiple roles besides being just hydrophobic anchors; for example, certain TMHs have been identified as regulators of protein quality control and trafficking mechanisms . As these additional biological functions are mirrored in the TMHs’ sequence patterns, TMHs can be classified as simple (just hydrophobic anchors) and complex sequence segments [5–7].
The relationship between sequence patterns in and in the vicinity of TMHs and their structural and functional properties, as well as their interaction with the lipid bilayer membrane, has been a field of intensive research in the last three decades . Besides the span of generally hydrophobic residues in the TMH, there are other trends in the sequence such as a saddle-like distribution of polar residues (depressed incidence of charged residues in the TMH itself), an enriched occurrence of positively charged residues in the cytosolic flanking regions as well as an increased likelihood of tryptophan and tyrosine at either flank edge [9–14]. These properties vary somewhat in length and intensity between various biological organelle membranes, between prokaryotes and eukaryotes  and even amongst eukaryotic species studied due to slightly different membrane constraints [9, 16]. These biological dispositions are exploitable in terms of transmembrane region prediction in query protein sequences [17, 18], and tools such as the quite reliable TMHMM (software for predicting TMHs based on a hidden Markov model), Phobius or the dense alignment surface-transmembrane filter (DAS-TMfilter) represent today’s prediction limit of TMHs’ hydrophobic cores within the protein sequence [19–25]. The prediction accuracy for true positives and negatives is reported to be close to 100%, and the remaining main cause of false positive prediction is hydrophobic α-helices completely buried in the hydrophobic core of proteins. Note that reliable prediction of TMHs and protein topology is a strong restriction for protein function of even otherwise non-characterised proteins [26–28] and thus provides very valuable information.
The “positive-inside rule” reported by von Heijne [2, 12] postulates the preferential occurrence of positively charged residues (lysine and arginine) at the cytoplasmic edge of TMHs. The practical value of positively charged residue sequence clustering in topology prediction of TMHs was first shown for the plasmalemma in bacteria [12, 29]. As a trend, the positive-inside rule has since been confirmed with statistical observations for most membrane proteins and biological membrane types [13, 30–32]. However, more recent evidence suggests that, in thylakoid membranes, the positive-inside rule is less applicable due to the co-occurrence of aspartic acid and glutamic acid residues together with positively charged residues .
The positive-inside rule also received support from protein engineering experiments that revealed conclusive evidence for positive charges as a topological determinant [12, 33–35]. Mutational experiments demonstrated that charged residues, when inserted into the centre of the helix, had a large effect on insertion capabilities of the TMH via the translocon. Insertion becomes more unfavourable when the charge is placed closer to the TMH core .
It remains unclear exactly why and how the positive charge determines topology from a biophysical perspective. Positively charged residues are suggested to be stronger determinants of topology than negatively charged residues due to a dampening of the translocation potential of negatively charged residues. This dampening factor is the result of protein-lipid interactions with the net-zero-charged phospholipid phosphatidylethanolamine and other neutral lipids. This effect favours cytoplasmic retention of positively charged residues .
The recent accumulation of transmembrane protein sequences and structures allowed us to revisit the problem of charged residue distribution in TMHs (see also http://blanco.biomol.uci.edu/mpstruc/). For example, whilst β-sheets contain charged residues in the transmembrane region, α-helices generally do not . Large-scale sequence analysis of TMHs from various organelle membrane surfaces in eukaryotic proteomes confirms the clustering of positive charge having a statistical bias for the cytosolic side of the membrane. At the same time, there are many TMH exception examples to the positive-inside rule; however, as a trend, topology can be determined by simply looking for the most positive loop region between helices [9, 13].
When the observation of positively charged residues preferentially localised at the cytoplasmic edge of TMHs emerged, it was also asked whether negatively charged residues work in concert with TMH orientation. It was shown that a single additional lysine residue can reverse the topology of a model Escherichia coli protein, whereas many more negatively charged residues are needed to achieve the same . Nevertheless, a sufficiently large negative charge can overturn the positive-inside rule [39, 40]; thus indeed, negative residues are topologically active to a point. Negatively charged residues were observed in the flanks of TMHs , especially in those of marginally hydrophobic transmembrane regions . It is known that the negatively charged acidic residues in transmembrane regions have a non-trivial role in the biological context. In E. coli, negative residues experience electrical pulling forces when travelling through the SecYEG translocon, indicating that negative charges are biologically relevant during the electrostatic interactions of insertion [42, 43].
Unfortunately, there is a problem with statistical evidence for preferential negative charge occurrence next to TMH regions. Early investigations indicated that overall both positive and negative charge were influential topology factors; this idea was dubbed the charge balance rule. If true, one would also expect to see a skew in the negative charge distribution if a cooperation between oppositely charged residues oriented a TMH [29, 44]. It might be expected that, if positive residues force the loop or tail to stay inside, negative residues would be drawn outside, and the topology would be determined, not unlike electrophoresis. Yet, there are plenty of individual protein examples but no conclusive statistical evidence in the current literature for a negatively charged skew [9, 13, 14, 16, 31, 45].
There are many observations described in the literature that charged residues determine topology more predictably in single-pass proteins than in multi-pass TMHs [40, 46]. It is thought that the charges only determine the initial orientation of the TMH in the biological membrane; yet, the ultimate orientation must be determined together with the totality of subsequent downstream regions .
With sequence-based hydrophobicity and volume analysis and consensus sequence studies, Sharpe et al.  demonstrated that there is asymmetry in the intra-membranous space of some membranes. Crucially, this asymmetry differs amongst the membranes of various organelles. They conclude that there are general differences between the lipid composition and organisation in membranes of the Golgi and endoplasmic reticulum (ER). Functional aspects are also important. For example, the abundance of serines in the region following the luminal end of Golgi TMHs appears to reflect the fact that this part of many Golgi enzymes forms a flexible linker that tethers the catalytic domain to the membrane .
A study by Baeza-Delgado et al.  analysed the distribution of amino acid residue types in TMHs in 170 integral membrane proteins from a manually maintained database of experimentally confirmed TMPs (MPtopo ) as well as in 930 structures from the Protein Data Bank (PDB). As expected, half of the natural amino acids are equally distributed along TMHs, whereas aromatic, polar and charged amino acids along with proline are biased near the flanks of the TMHs. Unsurprisingly, leucine and other non-polar residues are far more abundant than the charged residues in the transmembrane region .
In this work, we revisit the issue of statistical evidence for the preferential distribution of negatively charged (and a few other) residues within and nearby TMHs. We rely on the improved availability of comprehensive and large sequence and structure datasets for transmembrane proteins. We also show that several methodological aspects have hindered previous studies [9, 13, 16] from seeing the consistent non-trivial skew for negatively charged residues disfavouring the cytosolic interfacial region and/or preferring the outside flank. First, we show that acidic residues are especially rare within and in the close sequence environment of TMHs, even when compared to positively charged lysine and arginine. Second, therefore, the manner of normalisation is critical: Taken together with the difficulty of properly aligning TMHs relative to their boundaries, column-wise frequency calculations relative to all amino acid types as in previous studies will blur possible preferential localisations of negative charges in the sequence. However, the outcome changes when we ask where a negative charge occurs in the sequence relative to the total amount of negative charges in the respective sequence region. Thus, by accounting for the rarity of acidic residues with sensitive normalisation, the “non-negative inside rule/negative-outside rule” is clearly supported by the statistical data. We find that minor changes in the flank definitions, such as taking the TMH boundaries from the database or generating flanks by centrally aligning TMHs and applying some standardised TMH length, do not have a noticeable influence on the charge bias detected.
Third, there are significant differences in the distribution of amino acid residues between single-pass and multi-pass transmembrane regions in both the intra-membrane helix and the flanking regions, with further variations introduced by taxa and by the organelles along the secretory pathway. Importantly, we find that it is critical to weigh down the effect of TMHs in multi-pass transmembrane proteins with no or super-short flanks to observe statistical significance for the charge bias. Bluntly stated, if there are no flanks of sufficient length, there is also no negative charge bias to be observed.
The charge bias effect is even clearer when a classification of TMHs into so-called simple TMHs (which, as a trend, are mostly single-pass and mere anchors) and so-called complex ones (which typically have functions beyond anchorage) is considered [5–7]. We also observe parallel skews with regard to leucine, tyrosine, tryptophan and cysteine distributions. With these large-scale datasets and a sensitive normalisation approach, new sequence features are revealed that provide spatial insight into TMH membrane anchoring, recognition, helix-lipid, and helix-helix interactions.
Acidic residues within and nearby TMH segments are rare
Acidic residues are rarer in TMHs of single-pass proteins than in TMHs of multi-pass proteins
Acidic residues (D and E)
Aspartic acid (D only)
Glutamic acid (E only)
H statistic P value
H statistic P value
H statistic P value
The effect is most pronounced in single-pass transmembrane proteins (Fig. 1a). There are only 666 glutamates (just 1.24% of all residues) and 560 aspartates (1.05% respectively) amongst the total set of 53,238 residues comprising 1705 TMHs and their flanks. Within just the TMH regions, there are 71 glutamates (0.20% of all residues in TMHs and flanks) and 58 aspartates (0.16% respectively). This cannot be an artefact of UniProt TMH assignments since this feature is repeated in ExpAll. There are only 582 glutamates (1.22%) and 520 aspartates (1.09%) amongst the 47,568 residues involved. Within the TMH itself, there are 64 glutamates (0.19%) and 69 aspartates (0.21%). In both cases, the negatively charged residues represent the ultimate end of the distribution. Note that acidic residues are rare even compared to positively charged residues, which are about three to four times more frequent. On a much smaller dataset of single-spanning transmembrane proteins, Nakashima and Nishikawa  made similar compositional studies. To compare, they found 0.94% glutamate and 0.94% aspartate within just the TMH region (these values are very similar to ours from TMHs with small flanks; apparently, they used more outwardly defined TMH boundaries), but the content of each glutamate and aspartate within the extracellular or cytoplasmic domains is larger by an order of magnitude, between 5.26% and 9.34%. These latter values tend to be even higher than the average glutamate and aspartate composition throughout the protein database (5–6% ).
In the case of multi-pass transmembrane proteins (Fig. 1c and d), glutamates and aspartates are still very rare in TMHs and their ±5 residue flanks (1.94% and 1.92% from the total of 377,207 in the case of UniHuman, 1.79% and 1.70% from the total of 454,700 in the case of ExpAll). Yet, their occurrence is similar to those of histidine and tryptophan and, notably, acidic residues are only about ~1.5 times less frequent than positively charged residues. The observation that acidic residues are more suppressed in single-pass TMHs compared with multi-pass TMHs is statistically significant. In Table 1, the acidic residues are counted in the helices (excluding flanking regions) belonging to either multi-pass or single-pass helices. Indeed, single-pass helices appear to tolerate negative charge to a far lesser extent than multi-pass helices, as the data in the top two rows of Table 1 indicate (for datasets UniHuman and ExpAll). The trend is strictly observed throughout subcellular localisations (rows 3–5 in Table 1) and taxa (rows 6–10). Statistical significance (P ≤ 0.001) is found in all but six cases. These are UniEcoli (D + E, D, E), UniArch (D + E, E) and UniFungi (E). The problem is, most likely, that the respective datasets are quite small. Notably, the difference between single- and multi-pass TMHs is greatest in UniPM; here, TMHs from multi-pass proteins have on average 0.400 negative residue per helix, whereas single-pass TMHs contained just 0.039 (P = 3.86e-28).
Amino acid residue distribution analysis reveals a “negative-not-inside/negative-outside” signal in single-pass TMH segments
The trends become clearer if the occurrence of specific residues is normalised with the total number of residues of the given amino acid type in the dataset observed in the sequence region studied as shown for UniHuman in Fig. 2c and for ExpAll in Fig. 2d. For comparison, we indicated background residue occurrences (dashed lines calculated as averages for positions –25 to –30 and 25 to 30). The respective average occurrences in the inside and outside flanks (calculated from an average of the values at positions –20 to –10 and 10 to 20 respectively) are shown with wide lines.
The “positive-inside rule” becomes even more evident in this normalisation: Whereas the occurrence of positively charged residues is about the background level at the outside flank, it is about two to three times higher both for the UniHuman and the ExpAll datasets at the inside flank. Note that the background level was found to be 1.7% (lysine) and 1.6% (arginine) in UniHuman and 1.4% (lysine and arginine) in ExpAll. The inside flank average is 4.3% (lysine) and 4.6% (arginine) in UniHuman and 4.2% (lysine) and 4.6% (arginine) in ExpAll. The outside flank is similar to the background noise levels: about 1.4% (lysine) and 1.5% (arginine) in UniHuman and about 1.5% (lysine) and 1.4% (arginine) in ExpAll.
Most interestingly, a "negative-inside depletion" trend for the negatively charged residues is apparent from the distribution bias. The inside flank averages for glutamic acid were 1.1% and 1.4% in UniHuman and ExpAll respectively; for aspartic acid, 1.2% and 1.4% in UniHuman and ExpAll respectively. Meanwhile, the outside flanks for aspartic acid and glutamic acid occurrences were measured at 2.9% and 2.4% respectively in UniHuman, and in ExpAll, these values for aspartic acid and glutamic acid were found to be 2.5% and 2.1% respectively. Against the background level of aspartic acid (2.8% and 2.9% in UniHuman) and glutamic acid (2.6% and 2.9% in ExpAll), the inside flank averages were found to be about 2 to 3 times lower than the background level whilst the outside flank averages were comparable to the background level (Fig. 2c and d). Taken together, this indicates a clear suppression of negatively charged residues at the inside flank of single-pass TMHs and a possible trend for negatively charged residues occurring preferentially at the outside flank. This is not an effect of the flank definition selection since the trend remains the same when using the database-defined flanks without the context of the TMH (Fig. 2e and f). For UniHuman (Fig. 2e), the negative charge expectancy on the inside flank does not reach above 2% until position –10 (D) and position –11 (E), whereas, on the outside flank, both D and E start >2%. The same can be seen in ExpAll (Fig. 2f), where negative residues reach above 2% only as far from the membrane boundary as at position –9 (D) and position –7 (E) on the inside but exceed 2% beginning with positions 1 (D) and 3 (E) on the outside.
Statistical significances for negative charge distribution skew on either side of the membrane in single-pass TMHs
Flanks after central alignment
Amino acid residue distribution analysis reveals a general negative charge bias signal in outside flank of multi-pass TMH segments: the negative-outside enrichment rule
With regard to the positive-inside preference, positively charged residues have a background value of 2.0% for arginine and 2.2% for lysine in UniHuman, and 1.7% for arginine and 1.9% for lysine in ExpAll. At the inside flank, this rises to 4.6% for arginine and 4.1% for lysine in UniHuman and 4.6% for arginine and 4.2% for lysine in ExpAll. The mean net charge at each position was calculated for multi-pass and single-pass datasets from UniHuman and ExpAll (Additional file 1: Figure S1). The positive-inside rule clearly becomes visible, as the net charge has a positive skew approximately between residues –10 and –25. What is noteworthy is that the peaks found for single-pass helices were almost three times greater than those of multi-pass helices. For single-pass TMHs, the peak is +0.30 at position –15 in UniHuman and +0.31 at position –14 in ExpAll, whereas TMHs from multi-pass proteins had lower peaks of +0.15 at position –13 in UniHuman and +0.10 at position –14 in ExpAll. Thus, there is a positive charge bias towards the cytoplasmic side; yet, it is much weaker for multi-pass than for single-pass TMHs.
Statistical significances for negative charge distribution skew on either side of the membrane in multi-pass TMHs
Flanks after central alignment
Database-defined viable* flanks
Surprisingly, the result could not straightforwardly be repeated with the considerably smaller ExpAll. Under condition (1), we find with ExpAll that aspartic acid has a background level of 1.0%, an average of 2.6% on the inside flank and of 2.9% on the outside flank, but glutamic acid’s background is 1.2% but 2.8% on the inside flank and 2.5% on the outside flank. Statistical tests do not support finding a negative charge bias in conditions (1) and (2). Apparently, the problem is TMHs having no or almost no flanks at one of the sides. Statistical significance for the negative charge bias is detected as soon as this problem is dealt with — either by allowing extension of flanks overlap amongst neighbouring TMHs as in condition (3) or by excluding examples without proper flank lengths from the dataset as in condition (4). The respective P values under these conditions are 2.05e-6 and 9.81e-15.
The issues we had with ExpAll raised the question that sequence redundancy in the UniHuman set may have played a role. Therefore, we repeated all calculations but with UniRef50 instead of UniRef90 for mapping into sequence clusters (see the Methods section for details). We were surprised to see that harsher sequence redundancy requirements do not affect the outcome of the statistical tests in any major way. For the conditions (1)–(4), we computed the following P values: (1) 1.31e-28 (5940 negatively charged residues inside vs 7492 outside), (2) 1.38e-36 (5516 vs 7320), (3) 5.60e-53 (7089 vs 9233) and (4) 4.18e-41 (4232 vs 5730).
So, the amplifying effect of some subsets in the overall dataset on the statistical test that might be caused by allowing overlapping flanks (condition (3)) is not the major factor leading to the negative charge skew. Similarly, the trend is also not caused by sequence redundancy. Thus, we have learned that the negative charge bias does also exist in multi-pass transmembrane proteins but under the conditions that there are sufficiently long loops between TMHs. Bluntly stated: No loops equals to no charge bias. As soon as the loops reach some critical length, there are differences between single-pass and multi-pass TMHs with regard to occurrence and distribution of negative charges and the inside-suppression/outside-enrichment negative charge bias appears. Not only are there more negative charges within the multi-pass TMH itself (in fact, negative charges are almost not tolerated in single-pass TMHs; see Table 1), but also, there is a much stronger negative-outside skew in the TMHs of single-pass proteins than in those of multi-pass proteins.
Further significant sequence differences between single-pass and multi-pass helices: distribution of tryptophan, tyrosine, proline and cysteine
In accordance with expectations, enrichment for hydrophobic residues in the TMH, for the positively charged residues on the inside flank as well as a distribution for the negative distribution bias, was found in both datasets. Additionally, the inside interfacial region showed consistent enrichment hotspots for tryptophan (e.g. 7.1% at position –11 in ExpAll, 6.2% at position –10 in UniHuman with flanks after central TMH alignment) and tyrosine (6.4% at –11 in ExpAll, 7.1% at –11 in UniHuman), and some preference can also be seen for the outer interfacial region (e.g. 5.2% at position 11 for tryptophan in ExpAll and 5.8% at position 10 for tryptophan in UniHuman), albeit the “hot” cluster of the outer flank covers fewer positions than that of the inner flank. Further, there is an apparent bias of cysteine on the inner flank and interfacial region (e.g. 5.5% at position –10 in ExpAll, 5.9% at position –11 in UniHuman) and a depression in the outer interfacial region and flank (up to a minimum of 0.3% in both ExpAll and UniHuman). Proline appears to have a depression signal on the outer flank. Note that, in a similar way to Figs. 2 and 3, the distributions of the flanks derived from centrally aligned TMHs are corroborated by the distributions from the database-defined TMH boundary flanks (see outside bands in Fig. 4a–d).
A similar heatmap was generated for UniHuman multi-pass TMHs (Fig. 4c; from 12,353 TMHs with flanks having 452,708 residues) and ExpAll multi-pass (Fig. 4d; from 15,563 TMHs with flanks having 535,599 residues). Whereas the heatmaps of Fig. 4a–c appear quite noisy, the plot for ExpAll multi-pass TMHs appears almost to have undergone Gaussian-like smoothing, thus, indicating the quality of this dataset. Tyrosine and tryptophan in the multi-pass case do not appear as enriched in the interfacial regions of single-pass TMHs from both UniHuman and ExpAll. Prolines are only suppressed in the TMH itself and are not suppressed in the outer flank as in the single-pass case but, indeed, are tolerated if not slightly enriched in the flanks.
Hydrophobicity and leucine distribution in TMHs in single- and multi-pass proteins
Leucine is the most abundant residue in TMHs (Fig. 1) and is considered one of the most hydrophobic residues by all hydrophobicity scales. Therefore, it plays a very influential role in TMH helix-helix and lipid-helix interactions in the membrane and recognition by the insertion machinery. When looking at the difference in the abundance of leucine between the inner and outer halves, we find that TMHs from single-pass proteins have a trend to contain more leucine residues at the cytoplasmic side of TMHs (see Figs. 2 and 4).
Leucines at the inner and outer leaflets of the membrane in TMHs
A negative-outside (or negative-not-inside) signal is present across many membrane types
We explored the presence of amino acid residue compositional skews described above for human transmembrane proteins for those in other taxa and also specifically for human proteins with regard to membranes at various subcellular localisations. Acidic residues for TMHs from single-pass and multi-pass helices were plotted according to their relative percentage distributions (of the total amount of this residue type in the respective segment) for five taxon-specific datasets: UniCress (Fig. 6a), UniFungi (Fig. 6b), UniEcoli (Fig. 6c), UniBacilli (Fig. 6d), UniArch (Fig. 6e), and for three organelle-specific datasets: UniER (Fig. 6f), UniGolgi (Fig. 6g), UniPM (Fig. 6h).
For single-pass proteins in all taxon-specific datasets (with the exception of UniArch), there are more negative residues at the outside than at the inside. The skew is statistically significant (see Table 2, P < 0.001) except for UniBacilli. However, despite statistical significance found for UniFungi (P value = 1.12e-7 for database-defined and P value = 6.79e-10 for flanks after central alignment; Table 2), the trend is not very strong in this case (Fig. 6b). Whereas the skew is just a suppression of negatively charged residues at the inside flank for ExpAll and UniHuman (as well as in UniCress), the bias observed for UniEcoli also involves a negative charge enrichment at the outside flank. In the case of UniArch (Fig. 6e), we see a negative inside preference that is 6.0% in the case of aspartic acid and 6.3% for glutamic acid (not shown), with much lower values close to 0% on the outside. Whilst the difference is statistically significant for both TMHs (Table 2) from single-pass proteins (P value = 1.83e-12 and P value = 1.43e-11 for two versions of flank determination) and multi-pass proteins (P values 4.72e-3, 7.81e-3, 1.28e-4 for three versions of flank determination, see Table 3A and B), the distribution along the position axis is heavily fluctuating, perhaps as a result of the small size of the dataset. However, one can assuredly assign a “negative-inside” tendency to the flanking regions of Archaean TMHs.
In the human organelle datasets, we see trend shifts at different stages in the secretory pathway. In UniER, there is an enrichment of negative charge on the outside flank of 1–1.5% that is comparable to the magnitude of the positive inside signal. In UniGolgi, there is a suppression of negatively charged residues on the inside flank as well as an enrichment on the inside flank resulting in ~2% distribution difference. For UniPM, there is a negative-inside suppression (but no outside enrichment) as well as a positive-inside signal. All observed trends are statistically significant (see Table 2, P < 1.e-5).
For multi-pass TMH proteins, we either see the same trends but in a weaker form, or no skews are observed at all, as inspection of the graphs in Fig. 6 shows. For datasets UniER, UniGolgi, UniCress, UniFungi and UniBacilli, the hypothesis of equal distribution of negatively charged residues cannot be rejected (P value > 0.001, see Table 3); thus, a skew is statistically non-significant. Although UniPM has a statistically significant bias (P value < 4.30e-12, Table 3), the trends are more subtle and most present for aspartic acid of UniPM. We see many more negative and positive charges tolerated within the multi-pass TMHs themselves throughout all datasets (Table 1). We note that there is a positive-inside rule for all multi-pass datasets studied herein.
To conclude, we find that negative charge bias distribution is a feature of single-pass protein TMHs that is present across many membrane types, and it can have the form of a negative charge suppression at the inside flank or an enrichment of those charges at the outside flank.
Amino acid compositional skews in relation to TMH complexity and anchorage function
Simple TMHs are less similar than complex TMHs to TMHs from multi-pass proteins in UniHuman
P values for χ2
Bahadur slopes for χ2
R + K
D + E
P values for Kolmogorov-Smirnov
Bahadur slopes for Kolmogorov-Smirnov
R + K
D + E
P values for Kruskal-Wallis
Bahadur slopes for Kruskal-Wallis
R + K
D + E
Simple TMHs are less similar than complex TMHs to TMHs from multi-pass proteins in ExpAll
P values for χ2
Bahadur slopes for χ2
R + K
D + E
P values for Kolmogorov-Smirnov
Bahadur slopes for Kolmogorov-Smirnov
R + K
D + E
P values for Kruskal-Wallis
Bahadur slopes for Kruskal-Wallis
R + K
D + E
The many low P values in Tables 5 and 6 indicate significant differences between the three distributions studied. For the UniHuman dataset (Table 5), we find the most striking, significant differences between charged residue distributions (R, K, D, E) of simple and complex single-pass TMH + flank regions (χ2 P value < 2.23e-3 for single amino acid types). Similarly, simple single-pass TMH + flank segments differ significantly from multi-pass TMH + flank segments (KW test P values < 3.e-2 for R, K, D, E, Y, W amino acid types as well as for K + R and D + E). The trends are the same for the ExpAll dataset (Table 6): simple and complex single-pass TMH + flank regions differ in charged amino acid type distributions (χ2 P value < 4.21e-3 for all cases), as do simple single-pass and multi-pass ones (KW test P values < 5.e-2 for R, D, E, Y, W amino acid types and D + E).
Whereas P value tests for significant differences between distributions depend strongly on the amount of data, the more informative Bahadur slopes that measure the distance from the zero hypothesis are independent of the amount of data [55–57]. As we can see in Tables 5 and 6, the absolute Bahadur slopes for the simple single-pass to multi-pass comparison are always larger (even by at least an order of magnitude): (1) for all three statistical tests applied (χ2, KS and KW), (2) for all amino acid types, for K + R and E + D and (3) for both datasets UniHuman and ExpAll. Thus, complex single-pass TMH + flanks have compositional properties that are indeed very similar to those of multi-pass ones (which are known to have a large fraction of complex TMHs [6, 7]). This strong evidence implies that the actual issue is not so much about single- and multi-pass TMH segments but between simple and complex TMHs: The first are exclusively guided by the anchor requirements, whereas the latter have more complex restraints to fulfil.
Several distribution features of simple TMHs from single-pass proteins, when compared to complex TMHs from single-pass proteins and TMHs from multi-pass proteins, that contribute to the statistical differences (Fig. 7) are especially notable. There is a more pronounced trend for positively charged residues and tyrosine to be preferentially located on the inside flanks and for negatively charged residues to be on the outside flanks. The symmetrical peaks in the percentage distribution of tyrosine in complex single-pass TMHs are more akin to multi-pass TMHs, whereas in simple TMHs the distribution resembles a more typical single-pass helix (compare with Fig. 3). Furthermore, the depression of charged residues within the TMH itself is strongest in simple single-pass TMHs.
To emphasise, tryptophan is essentially not tolerated within the simple TMHs, and there are higher peaks of tryptophan occurrence at either flank. We also see a strong inside skew for leucine clustering within the core of simple TMHs which is not present in the “flatter” distributions of complex single-pass TMHs and TMHs from multi-pass proteins.
There is obviously a cysteine-inside preference for simple, single-pass TMHs but less in complex, multi-pass TMHs (Fig. 7). This conclusion is contrary to that of a previous study , but that deduction was drawn from a much smaller dataset of 45 single-pass TMHs and 24 multi-pass transmembrane proteins.
The “negative-not-inside/negative-outside” skew in TMHs and their flanks is statistically significant
We have seen that, consistently throughout the datasets, there is a trend for generally rare negatively charged residues to prefer the outside flank of a TMH rather than the inside (and to almost completely avoid the TMH itself), be it by suppression on the inside and/or enrichment on the outside. The trend is much stronger in single-pass protein datasets than in multi-pass protein datasets. However, as we have elaborated, the real crux of the bias appears to be associated with the TMH being simple or complex [6, 7] and, thus, whether or not the TMH has a role beyond anchorage. The existence of this bias has implications for topology prediction of proteins with TMHs, engineering membrane proteins and also for models of protein transport via membranes and protein-membrane stability considerations.
It should be noted that the controversy in the scientific community about the existence of a negative charge bias at TMHs was mainly with regard to multi-pass transmembrane proteins. Despite having access to much larger, better annotated sequence datasets and many more three-dimensional (3D) structures than our predecessors, we also had our share of difficulties here (see the Results section titled: Amino acid residue distribution analysis reveals a general negative charge bias signal in outside flank of multi-pass TMH segments: the negative-outside enrichment rule and Table 3). The straightforward approach results in inconclusive statistical tests if datasets become small (for example, if selections are restricted to subcellular localisations or 3D structures or if very harsh sequence redundancy criteria are applied) and, especially, if TMHs with very short or no flanks are included. Therefore, in the case of multi-pass proteins, we studied flanks as taken from the transmembrane boundaries in the databases under several conditions: (1) without allowing flank overlap between neighbouring TMHs, (2) as a subset of (1) but with requiring some minimal flank length at either side and (3) with overlapping flanks. We also studied flanks after central alignment of TMHs and assuming standardised TMH length. Multi-pass TMHs (without overlapping flanks) do not show statistically significant negative charge bias under condition (1) but, apparently, because of many TMHs without any or super-short flanks, at least at one side. Significance appears as soon as subsets of TMHs with flanks at both sides are studied. Not surprisingly, there is no charge bias if there are no flanks in the first place. It is perhaps worth noting that the results from multi-pass TMHs with overlapping flanks may involve amplification of skews since this involves multiple counting of the same residues. Given the redundancy threshold of UniRef90, we cannot rule out that these statistical skews are the result of a trend from only a small subgroup of TMPs which is being amplified. Hence, we also needed to observe if these same observed biases were true in condition (2), which is indeed the case.
As the "negative-not-inside/negative-outside” skew is widely observed amongst varying taxa and subcellular localisations with statistical significance, it appears, at least to a certain extent, to be caused by physical reasons and to be associated with the background membrane potential. Several earlier considerations and observation support this thought: (1) Firstly, a concert between the negative and positive charge on the TMH flanks drives anchorage and the direction of insertion of engineered TMHs [29, 44]. (2) Secondly, the inner leaflet of the plasmalemma tends to be more negatively charged . Specifically, phosphatidylserine was found to distribute in the cytosolic leaflets of the plasma membrane, and it was found to electrostatically interact with moderately positive-charged proteins enough to redirect the proteins into the endocytic pathway . The negative charge of proteins at the inside of the plasma membrane would decrease the anchoring potency of the TMH via electrostatic repulsion. (3) Thirdly, in membranes that maintain a membrane potential, there are inevitably electrical forces acting on charged residues during chain translocation, as this influences the translocon machinery when orienting the TMH. Therefore, it is no surprise that we see an inside-outside bias for negatively charged residues that is opposite to the one for positively charged residues. The negative charges in TMH residues have been shown to experience an electrical pulling force as they pass through the bacterial SecYEG translocon import [42, 43]. Also, they are known to be involved in intra-membrane helix-helix interactions . For example, aspartic acid and glutamic acid can drive efficient di- or trimerisation of TMHs in lipid bilayers and, furthermore, aspartic acid interactions with neighbouring TMHs can directly increase insertion efficiency of marginally hydrophobic TMHs via the Sec61 translocon . In support of this, less acidic residues are found in single-pass TMHs, amongst which only some will undergo intra-membrane helix-helix interactions. As the mutation studies have shown negative charge as a topological determinant , it is perhaps no surprise that we observe a skew in negatively charged residues in a similar manner to the skew in positively charged residues.
Whereas the "negative-not-inside/negative-outside” skew is observed for distantly related eukaryotic species, and it is also present in Gram-negative bacteria such as E. coli, this sequence pattern was not observed for the Gram-positive bacteria, in which there is no observable bias. In contrast, Archaea have a statistically significant “negative-inside” propensity both for single- and multi-pass TM proteins. It is known that Archaea have remarkably different membranes compared to other kingdoms of life due to their extremophile adaptations to stress . Whilst it is unclear why negative charge is distributed so differently in UniArch compared to the other taxonomic datasets, one must appreciate that a much more nuanced approach would be needed to draw formal conclusions about Archaea, which current databases cannot provide due to the relatively limited information and annotation of Archaean proteomes.
Methodological issues made previous studies struggle to identify negatively charged skews with statistical significance
Whereas the influence of a negative charge bias in engineered proteins with transmembrane regions on the direction of insertion into the membrane was solidly established [35, 39, 40, 45, 62], the search for the negative charge distribution pattern in the statistics of sequences of transmembrane proteins from databases failed to find significance for the expected negative charge skew [9, 13, 14, 16, 31, 45].
Acidic residues are rare near and within TMHs, and biases in their distribution are easily blurred by minor fluctuations of much more frequent amino acid types, most notably leucine. Therefore, the method of normalisation is critical. We have shown that normalising by the total amount of residues of the amino acid type studied within the sequence region under consideration is appropriate to answer the question of where to find a negatively charged residue if there is any at all (called “relative percentage” in this work).
The alignment of the TMHs is critical. It was common practice to align the TMH according to the most cytosolic residue , although it is known that the membrane/cytosol boundary of the TMH is not well defined (and the exact boundary is even less well understood at the non-cytosolic side). Aligning the transmembrane regions and their flanks from the centre of the TMH was first proposed by Baeza-Delgado et al. . Since we know now that acidic residues are often suppressed in the cytosolic flank and within the TMH, this implies that the few acidic residues found in the cytosolic interface would appear more comparable to those in the poorly defined non-cytosolic interface, as the respective residues are spread over more potential positions, diminishing any observable bias.
We find that separation into single- and multi-pass transmembrane datasets (or, even better, simple and complex transmembranes [6, 7]) is critical to study the inside/outside bias. As many TMHs in multi-pass transmembrane proteins have essentially no flanks or very short flanks if the condition of non-overlap is applied to flanks of neighbouring TMHs, this might also obscure the observation of the negative charge bias. If there are no flanks, then there will be no residue distribution bias in these flanks. The problem can be alleviated by either studying only subsets with minimal flank lengths on both sides (although datasets might become too small for statistical analysis) or by allowing flank overlaps between neighbouring TMHs.
This classification is even more justified in the light of previous reports about the “missing hydrophobicity” in multi-pass TMHs [36, 63–65]. Otherwise, the distribution bias well observed amongst the exclusive anchors could be lost to noise. This addresses the more biologically contextualised issue that there are different evolutionary pressures on different types of TMHs. The negative charge skew is most pronounced for dedicated anchors frequently found with simple TMHs typically observed in single-pass TM proteins. These TMHs are pressured to exhibit residue biases that may aid anchorage in a topologically correct manner. Complex TMHs, typically within multi-pass membrane proteins that have a function beyond anchorage, comply with a multitude of structural and functional constraints, and the negative charge skew is just one of them.
The most representative precedent papers are those of Sharpe et al.  from 2010 (with 1192 human and 1119 yeast single-pass TMHs), Baeza-Delgado et al.  (with 792 TMHs mixed from single- and multi-pass TM proteins) and Pogozheva et al.  (TMHs from 191 mixed from single- and multi-pass TM proteins with structural information), both from 2013. Whereas the first analysis would have benefitted from the central alignment approach and the first two studies from another normalisation as described above, the third study did come close to our findings. To note, their dataset mixed with single- and multi-pass proteins was too small for revealing the negative charge bias with significance; yet, they observed total charge differences at either side of the membrane varying for both single- and multi-pass proteins. Membrane asymmetry due to positively charged residues occurring more frequently on the cytosolic side causes net charge unevenness at both sides of the membrane. This observation has been known to correlate with orientation for decades [12, 13, 60]. Our data show that the negative charge skew contributes to this asymmetry.
There are differences in charged amino acid residue biases in TMH flanks through each stage of the secretory pathway
Here, we observe differences throughout subcellular locations along the secretory pathway. We found that negative charges are enriched at the outside flank (in the ER), both enriched outside and suppressed inside for the Golgi membrane and suppressed on the inside flank in the plasma membrane (PM). It has been suggested that the leaflets of different membranes have different lipid compositions throughout the secretory pathway , and this has led to general biochemical conservation in terms of TMH length and amino acid composition in different membranes [9, 16].
Lipid asymmetry in the Golgi and PM (in contrast to the ER) has been known for more than a decade [67, 68]. To note, the Golgi and PM have lipid asymmetry with sphingomyelin and glycosphingolipids on the non-cytosolic leaflet and phosphatidylserine and phosphatidylethanolamine enriched in the cytosolic leaflet. Although the ER is the main site for cholesterol synthesis, it has markedly low concentrations of sphingolipids . The Golgi synthesises sphingomyelin, a lipid not present in the ER, but present in both the Golgi  and in the PM [71, 72]. The PM is also enriched with densely packed sphingolipids and sterols . Another factor influencing the sequence patterns of TMHs and their flanking regions along the secretory pathway appears to be the variation in membrane potentials [74–76].
Several sequence features can be assigned to anchor TMHs: charged-residue flank biases, leucine intra-helix asymmetry and the “aromatic belt”
We investigated the difference between TMHs from single-pass and multi-pass proteins and found significant differences in sequence composition that are reflective of the biologically different roles the TMHs play. To emphasise and validate these findings, we separated TMHs from single-pass proteins into simple and complex TMHs [6, 7]: one type that likely contains mostly TMHs that act as exclusive anchors, and another that has roles beyond anchorage. This leaves us with “anchors” (simple TMHs from single-pass proteins) and “non-anchors” (complex TMHs from single-pass proteins and TMHs from multi-pass proteins). If there are strong sequence feature differences between anchors and non-anchors, it is likely that the sequence feature has a role in satisfying membrane constraints to act as an energetically optimally stable anchor.
Future studies in the area would desirably directly include a comprehensive analysis of datasets of oligomerised TMHs from single-pass proteins and ascertain if they appear to be more similar to simple anchors, multi-pass proteins or generally neither. Currently, no sufficiently complete set of intra-membrane oligomerised single-pass proteins exists that can be compared to a large set of known non-oligomerising proteins. The current work sidesteps this issue by comparing single-pass proteins with simple TMHs, which tend to be simple anchors (as shown in previous work [6, 7]), against datasets that contain TMHs that will form intra-membrane bundles. Bluntly, the simple/complex status of a TMH can be easily computed from its sequence with TMSOC, whereas the oligomerisation state of most membrane proteins still needs to be experimentally determined.
Unsurprisingly, both positively and negatively charged residues can be seen to be more strongly distributed with bias in anchors than non-anchors. Both the “positive-inside” rule and the “negative-not-inside/negative-outside” bias are mostly observable in simple single-pass TMHs (although they are statistically significant elsewhere). It is perhaps true that where a bias is clearly present in both non-anchors and anchors alike, it is a strong topological determinant, whereas if the residue is only distributed with topological bias in exclusively anchoring TMHs, we can attribute these features more specifically to biophysical anchorage. This being said, we should not rule out that the same features aid topological determination, since negative charge has been shown to be a weaker topological determinant than positively charged residues .
Tyrosine and tryptophan residues commonly are found at the interfacial boundaries of the TMH, and this feature, called the "aromatic belt'' [9, 13, 14, 31, 36], was thought to be caused by their affinity to the carbonyl groups in the lipid bilayer . Not all types of aromatic residues are found in the aromatic belt; phenylalanine has no particular preference for this region [14, 78]. It is still unclear if the aromatic belt has to do with anchorage or with translocon recognition . Here, TMHs with exclusively anchorage functions showed stronger preferences for the W and Y in the aromatic belt region, otherwise known as the water-lipid interface region, than TMHs with function beyond anchorage. This is strong evidence that the aromatic belt indeed assists with anchorage and is less conserved where the TMH must conform to other restraints beyond membrane anchorage. Furthermore, we see that tyrosine's preference for the inside interface region also appears to be involved with anchorage, and this trend is somewhat true for tryptophan, too.
Finally, our findings corroborate earlier reports that many multi-pass TMHs are much less hydrophobic than typical single-pass TMHs and about 30% of them fail the hydrophobicity requirements of ΔG TMH insertion prediction (“missing hydrophobicity”) [36, 63–65]. We also find that the leucine skew and the hydrophobic asymmetry towards the cytosolic leaflet of the membrane are more pronounced in simple, single-pass TMHs than in complex or multi-pass ones; thus, they appear to be another anchoring feature. It was found previously that the hydrophobic profiles of TMHs of multi-pass proteins share similar hydrophobicity profiles on average irrespective of the number of TMHs, and TMHs from single-pass proteins have been found to be typically more hydrophobic than TMHs from multi-pass proteins . Sharpe et al.  report an asymmetric hydrophobic length for single-pass TMHs. Our study reiterates the hydrophobic asymmetry and attributes it mainly to the leucine distribution. The leucine asymmetry might be linked to the different lipid compositions of either leaflet of biological membranes.
To conclude, the large fraction of functionally uncharacterised genomic sequences is the great bottleneck in life sciences at this moment that hinders many biomedical and biotechnological applications, some with tremendous societal need [27, 79]. Amongst these uncharacterised genomic regions, there are ~10,000 protein-coding genes, especially many membrane-embedded proteins. It is hoped that the NNI/NO rule as well as the other sequence properties of membrane anchoring TMHs described in this article will add new insights for membrane protein function discovery, design and engineering.
All datasets used for analysis are listed in Table 1. Transmembrane protein sequences and annotations were taken from TOPDB  and UniProt . UniProt-derived datasets are the most comprehensive datasets, built with (1) robust transmembrane prediction methods, providing the limit of today’s achievable accuracy with regard to hydrophobic core localisation, and (2) subcellular location annotation that can be used for orientation determination. However, they mostly rely on predicted transmembrane regions. TOPDB has meticulous experimental verifications of the orientation from the literature that are independent of prediction algorithms . Unfortunately, this dataset is much smaller with too few entries to have it divided with regard to taxonomy or subcellular locations.
UniProt database files were downloaded by querying the server for different taxonomic groups as well as different subcellular membrane locations: UniHuman (human representative proteome), UniCress (Arabidopsis thaliana, otherwise known as mouse-ear cress, representative proteome), UniER (human endoplasmic reticulum representative proteome), UniPM (human plasma membrane representative proteome) and UniGolgi (human Golgi representative proteome). To enforce a level of quality control, the queries were restricted to manually reviewed records and transmembrane proteins with manually asserted TRANSMEM annotation . Proteins were then sorted into multi-pass and single-pass groups according to whether they had more than one or exactly one TRANSMEM region respectively. TRANSMEM regions are validated by either experimental evidence  or according to a robust transmembrane consensus of the predictors TMHMM , Memsat , Phobius [21, 22] and the hydrophobic moment plot method of Eisenberg and co-workers . TMHs and flanking regions were oriented according to UniProt TOPO_DOM annotation according to the keyword “cytoplasmic”. If a “cytoplasmic” TOPO_DOM was found in the previous TOPO_DOM relative to the TRANSMEM region, then the sequence remained the same. If “cytoplasmic” was found in the next TOPO_DOM, relative to the TRANSMEM section, then the sequence was reversed. Proteins without the “cytoplasmic” keyword in their TOPO_DOM annotation were omitted from further analysis.
The TOPDB database  is a manually curated database composed of experimental records from the literature that allow determination of the protein topology. Experiments include fusion proteins, posttranslational modifications, protease experiments, immunolocalisation, chemical modifications as well as revertants, sequence motifs with known mandatory membrane-embedded topologies and tailoring mutants (Additional file 3: Table S1). Length cut-offs for the TMH were set with 16 as the shortest length and 38 as the longest.
The datasets described in the following subsections are used throughout this work.
TOPDB contained 4190 manually annotated transmembrane proteins at the time of download . CD-HIT  identified 3857 representative sequences using sequence clusters of >90% sequence identity. This choice of similarity threshold was chosen since CD-HIT ultimately underlies the clustering behind UniRef. Unlike the other datasets, which by definition contain reasonably typical TMHs, many of the transmembrane segments annotated in TOPDB are extremely short or long, and this would cause severe unrealistic hydrophobic mismatches. The short segments in particular could be the result of misannotation, TMHs broken into pieces due to kinks or segments that peripherally insert only into the interface of the membrane bilayer. To remove the atypical lengths, cut-offs were set with 16 as the lower cut-off and 38 as the upper cut-off after inspecting the length histogram. We found that, for the single-pass TMHs in TOPDB, 1215 out of 1544 are within the length limits (78.7%). Amongst the 17,141 multi-pass TMHs, we find 15,563 within our global length limits (from 2205 TOPDB records corresponding to 2281 UniProt entries). This removed 1578 very short TMHs and none of the long TMHs. Our cut-off selection is very similar to the one used by Baeza-Delgado et al. .
To get an idea of the taxonomical breakdown in the ExpAll dataset, the UniProt ID tags were extracted and mapped to UniProtKB. The combined dataset of multi-pass (single-pass) proteins was mapped to 1288 (1343) eukaryotic records, 404 (776) of which were human records, 926 (191) bacterial records, 46 (5) Archaea records and 14 (22) viral records.
This is a set of mostly human TMH-containing proteins or their close mammalian homologues. UniProtKB contains 5187 human protein records that are manually annotated with TRANSMEM regions (query = “annotation:(type:transmem) AND reviewed:yes AND organism:"Homo sapiens (Human) " AND proteome:up000005640”. To reduce sequence redundancy, these sequences were submitted to UniRef90 . To note, UniRef90 was chosen over UniRef50 to maintain a viable size of datasets for statistical analysis of occurrence of negatively charged residues, which are very rare in the vicinity of TMHs. There were 5015 UniRef90 clusters representing the 5187 sequences. A list of sequences representing those clusters was submitted back to UniProtKB, and 5014 representative entries were recovered. There is a small issue in that the list of representatives from UniRef includes non-canonical isoforms, whilst the batch retrieve query of UniProtKB only supports complete entries, i.e. canonical isoforms. This resulted in the loss of one record at this point due to two splice isoforms acting as representative identifiers. Of those 5014 records, 4714 were records from human entries, 197 were from mice, 94 from rats, 5 from bovines, 2 from chimps, 1 from Chinese hamsters, and 1 from pigs. Although the TMH length variations within the UniHuman dataset are much smaller than for ExpAll, we applied the same length cut-offs for the sake of comparability. Out of the 1709 single-pass cases, 1705 entered the final dataset. Of those, 1596 were from human records, 87 were from mouse, 19 were from rat, and 2 were from chimpanzee. The further loss of a record in the taxonomic query is again due to multiple splice isoform records being represented by a single UniProt record. Amongst the 12,390 multi-pass TMHs, 12,353 were included into UniHuman. The other, multi-pass record identifiers were mapped to 1789 UniProtKB entries. Of these, 1660 were human entries, 63 from rat, 61 from mouse, 4 from bovines and 1 from Chinese hamsters. This clustered human dataset was then queried for subcellular locations to make the UniER, UniGolgi and UniPM datasets (detailed below).
The clustered UniHuman dataset was queried using UniProtKB for endoplasmic reticulum subcellular location (locations:(location:"Endoplasmic reticulum [SL-0095]" evidence:manual)). This returned 487 protein entries, 457 of which belonged to human, 24 to mouse and 6 to rat. Of these records, 287 contained sufficient annotation for orientation determination. One hundred thirty-two were single-pass entries, of which 120 records were from humans, 11 from mouse, and 1 from rat. One hundred fifty-five were multi-pass entries containing 898 TMHs. One hundred forty-four were records from human, 8 were from mouse and 3 were from rat.
The clustered human dataset was queried using UniProtKB for Golgi subcellular location (locations:(location:"Golgi apparatus [SL-0132]" evidence:manual)). This returned 323 protein entries, 301 of which belonged to human, 19 to mice, 2 to rat and 1 to pig. Of these records, 269 contained sufficient annotation for orientation determination. Two hundred six were single-pass entries, of which 195 records were from human, 9 from mouse, and 1 from rat. Sixty-one were multi-pass entries containing 383 transmembrane regions. Fifty-four were records from human, 6 were from mouse and 1 was from rat.
The clustered human dataset was queried using UniProtKB for the cell membrane subcellular location (locations:(location:"Cell membrane [SL-0039]" evidence:manual)). This returned 1036 protein entries, 948 of which belonged to humans, 62 to mice, and 26 to rats. Of these records, 920 contained sufficient annotation for orientation determination. Four hundred ninety-three were single-pass entries, of which 451 records were from human, 37 from mouse, and 5 from rat. Four hundred twenty-seven were multi-pass entries containing 3079 transmembrane regions. Three hundred ninety-four were records from human, 17 were from mouse and 16 were from rat.
For the mouse-ear cress, a representative proteome dataset was acquired with the query annotation:proteomes:(reference:yes) AND reviewed:yes AND organism:"Arabidopsis thaliana (Mouse-ear cress) " AND proteome:up000006548. This returned 3174 records in UniProtKB. UniRef90 identified 3111 clusters. Of the representative sequences, 3110 were mapped back to UniProtKB. Of those, 3090 were from Arabidopsis thaliana, 2 from Hornwort, 1 from cucumber, 1 from tall dodder, 1 from soybean (Glycine max), 2 from Indian wild rice, 2 from rice, 2 from garden pea, 1 from potato, 4 from spinach, 1 from Thermosynechococcus elongatus (thermophilic cyanobacterium), 1 from wheat, and 2 from maize. Of those there were 1146 with suitable TOPO_DOM annotation for topological orientation determination. Of those records, 632 were identified as single-pass, all of which were from Arabidopsis thaliana. Five hundred seven protein records were from multi-pass records, which contained 3823 TMHs. Five hundred six of those records were from Arabidopsis thaliana, whilst 1 was from Thermosynechococcus elongatus.
For the Fungi dataset, the query “annotation:(type:transmem) taxonomy:"Fungi " AND reviewed:yes” was used. This returned 5628 records that were submitted to UniRef90. UniRef90 identified 4934 representative records, all of which were successfully mapped back to UniProtKB. Of those, 2070 had suitable annotation for orientation. A total of 1990 records belonged to Ascomycota including 1243 Saccharomycetales. 73 were Basidiomycota, and 6 were Apansporoblastina. Seven hundred twenty-nine records contained a single TMH region, 702 of which belonged to Ascomycota, 26 to Basidiomycota and 1 to Encephalitozoon cuniculi, a Microsporidium parasite. There were 8698 helices contained in 1338 records of multi-pass proteins. Of these records, 1285 were Ascomycota, 47 were Basidiomycota, and 5 were Apansporoblastina. One TMH from UniFungi was discounted from P32897 due to an unknown position.
This dataset was generated by querying UniProt with “reviewed:yes AND organism:”Escherichia coli (strain K12)””, which returned 941 hits. The hits were submitted to UniRef90, which returned 935 clusters. The representative IDs were then resubmitted to UniProtKB, all of which returned successfully. Nine hundred thirty-four were from bacteria, whilst one was from lambdalike viruses. Of the bacterial records, 862 were from various Escherichia species, of which 565 were from E. coli strain K12, 28 were from Salmonella choleraesuis, 25 were from Shigella and the rest all also fell under the Gammaproteobacteria class. This dataset contains 54 single-pass proteins and 3888 helices from 529 multi-pass proteins with sufficient annotation for topological determination.
The Bacilli dataset was constructed by querying UniProt for “reviewed:yes AND taxonomy:”Bacilli””. This returned 5044 records, which were submitted to UniRef90. There were 2591 clusters found in UniRef from these records. The representative IDs were successfully resubmitted to UniProtKB. Of these, 2031 were of the order Bacillales whilst 560 were also of the order Lactobacillales. This dataset contains 124 single-pass proteins and 822 helices from 140 multi-pass proteins.
The Archaea dataset was constructed by querying UniProt for “reviewed:yes AND taxonomy:”Archaea ””. This returned 1152 records, which were submitted to UniRef90. One thousand fifty-four clusters were found in UniRef from these records. The representative IDs were successfully resubmitted to UniProtKB. Nine hundred forty-six records belonged to the Euyarchaeota, 101 to Thermoprotei, 4 to Thaumarchaeota, and 3 to Korarchaeum cryptofilum. This dataset contains 48 single-pass proteins and 59 multi-pass proteins containing 327 helices from 59 proteins.
We are aware that proteome datasets are “moving targets” that have dramatically changed over the years and probably will continue to do so to some extent in the future . Yet, we think that currently available protein sequence sets are sufficiently good for our purposes, as we search for statistical properties in the TMH context only.
On the determination of flanking regions for TMHs and the TMH alignment
The determination of the boundary point at the sequence between the TMH in a membrane and the sequence immersed in the cytoplasm, extracellular space, vesicular lumen, etc. is not as trivial as it initially appears. There is a lot of dynamics in the TMH positioning, and the actual boundary point will be represented by various residues at different time points. Whilst the TMH core region detection from a sequence is trivial with modern software, the exact determination of TMH boundaries remains difficult, since it is unclear exactly how far in or out of the membrane a given helix extends . Previous studies have dealt with this issue in various ways [9, 13, 16, 85].
Here in this work, we explore two boundary definitions. First, we assign TMH boundary locations as described in the respective databases. These flanks are the ones that are reported in our TMH data files that are available at http://mendel.bii.a-star.edu.sg/SEQUENCES/NNI/. We studied flank lengths of ±5, ±10, and ±20 residues preceding and following the inside and outside TMH boundaries. In these cases, the flanks are aligned relative to the residue closest to the TMH.
In cases where the loops before and after the TMH are shorter than the predefined flank lengths, further precautions are necessary. In the multi-pass datasets particularly (Additional file 4: Figure S4, Additional file 3: Table S1), the flanks overlap with other membrane region flanks. We explore several variants. On the one hand, we work with data files where the flank residue stretches are equally truncated so that no overlap occurs. If the loop length was uneven, the central odd residue was not included into any flank. We find, surprisingly, that a large number of TMHs have no or just a super-short flank, a circumstance that should disturb any statistical analysis due to the absence of objects. Therefore, we also work with alternative datasets: (1) with flanks overlapping between consecutive TMHs (e.g. in Table 3B, yet this leads to some residues being counted more than one time) as well as (2) with subsets of the data where the flanks at both sides have a defined minimal length (50% or 100% of the required flanks; unfortunately, some of them become too small for analysis).
The problem of flanks overlapping also affects some single-pass and multi-pass TMH proteins with INTRAMEM regions as described in some UniProt entries. We do not include INTRAMEM regions in the datasets as TMHs, but sometimes the flanking regions of TMHs were truncated to avoid overlap with INTRAMEM flanking regions (Additional file 5: Table S2). The identifiers affected for single-pass TMH proteins are Q01628, P13164, Q01629, Q5JRA8, A2ANU3 (UniHuman), P13164, Q01629, A2ANU3 (UniPM) and Q5JRA8 (UniER).
The second form of boundary point definition for flank determination was achieved by gaplessly aligning all TMHs relative to their central residue at the position equal to half the length of the TMHs at either side. Though there is some length variation amongst TMHs; most of them are centred around a length of 20–22 residues. In this case, flanks are the sequence extensions beyond the standardised-length 21-residue TMHs. We define the inside flanking segments as the positions –20 to –10 and the outside flanking regions to be +10 to +20 from the central TMH residue (with the label “0”). Instead of emphasising some artificially selected boundary residue, this definition allows the average TMH boundary transition to become apparent.
Separating simple and complex single-pass helices
Single-pass helices from ExpAll and UniHuman datasets helices were split into two groups: simple and complex following a previously described classification [6, 7] to roughly distinguish simple hydrophobic anchors and TMHs with additional structural/functional roles. Simple and complex helices were determined using TMSOC . The complexity class is determined by calculating the hydrophobicity and sequence entropy. The resulting coordinates cluster with anchors being more hydrophobic and less complex, whilst more complex and more polar TMHs are associated with non-anchorage functions. In UniHuman there were 889 simple helices and 570 complex TMHs. In ExpAll there were 769 simple helices and 570 complex helices.
In this work, we have used normalisation techniques described in previous investigations as well as new approaches designed to more sensitively identify biases of rare residues. Baeza-Delgado and co-workers used LogOdds normalisation column-wise in TMH alignments. Critically, this is based on their definition of probability, which takes into account the total number of amino acids in the dataset as a denominator . Since aliphatic residues such as leucine and other highly abundant slightly polar residues dominate the denominator, the distribution of the rare acidic residues will be easily lost in the “background noise” of those highly abundant residues. Pogozheva and co-workers used two approaches, (1) the total accessible surface area (ASA total) and (2) the total number of charged residues (N total), as a denominator in their distribution normalisation .
Here, the denominator is the maximal number of all residues in any alignment column (i.e., the number of sequences in the alignment) and, to emphasise, this will make p i,r mostly dependent on the most abundant residue types. This type of normalisation reveals the most preferred residue types at given sequence positions.
The value a i is the total abundance of residues of just amino acid type i in a given alignment of TMH-containing segments (i.e., in the TMH together with its two adjoining flanks summed over all cases of TMHs in the given dataset). Peaks in q i,r as a function of r reveal the preferred positions of residues of type i. The difference in p i,r and q i,r normalisation is visualised in Additional file 6: Figure S3.
Hydrophobicity profiles were calculated using the Kyte and Doolittle hydrophobicity scale  and validated with the Eisenberg scale , the Hessa biological scale  and the White and Wimley whole residue scale  (Additional file 1: Figure S1). The hydrophobicity profile uses un-weighted windowing of the residue hydrophobicity scores from end to end of the TMD slice. Three residues were used as full window lengths, and partial windows were permitted.
Normalised net charge calculations
The inside/outside bias of negative residues was quantified by computing the independent Kruskal-Wallis (KW) and two-sample t test statistical method from the Python scipy.stats package v0.15 (https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.kruskal.html, https://docs.scipy.org/doc/scipy-0.15.0/reference/generated/scipy.stats.ttest_ind.html). This test answers the question of whether two means are actually different in the statistical sense. For the leucine residues, each TMH region was divided into two sections, representing the inner and outer leaflets (Table 4). For the hydrophobicity plot, three window values of hydrophobicity were taken for each TMH at each position. The statistical analyses were separately performed for single-pass and multi-pass transmembrane proteins. At each position, the two groups were compared using the KW test.
The larger the absolute Bahadur slope, the greater the difference between the two distributions.
The authors acknowledge the support by ARAP AGA A*STAR for JAB. The authors declare that none of the authors has any competing interests with regard to the conclusions in this article.
Availability of data and materials
Datasets and several programs (Python or Perl code) used can be downloaded from http://mendel.bii.a-star.edu.sg/SEQUENCES/NNI/ or from the authors by request.
The study was initiated and designed by JW, BE and FE. JAB carried out the overwhelming part of the computational work including data gathering, programming and result assessment. Statistical assessments were contributed by JAB, WCW, BE and FE. All authors contributed to writing the manuscript and approved the final version.
Ethics approval and consent to participate
Consent for publication
All authors agree with the publication of this article.
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
- Elofsson A, von Heijne G. Membrane protein structure: prediction versus reality. Annu Rev Biochem. 2007;76:125–40.View ArticlePubMedGoogle Scholar
- von Heijne G. Membrane-protein topology. Nat Rev Mol Cell Biol. 2006;7:909–18.View ArticleGoogle Scholar
- Cymer F, von Heijne G, White SH. Mechanisms of integral membrane protein insertion and folding. J Mol Biol. 2015;427:999–1022.View ArticlePubMedGoogle Scholar
- Hessa T, Sharma A, Mariappan M, Eshleman HD, Gutierrez E, Hegde RS. Protein targeting and degradation are coupled for elimination of mislocalized proteins. Nature. 2011;475:394–7.View ArticlePubMedPubMed CentralGoogle Scholar
- Wong WC, Maurer-Stroh S, Eisenhaber F. More than 1,001 problems with protein domain databases: transmembrane regions, signal peptides and the issue of sequence homology. PLoS Comput Biol. 2010;6:e1000867.View ArticlePubMedPubMed CentralGoogle Scholar
- Wong WC, Maurer-Stroh S, Eisenhaber F. Not all transmembrane helices are born equal: towards the extension of the sequence homology concept to membrane proteins. Biol Direct. 2011;6:57.View ArticlePubMedPubMed CentralGoogle Scholar
- Wong WC, Maurer-Stroh S, Schneider G, Eisenhaber F. Transmembrane helix: simple or complex. Nucleic Acids Res. 2012;40:W370–5.View ArticlePubMedPubMed CentralGoogle Scholar
- Ladokhin AS. Membrane protein folding & lipid interactions: theory & experiment. J Membr Biol. 2015;248:369–70.View ArticlePubMedPubMed CentralGoogle Scholar
- Sharpe HJ, Stevens TJ, Munro S. A comprehensive comparison of transmembrane domains reveals organelle-specific properties. Cell. 2010;142:158–69.View ArticlePubMedPubMed CentralGoogle Scholar
- von Heijne G. Net N-C charge imbalance may be important for signal sequence function in bacteria. J Mol Biol. 1986;192:287–90.View ArticleGoogle Scholar
- von Heijne G, Gavel Y. Topogenic signals in integral membrane proteins. Eur J Biochem. 1988;174:671–8.View ArticleGoogle Scholar
- von Heijne G. Control of topology and mode of assembly of a polytopic membrane protein by positively charged residues. Nature. 1989;341:456–8.View ArticleGoogle Scholar
- Baeza-Delgado C, Marti-Renom MA, Mingarro I. Structure-based statistical analysis of transmembrane helices. Eur Biophys J. 2013;42:199–207.View ArticlePubMedGoogle Scholar
- Granseth E, von Heijne G, Elofsson A. A study of the membrane-water interface region of membrane proteins. J Mol Biol. 2005;346:377–85.View ArticlePubMedGoogle Scholar
- Ojemalm K, Botelho SC, Studle C, von Heijne G. Quantitative analysis of SecYEG-mediated insertion of transmembrane alpha-helices into the bacterial inner membrane. J Mol Biol. 2013;425:2813–22.View ArticlePubMedGoogle Scholar
- Pogozheva ID, Tristram-Nagle S, Mosberg HI, Lomize AL. Structural adaptations of proteins to different biological membranes. Biochim Biophys Acta. 2013;1828:2592–608.View ArticlePubMedPubMed CentralGoogle Scholar
- Beuming T, Weinstein H. A knowledge-based scale for the analysis and prediction of buried and exposed faces of transmembrane domain proteins. Bioinformatics. 2004;20:1822–35.View ArticlePubMedGoogle Scholar
- Zhao G, London E. An amino acid "transmembrane tendency" scale that approaches the theoretical limit to accuracy for prediction of transmembrane helices: relationship to biological hydrophobicity. Protein Sci. 2006;15:1987–2001.View ArticlePubMedPubMed CentralGoogle Scholar
- Cserzo M, Eisenhaber F, Eisenhaber B, Simon I. On filtering false positive transmembrane protein predictions. Protein Eng. 2002;15:745–52.View ArticlePubMedGoogle Scholar
- Cserzo M, Eisenhaber F, Eisenhaber B, Simon I. TM or not TM: transmembrane protein prediction with low false positive rate using DAS-TMfilter. Bioinformatics. 2004;20:136–7.View ArticlePubMedGoogle Scholar
- Kall L, Krogh A, Sonnhammer EL. A combined transmembrane topology and signal peptide prediction method. J Mol Biol. 2004;338:1027–36.View ArticlePubMedGoogle Scholar
- Kall L, Krogh A, Sonnhammer EL. Advantages of combined transmembrane topology and signal peptide prediction—the Phobius web server. Nucleic Acids Res. 2007;35:W429–32.View ArticlePubMedPubMed CentralGoogle Scholar
- Krogh A, Larsson B, von Heijne G, Sonnhammer EL. Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. J Mol Biol. 2001;305:567–80.View ArticlePubMedGoogle Scholar
- Sonnhammer EL, von Heijne G, Krogh A. A hidden Markov model for predicting transmembrane helices in protein sequences. Proc Int Conf Intell Syst Mol Biol. 1998;6:175–82.PubMedGoogle Scholar
- Kall L, Sonnhammer EL. Reliability of transmembrane predictions in whole-genome data. FEBS Lett. 2002;532:415–8.View ArticlePubMedGoogle Scholar
- Eisenhaber B, Kuchibhatla D, Sherman W, Sirota FL, Berezovsky IN, Wong WC, Eisenhaber F. The recipe for protein sequence-based function prediction and its implementation in the ANNOTATOR software environment. Methods Mol Biol. 2016;1415:477–506.View ArticlePubMedGoogle Scholar
- Eisenhaber F. A decade after the first full human genome sequencing: when will we understand our own genome? J Bioinform Comput Biol. 2012;10:1271001.View ArticlePubMedGoogle Scholar
- Sherman WA, Kuchibhatla DB, Limviphuvadh V, Maurer-Stroh S, Eisenhaber B, Eisenhaber F. HPMV: human protein mutation viewer — relating sequence mutations to protein sequence architecture and function changes. J Bioinform Comput Biol. 2015;13:1550028.View ArticlePubMedGoogle Scholar
- Sipos L, von Heijne G. Predicting the topology of eukaryotic membrane proteins. Eur J Biochem. 1993;213:1333–40.View ArticlePubMedGoogle Scholar
- Gavel Y, Steppuhn J, Herrmann R, von Heijne G. The 'positive-inside rule' applies to thylakoid membrane proteins. FEBS Lett. 1991;282:41–6.View ArticlePubMedGoogle Scholar
- Nilsson J, Persson B, von Heijne G. Comparative analysis of amino acid distributions in integral membrane proteins from 107 genomes. Proteins. 2005;60:606–16.View ArticlePubMedGoogle Scholar
- Wallin E, von Heijne G. Genome-wide analysis of integral membrane proteins from eubacterial, archaean, and eukaryotic organisms. Protein Sci. 1998;7:1029–38.View ArticlePubMedPubMed CentralGoogle Scholar
- Beltzer JP, Fiedler K, Fuhrer C, Geffen I, Handschin C, Wessels HP, Spiess M. Charged residues are major determinants of the transmembrane orientation of a signal-anchor sequence. J Biol Chem. 1991;266:973–8.PubMedGoogle Scholar
- Kida Y, Morimoto F, Mihara K, Sakaguchi M. Function of positive charges following signal-anchor sequences during translocation of the N-terminal domain. J Biol Chem. 2006;281:1152–8.View ArticlePubMedGoogle Scholar
- Nilsson I, von Heijne G. Fine-tuning the topology of a polytopic membrane protein: role of positively and negatively charged amino acids. Cell. 1990;62:1135–41.View ArticlePubMedGoogle Scholar
- Hessa T, Kim H, Bihlmaier K, Lundin C, Boekel J, Andersson H, Nilsson I, White SH, von Heijne G. Recognition of transmembrane helices by the endoplasmic reticulum translocon. Nature. 2005;433:377–81.View ArticlePubMedGoogle Scholar
- Bogdanov M, Dowhan W, Vitrac H. Lipids and topological rules governing membrane protein assembly. Biochim Biophys Acta. 2014;1843:1475–88.View ArticlePubMedGoogle Scholar
- Ulmschneider MB, Sansom MS. Amino acid distributions in integral membrane protein structures. Biochim Biophys Acta. 2001;1512:1–14.View ArticlePubMedGoogle Scholar
- Andersson H, von Heijne G. Position-specific Asp-Lys pairing can affect signal sequence function and membrane protein topology. J Biol Chem. 1993;268:21389–93.PubMedGoogle Scholar
- Kim H, Paul S, Gennity J, Jennity J, Inouye M. Reversible topology of a bifunctional transmembrane protein depends upon the charge balance around its transmembrane domain. Mol Microbiol. 1994;11:819–31.View ArticlePubMedGoogle Scholar
- Delgado-Partin VM, Dalbey RE. The proton motive force, acting on acidic residues, promotes translocation of amino-terminal domains of membrane proteins when the hydrophobicity of the translocation signal is low. J Biol Chem. 1998;273:9927–34.View ArticlePubMedGoogle Scholar
- Ismail N, Hedman R, Schiller N, von Heijne G. A biphasic pulling force acts on transmembrane helices during translocon-mediated membrane integration. Nat Struct Mol Biol. 2012;19:1018–22.View ArticlePubMedPubMed CentralGoogle Scholar
- Ismail N, Hedman R, Linden M, von Heijne G. Charge-driven dynamics of nascent-chain movement through the SecYEG translocon. Nat Struct Mol Biol. 2015;22:145–9.View ArticlePubMedPubMed CentralGoogle Scholar
- Hartmann E, Rapoport TA, Lodish HF. Predicting the orientation of eukaryotic membrane-spanning proteins. Proc Natl Acad Sci U S A. 1989;86:5786–90.View ArticlePubMedPubMed CentralGoogle Scholar
- Andersson H, Bakker E, von Heijne G. Different positively charged amino acids have similar effects on the topology of a polytopic transmembrane protein in Escherichia coli. J Biol Chem. 1992;267:1491–5.PubMedGoogle Scholar
- Harley CA, Holt JA, Turner R, Tipper DJ. Transmembrane protein insertion orientation in yeast depends on the charge difference across transmembrane segments, their total hydrophobicity, and its distribution. J Biol Chem. 1998;273:24963–71.View ArticlePubMedGoogle Scholar
- Sato M, Hresko R, Mueckler M. Testing the charge difference hypothesis for the assembly of a eucaryotic multispanning membrane protein. J Biol Chem. 1998;273:25203–8.View ArticlePubMedGoogle Scholar
- Jayasinghe S, Hristova K, White SH. MPtopo: a database of membrane protein topology. Protein Sci. 2001;10:455–8.View ArticlePubMedPubMed CentralGoogle Scholar
- The UniProt Consortium. UniProt: a hub for protein information. Nucleic Acids Res. 2015;43:D204–12.View ArticleGoogle Scholar
- Dobson L, Lango T, Remenyi I, Tusnady GE. Expediting topology data gathering for the TOPDB database. Nucleic Acids Res. 2015;43:D283–9.View ArticlePubMedGoogle Scholar
- Nakashima H, Nishikawa K. The amino acid composition is different between the cytoplasmic and extracellular sides in membrane proteins. FEBS Lett. 1992;303:141–6.View ArticlePubMedGoogle Scholar
- Kyte J, Doolittle RF. A simple method for displaying the hydropathic character of a protein. J Mol Biol. 1982;157:105–32.View ArticlePubMedGoogle Scholar
- White SH, Wimley WC. Membrane protein folding and stability: physical principles. Annu Rev Biophys Biomol Struct. 1999;28:319–65.View ArticlePubMedGoogle Scholar
- Eisenberg D, Schwarz E, Komaromy M, Wall R. Analysis of membrane and surface protein sequences with the hydrophobic moment plot. J Mol Biol. 1984;179:125–42.View ArticlePubMedGoogle Scholar
- Bahadur RR. Rates of convergence of estimates and test statistics. Ann Math Stat. 1967;38:303–24.View ArticleGoogle Scholar
- Bahadur RR. Some limit theorems in statistics. Philadelphia: SIAM; 1971.View ArticleGoogle Scholar
- Sunyaev SR, Eisenhaber F, Argos P, Kuznetsov EN, Tumanyan VG. Are knowledge-based potentials derived from protein structure sets discriminative with respect to amino acid types? Proteins. 1998;31:225–46.View ArticlePubMedGoogle Scholar
- Zachowski A. Phospholipids in animal eukaryotic membranes: transverse asymmetry and movement. Biochem J. 1993;294(Pt 1):1–14.View ArticlePubMedPubMed CentralGoogle Scholar
- Yeung T, Gilbert GE, Shi J, Silvius J, Kapus A, Grinstein S. Membrane phosphatidylserine regulates surface charge and protein localization. Science. 2008;319:210–3.View ArticlePubMedGoogle Scholar
- Meindl-Beinker NM, Lundin C, Nilsson I, White SH, von Heijne G. Asn- and Asp-mediated interactions between transmembrane helices during translocon-mediated membrane protein assembly. EMBO Rep. 2006;7:1111–6.View ArticlePubMedPubMed CentralGoogle Scholar
- Oger PM, Cario A. Adaptation of the membrane in Archaea. Biophys Chem. 2013;183:42–56.View ArticlePubMedGoogle Scholar
- Rutz C, Rosenthal W, Schulein R. A single negatively charged residue affects the orientation of a membrane protein in the inner membrane of Escherichia coli only when it is located adjacent to a transmembrane domain. J Biol Chem. 1999;274:33757–63.View ArticlePubMedGoogle Scholar
- Hedin LE, Ojemalm K, Bernsel A, Hennerdal A, Illergard K, Enquist K, Kauko A, Cristobal S, von Heijne G, Lerch-Bader M, et al. Membrane insertion of marginally hydrophobic transmembrane helices depends on sequence context. J Mol Biol. 2010;396:221–9.View ArticlePubMedGoogle Scholar
- Hessa T, Meindl-Beinker NM, Bernsel A, Kim H, Sato Y, Lerch-Bader M, Nilsson I, White SH, von Heijne G. Molecular code for transmembrane-helix recognition by the Sec61 translocon. Nature. 2007;450:1026–30.View ArticlePubMedGoogle Scholar
- Ojemalm K, Halling KK, Nilsson I, von Heijne G. Orientational preferences of neighboring helices can drive ER insertion of a marginally hydrophobic transmembrane helix. Mol Cell. 2012;45:529–40.View ArticlePubMedPubMed CentralGoogle Scholar
- van Meer G, Voelker DR, Feigenson GW. Membrane lipids: where they are and how they behave. Nat Rev Mol Cell Biol. 2008;9:112–24.View ArticlePubMedPubMed CentralGoogle Scholar
- Daleke DL. Phospholipid flippases. J Biol Chem. 2007;282:821–5.View ArticlePubMedGoogle Scholar
- Devaux PF, Morris R. Transmembrane asymmetry and lateral domains in biological membranes. Traffic. 2004;5:241–6.View ArticlePubMedGoogle Scholar
- Bell RM, Ballas LM, Coleman RA. Lipid topogenesis. J Lipid Res. 1981;22:391–403.PubMedGoogle Scholar
- Futerman AH, Riezman H. The ins and outs of sphingolipid synthesis. Trends Cell Biol. 2005;15:312–8.View ArticlePubMedGoogle Scholar
- Li Z, Hailemariam TK, Zhou H, Li Y, Duckworth DC, Peake DA, Zhang Y, Kuo MS, Cao G, Jiang XC. Inhibition of sphingomyelin synthase (SMS) affects intracellular sphingomyelin accumulation and plasma membrane lipid organization. Biochim Biophys Acta. 2007;1771:1186–94.View ArticlePubMedPubMed CentralGoogle Scholar
- Tafesse FG, Huitema K, Hermansson M, van der Poel S, van den Dikkenberg J, Uphoff A, Somerharju P, Holthuis JC. Both sphingomyelin synthases SMS1 and SMS2 are required for sphingomyelin homeostasis and growth in human HeLa cells. J Biol Chem. 2007;282:17537–47.View ArticlePubMedGoogle Scholar
- Di Paolo G, De Camilli P. Phosphoinositides in cell regulation and membrane dynamics. Nature. 2006;443:651–7.View ArticlePubMedGoogle Scholar
- Qin Y, Dittmer PJ, Park JG, Jansen KB, Palmer AE. Measuring steady-state and dynamic endoplasmic reticulum and Golgi Zn2+ with genetically encoded sensors. Proc Natl Acad Sci U S A. 2011;108:7351–6.View ArticlePubMedPubMed CentralGoogle Scholar
- Worley III JF, McIntyre MS, Spencer B, Mertz RJ, Roe MW, Dukes ID. Endoplasmic reticulum calcium store regulates membrane potential in mouse islet beta-cells. J Biol Chem. 1994;269:14359–62.PubMedGoogle Scholar
- Schapiro FB, Grinstein S. Determinants of the pH of the Golgi complex. J Biol Chem. 2000;275:21025–32.View ArticlePubMedGoogle Scholar
- Killian JA, von Heijne G. How proteins adapt to a membrane-water interface. Trends Biochem Sci. 2000;25:429–34.View ArticlePubMedGoogle Scholar
- Braun P, von Heijne G. The aromatic residues Trp and Phe have different effects on the positioning of a transmembrane helix in the microsomal membrane. Biochemistry. 1999;38:9778–82.View ArticlePubMedGoogle Scholar
- Kuznetsov V, Lee HK, Maurer-Stroh S, Molnar MJ, Pongor S, Eisenhaber B, Eisenhaber F. How bioinformatics influences health informatics: usage of biomolecular sequences, expression profiles and automated microscopic image analyses for clinical needs and public health. Health Inf Sci Syst. 2013;1:2.View ArticlePubMedPubMed CentralGoogle Scholar
- Jones DT. Improving the accuracy of transmembrane protein topology prediction using evolutionary information. Bioinformatics. 2007;23:538–44.View ArticlePubMedGoogle Scholar
- Huang Y, Niu B, Gao Y, Fu L, Li W. CD-HIT Suite: a web server for clustering and comparing biological sequences. Bioinformatics. 2010;26:680–2.View ArticlePubMedPubMed CentralGoogle Scholar
- Suzek BE, Wang Y, Huang H, McGarvey PB, Wu CH. UniRef clusters: a comprehensive and scalable alternative for improving sequence similarity searches. Bioinformatics. 2015;31:926–32.View ArticlePubMedGoogle Scholar
- Sirota FL, Batagov A, Schneider G, Eisenhaber B, Eisenhaber F, Maurer-Stroh S. Beware of moving targets: reference proteome content fluctuates substantially over the years. J Bioinform Comput Biol. 2012;10:1250020.View ArticlePubMedGoogle Scholar
- Ojemalm K, Watson HR, Roboti P, Cross BC, Warwicker J, von Heijne G, High S. Positional editing of transmembrane domains during ion channel assembly. J Cell Sci. 2013;126:464–72.View ArticlePubMedPubMed CentralGoogle Scholar
- White SH, von Heijne G. How translocons select transmembrane helices. Annu Rev Biophys. 2008;37:23–42.View ArticlePubMedGoogle Scholar
- Nozaki Y, Tanford C. The solubility of amino acids and two glycine peptides in aqueous ethanol and dioxane solutions. Establishment of a hydrophobicity scale. J Biol Chem. 1971;246:2211–7.PubMedGoogle Scholar
- Wolfenden R, Andersson L, Cullis PM, Southgate CC. Affinities of amino acid side chains for solvent water. Biochemistry. 1981;20:849–55.View ArticlePubMedGoogle Scholar
- Chothia C. The nature of the accessible and buried surfaces in proteins. J Mol Biol. 1976;105:1–12.View ArticlePubMedGoogle Scholar
- Janin J. Surface and inside volumes in globular proteins. Nature. 1979;277:491–2.View ArticlePubMedGoogle Scholar
- von Heijne G, Blomberg C. Trans-membrane translocation of proteins. The direct transfer model. Eur J Biochem. 1979;97:175–81.View ArticleGoogle Scholar