Alternative cleavage and polyadenylation in spermatogenesis connects chromatin regulation with post-transcriptional control
© Li et al. 2016
Received: 29 October 2015
Accepted: 12 January 2016
Published: 22 January 2016
Most mammalian genes display alternative cleavage and polyadenylation (APA). Previous studies have indicated preferential expression of APA isoforms with short 3’ untranslated regions (3’UTRs) in testes.
By deep sequencing of the 3’ end region of poly(A) + transcripts, we report widespread shortening of 3’UTR through APA during the first wave of spermatogenesis in mouse, with 3’UTR size being the shortest in spermatids. Using genes without APA as a control, we show that shortening of 3’UTR eliminates destabilizing elements, such as U-rich elements and transposable elements, which appear highly potent during spermatogenesis. We additionally found widespread regulation of APA events in introns and exons that can affect the coding sequence of transcripts and global activation of antisense transcripts upstream of the transcription start site, suggesting modulation of splicing and initiation of transcription during spermatogenesis. Importantly, genes that display significant 3’UTR shortening tend to have functions critical for further sperm maturation, and testis-specific genes display greater 3’UTR shortening than ubiquitously expressed ones, indicating functional relevance of APA to spermatogenesis. Interestingly, genes with shortened 3’UTRs tend to have higher RNA polymerase II and H3K4me3 levels in spermatids as compared to spermatocytes, features previously known to be associated with open chromatin state.
Our data suggest that open chromatin may create a favorable cis environment for 3’ end processing, leading to global shortening of 3’UTR during spermatogenesis. mRNAs with shortened 3’UTRs are relatively stable thanks to evasion of powerful mRNA degradation mechanisms acting on 3’UTR elements. Stable mRNAs generated in spermatids may be important for protein production at later stages of sperm maturation, when transcription is globally halted.
Pre-mRNA cleavage/polyadenylation (C/P) is a 3’ end processing mechanism employed in eukaryotes for expression of almost all protein-coding transcripts and long non-coding RNAs by RNA polymerase II (RNAPII) [1, 2]. The site for C/P, commonly known as the polyA site or pA, is defined by both upstream and downstream cis elements [3, 4]. In mammals, the cis elements include the polyadenylation signal (PAS), such as AAUAAA, AUUAAA, or close variants, located within ~40 nucleotides (nt) from the pA; UGUA elements, typically located upstream of the PAS; U-rich elements located around the PAS; and downstream U-rich and GU-rich elements, generally located within ~100 nt from the pA. The C/P machinery in mammalian cells is composed of over 20 core factors [5–7]. Some form subcomplexes, including cleavage and polyadenylation specificity factor (CPSF), containing CPSF-160, CPSF-100, CPSF-73, CPSF-30, Fip1 and WDR33; cleavage stimulation factor (CstF), containing CstF-77, CstF-64/CstF-64τ and CstF-50; cleavage factor I (CFI), containing CFI-68 or CFI-59 and CFI-25; and cleavage factor II (CFII), containing Pcf11 and Clp1. Single proteins involved in C/P include symplekin, poly(A) polymerases (PAPs), nuclear poly(A) binding protein (PABPN1) and RNAPII. In addition, RBBP6, PP1α and PP1β are present in the C/P complex and homologous to yeast C/P factors [6, 8].
Most mammalian genes express alternative cleavage and polyadenylation (APA) isoforms [9–12]. APA in the 3’ untranslated region (3’UTR), called 3’UTR-APA, leads to isoforms with different 3’UTR lengths. Because the 3’UTR plays important roles in mRNA metabolism, 3’UTR-APA can impact the post-transcriptional control of gene expression [13–15]. Studies have shown that the APA pattern of genes is tissue-specific [10, 16–18]. For example, long and short 3’UTR APA isoforms tend to be expressed in the brain and testis, respectively [10, 16, 19, 20]. In addition, APA is regulated in cell proliferation, differentiation and development [11, 20–22], as well as in response to extracellular signals . A growing number of mechanisms have been found to modulate APA, including core C/P factor activities, binding of RNA-binding proteins (RBPs) near the pA, splicing and transcriptional controls, etc. [13, 14, 24].
A sizable fraction of mammalian genes display APA in introns and exons, which affects coding sequences (CDS) [12, 25]. These APA events are called CDS-APA and are largely controlled by the dynamic competition between splicing and C/P [25, 26]. However, transcripts utilizing pAs located near promoters are specifically controlled by U1 snRNP , which has been implicated in controlling transcriptional directionality . On the other hand, a large fraction of promoters lead to transcription of short antisense transcripts using pAs near the transcription start site (TSS). These transcripts are commonly known as promoter-upstream transcripts (PROMPTs) or upstream antisense RNAs (uaRNAs), and are subject to the regulation by the exosome/PABPN1 [26, 29].
Spermatogenesis involves substantial regulation of gene expression at multiple levels. First, there is pervasive transcription in spermatocytes and spermatids [30, 31], which is attributable to permissive open chromatin state, as indicated by histone marks such as H3K4 methylation [30, 32]. Nuclear DNA condenses during differentiation into elongating spermatids, leading to halting of transcriptional activity. As such, protein expression in the later phase of spermiogenesis relies on the mRNAs transcribed earlier . In addition, RNA processing is highly regulated during spermatogenesis, as indicated by rampant alternative splicing changes [31, 34–37] and alternative polyadenylation changes [16, 38–40].
One of the major players in gene expression regulation during spermatogenesis is Piwi-interacting small RNA (piRNA) . Two waves of piRNAs are expressed during the process, i.e., pre-pachytene piRNAs and pachytene piRNAs. While the former has been implicated in inhibition of transposable elements (TEs) through epigenetic mechanisms [42, 43], the latter has recently been shown to cause degradation of mRNA through the function of Miwi [44–47]. Inhibition of TE expression is important for germ cells, because uncontrolled expression can lead to genome instability .
We and others have previously found that pAs usage in testis is rather unique as compared to other tissues [10, 16, 39]. In general, transcripts with short 3’UTRs tend to be expressed in testis. However, how this APA pattern fits into the timeline of spermatogenesis is unclear, nor are the mechanisms and consequences. In addition, how CDS-APA events, which account for over 40 % of all APA events in the mouse genome , are regulated in spermatogenesis has never been studied. Further, we recently found that many bidirectional promoters express uaRNAs with pAs located within 2 kb from the promoter . How these uaRNAs are regulated in spermatogenesis is completely unknown. Here we examine APA isoform expression in the first wave of spermatogenesis to identify the cell type that has the shortest 3’UTRs and reveal potential mechanisms behind the global 3’UTR shortening. We also analyze CDS-APA and uaRNA expression. We present evidence that 3’UTR-APA regulation is important for RNA metabolism in sperm maturation.
Significant 3’UTR shortening in spermatogenesis
To examine how APA is regulated during spermatogenesis, we extracted RNA from mouse testes at different time points after birth, namely 1 week (w), 2w, 3w, 4w, 5w and 6w. These time points correspond to key phases in the first wave of spermatogenesis (indicated in Fig. 1a), during which cells are largely synchronized [49, 50]. RNA was subjected to 3’ region extraction and deep sequencing (3’READS), a deep sequencing method we recently developed to examine the expression of poly(A) + RNAs and identify their pAs .
We first compared the relative expression of the top two most abundant 3’UTR-APA isoforms of each gene at different time points. Based on the location of their pAs relative to the coding region, they were named proximal pA (Prx-pA) isoform and distal pA (Dis-pA) isoform, respectively (illustrated in Fig. 1b). Using relative expression (RE) values between Prx-pA and Dis-pA isoforms (Fig. 1b), we found that the relative expression of Dis-pA isoforms to Prx-pA isoforms were significantly lower in 4–6 week samples than in 1–3 week samples (Fig. 1c). Consistently, 1–3 week samples could be separated from 4–6 week samples using cluster analysis based on RE values (Fig. 1c).
We next applied significant analysis of alternative polyadenylation (SAAP), our recently developed method to statistically identify regulated APA events using a bootstrapping approach , and global analysis of alternative polyadenylation (GAAP), a method for comparing the extent of APA regulation between samples using a read sampling method . With sequencing depth controlled across all samples (1.5 M) and false discovery rate (FDR) set at 5 %, we found that 3’UTRs began to shorten between 3w and 2w, and significant shortening took place between 4w and 3w as well as between 5w and 4w, as indicated by the ratio of number of genes with shortened 3’UTRs (Sh) to that with lengthened 3’UTRs (Le) (Fig. 1d, right). While 4w vs. 3w and 5w vs. 4w were similar in the extent of regulation (log2(Sh#/Le#) = 4.4 and 4.3, respectively), the former involved much more APA events than the latter (by ~4-fold, Fig. 1d, left). To examine 3’UTR length changes more directly, we calculated the 3’UTR size for each gene in each sample using weighted mean of all 3’UTR-APA isoforms, with weight being the read number in the sample. In line with other analysis results, the median 3’UTR size decreased progressively from 949 nt at 1w to 431 nt at 6w (Fig. 1e). An example gene Eif4h is shown in Fig. 1f, which had two detected pA isoforms and displayed substantial 3’UTR shortening in spermatogenesis (weighted mean of 3’UTR size = 1,470 nt and 239 nt at 1w and 6w, respectively). Consistent with the global trend, the period between 3w and 4w involved a switch-like change of isoform expression (Fig. 1f). This is also supported by the relative expression difference (RED) value, which is the difference between RE values from two adjacent time points (Fig. 1f).
3’UTR shortening is coupled with upregulation of gene expression that is important for sperm maturation
Gene ontology terms enriched for genes with significant 3’UTR shortening in spermatogenesis
Gene ontology terms
calcium ion import
ethanolamine-containing compound metabolic process
protein K48-linked deubiquitination
microtubule associated complex
3’UTR shortening correlates with high RNA polymerase II signals and open chromatin state
We next examined cis elements near the regulated pAs. Because proximal and distal pAs are surrounded by distinct cis elements [9, 52], we analyzed proximal and distal pAs separately (Additional file 1: Figure S2A). As such, upregulated proximal pAs of genes with shortened 3’UTR were compared with proximal pAs of other genes, and so were downregulated distal pAs. Overall, very few 4-mers were found to be significantly enriched in the −40 to +100 nt region around the regulated pAs (Additional file 1: Figure S2B). The top ones were UUGU and UUUU in the −40 to −1 nt region and the +1 to +100 nt region of distal pAs, respectively (Additional file 1: Figure S2B). More significant 4-mers were found in the −100 to −41 nt regions of proximal pAs, such as CGAC, AAGA and CACC, and of distal pAs, such as UAUU, UUUU and AUUU (Additional file 1: Figure S2B). However, their significance of enrichment is substantially lower than what we previously observed with APA regulation by C/P factors , and they appear to be related to mRNA stability regulation (see below). Thus, while it remains possible that some cis elements near the pA may regulate pA usage through binding to certain RBPs, this result does not support a global, cis element-based APA mechanism in spermatogenesis.
Prompted by the correlation between upregulation of gene expression and 3’UTR shortening (Fig. 3), we next asked whether APA changes in spermatogenesis were related to transcriptional regulation, as we previously observed across different tissue types and cell conditions . To this end, we analyzed a chromatin immunoprecipitation and deep sequencing (ChIP-seq) data set for RNA polymerase II (RNAPII) , which included data for spermatids and pachytene stage spermatocytes (Additional file 1: Figure S3A). We found that genes with shortened 3’UTRs tended to have significantly higher RNAPII signals around the transcription start site (TSS), in the gene body and around the last pA, as compared to genes without 3’UTR-APA changes (P = 2.6 × 10−9, 2.0 × 10−12 and 5.7 × 10−5, respectively, K–S test, Fig. 4c). This result indicates that 3’UTR shortening is coupled with transcriptional upregulation in spermatogenesis.
We next asked whether chromatin structure, which is substantially remodeled in spermatogenesis and has been implicated in regulation of RNAPII activities , was related to 3’UTR-APA changes. To this end, we analyzed a ChIP-seq data set for H3K4 tri-methylation (H3K4me3) levels in spermatids and spermatocytes . As expected, H3K4me3 levels were high around the TSS and downstream of the last pA (Additional file 1: Figure S3B). Importantly, the H3K4me3 level was significantly higher in spermatids vs. spermatocytes for genes with significant 3’UTR shortening than other genes (Fig. 4d) around the TSS, in the gene body and around the last pA (P = 7.3 × 10−12, 5.8 × 10−63 and 4.7 × 10−8, respectively, K–S test), indicating that APA regulation correlates with the H3K4me3 level. Since high H3K4me3 levels represent open chromatin state, this result suggests that chromatin structure change may lead to more efficient 3’ end processing, resulting in preferential usage of proximal pAs in 3’UTRs.
3’UTR shortening eliminates destabilizing elements that are highly potent in spermatogenesis
To address if 3’UTR cis elements can regulate mRNA abundance in spermatogenesis, we selected genes with a single 3’UTR, or sUTR. As such, their expression could not be affected by APA. Interestingly, U-rich and UG-rich elements, such as UUUU, GUUU and UUGU, were significantly enriched in the sUTRs of downregulated genes (4w vs. 2w, Fig. 5d), indicating that these elements correlate with transcript abundance changes. Since these elements have previously been shown to play roles in mRNA stability [55, 56], it is likely that there exist potent mechanisms during spermatogenesis that degrade mRNAs containing these elements.
We next used the significance score (SS, derived from P value from the Fisher’s exact test, see Methods for detail) for each 4-mer to indicate its significance of enrichment in one set of sequences vs. another. We found that the SS values of 4-mers for sUTR regulation were positively correlated with those derived from cUTR analysis (r = 0.65, Pearson correlation, Fig. 5e), i.e., those associated with downregulated sUTR genes were also enriched in cUTRs of lengthened 3’UTRs, and those with upregulated sUTR genes were also enriched in cUTRs of shortened 3’UTRs (top 4-mers were highlighted in red in Fig. 5e). By contrast, a negative correlation was observed between the SS values of 4-mers associated with sUTR gene expression and those derived from aUTR analysis (r = −0.49, Pearson correlation, Fig. 5f). Overall, U-rich UG-rich and UA-rich 4-mers were highly enriched for sUTRs of downregulated genes, cUTRs of lengthened genes and aUTRs of shortened genes (Fig. 5e and f). It is also noteworthy that analyses using 6-mers gave very similar results (Additional file 1: Figure S4). Taken together, these results indicate that cis elements in 3’UTRs play a significant role in regulating mRNA abundance in spermatogenesis, likely through control of mRNA stability, and hence contribute to the APA profiles. Conversely, our data also suggest that shortening of 3’UTR can significantly remove destabilizing elements, impacting gene expression.
3’UTR shortening substantially removes transposable elements in the transcriptome
To examine how 3’UTR shortening impacts TE-mediated mRNA degradation, we divided genes into four groups based on both 3’UTR regulation between 4w and 2w and TE location in the 3’UTR (Fig. 6d, top): 1) genes with 3’UTRs shortened and with TEs in aUTR (TEs can be eliminated by 3’UTR shortening); 2) genes with 3’UTRs shortened and with TEs in cUTR (TEs cannot be eliminated despite 3’UTR shortening); 3) genes with unchanged 3’UTR and with TEs in aUTR (no APA regulation and TEs cannot be eliminated); 4) genes with a single 3’UTR which contains TEs (no APA and TEs cannot be eliminated). Both 3’READS data and RNA-seq data showed that genes with shortened 3’UTRs and with TEs in aUTR (group 1) tended to have significantly higher expression levels than genes in other groups between 4w and 2w (P <1 × 10−3, K–S test, Fig. 6d, bottom). This result suggests that elimination of TEs in 3’UTRs by 3’UTR shortening leads to higher transcript abundance.
To further address whether the TE-mediated gene regulation is due to mRNA degradation by the piRNA/Miwi pathway, we analyzed RNA-seq data from Miwi−/− and Miwi−/+ mice . Focusing on the four groups of genes described above, we found that group 1 genes were significantly less activated in Miwi−/− vs. Miwi−/+ than the other three groups of genes (P <0.05, K–S test, Fig. 6e), indicating that downregulation of gene groups 2–4 in wild type mice were mediated by Miwi. This result further supports the notion that 3’UTR shortening helps avoid TE/piRNA/Miwi-mediated mRNA degradation in spermatogenesis. It is also notable that TE regulation was independent of the U-rich elements described above, because U-rich elements could still be detected after excluding TE sequences in analysis (Additional file 1: Figure S5A) and TE-mediated regulation identified by the Miwi−/− vs. Miwi−/+ comparison did not involve U-rich elements (Additional file 1: Figure S5B).
Widespread regulation of C/P events in introns
Gene ontology terms associated with genes with regulated CDS-APA in spermatogenesis
−log10(P), downregulated CDS-APA
−log10(P), upregulated CDS-APA
Gene ontology terms
3w vs. 2w
4w vs. 3w
3w vs. 2w
4w vs. 3w
cellular process involved in reproduction in multicellular organism
intracellular non-membrane-bounded organelle
Compared to genes with unchanged CDS-APA, genes with downregulated CDS-APA tended to be significantly upregulated (P = 2 × 10−155 and 2 × 10−21 for 3w vs. 2w and 4w vs. 3w, respectively, K–S test, Fig. 7e), whereas genes with upregulated CDS-APA tended to be either mildly downregulated in 3w vs. 2w, or unchanged in 4w vs. 3w (Fig. 7e). In addition, genes with downregulated CDS-APA tended to have increased H3K4me3 signals around the TSS and in the gene body (Fig. 7f), whereas those with upregulated CDS-APA tended to have mildly decreased levels, as compared to genes with unchanged CDS-APA (Fig. 7f). Thus, the usage of upstream pAs appears to be inhibited when gene expression is upregulated and chromatin becomes more open, a trend that is opposite to the usage of proximal pA in 3’UTRs (see above). Whether this difference is due to the involvement of splicing activity, which plays a major role in CDS-APA , remains to be examined in the future.
Significant activation of bidirectional transcription in spermatogenesis
Here we show widespread shortening of 3’UTR by APA in spermatogenesis, with 3’UTRs being the shortest in spermatids. While this mechanism appears to be general, some genes tend to be regulated to a greater extent than others. Testis-specific genes display more substantial 3’UTR shortening than ubiquitously expressed genes, highlighting the importance of this mechanism for spermatogenesis. Interestingly, protein ubiquitination is the most significant pathway associated with genes with 3’UTR shortening. Because of the importance of ubiquitination for sperm development after spermatids, such as nucleosome removal and remodeling of cell structure, we posit that 3’UTR shortening plays an important role for spermiogenesis. Further experimental studies are needed to confirm this hypothesis.
Using genes without APA as a control group, we reveal that 3’UTR elements play an important role in mRNA abundance during spermatogenesis, including U-rich, UA-rich and UG-rich elements, possibly through regulation of mRNA stability. This may be important for elimination of mRNAs before spermatozoa, which contain little mRNA . Future studies are needed to elucidate proteins involved in the mRNA degradation mediated by these cis elements. Another important question to be addressed is how much of a role APA plays to help transcripts evade degradation. Compared to sUTR transcripts, long 3’UTR isoforms decreased in expression by ~1.7-fold from 2 weeks to 4 weeks and short isoforms increased by about 2.1-fold in the same period (Additional file 1: Figure S7). Assuming long isoforms have similar decay rates to sUTR transcripts, their decrease in expression can be attributable to APA changes. Therefore, while future experiments are needed to precisely address the question, our analysis indicates a significant impact on mRNA stability through APA changes. In the same vein, our cis element analysis result can parsimoniously explain why some genes display 3’UTR lengthening while the global trend is shortening: for genes whose cUTRs are enriched with destabilizing elements, their short 3’UTR isoforms are less stable than the long isoform, which presumably contains additional stabilizing elements in the aUTR to negate the destabilizing effect of cUTR, leading to overall display of 3’UTR lengthening.
By analyzing APA isoforms containing TEs at different portions of 3’UTR, we corroborated the recent finding by Gou et al.  that 3’UTR TEs are targeted for degradation in spermatogenesis. Using gene expression data from Miwi−/− mice , we further found that the degradation is through the piRNA-Miwi pathway. Thus, 3’UTR shortening can help genes evade the TE/piRNA/Miwi-based mRNA elimination during spermatogenesis. Previous studies have shown that TEs in 3’UTRs can play regulatory roles for mRNA metabolism [62–64] and some TEs can confer functional pAs to the host gene . APA regulation in spermatogenesis can thus effectively permit evolution of TEs in 3’UTRs without inhibiting the expression of host genes, contributing to exaptation of TEs into 3’UTRs. Further studies are needed to elucidate how important this mechanism is for 3’UTR evolution. Also to be examined is whether some TE-containing aUTR sequences can give rise to piRNAs, which in turn regulate the host gene post-transcriptionally.
We found that 3’UTR shortening is coupled with upregulation of gene transcription and open state of chromatin, as indicated by RNAPII and H3K4me3 levels, respectively. Open chromatin has been suggested to cause widespread transcription in testis , leading to high complexity of the transcriptome. While this result is consistent with our previous finding implicating a role of transcriptional activity in APA regulation , the mechanism behind the coupling is unclear. One possibility is that permissive chromatin structure makes it more efficient to assemble the cleavage/polyadenylation machinery, leading to more usage of proximal pAs. However, other mechanisms involving specific factors to facilitate recruitment of the C/P machinery, such as that mediated by transcription factors , cannot be ruled out.
Widespread regulation of APA events in introns and internal exons suggests modulation of splicing activity during spermatogenesis, which is consistent with previous reports [31, 34–37]. Notably, the CDS-APA regulation between 3 and 4 weeks is reminiscent of the APA regulation by U1 snRNP inhibition [26, 27], where activation of pAs is largely biased to 5’ introns. Whether there is a localized shortage of U1 snRNP for certain genes leading to activation of intronic pAs needs to be examined in the future. Interestingly, genes with suppressed CDS-APA isoforms tend to be upregulated in expression (Fig. 7e) and have higher H3K4me3 levels (Fig. 7f), suggesting that open chromatin state may lead to efficient splicing, resulting in inhibition of intronic C/P.
We found global activation of uaRNAs during spermatogenesis (Fig. 8). These non-coding transcripts are generated by divergent promoters and are typically under the surveillance of the exosome [26, 29]. Their expression can significantly enrich the transcriptome during spermatogenesis, potentially impacting evolution of new genes . uaRNA expression appears to be associated with open chromatin around the TSS and correlates with the expression of sense transcripts. It is not clear, however, why they are not eliminated by the nuclear exosome. Whether the function of exosome is suppressed in spermatogenesis or is overwhelmed by substantial activation of uaRNA expression needs to be addressed in the future.
3’READS of testis samples
Testis samples were obtained from mice (FVB) by surgical removal according to the approved Institutional Animal Care and Use Committee (IACUC) protocols. The 3’ region extraction and deep sequencing (3’READS) method was previously described in . Briefly, 25 μg of input RNA was used for each sample, and poly(A) + RNA was selected using oligo d(T)25 magnetic beads (NEB, Ipswich, MA, USA), followed by on-bead fragmentation using RNase III (NEB). Poly(A) + RNA fragments were then selected using the chimeric U5 and T45 (CU5T45) oligo conjugated on streptavidin beads, followed by RNase H (NEB) digestion. Eluted RNA fragments were ligated with 5’ and 3’ adapters, followed by RT and PCR (15x) to obtain cDNA libraries for sequencing on the Illumina platform. Processing of 3’READS data was carried out as previously described . Briefly, reads were mapped to the mouse genome using Bowtie 2 (Langmead and Salzberg 2012). Reads with ≥2 unaligned Ts at the 5’ end are called poly(A) site-supporting (PASS) reads, which were used to identify pAs. pAs located within 24 nt from each other were clustered together. The number of PASS reads generated for each sample is listed in Additional file 1: Table S2.
Relative expression (RE) of two 3’UTR-APA isoforms, e.g., Dis-pA and Prx-pA, was calculated by log2(RPMDis-pA/RPMPrx-pA), where RPM was reads per million PASS reads. Relative expression difference (RED) of two isoforms in two comparing samples was based on difference in RE of the two isoforms in the two samples. For CDS-APA analysis, RE was based on comparison of all CDS-pA isoforms combined (CDS-pA set) with all 3’UTR-pA isoforms combined (3’UTR-pA set), and RED was also based on the two sets. We used the significance analysis of alternative polyadenylation (SAAP) method to identify significantly regulated APA events, as previously described . Briefly, for two pAs (or two pA sets) from two comparing samples, a RED score was first calculated and was called observed RED. The PASS reads were then sampled based on the assumption that the relative abundance of each pA isoform was the same in two samples. Sampling was performed 20 times to obtain expected mean and standard deviation of RED, which were then used to convert observed RED to Z score (minus mean and divided by standard deviation). False discovery rate (FDR) was calculated by comparing observed Z (Zo) and expected Z (Ze) for a given Z cutoff value (Zc). For 3’UTR-APA, we selected the two most abundant pA isoforms for analysis. For CDS-APA, we combined all isoforms using pAs in upstream regions of the 3’-most exon and compared their expression change with that of isoforms using pAs in the 3’-most exon. Individual CDS-pAs were also analyzed by comparing to all other pA isoforms of the gene. For uaRNA analysis, we combined all antisense transcripts using pAs within 2 kb upstream from the transcription start site (TSS), excluding those mapped to other mRNA genes, and compared them to all sense strand transcripts, excluding pAs located within 2 kb downstream of the TSS. We used FDR = 5 % to select significantly regulated APA events. Global analysis of alternative polyadenylation (GAAP) was previously described in . Briefly, for two 3’READS data sets A and B, we sampled by bootstrapping 1.5 M PASS reads from A and B, respectively. The number of genes with significant APA changes based on SAAP analysis was calculated and called observed value. The data were also randomly permutated (shuffled across samples) to obtain the expected value. The observed value − expected value was normalized number of genes with significant APA regulation. This process was repeated 20 times to obtain standard deviation. The weighted mean of 3’UTR size for each gene was based on 3’UTR sizes of all 3’UTR-APA isoforms, weighted by the expression level of each isoform based on the number of PASS reads. Only genes with ≥50 PASS reads were used for the analysis.
Cis element analysis
To identify cis elements enriched for a set of sequences, we compared k-mer (k = 4 or 6) frequencies in the set vs. a background set and derived P values indicating significance of enrichment using the Fisher’s exact test. This was carried out to identify enriched k-mers for cUTR, aUTR and sUTR sets. For cis elements around regulated pAs, we first put proximal and distal pAs into two separate sets to mitigate the possibility that identified cis elements were related to location, rather than regulation. Regulated proximal pAs were then compared to other proximal pAs to identify overrepresented k-mers. The same approach was used for distal pAs. We examined three regions around the pA, i.e., −100 to −41 nt, −40 to −1 nt and +1 to +100 nt. For each region, the Fisher’s exact test was used to examine whether a k-mer was enriched for a set of pAs vs. other pAs.
Analysis of introns
The intron location was based on the RefSeq database, considering all RefSeq-supported splicing isoforms. Distribution of regulated intronic pA isoforms was compared to that of background set, which was derived from all detected intronic pAs in the mouse genome . To calculate 5’ splice site (5’SS) or 3’ splice site (3’SS) strength, we used all 5’SS or 3’SS supported by mouse RefSeq sequences and applied MaxEntScan to obtain strength scores . The 5’SS or 3’SS strength of introns containing regulated pAs was compared to that of background introns with the same relative locations in genes by the Wilcoxon rank-sum test.
Gene expression using 3’READS data was based on all PASS reads mapped to the 3’-most exon of a gene, represented by the reads per million total PASS reads (RPM) value. For RNA-seq data analysis, reads were mapped to the mouse genome (mm9) using Bowtie 2 (local mode). Only uniquely mapped reads with MAPQ score >10 were used. RefSeq gene models were used to calculate gene expression level. In order to eliminate the effect of 3’UTR-APA on gene expression calculation, only reads mapped to CDS were used. For ChIP-seq data analysis, reads were mapped to the mouse genome (mm9) using Bowtie 2. Only uniquely mapped reads with MAPQ score >10 were used. Reads mapped to the same genomic loci (defined by chromosome, strand and start and end positions) were collapsed into one. The genomic regions were binned for enrichment score calculation, i.e., every 50-nt before the TSS and after the last pA, or every percentile of the gene body. The enrichment score was defined as log2(ratio) of reads per million total mapped reads (RPM) of immunoprecipitated sample to that of input sample. The 5’ end position of a read was used to represent the read. For TE content analysis, 5’UTR and CDS were defined by RefSeq annotations, and 3’UTR was based on both RefSeq annotations and 3’READS data. All TE types were analyzed, including LINE, SINE, LTR and DNA. Gene ontology (GO) analysis was carried out using the Fisher’s exact test. GO annotation of genes was obtained from the NCBI Gene database. Testis-specific genes and ubiquitously expressed genes were defined in . The following data sets were downloaded from the GEO database of NCBI: [NCBI GEO:GSE45441] (RNAPII ChIP-seq data); [NCBI GEO:GSE49621] (H3K4me3 ChIP-seq data); [NCBI GEO:GSE43717] (directional RNA-seq data for different cell types in testis); [NCBI GEO:GSE42004] (directional RNA-seq data for different stages of testis from Miwi +/− and Miwi−/− strains).
Availability of supporting data
All data can be obtained from the NCBI GEO database [NCBI GEO:GSE73973].
We thank other members of BT’s laboratory, Mofang Liu and Donny Licatalosi for helpful discussions. This work was supported by an NIH grant (GM084089) to BT.
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
- Zhao J, Hyman L, Moore C. Formation of mRNA 3' ends in eukaryotes: mechanism, regulation, and interrelationships with other steps in mRNA synthesis. Microbiol Mol Biol Rev. 1999;63(2):405–45.PubMed CentralPubMedGoogle Scholar
- Colgan DF, Manley JL. Mechanism and regulation of mRNA polyadenylation. Genes Dev. 1997;11(21):2755–66.View ArticlePubMedGoogle Scholar
- Tian B, Graber JH. Signals for pre-mRNA cleavage and polyadenylation. Wiley Interdiscip Rev RNA. 2012;3(3):385–96.PubMed CentralView ArticlePubMedGoogle Scholar
- Proudfoot NJ. Ending the message: poly(A) signals then and now. Genes Dev. 2011;25(17):1770–82.PubMed CentralView ArticlePubMedGoogle Scholar
- Mandel CR, Bai Y, Tong L. Protein factors in pre-mRNA 3'-end processing. Cell Mol Life Sci. 2008;65(7-8):1099–122.PubMed CentralView ArticlePubMedGoogle Scholar
- Shi Y, Di Giammartino DC, Taylor D, Sarkeshik A, Rice WJ, Yates 3rd JR, et al. Molecular architecture of the human pre-mRNA 3' processing complex. Mol Cell. 2009;33(3):365–76.PubMed CentralView ArticlePubMedGoogle Scholar
- Shi Y, Manley JL. The end of the message: multiple protein-RNA interactions define the mRNA polyadenylation site. Genes Dev. 2015;29(9):889–97.PubMed CentralView ArticlePubMedGoogle Scholar
- Di Giammartino DC, Li W, Ogami K, Yashinskie JJ, Hoque M, Tian B, et al. RBBP6 isoforms regulate the human polyadenylation machinery and modulate expression of mRNAs with AU-rich 3' UTRs. Genes Dev. 2014;28(20):2248–60.PubMed CentralView ArticlePubMedGoogle Scholar
- Tian B, Hu J, Zhang H, Lutz CS. A large-scale analysis of mRNA polyadenylation of human and mouse genes. Nucleic Acids Res. 2005;33(1):201–12.PubMed CentralView ArticlePubMedGoogle Scholar
- Wang ET, Sandberg R, Luo S, Khrebtukova I, Zhang L, Mayr C, et al. Alternative isoform regulation in human tissue transcriptomes. Nature. 2008;456(7221):470–6.PubMed CentralView ArticlePubMedGoogle Scholar
- Shepard PJ, Choi EA, Lu J, Flanagan LA, Hertel KJ, Shi Y. Complex and dynamic landscape of RNA polyadenylation revealed by PAS-Seq. RNA. 2011;17(4):761–72.PubMed CentralView ArticlePubMedGoogle Scholar
- Hoque M, Ji Z, Zheng D, Luo W, Li W, You B, et al. Analysis of alternative cleavage and polyadenylation by 3' region extraction and deep sequencing. Nat Methods. 2013;10(2):133–9.PubMed CentralView ArticlePubMedGoogle Scholar
- Tian B, Manley JL. Alternative cleavage and polyadenylation: the long and short of it. Trends Biochem Sci. 2013;38(6):312–20.PubMed CentralView ArticlePubMedGoogle Scholar
- Elkon R, Ugalde AP, Agami R. Alternative cleavage and polyadenylation: extent, regulation and function. Nat Rev Genet. 2013;14(7):496–506.View ArticlePubMedGoogle Scholar
- Lutz CS, Moreira A. Alternative mRNA polyadenylation in eukaryotes: an effective regulator of gene expression. WIREs RNA. 2011;2(1):23–31.PubMed CentralView ArticlePubMedGoogle Scholar
- Zhang H, Lee JY, Tian B. Biased alternative polyadenylation in human tissues. Genome Biol. 2005;6:R100.PubMed CentralView ArticlePubMedGoogle Scholar
- Derti A, Garrett-Engele P, Macisaac KD, Stevens RC, Sriram S, Chen R, et al. A quantitative atlas of polyadenylation in five mammals. Genome Res. 2012;22(6):1173–83.PubMed CentralView ArticlePubMedGoogle Scholar
- Lianoglou S, Garg V, Yang JL, Leslie CS, Mayr C. Ubiquitously transcribed genes use alternative polyadenylation to achieve tissue-specific expression. Genes Dev. 2013;27(21):2380–96.PubMed CentralView ArticlePubMedGoogle Scholar
- Miura P, Shenker S, Andreu-Agullo C, Westholm JO, Lai EC. Widespread and extensive lengthening of 3' UTRs in the mammalian brain. Genome Res. 2013;23(5):812–25.PubMed CentralView ArticlePubMedGoogle Scholar
- Ji Z, Lee JY, Pan Z, Jiang B, Tian B. Progressive lengthening of 3' untranslated regions of mRNAs by alternative polyadenylation during mouse embryonic development. Proc Natl Acad Sci U S A. 2009;106:7028–33.PubMed CentralView ArticlePubMedGoogle Scholar
- Sandberg R, Neilson JR, Sarma A, Sharp PA, Burge CB. Proliferating cells express mRNAs with shortened 3' untranslated regions and fewer microRNA target sites. Science. 2008;320(5883):1643–7.PubMed CentralView ArticlePubMedGoogle Scholar
- Ji Z, Tian B. Reprogramming of 3' untranslated regions of mRNAs by alternative polyadenylation in generation of pluripotent stem cells from different cell types. PLoS One. 2009;4(12):e8419.PubMed CentralView ArticlePubMedGoogle Scholar
- Flavell SW, Kim TK, Gray JM, Harmin DA, Hemberg M, Hong EJ, et al. Genome-wide analysis of MEF2 transcriptional program reveals synaptic target genes and neuronal activity-dependent polyadenylation site selection. Neuron. 2008;60(6):1022–38.PubMed CentralView ArticlePubMedGoogle Scholar
- Zheng D, Tian B. RNA-binding proteins in regulation of alternative cleavage and polyadenylation. Adv Exp Med Biol. 2014;825:97–127.View ArticlePubMedGoogle Scholar
- Tian B, Pan Z, Lee JY. Widespread mRNA polyadenylation events in introns indicate dynamic interplay between polyadenylation and splicing. Genome Res. 2007;17(2):156–65.PubMed CentralView ArticlePubMedGoogle Scholar
- Li W, You B, Hoque M, Zheng D, Luo W, Ji Z, et al. Systematic profiling of poly(A) + transcripts modulated by core 3' end processing and splicing factors reveals regulatory rules of alternative cleavage and polyadenylation. PLoS Genet. 2015;11(4):e1005166.PubMed CentralView ArticlePubMedGoogle Scholar
- Kaida D, Berg MG, Younis I, Kasim M, Singh LN, Wan L, et al. U1 snRNP protects pre-mRNAs from premature cleavage and polyadenylation. Nature. 2010;468(7324):664–8.PubMed CentralView ArticlePubMedGoogle Scholar
- Almada AE, Wu X, Kriz AJ, Burge CB, Sharp PA. Promoter directionality is controlled by U1 snRNP and polyadenylation signals. Nature. 2013;499(7458):360–3.PubMed CentralView ArticlePubMedGoogle Scholar
- Ntini E, Jarvelin AI, Bornholdt J, Chen Y, Boyd M, Jorgensen M, et al. Polyadenylation site-induced decay of upstream transcripts enforces promoter directionality. Nat Struct Mol Biol. 2013;20(8):923–8.View ArticlePubMedGoogle Scholar
- Soumillon M, Necsulea A, Weier M, Brawand D, Zhang X, Gu H, et al. Cellular source and mechanisms of high transcriptome complexity in the mammalian testis. Cell Rep. 2013;3(6):2179–90.View ArticlePubMedGoogle Scholar
- Margolin G, Khil PP, Kim J, Bellani MA, Camerini-Otero RD. Integrated transcriptome analysis of mouse spermatogenesis. BMC Genomics. 2014;15:39.PubMed CentralView ArticlePubMedGoogle Scholar
- Hammoud SS, Low DH, Yi C, Carrell DT, Guccione E, Cairns BR. Chromatin and transcription transitions of mammalian adult germline stem cells and spermatogenesis. Cell Stem Cell. 2014;15(2):239–53.View ArticlePubMedGoogle Scholar
- Steger K. Transcriptional and translational regulation of gene expression in haploid spermatids. Anat Embryol. 1999;199(6):471–87.View ArticlePubMedGoogle Scholar
- Yeo G, Holste D, Kreiman G, Burge CB. Variation in alternative splicing across human tissues. Genome Biol. 2004;5(10):R74.PubMed CentralView ArticlePubMedGoogle Scholar
- Venables JP. Alternative splicing in the testes. Curr Opin Genet Dev. 2002;12(5):615–9.View ArticlePubMedGoogle Scholar
- Elliott DJ, Grellscheid SN. Alternative RNA splicing regulation in the testis. Reproduction. 2006;132(6):811–9.View ArticlePubMedGoogle Scholar
- Schmid R, Grellscheid SN, Ehrmann I, Dalgliesh C, Danilenko M, Paronetto MP, et al. The splicing landscape is globally reprogrammed during male meiosis. Nucleic Acids Res. 2013;41(22):10170–84.PubMed CentralView ArticlePubMedGoogle Scholar
- Li W, Yeh HJ, Shankarling GS, Ji Z, Tian B, Macdonald CC. The tauCstF-64 polyadenylation protein controls genome expression in testis. PLoS One. 2012;7(10):e48373.PubMed CentralView ArticlePubMedGoogle Scholar
- Liu D, Brockman JM, Dass B, Hutchins LN, Singh P, McCarrey JR, et al. Systematic variation in mRNA 3'-processing signals during mouse spermatogenesis. Nucleic Acids Res. 2007;35(1):234–46.PubMed CentralView ArticlePubMedGoogle Scholar
- Dass B, Tardif S, Park JY, Tian B, Weitlauf HM, Hess RA, et al. Loss of polyadenylation protein tauCstF-64 causes spermatogenic defects and male infertility. Proc Natl Acad Sci U S A. 2007;104(51):20374–9.PubMed CentralView ArticlePubMedGoogle Scholar
- Watanabe T, Lin H. Posttranscriptional regulation of gene expression by Piwi proteins and piRNAs. Mol Cell. 2014;56(1):18–27.PubMed CentralView ArticlePubMedGoogle Scholar
- Aravin AA, Sachidanandam R, Bourc'his D, Schaefer C, Pezic D, Toth KF, et al. A piRNA pathway primed by individual transposons is linked to de novo DNA methylation in mice. Mol Cell. 2008;31(6):785–99.PubMed CentralView ArticlePubMedGoogle Scholar
- Kuramochi-Miyagawa S, Watanabe T, Gotoh K, Totoki Y, Toyoda A, Ikawa M, et al. DNA methylation of retrotransposon genes is regulated by Piwi family members MILI and MIWI2 in murine fetal testes. Genes Dev. 2008;22(7):908–17.PubMed CentralView ArticlePubMedGoogle Scholar
- Watanabe T, Cheng EC, Zhong M, Lin H. Retrotransposons and pseudogenes regulate mRNAs and lncRNAs via the piRNA pathway in the germline. Genome Res. 2015;25(3):368–80.PubMed CentralView ArticlePubMedGoogle Scholar
- Gou LT, Dai P, Yang JH, Xue Y, Hu YP, Zhou Y, et al. Pachytene piRNAs instruct massive mRNA elimination during late spermiogenesis. Cell Res. 2014;24(6):680–700.PubMed CentralView ArticlePubMedGoogle Scholar
- Zhang P, Kang JY, Gou LT, Wang J, Xue Y, Skogerboe G, et al. MIWI and piRNA-mediated cleavage of messenger RNAs in mouse testes. Cell Res. 2015;25(2):193–207.PubMed CentralView ArticlePubMedGoogle Scholar
- Goh WS, Falciatori I, Tam OH, Burgess R, Meikar O, Kotaja N, et al. piRNA-directed cleavage of meiotic transcripts regulates spermatogenesis. Genes Dev. 2015;29(10):1032–44.PubMed CentralView ArticlePubMedGoogle Scholar
- Zamudio N, Bourc'his D. Transposable elements in the mammalian germline: a comfortable niche or a deadly trap? Heredity. 2010;105(1):92–104.View ArticlePubMedGoogle Scholar
- Yoshida S, Sukeno M, Nakagawa T, Ohbo K, Nagamatsu G, Suda T, et al. The first round of mouse spermatogenesis is a distinctive program that lacks the self-renewing spermatogonia stage. Development. 2006;133(8):1495–505.View ArticlePubMedGoogle Scholar
- Laiho A, Kotaja N, Gyenesei A, Sironen A. Transcriptome profiling of the murine testis during the first wave of spermatogenesis. PLoS One. 2013;8(4):e61558.PubMed CentralView ArticlePubMedGoogle Scholar
- Bose R, Manku G, Culty M, Wing SS. Ubiquitin-proteasome system in spermatogenesis. Adv Exp Med Biol. 2014;759:181–213.View ArticlePubMedGoogle Scholar
- Hu J, Lutz CS, Wilusz J, Tian B. Bioinformatic identification of candidate cis-regulatory elements involved in human mRNA polyadenylation. RNA. 2005;11(10):1485–93.PubMed CentralView ArticlePubMedGoogle Scholar
- Ji Z, Luo W, Li W, Hoque M, Pan Z, Zhao Y, et al. Transcriptional activity regulates alternative cleavage and polyadenylation. Mol Syst Biol. 2011;7:534.PubMed CentralView ArticlePubMedGoogle Scholar
- Zhang J, Bonasio R, Strino F, Kluger Y, Holloway JK, Modzelewski AJ, et al. SFMBT1 functions with LSD1 to regulate expression of canonical histone genes and chromatin-related factors. Genes Dev. 2013;27(7):749–66.PubMed CentralView ArticlePubMedGoogle Scholar
- Masuda A, Andersen HS, Doktor TK, Okamoto T, Ito M, Andresen BS, et al. CUGBP1 and MBNL1 preferentially bind to 3' UTRs and facilitate mRNA decay. Sci Rep. 2012;2:209.PubMed CentralPubMedGoogle Scholar
- Mukherjee N, Corcoran DL, Nusbaum JD, Reid DW, Georgiev S, Hafner M, et al. Integrative regulatory mapping indicates that the RNA-binding protein HuR couples pre-mRNA processing and mRNA stability. Mol Cell. 2011;43(3):327–39.PubMed CentralView ArticlePubMedGoogle Scholar
- Jurka J. Repbase update: a database and an electronic journal of repetitive elements. Trends Genet. 2000;16(9):418–20.View ArticlePubMedGoogle Scholar
- Bao W, Kojima KK, Kohany O. Repbase Update, a database of repetitive elements in eukaryotic genomes. Mob DNA. 2015;6:11.PubMed CentralView ArticlePubMedGoogle Scholar
- Richard P, Manley JL. How bidirectional becomes unidirectional. Nat Struct Mol Biol. 2013;20(9):1022–4.View ArticlePubMedGoogle Scholar
- Seila AC, Core LJ, Lis JT, Sharp PA. Divergent transcription: a new feature of active promoters. Cell Cycle. 2009;8(16):2557–64.View ArticlePubMedGoogle Scholar
- Miller D, Ostermeier GC, Krawetz SA. The controversy, potential and roles of spermatozoal RNA. Trends Mol Med. 2005;11(4):156–63.View ArticlePubMedGoogle Scholar
- Elbarbary RA, Li W, Tian B, Maquat LE. STAU1 binding 3' UTR IRAlus complements nuclear retention to protect cells from PKR-mediated translational shutdown. Genes Dev. 2013;27(13):1495–510.PubMed CentralView ArticlePubMedGoogle Scholar
- Fitzpatrick T, Huang S. 3'-UTR-located inverted Alu repeats facilitate mRNA translational repression and stress granule accumulation. Nucleus. 2012;3(4):359–69.PubMed CentralView ArticlePubMedGoogle Scholar
- Chen LL, Carmichael GG. Altered nuclear retention of mRNAs containing inverted repeats in human embryonic stem cells: functional role of a nuclear noncoding RNA. Mol Cell. 2009;35(4):467–78.PubMed CentralView ArticlePubMedGoogle Scholar
- Lee JY, Ji Z, Tian B. Phylogenetic analysis of mRNA polyadenylation sites reveals a role of transposable elements in evolution of the 3'-end of genes. Nucleic Acids Res. 2008;36(17):5581–90.PubMed CentralView ArticlePubMedGoogle Scholar
- Nagaike T, Logan C, Hotta I, Rozenblatt-Rosen O, Meyerson M, Manley JL. Transcriptional activators enhance polyadenylation of mRNA precursors. Mol Cell. 2011;41(4):409–18.PubMed CentralView ArticlePubMedGoogle Scholar
- Wu X, Sharp PA. Divergent transcription: a driving force for new gene origination? Cell. 2013;155(5):990–6.PubMed CentralView ArticlePubMedGoogle Scholar
- Hoque M, Li W, Tian B. Accurate mapping of cleavage and polyadenylation sites by 3' region extraction and deep sequencing. Methods Mol Biol. 2014;1125:119–29.View ArticlePubMedGoogle Scholar
- Yeo G, Burge CB. Maximum entropy modeling of short sequence motifs with applications to RNA splicing signals. J Comput Biol. 2004;11(2-3):377–94.View ArticlePubMedGoogle Scholar
- Pan JB, Hu SC, Shi D, Cai MC, Li YB, Zou Q, et al. PaGenBase: a pattern gene database for the global and dynamic understanding of gene function. PLoS One. 2013;8(12):e80747.PubMed CentralView ArticlePubMedGoogle Scholar