Welcome back to our RNA-seq article series! Now that we’ve gone through the general workflow and the ins and outs of RNA quality, we will look at some of the popular methods and platforms that you are likely to encounter if you embark on RNA-seq experiments.
While RNA-seq technology allows us to study many different aspects of RNA biology, e.g., splice variants, RNA regulation, and novel non-coding RNAs, the most widely used application is differential gene expression analysis. Technological developments and advances in the lab and bioinformatics have led to such massive progress in sequencing that about 100 distinct RNA-seq methods exist today. It is obviously beyond the scope of this article to cover them all, so instead, we can group these methods into 3 categories depending on the length of the individual fragments that are sequenced (read length) and the level of RNA processing required prior to sequencing:
- Short-read cDNA sequencing – Most methods fall into this category
- Long-read cDNA sequencing
- Long-read direct RNA sequencing (dRNA- seq)
Common Processing Steps
The library preparation protocols required for short-read and long-read cDNA sequencing (1 and 2 above) have much in common and both approaches are affected by RNA quality and computational issues that lie up- and downstream of library preparation, respectively.
Long-read direct RNA sequencing (dRNA-seq) is an emerging technology that sequences isolated RNA molecules directly without the need for modification, e.g., enrichment, conversion to cDNA or PCR amplification. As such, this technology bypasses the biases associated with these processing steps and also retains epigenetic information. dRNA-seq won’t be discussed further here but you can find links to further reading at the end of this article.
Before we go any further into the short- and long-read cDNA sequencing technologies, let’s first look at some of the important steps shared by these approaches:
mRNA Fragmentation – This is one of the first steps that occur during library preparation. Here, isolated RNA is broken in a controlled manner into smaller fragments that are short enough to be sequenced by the platform in question. The desired fragment length depends on the length of the reads produced by the given sequencing platform. RNA may be fragmented physically (through sonication), enzymatically (RNase III digestion), or chemically (by exposure to divalent metal cations, e.g., zinc or magnesium under elevated temperature).
Size Selection – This step controls the size distribution of the cDNA or DNA fragments that are fed into the sequencing platform. This is performed on the sequencing library and not the isolated RNA (and is not be confused with RNA fragmentation). The various RNA-seq methods benefit differently from size selection but here’s a handy rule of thumb. Short-read methods work best with sequencing libraries that contain fragments of similar sizes in the appropriate range. Very short fragments, e.g., primer-dimers will waste sequencing capacity, possibly increasing the number of lanes required to generate sufficient sequencing data and potentially skewing results. Long-read sequencing platforms, not surprisingly, produce the best results with long fragments, as short fragments will lead to incomplete reads that may disturb data quality. The exact size range for each protocol and platform is usually vendor-specific.
Adaptor Ligation – During library preparation, platform-specific adapters are added to the cDNA fragments. These adaptors are short DNA sequences that contain certain functional elements required for sequencing. Adaptors usually harbour additional short sequences known as barcodes or tags that aid in sample identification later, since samples are often mixed during RNA-seq and sequenced on the same sequencing lane or cell.
Short-Read cDNA Sequencing
Short-read RNA sequencing is the go-to method for differential gene expression analysis as it provides cost-effective and high-quality quantitative data that represents entire transcriptomes. Although other platforms exist, the vast majority of short-read RNA-seq data held in public databases has been generated using Illumina’s short-read sequencing technology.
Short-read protocols usually result in cDNA fragments that are below 200 bp in length due to mRNA fragmentation and a size selection step during bead-based library preparation to enrich for fragments in the 150-200 bp range.
Illumina RNA-seq – Sequencing by Synthesis
Illumina RNA-seq uses sequencing by synthesis technology. This is carried out on a flow cell where the cDNA molecules in the sequencing library are sequenced simultaneously in a base-by-base manner.
In a sequencing by synthesis reaction, 3’ fluorescently labelled nucleotides (i.e. A, G, C and T) are added as the substrates for DNA synthesis using the cDNA library as a template. In each round of sequencing, the growing DNA strand is imaged to determine which of the 4 fluorophores has been incorporated i.e. which nucleotide is present at that point in the cDNA sequence. The labelled nucleotides each contain a reversible terminator that is cleaved after imaging to allow the next round of synthesis to take place. As the DNA strand grows, it is continuously imaged and the fluorescent data is collected to reveal the identity of the newly synthesised DNA strands or reads. This technology can generate reads of 50-500 bp in length.
Long-Read cDNA sequencing
Although Illumina’s technology is most widely-used for mRNA analysis, both Pacific Biosciences (PacBio) and Oxford Nanopore provide alternative long-read technologies that permit single-molecule sequencing of intact individual mRNA molecules after conversion to cDNA. By eliminating the need to assemble shorter reads into longer sequences (which is performed during computational data analysis), these long-read approaches circumvent some of the issues associated with short-read approaches, e.g., ambiguity with read mapping is reduced (because there will be fewer, but longer, reads) and longer transcripts can be identified thus making it possible to detect and identify diverse isoforms.
In PacBio’s standard Iso-Seq protocols for RNA-seq, high-quality RNA is converted to full-length cDNA using a template-switching reverse transcriptase that is capable of generating cDNA molecules up to 10 kb in length. These cDNA molecules are PCR amplified and then used as the input material for single molecule real-time (SMRT) sequencing.
Because PacBio sequencing requires a large amount of DNA template that is generated through high-volume PCR reactions, it is also necessary to optimise the PCR amplification step to avoid biases associated with over-amplification. Following PCR amplification, DNA ends are repaired enzymatically to facilitate adaptor ligation and long-read sequencing is then performed.
With PacBio’s technology, individual cDNA molecules are loaded into a sequencing chip containing millions of nanowells known as zero mode waveguides (ZMWs). A single molecule of DNA polymerase is immobilised to the bottom of each ZMW, which uses a single molecule of cDNA (or DNA if library amplification is performed) as a PCR template. Fluorescently labelled nucleotides serve as the substrates for DNA synthesis. As nucleotides are incorporated into the growing DNA strand, fluorescent signals are released and detected in real time. This technology can yield reads of up to 50 kb in length.
Short transcripts diffuse more quickly than longer transcripts to the active surface of the sequencing chips in PacBio’s platforms. To avoid associated sequencing bias, it is advised to perform size selection when studying transcripts in the 1-4 kb range. Size selection bias can be further controlled by modifying the loading conditions of the sequencing chips.
Oxford Nanopore RNA-Seq
Oxford Nanopore’s cDNA sequencing technology also generates full-length transcript reads. A template-switching reverse transcriptase is used during cDNA synthesis resulting in long cDNA molecules that may be subsequently PCR amplified as necessary. cDNA or DNA is then ligated to adaptors resulting in the library that is used as the template for sequencing. Direct cDNA sequencing eliminates PCR bias, which may lead to higher quality data, but with the drawback of fewer sequencing reads than can be obtained from PCR-amplified cDNA libraries. In spite of the risk of bias, PCR amplification of cDNA libraries is sometimes necessary, e.g., when input RNA material is limited.
The cDNA or DNA library is loaded into a flowcell that contains nanopores. Motor proteins, that are attached to the cDNA or DNA molecules during adaptor ligation, dock with the nanopores and separate the library into individual molecules. The motor protein controls the movement of a single cDNA or DNA strand through a nanopore, and as it moves a change in current occurs that is detected and decoded to generate the cDNA or DNA sequence in real time. Size selection bias has not yet been reported for this technology.
Despite the advantages of long-read sequencing as briefly addressed above, both of the long-read technologies described here suffer from a significant drawback. That is, the template switching reverse transcriptase that is used to generate the long cDNA molecules does not discriminate between intact and truncated RNAs. Alternative reverse transcriptases are available that specifically convert 5’ capped mRNAs (i.e. intact RNAs) to cDNA. Using these will reduce the amount of cDNA originating from degraded/truncated RNA. However, these transcriptases may impact read length depending on your platform so it’s wise to investigate this before starting out.
At a Crossroads? Get in Touch With Us!
If you’re about to embark on an RNA-seq project, you may already have an idea of which method type i.e. short- or long-read you are likely to choose. Because of the broad range of methods available and the pros and cons of each, as well as the many important considerations involved in setting up any RNA-seq project, it is difficult to give a complete overview here. If this article provokes any questions or you would like to discuss any RNA-seq method with us, you are very welcome to get in touch by email at firstname.lastname@example.org or using the contact details listed here.
Do also stay tuned for the next article in this series, where we will go through some of the main factors to consider when planning an RNA-seq experiment, e.g., whether to use single-read or paired-end sequencing, sequencing depth, strand-specific or non-strand specific protocols, and more!
- Deamer, D.; Akeson, M.; Branton, D., Three decades of nanopore sequencing. Nat Biotechnol 2016, 34 (5), 518-24.
- Stark, R.; Grzelak, M.; Hadfield, J., RNA sequencing: the teenage years. Nat Rev Genet 2019, 20 (11), 631-656.
- Harel, N.; Meir, M.; Gophna, U.; Stern, A., Direct sequencing of RNA with MinION Nanopore: detecting mutations based on associations. Nucleic Acids Res 2019, 47 (22), e148.