Transcriptomics

Transcriptomics refers to probing differences in gene expression levels and pathway activities. In addition, cancer researchers, for example, are often interested in identifying novel fusion genes or lncRNAs. For non-model organisms, RNA-seq data offer significant benefits in assembling genomes as well as naturally assembling and annotating entire transcriptomes. High-quality gene models then enable expression studies just like in a model organism.

Why limit your study to expression levels when you can do so much more with your RNA-seq data – identify transcript isoforms, fusion genes, and lncRNAs, for example?

Thomas Liuksiala
Thomas Liuksiala Business Development Manager Genevia Technologies Oy
  • Transcriptome assembly and annotation
    • Whole transcriptome sequencing data enables computational assembly of short RNA-seq reads into entire transcripts either de novo or based on a reference genome. We then annotate the identified transcripts with their location in the genome if one exists, and compare their sequences to transcript databases to identify them. We can also study known motifs in the transcripts to predict their enzymatic function, for example.

      Deliverables:

      • List of assembled transcript sequences
      • Transcript coordinates on a genome sequence
      • Annotated names and/or functions for each transcript
  • Expression analysis
    • The most common type of study with RNA-seq data is gene (or microRNA) expression analysis. First, we identify which gene each sequencing read comes from, and then compute gene expression levels for all annotated genes. Then, differential expression levels between sample groups are inferred using robust statistical testing. Co-expressed genes can be found using clustering approaches. Gene groups often form the basis for more advanced analyses of gene regulation, for example.

      Deliverables:

      • Gene expression levels for all genes/isoforms in all samples
      • Differentially expressed genes between sample groups
      • Time-dependent genes in a time-series experiment
  • Pathway analysis
    • Interpreting the common biological themes in gene groups can be a daunting task. We summarize the differences between samples on the level of metabolic and signaling pathways or functional categories using pathway enrichment or gene set enrichment methods. Pathway activities can then be visualized to enable data exploration. This facilitates understanding the results of an expression study and helps, for instance, in determining the mode of action of a drug compound.

      Deliverables:

      • Enriched metabolic and signaling pathways
      • Enriched functional categories
      • Visualizations of changes in key pathways
  • Alternative splicing analysis
    • RNA-sequencing data allows identification of novel splicing isoforms and quantification of expression levels for all transcript variants of a gene, given a high enough expression level and sufficient depth of sequencing. Commonly we do not differentiate between gene isoforms in expression analysis due to practical reasons, but some of our projects are centered on investigating the expression rates for all splice variants, novel and known.

      Deliverables:

      • List of expressed transcript isoforms
      • List of previously un-identified splicing isoforms
      • Quantified expression levels for all isoforms
  • Fusion gene detection
    • Fusion genes, often caused by genomic rearrangements and leading to expression of dysregulated or fusion proteins, have a central role in the development of some cancers. We detect the expression of transcripts formed by two wild type genes from normal RNA-sequencing data. The potential fusions will be ranked based on available sequence evidence and other data.

      Deliverables:

      • Evidence for existence of known fusion genes
      • List of novel fusion genes
      • Expression estimates for fusion partners
  • Novel transcript detection
    • Not all genes - even in the human genome - have been recorded in databases to date. Using RNA-sequencing we can reveal transcription from genomic loci to which no genes have been annotated. Usually these are non-coding transcripts that may be extremely relevant tissue- or condition-specific regulators of gene expression.

      Deliverables:

      • Assembled and annotated expressed transcripts
      • List of previously unidentified transcripts
      • Expression estimates for all transcripts
  • GRO-seq
    • Global run-on assays identify the genes that are being transcribed at a certain point in time instead of genes that have been recently transcribed. This difference may be essential in studying gene regulation with ChIP-seq measurements, for example. Another difference to mRNA-seq is that the data comes from primary transcripts instead of mature mRNAs. Our analysis of GRO-seq data results in quantified expression levels and transcription start sites of the transcribed genes.

      Deliverables:

      • List of expressed genes
      • Expression estimates for all genes
      • Genomic coordinates for primary transcripts

Get free information package about bioinformatics as a service to your email in pdf-form.

See our other analyses: