Genomics

Whole genome, whole exome and targeted re-sequencing of well-characterized organisms, such as human, is often done to accurately map genetic variants or mutations. Our genome variation analysis finds all high-confidence SNPs, indels, gene copy numbers, and genomic rearrangements from DNA-seq data. For non-model organisms, we produce annotated genome assemblies with computational post-processing steps to ensure the best possible starting point for future studies.

Annotations are key to understanding genome analysis – we reflect the findings to public databases that are relevant in your research

Sergei Häyrynen
Sergei Häyrynen Technology Specialist Genevia Technologies Oy
  • Variant calling
    • Our statistical approaches to variant calling employ the current best practices which result in a reliable set of variants. Natural variants and single nucleotide polymorphisms can be called against any reference genome in any organism or even a genomic ensemble compiled from individual genomes from sequencing projects for better representation. In addition to the high confidence variants, we report regions of low coverage where the variant caller was not able to determine the sequence of samples. Whole genome, whole exome or targeted DNA-sequencing all enable variant calling equally well. The lists of variants can be further combined, compared and filtered in order to find disease-causing de novo germ line variants in trio studies, for example.

      Deliverables:

      • Full variant lists for all samples with evidence from data
      • Filtered variant lists based on any criteria (e.g. germ line control for mutations)
      • Low-coverage regions where variants could not be called
  • Variant annotation
    • Genetic variants are annotated with information regarding their location in the genome, variant type (homozygous/heterozygous), evidence from data (supporting reads), functional classification for exonic variants, amino acid changes in all isoforms, database identifiers for known variants, observed minor allele frequencies in several genome databases, or even your own data. We also provide pathogenicity predictions for each exonic variant using several prediction software. Flexible ranking and filtering the variants based on these annotations enables easy interpretation of complex genomic data for a geneticist or a physician.

      Deliverables:

      • Functional and location annotation for every variant
      • Minor allele frequencies in relevant databases
      • Database identifiers for known variants
      • Pathogenicity predictions
  • Copy number analysis
    • Gene copy numbers can be deduced from sequencing data using our statistical approaches for analyzing both coverage information and allele frequency information. The analysis yields copy numbers for chromosome-scale segments, each gene, as well as each exon independently. Gene copy numbers can be further integrated to expression data, for example, to find significant gene dosage effects.

      Deliverables:

      • Copy number for each chromosome
      • Gene copy number for each gene
      • Copy number for each exon
  • Genomic rearrangements
    • Whole genome sequencing data coupled with mate pair information from paired-end sequencing can be used to study copy number neutral genomic rearrangements like inversions and translocations. These can result in fusion genes that are critically linked to formation of cancer, for example. We deliver the altered genome structure with ranked fusion genes that can be validated with RNA-sequencing data.

      Deliverables:

      • List of potential fusion genes
      • List of all rearrangements
  • Genome assembly and refinement
    • For simpler organisms, we offer assembly of their genomes de novo based on DNA-sequencing data. Our approach is based on building a consensus assembly from outputs of several assembly tools, and then running computational post-assembly improvement software. If a draft genome exists, we can refine it computationally by joining contigs and resolving errors using the improvement tools or additional DNA-seq or RNA-seq data.

      Deliverables:

      • Assembled contigs in FASTA format
      • Computationally refined genome assembly
      • Quality estimation scores
  • Genome annotation
    • Assembled genomes can always be annotated using gene prediction and oriC prediction software and/or based on RNA-seq data. We predict gene identities for all putative genes by comparing their sequence to several genome databases, and for genes with less sequence similarity, functions can be predicted by identified functional domains. If annotated genomes for close relatives exist, we can improve the annotation by transferring gene information to the unannotated genome using sequence alignment based approaches. The result is a comprehensive list of genes with their coordinates in the genome.

      Deliverables:

      • Loci of predicted genes
      • Fully annotated genes based on homolog searches
      • Validated genes based on RNA-seq data
  • Neoantigen discovery
    • Identification of patient-specific tumor neoantigens (novel protein sequences that are created by tumor-specific DNA alterations) is one of the cornerstones of cancer immunotherapy. We can interrogate exome sequencing data for non-synonymous somatic mutations in coding regions, and translate these in silico to peptides containing the mutation. Additional RNA-sequencing can be used to focus on highly expressed genes to ascertain high epitope abundance as well as to look for alternative splicing, exon skipping and translocation based neoantigens. The lists of epitopes can be further filtered or ranked algorithmically by analyzing aspects such as the likelihood of proteasomal processing, transport into the endoplasmic reticulum and affinity for the relevant MHC class I alleles.

      Deliverables:

      • Full non-synonymous DNA variant lists
      • Expression levels for each mutated exon
      • Computationally ranked list potential epitopes
  • Cell-free DNA biomarker discovery
    • Circulating cell-free DNA holds potential for non-invasive genomic biomarkers, in particular for prenatal diagnosis and oncology. The mere presence of certain DNA sequences in plasma can reveal a tumor undetected by other means. Furthermore, mutations detected in circulating DNA can be used as markers in personalizing treatment and prognosis. Our pipeline for cell-free DNA-based biomarker discovery starts with full quality control of the data, followed by statistical comparison of pathological and control groups to uncover biomarkers with the optimal combination of sensitivity and specificity. Considering biological factors along with clinical feasibility, we summarize the analysis by highlighting the most promising biomarker candidates.

      Deliverables:

      • List of biomarker candidates from cell-free DNA
      • Sensitivity and specificity estimations for each candidate
      • Database identifiers for known mutations and pathogenicity predictions
  • Metagenomic analysis
    • Metagenomics offers an unbiased view into the microbial diversity of ecological niches including samples from host organisms and soil. Using whole-genome or, alternatively, 16S sequencing data, we assemble the sequence reads into contigs and assign them to species or operational taxonomic units (OTUs). Then, we quantify the abundance of each taxa. In the case of multiple samples, we compare the abundances and associate them with host phenotype or environmental factors. For whole-genome studies, we identify and annotate genes using both sequence homology and computational gene prediction.

      Deliverables:

      • Quantitative characterization of microbial diversity
      • Association of species/OTU and host phenotype or other environmental factors
      • Identified and predicted genes with custom annotations

Get free information package about bioinformatics as a service to your email in pdf-form.

See our other analyses: