Data Analysis and Bioinformatics

Data Format and Security

The Illumina Genome Analyzer II(x) generates raw data in the form of fluorescence images of clonal DNA clusters. These are processed by our staff through Illumina’s data analysis Pipeline (Casava), resulting in sequence data in the form of large text files of sequence reads and associated base call quality data.

A variety of sequence data formats can be produced by the Illumina pipeline. Most commonly GeneWorks provides fastQ format data (four lines of text per read including ASCII character-encoded quality scores) but other formats are available on request. The volume of data normally provided to a customer on project completion can vary from about a gigabyte for a small project to hundreds of GB for a large genome, and is normally provided on DVD or hard disk.

GeneWorks realises that integrity, security and confidentiality of customer data is of paramount importance and has measures in place to protect this.

 

Analysis Options

Like other NextGen sequencing platforms, the Illumina Genome Analyzer II(x) generates a large volume of data. Solutions for analysis of Genome Analyzer and other NGS data are becoming increasingly available. In addition to basic data manipulation such as sequence trimming and counting, GeneWorks offers an increasing range of its own analysis solutions using commercially developed software from DNASTAR Inc. Central to this is our investment in the SeqMan NGen v3 assembler, specifically designed to assemble and analyse data from NGS platforms. NGen allows templated (i.e. using a close reference sequence) assembly of genome or whole exome sequence, de novo assembly for smaller sized genomes or RNAseq data. Assemblies made by GeneWorks using NGen can be further analysed using SeqMan Pro, one of the modules from DNAStar’s Lasergene suite (v 7.2 or higher).

GeneWorks’ currently provides the following solutions:

  • Read trimming and sorting.
  • Templated assembly (re-sequencing) of small genomes or targeted areas of large genomes.
  • De novo assembly of small genomes and RNAseq data.
  • Mapping of RNAseq data for quantitative gene expression analysis.
  • Access to third party service providers for large genome assembly, ChIP-seq and more complex bioinformatic analyses.
  • Advice on accessing expertise within the academic community.