Team I Webserver Group

From Compgenomics 2018
Jump to navigation Jump to search

Introduction

Background

The objectives of our BIOL 7210: Computational Genomics teams were to, given unassembled genome sequence data from the Weiss Lab at the Emory University School of Medicine, proceed through five distinct stages of analysis and interpretation of that data: genome assembly, gene prediction, functional annotation, comparative genomics, and production of a predictive webserver. At the last stage, our goal was to create a predictive webserver that performed the functionalities of some, if not all, of the work from previous groups.

Goals

Our goals for a predictive webserver were as follows:

  • Assemble input reads​
  • Analyze assemblies​
  • Visualize results​ in user-friendly format
  • Implement a way for results to be downloaded

KAREN

Klebsiella Antibiotics REsistance PredicitioN (KAREN) is a culmination of these objectives and is able to perform the following analyses given an input of raw sequence reads:

  • Raw read trimming and quality control checks
  • De novo assembly
  • Species identification
  • Strain identification
  • Average Nucleotide Identity
  • Computational Phenotyping
  • Visualization of results

Technologies Used

For the creation and development of this webserver, we used PHP framework for server-side programming. PHP provides a strong frameworks to support MySQL and Apache Server. Also, PHP provides the feasibility of the development of Model-View-Controller (MVC) framework, which provides a more simple user-interface. There are many MVC frameworks available, among which we used Laravel. Laravel was used because it is based on Symfony, which provides three important features we wanted to implement within our webserver

  1. Blade Templates (User Interface)
  2. Migrations (Database Management)
  3. Job Chainings

This webserver is built on PHP v7.0.0 and Laravel v5.5.

Functionalities

De novo Genome Assembly using Skesa

Trimmomatic and FastQC was used to perform both trimming and quality control checks on the raw input sequence data. Then, de novo sequencing was used in our pipeline since no reference sequence is needed in this case. We used Skesa for de novo genome assembly. This tool is currently unpublished.

Species & Strain Typing by StrainSeeker

Strainseeker is a tool which lets you rapidly and accurately make an assessment of the species and strain of a bacterial assembly. StrainSeeker has a pre-built database that is uses for species identification and works on paired-end reads to identify strain type. It has the ability to identify novel strains and is therefore a useful tool for further assessment of a sample of unknown origin.

perl builder.pl -n refseq_guide_tree.nwk -d strain_fasta_directory -w 32 -o my_database

perl seeker.pl -i sample_file.fastq -d ss_db_w32 -o sample_result.txt

CARD and VFDB Database

The Comprehensive Antibiotic Resistance Database (CARD) includes information on resistant genes, proteins coded by those genes, and their associated phenotypes. Since we want to understand the cause of heteroresistance and/or heterosusceptibility, we performed computational phenotyping against the CARD database to determine which antibiotic genes were present within the genome assembly.

The Virulence Factors Database (VFDB) is a reference database that holds information on virulence factors of pathogenic bacteria. They hold about 2,353 virulence factors including bacterial toxins, cell surface proteins, cell surface carbohydrates, and hydrolytic enzymes that may contribute to the pathogenicity of the bacterium. Computational phenotyping was performed against the VFDB database as well.


pyANI

In order to calculate the Average Nucleotide Identity (ANI) between genomes, we implemented the python tool pyANI. ANI is a measure of genome relatedness, and it shows how many nucleotides are identical between two genomes. The ANI value is related to DNA-DNA hybridization values, which traditionally indicate the microbial species definition. ANI values above 95% indicate that two genomes are the same species.

We implemented pyANI through using a very quick alignment tool - mummer. In our server, we run pyANI between six genomes. The user is able to choose among 20 reference genomes and any genome that a user has uploaded. So, ANI can be used to see similarities and differences between a dataset and also to get an idea of identity to Klebsiella references.

WebPage

Content to by updated

TO DO: TO DO: TO DO:

References

(Goris J, Konstantinidis KT, Klappenbach JA, Coenye T, Vandamme P, et al. (2007) DNA-DNA hybridization values and their relationship to whole-genome sequence similarities. (Int J Syst Evol Micr 57: 81-91. doi:10.1099/ijs.0.64483-0).