Team I Webserver Group: Difference between revisions
No edit summary |
|||
Line 32: | Line 32: | ||
FastQC was used to perform quality control checks on the raw input sequence data. Then, de novo sequencing was used in our pipeline because no reference sequence is needed in this case. Sequence reads are assembled as contigs, and the coverage quality of de novo sequence data depends on the size and continuity of the contigs. We used Skesa for de novo genome assembly. This tool is currently unpublished. | FastQC was used to perform quality control checks on the raw input sequence data. Then, de novo sequencing was used in our pipeline because no reference sequence is needed in this case. Sequence reads are assembled as contigs, and the coverage quality of de novo sequence data depends on the size and continuity of the contigs. We used Skesa for de novo genome assembly. This tool is currently unpublished. | ||
===Species & Strain Typing=== | ===Species & Strain Typing by StrainSeeker=== | ||
MEGA, GenomeTester4 and StrainSeeker were used to constructs a list of specific k-mers for each node of any given Newick-format tree and enables the identification of bacterial isolates in 1–2 min. MEGA7 was used to align the sequences and construct neighbor-joining tree. Then StrainSeeker was used to build a custom database using the 258 Klebsiella genomes we were given. To build a custom database, the tree generated by MEGA7 was used to function as the guide tree, describing the relationships between given strains. Then StrainSeeker was used to detect novel strains that are related to strains in the database. | MEGA, GenomeTester4 and StrainSeeker were used to constructs a list of specific k-mers for each node of any given Newick-format tree and enables the identification of bacterial isolates in 1–2 min. MEGA7 was used to align the sequences and construct neighbor-joining tree. Then StrainSeeker was used to build a custom database using the 258 Klebsiella genomes we were given. To build a custom database, the tree generated by MEGA7 was used to function as the guide tree, describing the relationships between given strains. Then StrainSeeker was used to detect novel strains that are related to strains in the database. | ||
Line 48: | Line 48: | ||
<Image> | <Image> | ||
The graph above describes the counts of the genes found and and the efflux mechanism that they possess. As '''Kleibsiella spp.''' are one of the bacteria known | The graph above describes the counts of the genes found and and the efflux mechanism that they possess. As '''Kleibsiella spp.''' are one of the bacteria known to develop multi-drug resistance, this information can be useful for interpretation and get a brief idea on the organism that was assembled. | ||
===VFDB Database=== | |||
The Virulence Factors Database is a reference database that holds information on virulent factors of pathogenic bacteria. They hold about 2,353 virulence factors including bacterial toxins, cell surface proteins, cell surface carbohydrates, and hydrolytic enzymes that may contribute to the pathogenicity of the bacterium. | |||
<Image> | |||
=== | ===PyANI=== | ||
Revision as of 21:06, 23 April 2018
Web Server
Introduction
Background
The goal of Klebsiella Antibiotics REsistance PredicitioN (KAREN) webserver is to assemble and annotate genome of Klebsiella spp. and provide the results to the user in an user-friendly format. KAREN could also be used to assemble genomes of other bacteria, however the server has been currently designed to annotate only Klebsiella genomes.
The objective of the BIOL7210: Computational Genomics this year was to perform genome assembly and functionally annotate 258 genomes. Recent studies have shown the emergence of colistin and fosfomycin resistance within Klebsiella spp..
KAREN is able to perform the following analyses with the input of raw sequence reads.
- TO DO: - TO DO: - TO DO:
Goals
- Assemble input reads
- Analyze assemblies
- Visualize results
- Implement a way for results to be downloaded
Technologies Used
For the creation and development of this webserver, we used PHP framework for server-side programming. PHP provides a strong frameworks to support MySQL and Apache Server. Also PHP provides the feasibility of the development of Model-View-Controller framework, which provides a simpler user-interface. There are many such frameworks available, among which we used Laravel.
Laravel was created by Taylor Otwell and is based on Symfony which provides three important features we wanted to implement within our webserver - 1. Blade Templates (User Interface), 2. Migrations (Database Management) and 3. Job Chainings. This webserver is built on PHP v7.0.0 and Laravel v5.5.
Functionalities
de novo Genome Assembly
FastQC was used to perform quality control checks on the raw input sequence data. Then, de novo sequencing was used in our pipeline because no reference sequence is needed in this case. Sequence reads are assembled as contigs, and the coverage quality of de novo sequence data depends on the size and continuity of the contigs. We used Skesa for de novo genome assembly. This tool is currently unpublished.
Species & Strain Typing by StrainSeeker
MEGA, GenomeTester4 and StrainSeeker were used to constructs a list of specific k-mers for each node of any given Newick-format tree and enables the identification of bacterial isolates in 1–2 min. MEGA7 was used to align the sequences and construct neighbor-joining tree. Then StrainSeeker was used to build a custom database using the 258 Klebsiella genomes we were given. To build a custom database, the tree generated by MEGA7 was used to function as the guide tree, describing the relationships between given strains. Then StrainSeeker was used to detect novel strains that are related to strains in the database.
perl builder.pl -n refseq_guide_tree.nwk -d strain_fasta_directory -w 32 -o my_database
perl seeker.pl -i sample_file.fastq -d ss_db_w32 -o sample_result.txt
A pre-build database is used by the StrainSeeker for species identification. Strainseeker is a tool which lets you rapidly and accurately makes as assessment of the species and strain of a bacterial assembly. It works in a matter of minutes and can be customized to use a user-created database. It works on paired-end reads and can even identify novel strains and place them near their close relatives on the phylogeny tree. It is therefore a useful tool for further assessment of a sample of unknown origin.
For KAREN, we are specifically concerned only with Klebsiella spp.. When testing the results using the pre-built database, our results showed it was seemed accurate at analyzing the Klebsiella strains. For this reason, we choose to use the pre-built database for finding species and strain identification.
CARD Database
The Comprehensive Antibiotic Resistance Database includes information on resistant genes, the proteins coded by those genes and their associated phenotypes. As one of the objectives of the class was to understand the cause of hetero-resistance and hetero-susceptibility, we performed computational phenotyping - to determine the antibiotic genes present within the genome assembly created by the webserver against the CARD database.
<Image>
The graph above describes the counts of the genes found and and the efflux mechanism that they possess. As Kleibsiella spp. are one of the bacteria known to develop multi-drug resistance, this information can be useful for interpretation and get a brief idea on the organism that was assembled.
VFDB Database
The Virulence Factors Database is a reference database that holds information on virulent factors of pathogenic bacteria. They hold about 2,353 virulence factors including bacterial toxins, cell surface proteins, cell surface carbohydrates, and hydrolytic enzymes that may contribute to the pathogenicity of the bacterium.
<Image>