Team I Webserver Group: Difference between revisions
Line 23: | Line 23: | ||
MEGA, GenomeTester4 and StrainSeeker were used to constructs a list of specific k-mers for each node of any given Newick-format tree and enables the identification of bacterial isolates in 1–2 min. MEGA7 was used to align the sequences and construct neighbor-joining tree. Then StrainSeeker was used to build a custom database using the 258 Klebsiella genomes we were given. To build a custom database, the tree generated by MEGA7 was used to function as the guide tree, describing the relationships between given strains. Then StrainSeeker was used to detect novel strains that are related to strains in the database. | MEGA, GenomeTester4 and StrainSeeker were used to constructs a list of specific k-mers for each node of any given Newick-format tree and enables the identification of bacterial isolates in 1–2 min. MEGA7 was used to align the sequences and construct neighbor-joining tree. Then StrainSeeker was used to build a custom database using the 258 Klebsiella genomes we were given. To build a custom database, the tree generated by MEGA7 was used to function as the guide tree, describing the relationships between given strains. Then StrainSeeker was used to detect novel strains that are related to strains in the database. | ||
perl builder.pl -n refseq_guide_tree.nwk -d strain_fasta_directory -w 32 -o my_database | <code>perl builder.pl -n refseq_guide_tree.nwk -d strain_fasta_directory -w 32 -o my_database</code> | ||
Because to align 258 genome sequences takes too much time, pre-built database provided by StrainSeeker was used, which k-mer length is 16. And it took 3 hours to get the result, combining 258 Klebsiella genome sequences together. | <code>perl seeker.pl -i sample_file.fastq -d ss_db_w32 -o sample_result.txt</code> | ||
Because to align 258 genome sequences takes too much time, pre-built database provided by StrainSeeker was used, which k-mer length is 16. And it took 3 hours to get the result, combining 258 Klebsiella genome sequences together. We are going to visualize the result like the StrainSeeker web tool does, but we did not find any for now. | |||
===Genome Database=== | ===Genome Database=== | ||
We created a curated list of genes that are included in the 258 Klebsiella genomes we were given, performed a literature review to find genes that may indicate colistin resistance, and built a gene panel to help us find phenotypic indicators in our assembled genomes. We will be using a MySQL database that will show 0 (absent) or 1 (present) for both antibiotic resistance genes and virulence factor genes. | We created a curated list of genes that are included in the 258 Klebsiella genomes we were given, performed a literature review to find genes that may indicate colistin resistance, and built a gene panel to help us find phenotypic indicators in our assembled genomes. We will be using a MySQL database that will show 0 (absent) or 1 (present) for both antibiotic resistance genes and virulence factor genes. |
Revision as of 12:29, 23 April 2018
Web Server
Team
Dongjo Ban, Genevieve Brandt, Saurabh Gulati, Yuntian He, Ryan Place, Nirav Shah, Casey Smith, Mohit Thakur, Stephen Wist
Introduction
Background
The goal of our predictive webserver is to process biological data and output the results of the analysis in a user-friendly format. We will provide information about an input sample's antibiotic resistance and other biological traits such as genus, species, and strain.
Goals
- Assemble input reads
- Analyze assemblies
- Visualize results
- Implement a way for results to be downloaded
Technologies Used
PHP was used because it is a universal web language that has numerous useful libraries and is easy to integrate with HTML. This was used with Laravel to allow for rapid application development.
Functionalities
De Novo Genome Assembly
FastQC was used to perform quality control checks on the raw input sequence data. Then, de novo sequencing was used in our pipeline because no reference sequence is needed in this case. Sequence reads are assembled as contigs, and the coverage quality of de novo sequence data depends on the size and continuity of the contigs. We used Skesa for de novo genome assembly. This tool is currently unpublished.
Species & Strain Typing
MEGA, GenomeTester4 and StrainSeeker were used to constructs a list of specific k-mers for each node of any given Newick-format tree and enables the identification of bacterial isolates in 1–2 min. MEGA7 was used to align the sequences and construct neighbor-joining tree. Then StrainSeeker was used to build a custom database using the 258 Klebsiella genomes we were given. To build a custom database, the tree generated by MEGA7 was used to function as the guide tree, describing the relationships between given strains. Then StrainSeeker was used to detect novel strains that are related to strains in the database.
perl builder.pl -n refseq_guide_tree.nwk -d strain_fasta_directory -w 32 -o my_database
perl seeker.pl -i sample_file.fastq -d ss_db_w32 -o sample_result.txt
Because to align 258 genome sequences takes too much time, pre-built database provided by StrainSeeker was used, which k-mer length is 16. And it took 3 hours to get the result, combining 258 Klebsiella genome sequences together. We are going to visualize the result like the StrainSeeker web tool does, but we did not find any for now.
Genome Database
We created a curated list of genes that are included in the 258 Klebsiella genomes we were given, performed a literature review to find genes that may indicate colistin resistance, and built a gene panel to help us find phenotypic indicators in our assembled genomes. We will be using a MySQL database that will show 0 (absent) or 1 (present) for both antibiotic resistance genes and virulence factor genes.