Team I Genome Assembly Group: Difference between revisions

From Compgenomics 2018
Jump to navigation Jump to search
Scho92 (talk | contribs)
Kagarwal (talk | contribs)
Line 104: Line 104:


==== De Bruijn graph ====
==== De Bruijn graph ====
==== Reference Assembly ====
Reference Assembly was performed using Burrow's Wheeler Aligner. Since, we did not know the species for the reads we used Mash tool for selecting reference genome based on lowest mutation distance from each read. Mash tool uses MinHash algorithm for estimation global mutation distance.


== '''Results'''==
== '''Results'''==

Revision as of 01:47, 3 March 2018

The group is conducting its discussions, planning, etc. on the Open Academic Environment. Our group is public.

Introduction

Bacterial Genomics

Figure 1. Bacterial Genomics [1]

Bacterial genomics is the discipline that studies the genome of a bacteria and includes all hereditary information of that bacteria. Bacterial genomics helps study bacterial evolution as well as determine the causative agent in disease outbreaks. Additionally, it helps identify bacterial pathogens (and antibiotic resistance) and how these pathogens interact with their host.

Figure 2. Klebsiella (http://healthcare.bioquell.com)

Features of Klebsiella[2]:

Klebsiella is a gram negative, non-motile rod shaped bacteria enclosed in a capsule (helps evade phagocytosis). These bacteria can be found singly, in pairs or as short chains. Klebsiella are facultative anaerobes (oxidase negative) and can perform both anaerobic respiration and fermentation. They are found in water, soil, and on plants and include those that help plants fix nitrogen.

Species of Klebsiella, especially Klebsiella pneumoniae, are known to cause respiratory tract infections such as pneumonia and urinary tract infections. They release a number of virulence factors such as multiple adhesins, capsular polysaccharide, siderophores, and lipopolysaccharide that help resist host defenses.

Assembly

Assembly is the process of combining sequence reads into contiguous stretches of DNA called contigs.

Data

We downloaded 258 Klebsiella spp. (paired-end reads; 250 base pairs in length) from NCBI SRA database.

Objective

This study on Klebsiella spp. is motivated by finding of Colistin heteroresistance in K. pneumoniae.

Figure 3. Heteroresistant subpopulation [3]


Resistance of bacteria against various antibiotics has been noted since the discovery of antibiotics and it has been on a rise since then. This poses a challenge to treat patients that have acquired these multidrug resistant bacterial infections, especially immunocompromised patients, who become susceptible to opportunistic pathogens that have resistance to virtually all antibiotics currently available. Many strains of Klebsiella pneumoniae identified to be resistant to all major antibiotics. Some of the resistance strategies used by this bacteria include release of carbapenem-hydrolyzing enzymes, oxacillin hydrolyzing enxymes, beta lactamases including plasmid-borne extended spectrum beta lactamases. Multi resistant K. pneumoniae (MRKP) have been found to resist third generation antibiotics such as cephalosporins, gentamycin, and tobramycin.

As the scientific and pharmaceutical world is battling against these antibiotic resistant strains, a new phenomenon has been discovered, Heteroresistance. According to Valvano et.al., heteroresistance is a variable response showed by a population to a specific antibiotic [4]. Bacterial heteroresistance is a phenomenon that has been known for a while, but the actual mechanism of acquiring this resistance is unclear. Many mechanisms have been attributed to heteroresistance including a mutation in the gene of the PhoP protein involved in the PhoP/PhoQ pathway to gain resistance to colistin.

The current study focuses on Klebsiella spp. that have been found to be genetically identical lacking the above mentioned mutation in the PhoP protein. "Genetically identical, but phenotypically distinct, subpopulation of colistin-resistant bacteria can mediate in vivo treatment failure" --David Weiss.

Methods

Pipeline (general workflow)

Pre-processing

Library Prep

Trimming

Trimmomatic is a multi tasking trimming tool that is able to simultaneously perform adapter trimming, quality trimming, and force trimming. There are three functions implemented in trimming. ILLUMINACLIP trims adapter sequences in the reads. SLIDINGWINDOW trims the reads based on the threshold quality score set by a user. The threshold for SLIDINGWINDOW was set for 4:20, and the program scans the reads with a 4-base wide sliding window, cutting when the average quality per base drops below 20. MINLEN drops reads if they are below an assigned length. The minimum length was set to 20, and any reads below 20 were discarded.


Figure 4. Quality Control with Trimmomatic

Assembly

Algorithms

Burrows–Wheeler Transform(BWT)

Originally, Burrows–Wheeler transformation(aka BWT) was designed for data compression purpose[5]. This transformation stood out for being reversible, without needing to store any additional data. After the invention of Next Generation Sequencing (NGS), BWT had its application in bioinformatics. Tools implementing BWT (such as Bowtie[6] and BWA[7] etc) were created to map short reads to a reference.

There are three main steps in using BWT: 1) Sort all rotations of the text into lexicographic order ($ always as the first row). Keep the first and last column and index information. [1]


--> -->


2) Invert the BWT matrix (BWM). [2]



Example: [3]



3) Map patterns to the data structure [4]


De Bruijn graph

Reference Assembly

Reference Assembly was performed using Burrow's Wheeler Aligner. Since, we did not know the species for the reads we used Mash tool for selecting reference genome based on lowest mutation distance from each read. Mash tool uses MinHash algorithm for estimation global mutation distance.

Results

Discussion

Conclusions

References

1. Perna, Nicole T., Guy Plunkett III, Valerie Burland, Bob Mau, Jeremy D. Glasner, Debra J. Rose, George F. Mayhew et al. "Genome sequence of enterohaemorrhagic Escherichia coli O157: H7." Nature 409, no. 6819 (2001): 529.

2. Bergey, David Hendricks, Robert Stanley Breed, Everitt George Dunne Murray, and A. Parker Hitchens. Bergey's manual of determinative bacteriology. Baltimore: Williams & Wilkins, 1934.

3. Jayol, Aurélie, Patrice Nordmann, Adrian Brink, and Laurent Poirel. "Heteroresistance to colistin in Klebsiella pneumoniae associated with alterations in the PhoPQ regulatory system." Antimicrobial agents and chemotherapy 59, no. 5 (2015): 2780-2784.

4. El-Halfawy, O. M., and Valvano, M. A. (2013) Chemical communication of antibiotic resistance by a highly resistant subpopulation of bacterial cells. PLoS One 8, e68874

5. Burrows, Michael, and David J. Wheeler. "A block-sorting lossless data compression algorithm." (1994).

6. Langmead, Ben, Cole Trapnell, Mihai Pop, and Steven L. Salzberg. "Ultrafast and memory-efficient alignment of short DNA sequences to the human genome." Genome biology 10, no. 3 (2009): R25.

7. Li, Heng, and Richard Durbin. "Fast and accurate short read alignment with Burrows–Wheeler transform." Bioinformatics 25, no. 14 (2009): 1754-1760.