Team II Comparative Genomics Group: Difference between revisions

From Compgenomics 2018
Jump to navigation Jump to search
Line 67: Line 67:


[[File:Mlst_table.png|thumb|400px|left|upright=2|Fig #: MLST distribution across samples]]
[[File:Mlst_table.png|thumb|400px|left|upright=2|Fig #: MLST distribution across samples]]
[[File:Barchart1_mlst.png|thumb|400px|left|upright=2|Fig #: MLST distribution across samples]]
[[File:Barchart1_mlst.png|thumb|400px|right|upright=2|Fig #: MLST distribution across samples]]


===Single Nucleotide Polymorphism (SNP) Analysis===
===Single Nucleotide Polymorphism (SNP) Analysis===

Revision as of 16:37, 18 April 2018

Introduction

Background

Comparative genomics is the study of comparing genome sequences to better understand the structure and function of genes. This field has explored areas ranging from organism development and behavior to metabolism and susceptibility to disease.

Fosfomycin

Fosfomycin is a natural antibacterial produced by various Streptomyces and Pseudomonas species. It is the only antibiotic currently in clinical use that targets a Mur enzyme. It is broad-spectrum bactericidal antibiotic that can be employed against both Gram-positive and Gram-negative bacteria (i.e. Klebsiella). It interferes with cell wall synthesis, particularly inhibits the initial step involving phosphoenolpyruvate synthetase, as shown below.

Resistance of Fosfomycin involves a wide range of resistance mechanisms. Some of them include reduced uptake, target site modification, expression of antibiotic-degrading enzymes and rescue of the UDP-MurNAc biogenesis pathway (ex. mutation within MurA enzyme).

Objectives

To identify genetic determinants that could be a potential cause for Fosfomycin heteroresistance in the isolates provided.

Data

The following is the metadata of our study:

Whole Genome Approach

The Whole Genome approach to comparative genomics attempts to broadly identify similarities and differences across samples.

Similarity analysis

Identifying similarities among our samples would tell us if phenotypic similarities correlate with overall genome similarity and help us choose representatives.

We computed min-hash distances between all samples with known Antibiotic resistance phenotype and clustered them using complete linkage hierarchical clustering.

Fig #: Hierarchical clustering based on min-hash distances. Distances were computed using MASH, and clustering was done using the R package hclust. Samples labeled "NA" were not included in the analysis.

The dendrogram indicates that there is no clear demarcation between the different resistance phenotypes, confirming that there is no broad genomic signature that distinguishes the heteroresistant samples from the non-heteroresistant samples.

After removing the outliers, we found a single highly similar cluster of isolates that spanned all 3 phenotypes (Resistant, heteroresistant, susceptible)

Fig #: Zoomed in view of a single highly related cluster that encompasses all three resistance phenotypes


These 11 samples have relative mash distances =~ 0.0001, suggesting that any genetic element causing the different antibiotic resistance phenotypes would be caused by the extremely small percentage of differences between these samples. We believe this cluster is a good representative of our complete dataset.

Difference analysis

In order to identify the genomic differences between our samples, we decided to perform a Genome Wide Association Study or GWAS. A GWA study correlates the presence or absence of variants in a genome with the presence or absence of a trait, which in our case is heteroresistance.

We executed a pan-genome GWAS using the tool 'bacterial GWAS'. Bacterial GWAS runs prodigal on assembled contigs to annotate ORFs, and performs CD-HIT clustering to construct a pan-genome. Then it performs a GWAS using a logistic regression model on the given phenotypes - presence or absence of heteroresistance and outputs a list of significant predicted genes along with the frequency of their occurrence in each phenotype. We then BLASTed the predicted genes to identify their function.

Phylogeny Approach

Phylogeny based approaches aim to pair down analysis by focusing on small changes to a directed set of genes between samples. Our group chose to focus on comparison of highly conserved genes and single nucleotide polymorphisms. These approaches attempted to both sequence type and understand the underlying mechanisms of action for heteroresistance in Klebsiella pneumoniae.

Multilocus Sequence Typing (MLST)

Traditional MLST schemes focus on allelic diversity across a small subset of highly conserved genes commonly referred to as housekeeping genes. Compiling allelic variants into compound identification profiles creates unique types which have been shown to have specificity down to the strain level. However, due to housekeeping genes being highly conserved, MLST schemes have difficulty distinguishing between organisms and samples from the same culture.

MLST schemes take years and large amounts of funding to establish and verify. Luckily, an MLST scheme existed previously for Klebsiella pneumoniae. We chose to use this existing scheme along with known phenotypic profiles of our samples in hopes of being able to easily sequence type heteroresistance. STing, an MLST Tool developed by the Jordan Lab at Georgia Tech, was used to quickly assign allelic profiles to our sample set.

The MLST scheme for Klebsiella pneumoniae contains 7 genes and is as follows:

  • gapA - Glyceraldehyde-3-phosphate dehydrogenase A
  • infB - Translation initiation factor IF-2
  • Mdh - Malate dehydrogenase
  • Pgi - Glucose-6-phosphate isomerase
  • phoE - Outer membrane pore protein E
  • rpoB - RNA polymerase subunit B
  • tonB - Protein TonB
Fig #: MLST distribution across samples
Fig #: MLST distribution across samples

Single Nucleotide Polymorphism (SNP) Analysis

SNP analysis compares genetic sequences across samples in search of single nucleotide differences. Even single nucleotide changes have been shown to drastically affect genetic expression, transcriptional mechanisms, and protein composition and configuration. Our team chose to compare these sites using kSNP3. kSNP relies on pre-existing string manipulation programs which k-merize the sequences based off of metrics of sequence similarity. kSNP3 also possess the ability to annotate these SNPs found in the genome based off either NCBI reference genomes or provided GenBank files.

Using self-contained scripts, we used a k-mer size of 23 and GenBank files from the Functional Annotation team, we analyzed both the whole sample population and our previously identified cluster of interest. Initial analysis of the trees generated by kSNP3 did not return clear phylogenetic groupings along phenotypic populations. The tree for our cluster of interest is listed below.

Fig #: SNP Tree generated from analysis of cluster of interest

After obtaining the trees and SNPs contained within, our focus shifted toward finding SNPs which were homogenous within and unique to our phenotypic populations.

In the end, we were unable to discover any SNPs which were homogenous across our heteroresistant sample population and were not found in susceptible and/or resitant populations. This led us to the conclusion that heteroresistance was not being caused by single nucleotide polymorphisms.

Results and Discussion

References

Castañeda-García, Alfredo, Jesús Blázquez, and Alexandro Rodríguez-Rojas. "Molecular mechanisms and clinical impact of acquired and intrinsic fosfomycin resistance." Antibiotics 2.2 (2013): 217-236.
Nikolaidis I, Favini-Stabile S, Dessen A. 2014. Resistance to antibiotics targeted to the bacterial cell wall. Protein Sci 23: 243–259.
Kidd, Timothy J et al. “A Klebsiella Pneumoniae Antibiotic Resistance Mechanism That Subdues Host Defences and Promotes Virulence.” EMBO Molecular Medicine 9.4 (2017): 430–447.
Guo, Qinglan et al. “Glutathione-S-Transferase FosA6 of Klebsiella Pneumoniae Origin Conferring Fosfomycin Resistance in ESBL-Producing Escherichia Coli.” Journal of Antimicrobial Chemotherapy 71.9 (2016): 2460–2465.
Gardner, Shea N., Tom Slezak, and Barry G. Hall. "kSNP3. 0: SNP detection and phylogenetic analysis of genomes without genome alignment or reference genome." Bioinformatics31.17 (2015): 2877-2878.
Shea N Gardner, Tom Slezak, Barry G. Hall; kSNP3.0: SNP detection and phylogenetic analysis of genomes without genome alignment or reference genome, Bioinformatics, Volume 31, Issue 17, 1 September 2015, Pages 2877–2878.
Kim, Mincheol, et al. "Towards a taxonomic coherence between average nucleotide identity and 16S rRNA gene sequence similarity for species demarcation of prokaryotes." International journal of systematic and evolutionary microbiology 64.2 (2014): 346-351.