Team I Gene Prediction Group: Difference between revisions
Added Prodigal section and skeleton for some other sections |
|||
Line 23: | Line 23: | ||
=== Ab-inito Methods === | === Ab-inito Methods === | ||
'''Prodigal''' | |||
Pros: | |||
<li> ''Straightforward installation and use''</li> | |||
<li> ''Accuracy'': sensitivity - 94.7%, positive predictive value - 94.1%. Comparable to other tools</li> | |||
<li> ''Speed'': averaged 17 seconds/genome</li> | |||
Cons: | |||
<li> ''Fairly limited number of options''</li> | |||
Parameters used: | |||
<li> ''mode'': normal mode (DEFAULT-single genome, any number of sequences)</li> | |||
<li> ''translation table'': 4 (DEFAULT-standard bacteria, archaea) and 11 (DEFAULT-mycoplasma/spiroplasma)</li> | |||
<li> ''gap mode'': 0 (DEFAULT-partial genes run into gaps)</li> | |||
<li> ''output format'': gff (GFF format for the gene coordinates file)</li> | |||
<li> ''rbs motif'': (DEFAULT-looks for default Shine-Dalgarno motif)</li> | |||
Input: assembly file (.fna) | |||
Output: gene coordinates (.gff), nucleotide sequences (.fna), protein sequences (.fna) | |||
Command: | |||
<pre>prodigal -i input.fna -o output.gff -f gff -d nucleotide_seq.fna -a protein_seq.fna</pre> | |||
'''Genemark''' | |||
Pros: | |||
<li>text</li> | |||
Parameters used: | |||
<li>text</li> | |||
Input: assembly file (.fna) | |||
Output: | |||
Command: | |||
<pre>Genemark</pre> | |||
=== ncRNA Methods === | === ncRNA Methods === | ||
'''Infernal''' | |||
Pros: | |||
<li>text</li> | |||
Parameters used: | |||
<li>text</li> | |||
Input: assembly file (.fna) | |||
Output: | |||
Command: | |||
<pre>infernal</pre> | |||
== ''' Results ''' == | == ''' Results ''' == |
Revision as of 10:22, 24 March 2018
Introduction
Data
We were given assemblies of 258 isolates of Klebsiella spp..
Background
Our overarching goal is to understand what causes heteroresistance in Klebsiella spp. At this step, our objective was, given assembled genomes, to predict genes for Klebsiella that could later be annotated to understand functionality.
Gene Prediction
Gene prediction is the process of identifying the specific regions of genomic DNA that encode for genes. After sequencing and assembly, gene prediction is one of the first steps in understanding the genome of a species. In the past, confirming that the gene prediction is accurate demanded in vivo experimentation through gene knockout and other assays. Today, bioinformatics research has made it possible to predict the function of a gene based on its sequence alone. There are two general methods to do this: homology-based tools and ab-initio tools.
There is a big difference between prokaryotic and eukaryotic gene prediction. For eukaryotes, genes may be separated by introns, which makes it challenging to find the whole genomic sequence. Promoter sequences are more complex and less well understood in eukaryotes as well. In prokaryotes, the promoter regions are well understood, which is useful when using ab initio tools, since these tools search for signs of specific signs of protein coding genes. In prokaryotic genomes there are also contiguous open reading frames (ORFs) - when paired with the high amount of stop codons in prokaryotes this can indicate, with high probability, a gene being present. Our challenge when looking at ORFs is that every gene is a ORF, but not every ORF is a gene.
Methods
Pipeline (general workflow)
Homology-Based Methods
Ab-inito Methods
Prodigal
Pros:
Cons:
Parameters used:
Input: assembly file (.fna) Output: gene coordinates (.gff), nucleotide sequences (.fna), protein sequences (.fna) Command:
prodigal -i input.fna -o output.gff -f gff -d nucleotide_seq.fna -a protein_seq.fna
Genemark
Pros:
Parameters used:
Input: assembly file (.fna) Output: Command:
Genemark
ncRNA Methods
Infernal
Pros:
Parameters used:
Input: assembly file (.fna) Output: Command:
infernal
Results
References
https://ghr.nlm.nih.gov/primer/basics/gene
https://www.biostat.wisc.edu/bmi776/spring-15/lectures/IMMs.pdf
http://ece.drexel.edu/gailr/ECE-S690-503/markov_models.ppt.pdf
http://onlinelibrary.wiley.com/doi/10.1042/BC20070137/full#footer-citing
https://iweb.langara.bc.ca/biology/mario/Biol2315notes/biol2315chap11.h