Team I Functional Annotation Group: Difference between revisions
Line 18: | Line 18: | ||
'''Eggnog''' | '''Eggnog''' | ||
Eggnog performs functional annotation of genes and proteins using orthology assignments from pre-computed clusters and phylogenies from eggnog database. | |||
Command: | |||
<pre>python emapper.py -i <input_file> --output <output_file> -m [diamond,hmm] --usemem -d <database_name></pre> | |||
'''PilerCR''' | '''PilerCR''' |
Revision as of 16:54, 9 April 2018
Introduction
Background
Functional annotation is the process of locating genes and identifying their functions (biochemical functions, regulatory functions, etc.) in the genome.
Objective
- Fully annotate 258 genomes from Gene Prediction group, focusing on antibiotic resistance
- Provide Comparative Genomics group with data required to perform Genome Wide Association Study(GWAS)
Pipeline
Tools
Prokka
Command:
prokka --outdir <output_directory> --kingdom <species' kingdom> --genus <species' genus> --gram <> --prefix <output_file> --rfam --rnammer <input_file>
- Runtime: ~ 16mins /genome
Eggnog
Eggnog performs functional annotation of genes and proteins using orthology assignments from pre-computed clusters and phylogenies from eggnog database.
Command:
python emapper.py -i <input_file> --output <output_file> -m [diamond,hmm] --usemem -d <database_name>
PilerCR
PilerCR identifies and analyzes CRISPR repeats
Command:
pilercr -in <input_file> -out <output_file>
- Runtime: <5 sec/genome
Phobius
Phobius predicts transmembrane topology and signal peptides from amino acid sequences, it was a challenging problem because of high similarity between the hydrophobic regions of a transmembrane helix and that of a signal peptide, leading to cross-reaction between the two types of predictions.
Phobius is based on a hidden Markov model (HMM) that models the different sequence regions of a signal peptide and the different regions of a transmembrane protein in a series of interconnected states, which allows it to have a higher accuracy rate.
Command:
phobius.pl -<output_format> <input_file> > <output_file>
- Runtime: 12-16mins /genome
LipoP
LipoP predicts lipoprotein signal peptides in Gram-negative bacteria, its hidden Markov model (HMM) was able to distinguish between lipoproteins (SPaseII-cleaved proteins), SPaseI-cleaved proteins, cytoplasmic proteins, and transmembrane proteins.
Command:
LipoP -<output_format> -<input_file> <output_file>
- Runtime: ~2mins /genome
TMHMM
Command:
tmhmm -<output_format> -<input_file> <output_file>
- Runtime: ~6mins /genome
SignalP
Command:
signalp -t <organism_type> -f <output_format> <input_file>
- Runtime: ~ 4mins /genome
DeepARG
DeepARG is a machine learning solution that uses deep learning to characterize and annotate antibiotic resistance genes in metagenomes. It contains two models for different inputs, short sequence reads from Next Generation Sequencing and gene-like sequences
Command:
- Runtime: 3min27s /genome
Interproscan
InterProScan runs the scanning algorithms from the InterPro database, which uses predictive models, known as signatures, provided by member databases, in an integrated way.
Command:
interproscan.sh -appl <application_you_want> -iprlookup -pa -i <input_file> -f <output_format>
- -iprlookup: include lookup of corresponding InterPro annotation in the TSV and GFF3 format
- -pa: lookup of corresponding pathway annotation
- Runtime: 1min/genome, depends on applications you choose
Result
Reference
- LukasKäll, AndersKrogh, Erik L.LSonnhammer. "A Combined Transmembrane Topology and Signal Peptide Prediction Method"Journal of Molecular Biology 14 May 2004, Pages 1027-1036.