Team II Gene Prediction Group: Difference between revisions
No edit summary |
No edit summary |
||
Line 35: | Line 35: | ||
'''Figure 3: (Left) Local Alignment, 2 mismatch , 0 gaps. (Right) Global Alignment, 1 mismatch , 2 gaps of length 4 and 2 ''' | '''Figure 3: (Left) Local Alignment, 2 mismatch , 0 gaps. (Right) Global Alignment, 1 mismatch , 2 gaps of length 4 and 2 ''' | ||
There are some tools for gene prediction based on comparative method such as "SGP2", "TwinScan" and "GenomeScan". But, they are developed just for some limited number of species. For example, Twinscan is currently available for Mammals, Caenorhabditis (worm), Dicot plants, and Cryptococci. Therefore, we can not use them for our dataset. | |||
'''Blast:''' '''B'''asic '''L'''ocal '''A'''lignment '''S'''earch '''T'''ool |
Revision as of 12:44, 26 March 2018
Introduction
Gene Prediction
Data
Approaches
1. Ab-initio
2. Comparative
Ab-initio Approaches
Comparative Approaches
Comparative, similarity based or Homology based gene prediction uses previously sequenced genes and their protein products as a template for recognition of unknown genes in a newly sequenced DNA fragments. So, in short we cab say: It is using "Known Genes" to predict "New Genes".
Recently, the number of sequenced genomes has increased drastically and 99% of genes have homologous partner, 80% have orthologous partner and 85% identity (protein coding DNA) versus 69% identity (intronic DNA). All these can be considered as the motivation of using this method of gene prediction.
Figure 2: Given a known gene and an unannotated genome sequence, find a set of substrings in the genomic sequence whose concatenation best matches the known gene
Sequence alignment is a way of arranging the sequences to identify regions of similarity that may be results of functional, structural or evolutionary relationships between the genomes. Two methods based on similarity research are: Local alignment and Global alignment.
Local alignment tries to match your query with a substring of your reference. Smith–Waterman algorithm is based on local alignment. While, global alignment forces the alignment to span the entire length of all query sequences. It is most useful when the sequences are similar and roughly "equal size". Otherwise, it may end up with a lot of gaps. Needleman–Wunsch algorithmBased on Dynamic programing uses global alignment.
Figure 3: (Left) Local Alignment, 2 mismatch , 0 gaps. (Right) Global Alignment, 1 mismatch , 2 gaps of length 4 and 2
There are some tools for gene prediction based on comparative method such as "SGP2", "TwinScan" and "GenomeScan". But, they are developed just for some limited number of species. For example, Twinscan is currently available for Mammals, Caenorhabditis (worm), Dicot plants, and Cryptococci. Therefore, we can not use them for our dataset.
Blast: Basic Local Alignment Search Tool