Notice
3.6. Boyer-Moore algorithm
- document 1 document 2 document 3
- niveau 1 niveau 2 niveau 3
Descriptif
We have seen how we can make gene predictions more reliable through searching for all the patterns,all the occurrences of patterns. We have seen, for example, howif we locate the RBS, Ribosome
Binding Site, upstream gene we can make the prediction of the coding sequence more reliable. So it is clear that pattern searching isa central topic in sequence analysis. So let's have a look at searching algorithms for strings or patterns and their performance. First,what we call the naive algorithm. What does it mean? The naive algorithm consists in comparing every letter of the pattern toevery letter of the text, so if N is the length of the stringor the pattern to be searched and M is the length of the searchable text then in the worst case, the number of comparisons to be executed is in the order of M multiplied by M minus N. Let's see why. Let's see this example of a pattern of six characters to be searched in a textof 32 characters. Well you understand the principle,you first compare the first character, if it matches youlook for the second one, if it doesn't match so you just advance of one position and do the comparison again. Here it doesn't match so it's useless to have a look at the other position of the pattern tobe searched and we can go ahead, here this is correct but this is not, so again and again and again.
Thème
Documentation
Dans la même collection
-
3.7. Index and suffix trees
RECHENMANN François
We have seen with the Boyer-Moore algorithm how we can increase the efficiency of spin searching through the pre-processing of the pattern to be searched. Now we will see that an alternative way of
-
3.1. All genes end on a stop codon
RECHENMANN François
Last week we studied genes and proteins and so how genes, portions of DNA, are translated into proteins. We also saw the very fast evolutionof the sequencing technology which allows for producing
-
3.10. Gene prediction in eukaryotic genomes
RECHENMANN François
If it is possible to have verygood predictions for bacterial genes, it's certainly not the caseyet for eukaryotic genomes. Eukaryotic cells have manydifferences in comparison to prokaryotic cells. You
-
3.4. Predicting all the genes in a sequence
RECHENMANN François
We have written an algorithm whichis able to locate potential genes on a sequence but only on one phase because we are looking triplets after triplets. Now remember that the genes maybe located on
-
3.8. Probabilistic methods
RECHENMANN François
Up to now, to predict our gene,we only rely on the process of searching certain strings or patterns. In order to further improve our gene predictor, the idea is to use, to rely onprobabilistic methods
-
3.2. A simple algorithm for gene prediction
RECHENMANN François
Based on the principle we statedin the last session, we will now write in pseudo code a firstalgorithm for locating genes on a bacterial genome. Remember first how this algorithm should work, we first
-
3.5. Making the predictions more reliable
RECHENMANN François
We have got a bacterial gene predictor but the way this predictor works is rather crude and if we want to have more reliable results, we have to inject into this algorithmmore biological knowledge. We
-
3.9. Benchmarking the prediction methods
RECHENMANN François
It is necessary to underline that gene predictors produce predictions. Predictions mean that you have no guarantees that the coding sequences, the coding regions,the genes you get when applying your
-
3.3. Searching for start and stop codons
RECHENMANN François
We have written an algorithm for finding genes. But you remember that we arestill to write the two functions for finding the next stop codonand the next start codon. Let's see how we can do that. We
Avec les mêmes intervenants et intervenantes
-
1.6. GC and AT contents of DNA sequence
RECHENMANN François
We have designed our first algorithmfor counting nucleotides. Remember, what we have writtenin pseudo code is first declaration of variables. We have several integer variables that are variables which
-
2.5. Implementing the genetic code
RECHENMANN François
Remember we were designing our translation algorithm and since we are a bit lazy, we decided to make the hypothesis that there was the adequate function forimplementing the genetic code. It's now time
-
3.2. A simple algorithm for gene prediction
RECHENMANN François
Based on the principle we statedin the last session, we will now write in pseudo code a firstalgorithm for locating genes on a bacterial genome. Remember first how this algorithm should work, we first
-
4.1. How to predict gene/protein functions?
RECHENMANN François
Last week we have seen that annotating a genome means first locating the genes on the DNA sequences that is the genes, the region coding for proteins. But this is indeed the first step,the next very
-
4.10. How efficient is this algorithm?
RECHENMANN François
We have seen the principle of an iterative algorithm in two paths for aligning and comparing two sequences of characters, here DNA sequences. And we understoodwhy the iterative version is much more
-
5.7. The application domains in microbiology
RECHENMANN François
Bioinformatics relies on many domains of mathematics and computer science. Of course, algorithms themselves on character strings are important in bioinformatics, we have seen them. Algorithms and
-
1.1. The cell, atom of the living world
RECHENMANN François
Welcome to this introduction to bioinformatics. We will speak of genomes and algorithms. More specifically, we will see how genetic information can be analysed by algorithms. In these five weeks to
-
1.9. Predicting the origin of DNA replication?
RECHENMANN François
We have seen a nice algorithm to draw, let's say, a DNA sequence. We will see that first, we have to correct a little bit this algorithm. And then we will see how such as imple algorithm can provide
-
2.8. DNA sequencing
RECHENMANN François
During the last session, I explained several times how it was important to increase the efficiency of sequences processing algorithm because sequences arevery long and there are large volumes of
-
3.5. Making the predictions more reliable
RECHENMANN François
We have got a bacterial gene predictor but the way this predictor works is rather crude and if we want to have more reliable results, we have to inject into this algorithmmore biological knowledge. We
-
4.5. A sequence alignment as a path
RECHENMANN François
Comparing two sequences and thenmeasuring their similarities is an optimization problem. Why? Because we have seen thatwe have to take into account substitution and deletion. During the alignment, the
-
5.5. Differences are not always what they look like
RECHENMANN François
The algorithm we have presented works on an array of distance between sequences. These distances are evaluated on the basis of differences between the sequences. The problem is that behind the