Vidéo pédagogique
Notice
Lieu de réalisation
Grenoble
Sous-titrage
Sous-titre
Langue :
Anglais
Crédits
François Rechenmann (Intervention)
Conditions d'utilisation
Ces ressources de cours sont, sauf mention contraire, diffusées sous Licence Creative Commons. L’utilisateur doit mentionner le nom de l’auteur, il peut exploiter l’œuvre sauf dans un contexte commercial et il ne peut apporter de modifications à l’œuvre originale.
DOI : 10.60527/c0x5-4639
Citer cette ressource :
François Rechenmann. Inria. (2015, 5 février). 3.6. Boyer-Moore algorithm , in 3. Gene prediction. [Vidéo]. Canal-U. https://doi.org/10.60527/c0x5-4639. (Consultée le 6 août 2024)

# 3.6. Boyer-Moore algorithm

Réalisation : 5 février 2015 - Mise en ligne : 9 mai 2017
• document 1 document 2 document 3
• niveau 1 niveau 2 niveau 3
Descriptif

We have seen how we can make gene predictions more reliable through searching for all the patterns,all the occurrences of patterns. We have seen, for example, howif we locate the RBS, Ribosome Binding Site, upstream gene we can make the prediction of the coding sequence more reliable. So it is clear that pattern searching isa central topic in sequence analysis. So let's have a look at searching algorithms for strings or patterns and their performance. First,what we call the naive algorithm. What does it mean? The naive algorithm consists in comparing every letter of the pattern toevery letter of the text, so if N is the length of the stringor the pattern to be searched and M is the length of the searchable text then in the worst case, the number of comparisons to be executed is in the order of M multiplied by M minus N. Let's see why. Let's see this example of a pattern of six characters to be searched in a textof 32 characters. Well you understand the principle,you first compare the first character, if it matches youlook for the second one, if it doesn't match so you just advance of one position and do the comparison again. Here it doesn't match so it's useless to have a look at the other position of the pattern tobe searched and we can go ahead, here this is correct but this is not, so again and again and again.

Intervention
Thème
Documentation

## Dans la même collection

• Vidéo pédagogique
00:07:06

### 3.7. Index and suffix trees

Rechenmann
François

We have seen with the Boyer-Moore algorithm how we can increase the efficiency of spin searching through the pre-processing of the pattern to be searched. Now we will see that an alternative way of

• Vidéo pédagogique
00:05:41

### 3.1. All genes end on a stop codon

Rechenmann
François

Last week we studied genes and proteins and so how genes, portions of DNA, are translated into proteins. We also saw the very fast evolutionof the sequencing technology which allows for producing

• Vidéo pédagogique
00:08:56

### 3.10. Gene prediction in eukaryotic genomes

Rechenmann
François

If it is possible to have verygood predictions for bacterial genes, it's certainly not the caseyet for eukaryotic genomes. Eukaryotic cells have manydifferences in comparison to prokaryotic cells. You

• Vidéo pédagogique
00:06:22

### 3.4. Predicting all the genes in a sequence

Rechenmann
François

We have written an algorithm whichis able to locate potential genes on a sequence but only on one phase because we are looking triplets after triplets. Now remember that the genes maybe located on

• Vidéo pédagogique
00:06:09

### 3.8. Probabilistic methods

Rechenmann
François

Up to now, to predict our gene,we only rely on the process of searching certain strings or patterns. In order to further improve our gene predictor, the idea is to use, to rely onprobabilistic methods

• Vidéo pédagogique
00:05:13

### 3.2. A simple algorithm for gene prediction

Rechenmann
François

Based on the principle we statedin the last session, we will now write in pseudo code a firstalgorithm for locating genes on a bacterial genome. Remember first how this algorithm should work, we first

• Vidéo pédagogique
00:04:45

### 3.5. Making the predictions more reliable

Rechenmann
François

We have got a bacterial gene predictor but the way this predictor works is rather crude and if we want to have more reliable results, we have to inject into this algorithmmore biological knowledge. We

• Vidéo pédagogique
00:05:35

### 3.9. Benchmarking the prediction methods

Rechenmann
François

It is necessary to underline that gene predictors produce predictions. Predictions mean that you have no guarantees that the coding sequences, the coding regions,the genes you get when applying your

• Vidéo pédagogique
00:04:45

### 3.3. Searching for start and stop codons

Rechenmann
François

We have written an algorithm for finding genes. But you remember that we arestill to write the two functions for finding the next stop codonand the next start codon. Let's see how we can do that. We

## Avec les mêmes intervenants et intervenantes

• Vidéo pédagogique
00:04:52

### 1.2. At the heart of the cell: the DNA macromolecule

Rechenmann
François

During the last session, we saw how at the heart of the cell there's DNA in the nucleus, sometimes of cells, or directly in the cytoplasm of the bacteria. The DNA is what we call a macromolecule, that

• Vidéo pédagogique
00:07:29

### 1.10. Overlapping sliding window

Rechenmann
François

We have made some drawings along a genomic sequence. And we have seen that although the algorithm is quite simple, even if some points of the algorithmare bit trickier than the others, we were able to

• Vidéo pédagogique
00:05:53

### 2.3. The genetic code

Rechenmann
François

Genes code for proteins. What is the correspondence betweenthe genes, DNA sequences, and the structure of proteins? The correspondence isthe genetic code. Proteins have indeedsequences of amino acids.

• Vidéo pédagogique
00:07:06

### 3.7. Index and suffix trees

Rechenmann
François

We have seen with the Boyer-Moore algorithm how we can increase the efficiency of spin searching through the pre-processing of the pattern to be searched. Now we will see that an alternative way of

• Vidéo pédagogique
00:04:22

### 4.4. Aligning sequences is an optimization problem

Rechenmann
François

We have seen a nice and a quitesimple solution for measuring the similarity between two sequences. It relied on the so-called hammingdistance that is counting the number of differencesbetween two

• Vidéo pédagogique
00:04:45

### 5.2. The tree, an abstract object

Rechenmann
François

When we speak of trees, of species,of phylogenetic trees, of course, it's a metaphoric view of a real tree. Our trees are abstract objects. Here is a tree and the different components of this tree.

• Vidéo pédagogique
00:05:10

### 1.5. Counting nucleotides

Rechenmann
François

In this session, don't panic. We will design our first algorithm. This algorithm is forcounting nucleotides. The idea here is that as an input,you have a sequence of nucleotides, of bases, of letters,

• Vidéo pédagogique
00:06:09

### 2.4. A translation algorithm

Rechenmann
François

We have seen that the genetic codeis a correspondence between the DNA or RNA sequences and aminoacid sequences that is proteins. Our aim here is to design atranslation algorithm, we make the

• Vidéo pédagogique
00:05:41

### 3.1. All genes end on a stop codon

Rechenmann
François

Last week we studied genes and proteins and so how genes, portions of DNA, are translated into proteins. We also saw the very fast evolutionof the sequencing technology which allows for producing

• Vidéo pédagogique
00:08:56

### 3.10. Gene prediction in eukaryotic genomes

Rechenmann
François

If it is possible to have verygood predictions for bacterial genes, it's certainly not the caseyet for eukaryotic genomes. Eukaryotic cells have manydifferences in comparison to prokaryotic cells. You

• Vidéo pédagogique
00:07:41

### 4.8. A recursive algorithm

Rechenmann
François

We have seen how we can computethe optimal cost, the ending node of our grid if we know the optimal cost of the three adjacent nodes. This is this computation scheme we can see here using the notation

• Vidéo pédagogique
00:08:29

### 5.6. The diversity of bioinformatics algorithms

Rechenmann
François

In this course, we have seen a very little set of bioinformatic algorithms. There exist numerous various algorithms in bioinformatics which deal with a large span of classes of problems. For example,