Vidéo pédagogique

Notice

Lieu de réalisation

Grenoble

Sous-titrage

Sous-titre

Langue :

Anglais

Crédits

François Rechenmann (Intervention)

Conditions d'utilisation

Ces ressources de cours sont, sauf mention contraire, diffusées sous Licence Creative Commons. L’utilisateur doit mentionner le nom de l’auteur, il peut exploiter l’œuvre sauf dans un contexte commercial et il ne peut apporter de modifications à l’œuvre originale.

DOI : 10.60527/c0x5-4639

Citer cette ressource :

François Rechenmann. Inria. (2015, 5 février). 3.6. Boyer-Moore algorithm , in 3. Gene prediction. [Vidéo]. Canal-U. https://doi.org/10.60527/c0x5-4639. (Consultée le 6 juillet 2025)

3.6. Boyer-Moore algorithm

Réalisation : 5 février 2015 - Mise en ligne : 9 mai 2017

document 1 document 2 document 3
niveau 1 niveau 2 niveau 3

Descriptif

We have seen how we can make gene predictions more reliable through searching for all the patterns,all the occurrences of patterns. We have seen, for example, howif we locate the RBS, Ribosome
Binding Site, upstream gene we can make the prediction of the coding sequence more reliable. So it is clear that pattern searching isa central topic in sequence analysis. So let's have a look at searching algorithms for strings or patterns and their performance. First,what we call the naive algorithm. What does it mean? The naive algorithm consists in comparing every letter of the pattern toevery letter of the text, so if N is the length of the stringor the pattern to be searched and M is the length of the searchable text then in the worst case, the number of comparisons to be executed is in the order of M multiplied by M minus N. Let's see why. Let's see this example of a pattern of six characters to be searched in a textof 32 characters. Well you understand the principle,you first compare the first character, if it matches youlook for the second one, if it doesn't match so you just advance of one position and do the comparison again. Here it doesn't match so it's useless to have a look at the other position of the pattern tobe searched and we can go ahead, here this is correct but this is not, so again and again and again.

Intervention

Rechenmann

François

Ingénieur. Auteur d'une thèse de docteur-ingénieur en sciences appliquées (Grenoble INPG, 1976). - HDR. Directeur de thèse à Grenoble INPG (1990-1994-) et à l'université de Grenoble 1. Directeur de recherche au centre Inria Grenoble – Rhône-Alpes (2002, 2015)

Thème

Disciplines :

Documentation

Liens

Support de présentation au format PDF

Dans la même collection

Vidéo pédagogique

00:07:06

Favoris
3.7. Index and suffix trees

Rechenmann

François

We have seen with the Boyer-Moore algorithm how we can increase the efficiency of spin searching through the pre-processing of the pattern to be searched. Now we will see that an alternative way of
Biologie application informatique
DNA
Genome
Algorithm
Cell
09.05.2017
document 1 document 2 document 3
niveau 1 niveau 2 niveau 3
Vidéo pédagogique

00:05:41

Favoris
3.1. All genes end on a stop codon

Rechenmann

François

Last week we studied genes and proteins and so how genes, portions of DNA, are translated into proteins. We also saw the very fast evolutionof the sequencing technology which allows for producing
Biologie application informatique
DNA
Genome
Algorithm
Cell
09.05.2017
document 1 document 2 document 3
niveau 1 niveau 2 niveau 3
Vidéo pédagogique

00:08:56

Favoris
3.10. Gene prediction in eukaryotic genomes

Rechenmann

François

If it is possible to have verygood predictions for bacterial genes, it's certainly not the caseyet for eukaryotic genomes. Eukaryotic cells have manydifferences in comparison to prokaryotic cells. You
Biologie application informatique
DNA
Genome
Algorithm
Cell
09.05.2017
document 1 document 2 document 3
niveau 1 niveau 2 niveau 3
Vidéo pédagogique

00:06:22

Favoris
3.4. Predicting all the genes in a sequence

Rechenmann

François

We have written an algorithm whichis able to locate potential genes on a sequence but only on one phase because we are looking triplets after triplets. Now remember that the genes maybe located on
Biologie application informatique
DNA
Genome
Algorithm
Cell
09.05.2017
document 1 document 2 document 3
niveau 1 niveau 2 niveau 3
Vidéo pédagogique

00:06:09

Favoris
3.8. Probabilistic methods

Rechenmann

François

Up to now, to predict our gene,we only rely on the process of searching certain strings or patterns. In order to further improve our gene predictor, the idea is to use, to rely onprobabilistic methods
Biologie application informatique
DNA
Genome
Algorithm
Cell
09.05.2017
document 1 document 2 document 3
niveau 1 niveau 2 niveau 3
Vidéo pédagogique

00:05:13

Favoris
3.2. A simple algorithm for gene prediction

Rechenmann

François

Based on the principle we statedin the last session, we will now write in pseudo code a firstalgorithm for locating genes on a bacterial genome. Remember first how this algorithm should work, we first
Biologie application informatique
DNA
Genome
Algorithm
Cell
09.05.2017
document 1 document 2 document 3
niveau 1 niveau 2 niveau 3
Vidéo pédagogique

00:04:45

Favoris
3.5. Making the predictions more reliable

Rechenmann

François

We have got a bacterial gene predictor but the way this predictor works is rather crude and if we want to have more reliable results, we have to inject into this algorithmmore biological knowledge. We
Biologie application informatique
DNA
Genome
Algorithm
Cell
09.05.2017
document 1 document 2 document 3
niveau 1 niveau 2 niveau 3
Vidéo pédagogique

00:05:35

Favoris
3.9. Benchmarking the prediction methods

Rechenmann

François

It is necessary to underline that gene predictors produce predictions. Predictions mean that you have no guarantees that the coding sequences, the coding regions,the genes you get when applying your
Biologie application informatique
DNA
Genome
Algorithm
Cell
09.05.2017
document 1 document 2 document 3
niveau 1 niveau 2 niveau 3
Vidéo pédagogique

00:04:45

Favoris
3.3. Searching for start and stop codons

Rechenmann

François

We have written an algorithm for finding genes. But you remember that we arestill to write the two functions for finding the next stop codonand the next start codon. Let's see how we can do that. We
Biologie application informatique
DNA
Genome
Algorithm
Cell
09.05.2017
document 1 document 2 document 3
niveau 1 niveau 2 niveau 3

Voir tout

Avec les mêmes intervenants et intervenantes

Vidéo pédagogique

00:05:48

Favoris
1.4. What is an algorithm?

Rechenmann

François

We have seen that a genomic textcan be indeed a very long sequence of characters. And to interpret this sequence of characters, we will need to use computers. Using computers means writing program.
Biologie application informatique
DNA
Genome
Algorithm
Cell
09.05.2017
document 1 document 2 document 3
niveau 1 niveau 2 niveau 3
Vidéo pédagogique

00:04:58

Favoris
2.2. Genes: from Mendel to molecular biology

Rechenmann

François

The notion of gene emerged withthe works of Gregor Mendel. Mendel studied the inheritance on some traits like the shape of pea plant seeds,through generations. He stated the famous laws of inheritance
Biologie application informatique
DNA
Genome
Algorithm
Cell
09.05.2017
document 1 document 2 document 3
niveau 1 niveau 2 niveau 3
Vidéo pédagogique

00:05:37

Favoris
2.10. How to find genes?

Rechenmann

François

Getting the sequence of the genome is only the beginning, as I explained, once you have the sequence what you want to do is to locate the gene, to predict the function of the gene and maybe study the
Biologie application informatique
DNA
Genome
Algorithm
Cell
09.05.2017
document 1 document 2 document 3
niveau 1 niveau 2 niveau 3
Vidéo pédagogique

00:05:35

Favoris
3.9. Benchmarking the prediction methods

Rechenmann

François

It is necessary to underline that gene predictors produce predictions. Predictions mean that you have no guarantees that the coding sequences, the coding regions,the genes you get when applying your
Biologie application informatique
DNA
Genome
Algorithm
Cell
09.05.2017
document 1 document 2 document 3
niveau 1 niveau 2 niveau 3
Vidéo pédagogique

00:04:29

Favoris
4.2. Why gene/protein sequences may be similar?

Rechenmann

François

Before measuring the similaritybetween the sequences, it's interesting to answer the question: why gene or protein sequences may be similar? It is indeed veryinteresting because the answer is related
Biologie application informatique
DNA
Genome
Algorithm
Cell
09.05.2017
document 1 document 2 document 3
niveau 1 niveau 2 niveau 3
Vidéo pédagogique

00:04:59

Favoris
5.4. The UPGMA algorithm

Rechenmann

François

We know how to fill an array with the values of the distances between sequences, pairs of sequences which are available in the file. This array of distances will be the input of our algorithm for
Biologie application informatique
DNA
Genome
Algorithm
Cell
09.05.2017
document 1 document 2 document 3
niveau 1 niveau 2 niveau 3
Vidéo pédagogique

00:06:06

Favoris
1.7. DNA walk

Rechenmann

François

We will now design a more graphical algorithm which is called "the DNA walk". We shall see what does it mean "DNA walk". Walk on to DNA. Something like that, yes. But first, just have a look again at
Biologie application informatique
DNA
Genome
Algorithm
Cell
09.05.2017
document 1 document 2 document 3
niveau 1 niveau 2 niveau 3
Vidéo pédagogique

00:05:47

Favoris
2.6. Algorithms + data structures = programs

Rechenmann

François

By writing the Lookup GeneticCode Function, we completed our translation algorithm. So we may ask the question about the algorithm, does it terminate? Andthe answer is yes, obviously. Is it pertinent,
Biologie application informatique
DNA
Genome
Algorithm
Cell
09.05.2017
document 1 document 2 document 3
niveau 1 niveau 2 niveau 3
Vidéo pédagogique

00:04:45

Favoris
3.3. Searching for start and stop codons

Rechenmann

François

We have written an algorithm for finding genes. But you remember that we arestill to write the two functions for finding the next stop codonand the next start codon. Let's see how we can do that. We
Biologie application informatique
DNA
Genome
Algorithm
Cell
09.05.2017
document 1 document 2 document 3
niveau 1 niveau 2 niveau 3
Vidéo pédagogique

00:06:38

Favoris
4.7. Alignment costs

Rechenmann

François

We have seen how we can compute the cost of the path ending on the last node of our grid if we know the cost of the sub-path ending on the three adjacent nodes. It is time now to see more deeply why
Biologie application informatique
DNA
Genome
Algorithm
Cell
09.05.2017
document 1 document 2 document 3
niveau 1 niveau 2 niveau 3
Vidéo pédagogique

00:06:58

Favoris
4.9. Recursion can be avoided: an iterative version

Rechenmann

François

We have written a recursive function to compute the optimal path that is an optimal alignment between two sequences. Here all the examples I gave were onDNA sequences, four letter alphabet. OK. The
Biologie application informatique
DNA
Genome
Algorithm
Cell
09.05.2017
document 1 document 2 document 3
niveau 1 niveau 2 niveau 3
Vidéo pédagogique

00:04:52

Favoris
1.2. At the heart of the cell: the DNA macromolecule

Rechenmann

François

During the last session, we saw how at the heart of the cell there's DNA in the nucleus, sometimes of cells, or directly in the cytoplasm of the bacteria. The DNA is what we call a macromolecule, that
Biologie application informatique
DNA
Genome
Algorithm
Cell
09.05.2017
document 1 document 2 document 3
niveau 1 niveau 2 niveau 3