Vidéo pédagogique
Notice
Lieu de réalisation
Grenoble
Sous-titrage
Sous-titre
Langue :
Anglais
Crédits
François Rechenmann (Intervention)
Conditions d'utilisation
Ces ressources de cours sont, sauf mention contraire, diffusées sous Licence Creative Commons. L’utilisateur doit mentionner le nom de l’auteur, il peut exploiter l’œuvre sauf dans un contexte commercial et il ne peut apporter de modifications à l’œuvre originale.
DOI : 10.60527/tjnv-nb73
Citer cette ressource :
François Rechenmann. Inria. (2015, 5 février). 4.3. Measuring sequence similarity , in 4. Sequences comparison. [Vidéo]. Canal-U. https://doi.org/10.60527/tjnv-nb73. (Consultée le 24 juillet 2024)

# 4.3. Measuring sequence similarity

Réalisation : 5 février 2015 - Mise en ligne : 9 mai 2017
• document 1 document 2 document 3
• niveau 1 niveau 2 niveau 3
Descriptif

So we understand why gene orprotein sequences may be similar. It's because they evolve togetherwith the species and they evolve in time, there aremodifications in the sequence and that the sequence may still besimilar, similar enough again to retrieve information on onesequence to transfer it to another sequence of interest. So thequestion now is how can we measure this similarity between twosequences for the moment. The first approach to similarityis a very simple one is to apply a distance which is calledhere the Editing System or the Hamming Distance.The idea is very basic. You would take two sequences likethese two sequences here and you look at the differences and youcount the number of differences. Here, for example, you have twodifferences so you will say that the distance, the similaritybetween the two sequences, the distance is two. Here wehave another pair of sequences which are less similar becausethey are three differences. That's quite nice, it'sa hamming distance. Is it really a distance? A distance is a mathematicalconcept and to be a distance, it must satisfy three conditions:the distance between a sequence and itself must be zero, a sequencebetween a sequence and another one must be the same betweenthe last one and the first one and we must have this inequalitywhich is always verified.

Intervention
Thème
Documentation

## Dans la même collection

• Vidéo pédagogique
00:03:50

### 4.5. A sequence alignment as a path

Rechenmann
François

Comparing two sequences and thenmeasuring their similarities is an optimization problem. Why? Because we have seen thatwe have to take into account substitution and deletion. During the alignment, the

• Vidéo pédagogique
00:07:41

### 4.8. A recursive algorithm

Rechenmann
François

We have seen how we can computethe optimal cost, the ending node of our grid if we know the optimal cost of the three adjacent nodes. This is this computation scheme we can see here using the notation

• Vidéo pédagogique
00:04:29

### 4.2. Why gene/protein sequences may be similar?

Rechenmann
François

Before measuring the similaritybetween the sequences, it's interesting to answer the question: why gene or protein sequences may be similar? It is indeed veryinteresting because the answer is related

• Vidéo pédagogique
00:04:11

### 4.6. A path is optimal if all its sub-paths are optimal

Rechenmann
François

A sequence alignment between two sequences is a path in a grid. So that, an optimal sequence alignmentis an optimal path in the same grid. We'll see now that a property of this optimal path provides

• Vidéo pédagogique
00:06:58

### 4.9. Recursion can be avoided: an iterative version

Rechenmann
François

We have written a recursive function to compute the optimal path that is an optimal alignment between two sequences. Here all the examples I gave were onDNA sequences, four letter alphabet. OK. The

• Vidéo pédagogique
00:04:22

### 4.4. Aligning sequences is an optimization problem

Rechenmann
François

We have seen a nice and a quitesimple solution for measuring the similarity between two sequences. It relied on the so-called hammingdistance that is counting the number of differencesbetween two

• Vidéo pédagogique
00:06:38

### 4.7. Alignment costs

Rechenmann
François

We have seen how we can compute the cost of the path ending on the last node of our grid if we know the cost of the sub-path ending on the three adjacent nodes. It is time now to see more deeply why

• Vidéo pédagogique
00:04:54

### 4.1. How to predict gene/protein functions?

Rechenmann
François

Last week we have seen that annotating a genome means first locating the genes on the DNA sequences that is the genes, the region coding for proteins. But this is indeed the first step,the next very

• Vidéo pédagogique
00:09:26

### 4.10. How efficient is this algorithm?

Rechenmann
François

We have seen the principle of an iterative algorithm in two paths for aligning and comparing two sequences of characters, here DNA sequences. And we understoodwhy the iterative version is much more

## Avec les mêmes intervenants et intervenantes

• Vidéo pédagogique
00:05:24

### 1.1. The cell, atom of the living world

Rechenmann
François

Welcome to this introduction to bioinformatics. We will speak of genomes and algorithms. More specifically, we will see how genetic information can be analysed by algorithms. In these five weeks to

• Vidéo pédagogique
00:09:07

### 1.9. Predicting the origin of DNA replication?

Rechenmann
François

We have seen a nice algorithm to draw, let's say, a DNA sequence. We will see that first, we have to correct a little bit this algorithm. And then we will see how such as imple algorithm can provide

• Vidéo pédagogique
00:08:21

### 2.8. DNA sequencing

Rechenmann
François

During the last session, I explained several times how it was important to increase the efficiency of sequences processing algorithm because sequences arevery long and there are large volumes of

• Vidéo pédagogique
00:04:45

### 3.5. Making the predictions more reliable

Rechenmann
François

We have got a bacterial gene predictor but the way this predictor works is rather crude and if we want to have more reliable results, we have to inject into this algorithmmore biological knowledge. We

• Vidéo pédagogique
00:04:11

### 4.6. A path is optimal if all its sub-paths are optimal

Rechenmann
François

A sequence alignment between two sequences is a path in a grid. So that, an optimal sequence alignmentis an optimal path in the same grid. We'll see now that a property of this optimal path provides

• Vidéo pédagogique
00:07:39

### 5.5. Differences are not always what they look like

Rechenmann
François

The algorithm we have presented works on an array of distance between sequences. These distances are evaluated on the basis of differences between the sequences. The problem is that behind the

• Vidéo pédagogique
00:05:48

### 1.4. What is an algorithm?

Rechenmann
François

We have seen that a genomic textcan be indeed a very long sequence of characters. And to interpret this sequence of characters, we will need to use computers. Using computers means writing program.

• Vidéo pédagogique
00:04:58

### 2.2. Genes: from Mendel to molecular biology

Rechenmann
François

The notion of gene emerged withthe works of Gregor Mendel. Mendel studied the inheritance on some traits like the shape of pea plant seeds,through generations. He stated the famous laws of inheritance

• Vidéo pédagogique
00:05:37

### 2.10. How to find genes?

Rechenmann
François

Getting the sequence of the genome is only the beginning, as I explained, once you have the sequence what you want to do is to locate the gene, to predict the function of the gene and maybe study the

• Vidéo pédagogique
00:06:09

### 3.8. Probabilistic methods

Rechenmann
François

Up to now, to predict our gene,we only rely on the process of searching certain strings or patterns. In order to further improve our gene predictor, the idea is to use, to rely onprobabilistic methods

• Vidéo pédagogique
00:04:29

### 4.2. Why gene/protein sequences may be similar?

Rechenmann
François

Before measuring the similaritybetween the sequences, it's interesting to answer the question: why gene or protein sequences may be similar? It is indeed veryinteresting because the answer is related

• Vidéo pédagogique
00:04:59

### 5.4. The UPGMA algorithm

Rechenmann
François

We know how to fill an array with the values of the distances between sequences, pairs of sequences which are available in the file. This array of distances will be the input of our algorithm for