le (3m51s)

# Résultats de recherche

**537**

le (4m23s)

## 4.4. Aligning sequences is an optimization problem

We have seen a nice and a quitesimple solution for measuring the similarity between two sequences. It relied on the so-called hammingdistance that is counting the number of differencesbetween two sequences. But the real situation is a bitmore complex as we'll see now, it needs an adequatesolution and algorithm. Why is it a bit more complex? Let's have a look at thispair of two sequences. If we apply the hamming distance,compute the hamming between these two sequences,we find ten differences. OK. But you must remember thatmutation may be substitution, deletion and insertion. So if wetake into account the deletion and insertion, the situation isvery different in the case of these two sequences. ... Voir la vidéole (4m0s)

## 4.3. Measuring sequence similarity

So we understand why gene orprotein sequences may be similar. It's because they evolve togetherwith the species and they evolve in time, there aremodifications in the sequence and that the sequence may still besimilar, similar enough again to retrieve information on onesequence to transfer it to another sequence of interest. So thequestion now is how can we measure this similarity between twosequences for the moment. The first approach to similarityis a very simple one is to apply a distance which is calledhere the Editing System or the Hamming Distance.The idea is very basic. You would take two sequences likethese two sequences here and you look at the differences and youcount ... Voir la vidéole (4m30s)

## 4.2. Why gene/protein sequences may be similar?

Before measuring the similaritybetween the sequences, it's interesting to answer the question: why gene or protein sequences may be similar? It is indeed veryinteresting because the answer is related to the theory ofevolution which is due, as you all know, to Darwin. What Darwinsays is that species evolve in time and there is a creation ofnew species for existing ones. So there is an evolutionof species over time. He was a very thinking man, huh. This evolution can be also seenon the genomic sequences. Let's see this very small and partialtree of life and hypothetical tree of life. Here you have thespecies and you have this phenomenon of speciation giving ... Voir la vidéole (7m42s)

## 4.8. A recursive algorithm

We have seen how we can computethe optimal cost, the ending node of our grid if we know the optimal cost of the three adjacent nodes. This is this computation scheme we can see here using the notation of the pseudo code and not the mathematical notation we used in the previous sessions. So again we can compute the cost of this node if we know the cost of that node, that node and that node and we have to add respectively the insertion cost, the substitution cost orthe insertion cost. The substitution cost here depends on the letter at this position in the sequence and this letter ... Voir la vidéole (9m27s)

## 4.10. How efficient is this algorithm?

We have seen the principle of an iterative algorithm in two paths for aligning and comparing two sequences of characters, here DNA sequences. And we understoodwhy the iterative version is much more efficient than the recursive version. But, how efficient is reallythis iterative algorithm? You remember that in order to measure the efficiency of algorithms, the computer scientists do not use any mean of measuring the time or any other thing. They evaluate the number of timethe main operation inside the algorithm is executed. In the caseof this Needleman and Wunsch algorithm which has been published 40 years ago, the operation which is critical is the comparisonbetween two letters of ... Voir la vidéole (6m59s)

## 4.9. Recursion can be avoided: an iterative version

We have written a recursive function to compute the optimal path that is an optimal alignment between two sequences. Here all the examples I gave were onDNA sequences, four letter alphabet. OK. The writing of this recursive function is very elegant but unfortunately we will see now that it isnot very efficient in execution time. Let's see why. Remember the computing schema weapply during the recursion, for example here, to compute the cost of this node, we saw that it was required to computerecursively the cost of that node, that node and that node. OK but to compute the cost of that node here, you need to compute the cost ... Voir la vidéole (5m17s)

## 5.1. The tree of life

Welcome to this fifth and last week of our course on genomes and algorithms that is the computer analysis of genetic information. During this week, we will firstsee what phylogenetic trees are and how we can reconstruct these trees from the available data. Then to conclude this week and this course, we will present an overview, a larger overview of bioinformatic algorithms and we will conclude on the application of bioinformatics at least in the microbial world. So first the tree of life, we have already seen that due to the ideas of Darwin, we know that species evolve and the evolution of these species canbe seen as a ... Voir la vidéole (7m40s)

## 5.5. Differences are not always what they look like

The algorithm we have presented works on an array of distance between sequences. These distances are evaluated on the basis of differences between the sequences. The problem is that behind the differences we observed on the set of sequences, there may beother mutations which cannot be observed and we should modify the distances. We will have a look at some simple cases of these observed differences which may correspond to hidden differences and then we will see how the evaluation, computationof the number of differences may be affected. The simple case is this one, aunique substitution between, in the sequence One we have a Cand it turns out that in ... Voir la vidéole (4m46s)