François Rechenmann
4.4. Aligning sequences is an optimization problem

5 février 2015
We have seen a nice and a quitesimple solution for measuring the similarity between two sequences. It relied on the so-called hammingdistance that is counting the number of differencesbetween two sequences. But the real situation is a bitmore complex as we'll see now, it needs an adequatesolution and algorithm. Why is it a bit more complex? Let's have a look at thispair of two sequences. If we apply the hamming distance,compute the hamming between these two sequences,we find ten differences. OK. But you must remember thatmutation may be substitution, deletion and insertion. So if wetake into account the deletion and insertion, the situation isvery different in the case of these two sequences. Let's see why. We can compare now that it isthe same set of sequences but here and here we made the hypothesisof insertion or deletion. Insertion or deletion depend on thesequence you take into account and one substitution. Instead ofhaving ten differences, the same sequences show only twoinsertions/deletions and one substitution. It's because we have taken into accountthe notion of insertion/deletion by inserting this character,this blank character.


