Notice
4.9. Recursion can be avoided: an iterative version
- document 1 document 2 document 3
- niveau 1 niveau 2 niveau 3
Descriptif
We have written a recursive function to compute the optimal path that is an optimal alignment between two sequences. Here all the examples I gave were onDNA sequences, four letter alphabet. OK. The writing of this recursive function is very elegant but unfortunately we will see now that it isnot very efficient in execution time. Let's see why. Remember the computing schema weapply during the recursion, for example here, to compute the cost of this node, we saw that it was required to computerecursively the cost of that node, that node and that node. OK but to compute the cost of that node here, you need to compute the cost of that one, that oneand that one again that is this cost which was computed in order to compute the code of the ending node here has to be recomputed in the recursive function to compute the cost of that node. In a more general way, to computethe cost of a node like this one, you need to compute all of these nodes here but to compute the cost of that node, you need to compute all the costs of these nodes again and again and again. So the cost of one node in thiswriting of the function is computed many, many, many times. It's because we use this recursive function so it was nice but it was expensive in terms of execution time. So we can imagine a new version of the algorithm which is not recursive but iterative in two phases. Let's see how it works.
Thème
Documentation
Dans la même collection
-
4.8. A recursive algorithm
RECHENMANN François
We have seen how we can computethe optimal cost, the ending node of our grid if we know the optimal cost of the three adjacent nodes. This is this computation scheme we can see here using the notation
-
4.3. Measuring sequence similarity
RECHENMANN François
So we understand why gene orprotein sequences may be similar. It's because they evolve togetherwith the species and they evolve in time, there aremodifications in the sequence and that the sequence
-
4.6. A path is optimal if all its sub-paths are optimal
RECHENMANN François
A sequence alignment between two sequences is a path in a grid. So that, an optimal sequence alignmentis an optimal path in the same grid. We'll see now that a property of this optimal path provides
-
4.1. How to predict gene/protein functions?
RECHENMANN François
Last week we have seen that annotating a genome means first locating the genes on the DNA sequences that is the genes, the region coding for proteins. But this is indeed the first step,the next very
-
4.10. How efficient is this algorithm?
RECHENMANN François
We have seen the principle of an iterative algorithm in two paths for aligning and comparing two sequences of characters, here DNA sequences. And we understoodwhy the iterative version is much more
-
4.4. Aligning sequences is an optimization problem
RECHENMANN François
We have seen a nice and a quitesimple solution for measuring the similarity between two sequences. It relied on the so-called hammingdistance that is counting the number of differencesbetween two
-
4.7. Alignment costs
RECHENMANN François
We have seen how we can compute the cost of the path ending on the last node of our grid if we know the cost of the sub-path ending on the three adjacent nodes. It is time now to see more deeply why
-
4.2. Why gene/protein sequences may be similar?
RECHENMANN François
Before measuring the similaritybetween the sequences, it's interesting to answer the question: why gene or protein sequences may be similar? It is indeed veryinteresting because the answer is related
-
4.5. A sequence alignment as a path
RECHENMANN François
Comparing two sequences and thenmeasuring their similarities is an optimization problem. Why? Because we have seen thatwe have to take into account substitution and deletion. During the alignment, the
Avec les mêmes intervenants et intervenantes
-
1.5. Counting nucleotides
RECHENMANN François
In this session, don't panic. We will design our first algorithm. This algorithm is forcounting nucleotides. The idea here is that as an input,you have a sequence of nucleotides, of bases, of letters,
-
2.4. A translation algorithm
RECHENMANN François
We have seen that the genetic codeis a correspondence between the DNA or RNA sequences and aminoacid sequences that is proteins. Our aim here is to design atranslation algorithm, we make the
-
3.1. All genes end on a stop codon
RECHENMANN François
Last week we studied genes and proteins and so how genes, portions of DNA, are translated into proteins. We also saw the very fast evolutionof the sequencing technology which allows for producing
-
3.9. Benchmarking the prediction methods
RECHENMANN François
It is necessary to underline that gene predictors produce predictions. Predictions mean that you have no guarantees that the coding sequences, the coding regions,the genes you get when applying your
-
4.2. Why gene/protein sequences may be similar?
RECHENMANN François
Before measuring the similaritybetween the sequences, it's interesting to answer the question: why gene or protein sequences may be similar? It is indeed veryinteresting because the answer is related
-
5.6. The diversity of bioinformatics algorithms
RECHENMANN François
In this course, we have seen a very little set of bioinformatic algorithms. There exist numerous various algorithms in bioinformatics which deal with a large span of classes of problems. For example,
-
1.8. Compressing the DNA walk
RECHENMANN François
We have written the algorithm for the circle DNA walk. Just a precision here: the kind of drawing we get has nothing to do with the physical drawing of the DNA molecule. It is a symbolic
-
2.7. The algorithm design trade-off
RECHENMANN François
We saw how to increase the efficiencyof our algorithm through the introduction of a data structure. Now let's see if we can do even better. We had a table of index and weexplain how the use of these
-
3.4. Predicting all the genes in a sequence
RECHENMANN François
We have written an algorithm whichis able to locate potential genes on a sequence but only on one phase because we are looking triplets after triplets. Now remember that the genes maybe located on
-
4.7. Alignment costs
RECHENMANN François
We have seen how we can compute the cost of the path ending on the last node of our grid if we know the cost of the sub-path ending on the three adjacent nodes. It is time now to see more deeply why
-
5.1. The tree of life
RECHENMANN François
Welcome to this fifth and last week of our course on genomes and algorithms that is the computer analysis of genetic information. During this week, we will firstsee what phylogenetic trees are and how
-
1.3. DNA codes for genetic information
RECHENMANN François
Remember at the heart of any cell,there is this very long molecule which is called a macromolecule for this reason, which is the DNA molecule. Now we will see that DNA molecules support what is called