Notice
3.9. Benchmarking the prediction methods
- document 1 document 2 document 3
- niveau 1 niveau 2 niveau 3
Descriptif
It is necessary to underline that gene predictors produce predictions. Predictions mean that you have no guarantees that the coding sequences, the coding regions,the genes you get when applying your algorithm, are true genes, thatis genes which have a biological existence. Only experimental analysiscan confirm or infirm your predictions. Nevertheless it is interesting and also important to be able to evaluate your algorithm, thisis the role of benchmarking. Benchmarking means measuring the capacity of your algorithm to produce good predictions. How can we make thiskind of measurement? We need a reference, an idealreference would be a genome which is well annotated and for whichall of the annotations have been confirmed through experimental results. Unfortunately, there are very few genomes for which we have this experimental confirmation.Even, for example, the E. coli genome, which is a well-knownorganism and well annotated genome, is not the ideal reference. However, it is also interesting to compare prediction algorithm and method between them, to do acompetition, to apply several predictors on the same genomes and to compare the results of this algorithm.
Thème
Documentation
Dans la même collection
-
3.1. All genes end on a stop codon
RECHENMANN François
Last week we studied genes and proteins and so how genes, portions of DNA, are translated into proteins. We also saw the very fast evolutionof the sequencing technology which allows for producing
-
3.10. Gene prediction in eukaryotic genomes
RECHENMANN François
If it is possible to have verygood predictions for bacterial genes, it's certainly not the caseyet for eukaryotic genomes. Eukaryotic cells have manydifferences in comparison to prokaryotic cells. You
-
3.4. Predicting all the genes in a sequence
RECHENMANN François
We have written an algorithm whichis able to locate potential genes on a sequence but only on one phase because we are looking triplets after triplets. Now remember that the genes maybe located on
-
3.7. Index and suffix trees
RECHENMANN François
We have seen with the Boyer-Moore algorithm how we can increase the efficiency of spin searching through the pre-processing of the pattern to be searched. Now we will see that an alternative way of
-
3.2. A simple algorithm for gene prediction
RECHENMANN François
Based on the principle we statedin the last session, we will now write in pseudo code a firstalgorithm for locating genes on a bacterial genome. Remember first how this algorithm should work, we first
-
3.5. Making the predictions more reliable
RECHENMANN François
We have got a bacterial gene predictor but the way this predictor works is rather crude and if we want to have more reliable results, we have to inject into this algorithmmore biological knowledge. We
-
3.8. Probabilistic methods
RECHENMANN François
Up to now, to predict our gene,we only rely on the process of searching certain strings or patterns. In order to further improve our gene predictor, the idea is to use, to rely onprobabilistic methods
-
3.3. Searching for start and stop codons
RECHENMANN François
We have written an algorithm for finding genes. But you remember that we arestill to write the two functions for finding the next stop codonand the next start codon. Let's see how we can do that. We
-
3.6. Boyer-Moore algorithm
RECHENMANN François
We have seen how we can make gene predictions more reliable through searching for all the patterns,all the occurrences of patterns. We have seen, for example, howif we locate the RBS, Ribosome
Avec les mêmes intervenants et intervenantes
-
1.6. GC and AT contents of DNA sequence
RECHENMANN François
We have designed our first algorithmfor counting nucleotides. Remember, what we have writtenin pseudo code is first declaration of variables. We have several integer variables that are variables which
-
2.5. Implementing the genetic code
RECHENMANN François
Remember we were designing our translation algorithm and since we are a bit lazy, we decided to make the hypothesis that there was the adequate function forimplementing the genetic code. It's now time
-
3.2. A simple algorithm for gene prediction
RECHENMANN François
Based on the principle we statedin the last session, we will now write in pseudo code a firstalgorithm for locating genes on a bacterial genome. Remember first how this algorithm should work, we first
-
4.1. How to predict gene/protein functions?
RECHENMANN François
Last week we have seen that annotating a genome means first locating the genes on the DNA sequences that is the genes, the region coding for proteins. But this is indeed the first step,the next very
-
4.10. How efficient is this algorithm?
RECHENMANN François
We have seen the principle of an iterative algorithm in two paths for aligning and comparing two sequences of characters, here DNA sequences. And we understoodwhy the iterative version is much more
-
5.7. The application domains in microbiology
RECHENMANN François
Bioinformatics relies on many domains of mathematics and computer science. Of course, algorithms themselves on character strings are important in bioinformatics, we have seen them. Algorithms and
-
1.1. The cell, atom of the living world
RECHENMANN François
Welcome to this introduction to bioinformatics. We will speak of genomes and algorithms. More specifically, we will see how genetic information can be analysed by algorithms. In these five weeks to
-
1.9. Predicting the origin of DNA replication?
RECHENMANN François
We have seen a nice algorithm to draw, let's say, a DNA sequence. We will see that first, we have to correct a little bit this algorithm. And then we will see how such as imple algorithm can provide
-
2.8. DNA sequencing
RECHENMANN François
During the last session, I explained several times how it was important to increase the efficiency of sequences processing algorithm because sequences arevery long and there are large volumes of
-
3.5. Making the predictions more reliable
RECHENMANN François
We have got a bacterial gene predictor but the way this predictor works is rather crude and if we want to have more reliable results, we have to inject into this algorithmmore biological knowledge. We
-
4.5. A sequence alignment as a path
RECHENMANN François
Comparing two sequences and thenmeasuring their similarities is an optimization problem. Why? Because we have seen thatwe have to take into account substitution and deletion. During the alignment, the
-
5.5. Differences are not always what they look like
RECHENMANN François
The algorithm we have presented works on an array of distance between sequences. These distances are evaluated on the basis of differences between the sequences. The problem is that behind the