3.9. Benchmarking the prediction methods

It is necessary to underline that gene predictors produce predictions. Predictions mean that you have no guarantees that the coding sequences, the coding regions,the genes you get when applying your algorithm, are true genes, thatis genes which have a biological existence. Only experimental analysiscan confirm or infirm your predictions. Nevertheless it is interesting and also important to be able to evaluate your algorithm, thisis the role of benchmarking. Benchmarking means measuring the capacity of your algorithm to produce good predictions. How can we make thiskind of measurement? We need a reference, an idealreference would be a genome which is well annotated and for whichall of the annotations have been confirmed through ...
2.10. How to find genes?

Getting the sequence of the genome is only the beginning, as I explained, once you have the sequence what you want to do is to locate the gene, to predict the function of the gene and maybe study the interaction between genes and proteins. Let's concentrate on the prediction of genes on a genome. How can we find genes using,of course, algorithms? That's what we call genome annotation, the prediction of gene location and the prediction ofthe function of the genes, of the protein coded by the genes. A typical bacterial genome like the E. coli genome is four by five megabases and is the support of 4,500 genes. A ...
3.1. All genes end on a stop codon

Last week we studied genes and proteins and so how genes, portions of DNA, are translated into proteins.We also saw the very fast evolutionof the sequencing technology which allows for producing large genomic texts, it is now possible to sequence a whole genome. But it is just thebeginning of the story. The challenge to come is to analysethe texts of these genomes and find genes, so this week wewill see how we can design avery first algorithm for predicting genes on a bacterial genome. We will first remember the conditionfor finding genes, we will design and propose an algorithmfor that and we will see that a part of ...
2.6. Algorithms + data structures = programs

By writing the Lookup GeneticCode Function, we completed our translation algorithm. So we may ask the question about the algorithm, does it terminate? Andthe answer is yes, obviously. Is it pertinent, that is, doesit return the expected answer? The answer is yes, if you giveas an input a sequence of DNA, you will get as an output asequence of amino acids unless, of course, one of the tripletsis not one of the 64 expected triplets and then you will get, ofcourse, a nonsense protein sequence. Is it efficient? Well, for measuring the efficiency of an algorithm, you can ask the question, how manybasic operations you have to execute. In ...
