Notice
3.10. Gene prediction in eukaryotic genomes
- document 1 document 2 document 3
- niveau 1 niveau 2 niveau 3
Descriptif
If it is possible to have verygood predictions for bacterial genes, it's certainly not the caseyet for eukaryotic genomes. Eukaryotic cells have manydifferences in comparison to prokaryotic cells. You rememberthe existence of a nucleus and you also remember on one ofthe schemes in the first week that there are more structureswithin a eukaryotic cell. But the differences lie also inthe organization of the genomes. In eukaryotic genomes, the so-calledintergenic regions are very long. Intergenic regions are theregions which separate genes. A bacterial genome is very denseindeed, if you put your fingers somewhere on the genome, if itwas possible of course, it would be on the gene. If you do the sameon a eukaryotic genome, the probability is very very very highthat it is on an intergenic region. Indeed if you take the exampleof the human genome, less than 5% of the sequences of a human genome are made up of genes, 95 % of the humangenomes are not genes. What are they? This isstill an open question. Years ago a biologist spoke about germDNA to say, well DNA which is useless. Now the feeling is somewhat different,it certainly has a reason to exist. We understand some of thesereasons but not all.
Thème
Documentation
Dans la même collection
-
3.4. Predicting all the genes in a sequence
RECHENMANN François
We have written an algorithm whichis able to locate potential genes on a sequence but only on one phase because we are looking triplets after triplets. Now remember that the genes maybe located on
-
3.7. Index and suffix trees
RECHENMANN François
We have seen with the Boyer-Moore algorithm how we can increase the efficiency of spin searching through the pre-processing of the pattern to be searched. Now we will see that an alternative way of
-
3.2. A simple algorithm for gene prediction
RECHENMANN François
Based on the principle we statedin the last session, we will now write in pseudo code a firstalgorithm for locating genes on a bacterial genome. Remember first how this algorithm should work, we first
-
3.5. Making the predictions more reliable
RECHENMANN François
We have got a bacterial gene predictor but the way this predictor works is rather crude and if we want to have more reliable results, we have to inject into this algorithmmore biological knowledge. We
-
3.8. Probabilistic methods
RECHENMANN François
Up to now, to predict our gene,we only rely on the process of searching certain strings or patterns. In order to further improve our gene predictor, the idea is to use, to rely onprobabilistic methods
-
3.3. Searching for start and stop codons
RECHENMANN François
We have written an algorithm for finding genes. But you remember that we arestill to write the two functions for finding the next stop codonand the next start codon. Let's see how we can do that. We
-
3.6. Boyer-Moore algorithm
RECHENMANN François
We have seen how we can make gene predictions more reliable through searching for all the patterns,all the occurrences of patterns. We have seen, for example, howif we locate the RBS, Ribosome
-
3.1. All genes end on a stop codon
RECHENMANN François
Last week we studied genes and proteins and so how genes, portions of DNA, are translated into proteins. We also saw the very fast evolutionof the sequencing technology which allows for producing
-
3.9. Benchmarking the prediction methods
RECHENMANN François
It is necessary to underline that gene predictors produce predictions. Predictions mean that you have no guarantees that the coding sequences, the coding regions,the genes you get when applying your
Avec les mêmes intervenants et intervenantes
-
1.1. The cell, atom of the living world
RECHENMANN François
Welcome to this introduction to bioinformatics. We will speak of genomes and algorithms. More specifically, we will see how genetic information can be analysed by algorithms. In these five weeks to
-
1.9. Predicting the origin of DNA replication?
RECHENMANN François
We have seen a nice algorithm to draw, let's say, a DNA sequence. We will see that first, we have to correct a little bit this algorithm. And then we will see how such as imple algorithm can provide
-
2.8. DNA sequencing
RECHENMANN François
During the last session, I explained several times how it was important to increase the efficiency of sequences processing algorithm because sequences arevery long and there are large volumes of
-
3.5. Making the predictions more reliable
RECHENMANN François
We have got a bacterial gene predictor but the way this predictor works is rather crude and if we want to have more reliable results, we have to inject into this algorithmmore biological knowledge. We
-
4.5. A sequence alignment as a path
RECHENMANN François
Comparing two sequences and thenmeasuring their similarities is an optimization problem. Why? Because we have seen thatwe have to take into account substitution and deletion. During the alignment, the
-
5.5. Differences are not always what they look like
RECHENMANN François
The algorithm we have presented works on an array of distance between sequences. These distances are evaluated on the basis of differences between the sequences. The problem is that behind the
-
1.4. What is an algorithm?
RECHENMANN François
We have seen that a genomic textcan be indeed a very long sequence of characters. And to interpret this sequence of characters, we will need to use computers. Using computers means writing program.
-
2.2. Genes: from Mendel to molecular biology
RECHENMANN François
The notion of gene emerged withthe works of Gregor Mendel. Mendel studied the inheritance on some traits like the shape of pea plant seeds,through generations. He stated the famous laws of inheritance
-
2.10. How to find genes?
RECHENMANN François
Getting the sequence of the genome is only the beginning, as I explained, once you have the sequence what you want to do is to locate the gene, to predict the function of the gene and maybe study the
-
3.8. Probabilistic methods
RECHENMANN François
Up to now, to predict our gene,we only rely on the process of searching certain strings or patterns. In order to further improve our gene predictor, the idea is to use, to rely onprobabilistic methods
-
4.2. Why gene/protein sequences may be similar?
RECHENMANN François
Before measuring the similaritybetween the sequences, it's interesting to answer the question: why gene or protein sequences may be similar? It is indeed veryinteresting because the answer is related
-
5.4. The UPGMA algorithm
RECHENMANN François
We know how to fill an array with the values of the distances between sequences, pairs of sequences which are available in the file. This array of distances will be the input of our algorithm for