Canal-U

Mon compte

Résultats de recherche

Nombre de programmes trouvés : 933
Label UNT Vidéocours

le (5m38s)

2.10. How to find genes?

Getting the sequence of the genome is only the beginning, as I explained, once you have the sequence what you want to do is to locate the gene, to predict the function of the gene and maybe study the interaction between genes and proteins. Let's concentrate on the prediction of genes on a genome. How can we find genes using,of course, algorithms? That's what we call genome annotation, the prediction of gene location and the prediction ofthe function of the genes, of the protein coded by the genes. A typical bacterial genome like the E. coli genome is four by five megabases and is the support of 4,500 genes. A ...
Voir la vidéo
Label UNT Vidéocours

le (5m42s)

3.1. All genes end on a stop codon

Last week we studied genes and proteins and so how genes, portions of DNA, are translated into proteins.We also saw the very fast evolutionof the sequencing technology which allows for producing large genomic texts, it is now possible to sequence a whole genome. But it is just thebeginning of the story. The challenge to come is to analysethe texts of these genomes and find genes, so this week wewill see how we can design avery first algorithm for predicting genes on a bacterial genome. We will first remember the conditionfor finding genes, we will design and propose an algorithmfor that and we will see that a part of ...
Voir la vidéo
Label UNT Vidéocours

le (5m14s)

3.2. A simple algorithm for gene prediction

Based on the principle we statedin the last session, we will now write in pseudo code a firstalgorithm for locating genes on a bacterial genome. Remember first how this algorithm should work, we first need to find two consecutive stop triplets in the same phase, same phase meansthe number of letters between these two stop triplets might bea multiple of three so that this sequence here can be divided into triplets. This is called an open reading frame. Once we have an open reading framewe look for the start triplet which is situated leftmost onthe open reading frame and we declare, we make the hypothesis that thisis a coding ...
Voir la vidéo
Label UNT Vidéocours

le (4m46s)

3.3. Searching for start and stop codons

We have written an algorithm for finding genes. But you remember that we arestill to write the two functions for finding the next stop codonand the next start codon. Let's see how we can do that. We are looking for triplets. We use the term triplets as long as wehave no proof that they are codons. You can have triplets outside genes. Within genes, they are called codons. In general, we arelooking for triplets. If you have a sequence like thisone and you are looking for occurrences of this triplet, whatyou have to do is: position your triplet at the beginning of the sequence. Compare the first letter. If it is not ...
Voir la vidéo
Label UNT Vidéocours

le (6m23s)

3.4. Predicting all the genes in a sequence

We have written an algorithm whichis able to locate potential genes on a sequence but only on one phase because we are looking triplets after triplets. Now remember that the genes maybe located on different phases and on the two strands. It means that to retrieve all the genes on a genome we have to look on six different sequences, three phases on each strand. Let's looknow how we can deal with this kind of search. First we have to modify a little bit our algorithm so that instead of starting at position One, I want to introduce a variable, a parameter which could be One or Two ...
Voir la vidéo
Label UNT Vidéocours

le (4m46s)

3.5. Making the predictions more reliable

We have got a bacterial gene predictor but the way this predictor works is rather crude and if we want to have more reliable results, we have to inject into this algorithmmore biological knowledge. We will use a notion of RBS, RBS stands for Ribosome Binding Sites. What is it? OK. Let's have a look atthe cell machinery or part of it here. You certainly see here that wedeal with a eukaryotes cell. Why? It's because you have anucleus and you remember that the difference between prokaryoticcell and eukaryotic cell lies n the existence of a nucleus. Within the nucleus you have the DNA. The DNA is transcribed into ...
Voir la vidéo
Label UNT Vidéocours

le (7m7s)

3.7. Index and suffix trees

We have seen with the Boyer-Moore algorithm how we can increase the efficiency of spin searching through the pre-processing of the pattern to be searched. Now we will see that an alternative way of improving the performance is to pre-process the text itself,the searchable text itself and we will, for that, study two methods, the construction of indexes of fixed length words and the algorithm which uses prefix trees. An index of fixed lengthword, what does it mean? Imagine you have a text, a searchable text, that is a text in which you want to search a pattern,here is quite a short text, the sequence is 14 correctors. We will ...
Voir la vidéo
Label UNT Vidéocours

le (6m10s)

3.8. Probabilistic methods

Up to now, to predict our gene,we only rely on the process of searching certain strings or patterns. In order to further improve our gene predictor, the idea is to use, to rely onprobabilistic methods. What does it mean? I will firsttake an example, which is not related to genomic but I think it'sgood to understand the idea. Imagine you have a very long text which is known to be written in some human understandable language but you don't know which one but you know that some passages of this text only are written in a human understandable language,maybe English, maybe French and so on, whatever. You don't know. How ...
Voir la vidéo
Label UNT Vidéocours

le (5m36s)

3.9. Benchmarking the prediction methods

It is necessary to underline that gene predictors produce predictions. Predictions mean that you have no guarantees that the coding sequences, the coding regions,the genes you get when applying your algorithm, are true genes, thatis genes which have a biological existence. Only experimental analysiscan confirm or infirm your predictions. Nevertheless it is interesting and also important to be able to evaluate your algorithm, thisis the role of benchmarking. Benchmarking means measuring the capacity of your algorithm to produce good predictions. How can we make thiskind of measurement? We need a reference, an idealreference would be a genome which is well annotated and for whichall of the annotations have been confirmed through ...
Voir la vidéo

 
FMSH
 
Facebook Twitter
Mon Compte