Vidéo pédagogique
Notice
Lieu de réalisation
Grenoble
Sous-titrage
Sous-titre
Langue :
Anglais
Crédits
François Rechenmann (Intervention)
Conditions d'utilisation
Ces ressources de cours sont, sauf mention contraire, diffusées sous Licence Creative Commons. L’utilisateur doit mentionner le nom de l’auteur, il peut exploiter l’œuvre sauf dans un contexte commercial et il ne peut apporter de modifications à l’œuvre originale.
DOI : 10.60527/z08e-dq41
Citer cette ressource :
François Rechenmann. Inria. (2015, 5 février). 3.8. Probabilistic methods , in 3. Gene prediction. [Vidéo]. Canal-U. https://doi.org/10.60527/z08e-dq41. (Consultée le 21 juillet 2024)

# 3.8. Probabilistic methods

Réalisation : 5 février 2015 - Mise en ligne : 9 mai 2017
• document 1 document 2 document 3
• niveau 1 niveau 2 niveau 3
Descriptif

Up to now, to predict our gene,we only rely on the process of searching certain strings or patterns. In order to further improve our gene predictor, the idea is to use, to rely onprobabilistic methods. What does it mean? I will firsttake an example, which is not related to genomic but I think it'sgood to understand the idea. Imagine you have a very long text which is known to be written in some human understandable language but you don't know which one but you know that some passages of this text only are written in a human understandable language,maybe English, maybe French and so on, whatever. You don't know. How can you retrieve these passages with this very little information you have on the text? Well, the idea is to make use ofthe fact that the frequencies of letters in a human readable languageare different from random frequencies. For example, here you have the tables of the frequencies and letters in French and in English. For example you see in French,W is a very low frequency, the highest frequency is E and so on, yousee E for example, well whatever, the. . . OK. This is also meaningful. OK. But the idea here is you see that if you count the frequencies letters in a human readable text,these frequencies are not all equal. That's normal because it's writtenwith words and so on and so on.

Intervention
Thème
Documentation

## Dans la même collection

• Vidéo pédagogique
00:05:58

### 3.6. Boyer-Moore algorithm

Rechenmann
François

We have seen how we can make gene predictions more reliable through searching for all the patterns,all the occurrences of patterns. We have seen, for example, howif we locate the RBS, Ribosome

• Vidéo pédagogique
00:05:41

### 3.1. All genes end on a stop codon

Rechenmann
François

Last week we studied genes and proteins and so how genes, portions of DNA, are translated into proteins. We also saw the very fast evolutionof the sequencing technology which allows for producing

• Vidéo pédagogique
00:08:56

### 3.10. Gene prediction in eukaryotic genomes

Rechenmann
François

If it is possible to have verygood predictions for bacterial genes, it's certainly not the caseyet for eukaryotic genomes. Eukaryotic cells have manydifferences in comparison to prokaryotic cells. You

• Vidéo pédagogique
00:06:22

### 3.4. Predicting all the genes in a sequence

Rechenmann
François

We have written an algorithm whichis able to locate potential genes on a sequence but only on one phase because we are looking triplets after triplets. Now remember that the genes maybe located on

• Vidéo pédagogique
00:07:06

### 3.7. Index and suffix trees

Rechenmann
François

We have seen with the Boyer-Moore algorithm how we can increase the efficiency of spin searching through the pre-processing of the pattern to be searched. Now we will see that an alternative way of

• Vidéo pédagogique
00:05:13

### 3.2. A simple algorithm for gene prediction

Rechenmann
François

Based on the principle we statedin the last session, we will now write in pseudo code a firstalgorithm for locating genes on a bacterial genome. Remember first how this algorithm should work, we first

• Vidéo pédagogique
00:04:45

### 3.5. Making the predictions more reliable

Rechenmann
François

We have got a bacterial gene predictor but the way this predictor works is rather crude and if we want to have more reliable results, we have to inject into this algorithmmore biological knowledge. We

• Vidéo pédagogique
00:05:35

### 3.9. Benchmarking the prediction methods

Rechenmann
François

It is necessary to underline that gene predictors produce predictions. Predictions mean that you have no guarantees that the coding sequences, the coding regions,the genes you get when applying your

• Vidéo pédagogique
00:04:45

### 3.3. Searching for start and stop codons

Rechenmann
François

We have written an algorithm for finding genes. But you remember that we arestill to write the two functions for finding the next stop codonand the next start codon. Let's see how we can do that. We

## Avec les mêmes intervenants et intervenantes

• Vidéo pédagogique
00:05:24

### 1.1. The cell, atom of the living world

Rechenmann
François

Welcome to this introduction to bioinformatics. We will speak of genomes and algorithms. More specifically, we will see how genetic information can be analysed by algorithms. In these five weeks to

• Vidéo pédagogique
00:09:07

### 1.9. Predicting the origin of DNA replication?

Rechenmann
François

We have seen a nice algorithm to draw, let's say, a DNA sequence. We will see that first, we have to correct a little bit this algorithm. And then we will see how such as imple algorithm can provide

• Vidéo pédagogique
00:08:21

### 2.8. DNA sequencing

Rechenmann
François

During the last session, I explained several times how it was important to increase the efficiency of sequences processing algorithm because sequences arevery long and there are large volumes of

• Vidéo pédagogique
00:04:45

### 3.5. Making the predictions more reliable

Rechenmann
François

We have got a bacterial gene predictor but the way this predictor works is rather crude and if we want to have more reliable results, we have to inject into this algorithmmore biological knowledge. We

• Vidéo pédagogique
00:03:50

### 4.5. A sequence alignment as a path

Rechenmann
François

Comparing two sequences and thenmeasuring their similarities is an optimization problem. Why? Because we have seen thatwe have to take into account substitution and deletion. During the alignment, the

• Vidéo pédagogique
00:07:39

### 5.5. Differences are not always what they look like

Rechenmann
François

The algorithm we have presented works on an array of distance between sequences. These distances are evaluated on the basis of differences between the sequences. The problem is that behind the

• Vidéo pédagogique
00:05:48

### 1.4. What is an algorithm?

Rechenmann
François

We have seen that a genomic textcan be indeed a very long sequence of characters. And to interpret this sequence of characters, we will need to use computers. Using computers means writing program.

• Vidéo pédagogique
00:04:58

### 2.2. Genes: from Mendel to molecular biology

Rechenmann
François

The notion of gene emerged withthe works of Gregor Mendel. Mendel studied the inheritance on some traits like the shape of pea plant seeds,through generations. He stated the famous laws of inheritance

• Vidéo pédagogique
00:05:37

### 2.10. How to find genes?

Rechenmann
François

Getting the sequence of the genome is only the beginning, as I explained, once you have the sequence what you want to do is to locate the gene, to predict the function of the gene and maybe study the

• Vidéo pédagogique
00:05:35

### 3.9. Benchmarking the prediction methods

Rechenmann
François

It is necessary to underline that gene predictors produce predictions. Predictions mean that you have no guarantees that the coding sequences, the coding regions,the genes you get when applying your

• Vidéo pédagogique
00:04:29

### 4.2. Why gene/protein sequences may be similar?

Rechenmann
François

Before measuring the similaritybetween the sequences, it's interesting to answer the question: why gene or protein sequences may be similar? It is indeed veryinteresting because the answer is related

• Vidéo pédagogique
00:04:59

### 5.4. The UPGMA algorithm

Rechenmann
François

We know how to fill an array with the values of the distances between sequences, pairs of sequences which are available in the file. This array of distances will be the input of our algorithm for