WEBVTT
00:00:00.830 --> 00:00:07.660
We know how to fill an array with
the values of the distances
00:00:08.110 --> 00:00:14.620
between sequences, pairs of sequences
which are available in the file.
00:00:16.230 --> 00:00:21.180
This array of distances will be
the input of our algorithm for
00:00:21.180 --> 00:00:22.620
reconstructing phylogenetic trees.
00:00:27.040 --> 00:00:32.560
The name of this algorithm is
rather complicated but the method
00:00:32.560 --> 00:00:36.600
itself is rather simple,
too simple indeed.
00:00:36.570 --> 00:00:40.900
We will see that. The name stands
for Unweighted Pair Group
00:00:41.360 --> 00:00:45.090
Method with Arithmetic Mean, we
will understand these terms
00:00:45.450 --> 00:00:47.780
along the presentation
of the algorithm.
00:00:47.750 --> 00:00:52.220
The algorithm starts with
an array of distances.
00:00:53.330 --> 00:00:58.630
Let's take this very simple
example, it implies seven species
00:00:59.400 --> 00:01:05.630
and here we have the values of the
distances between these different
00:01:06.670 --> 00:01:09.040
sequences associated with a species.
00:01:09.530 --> 00:01:15.090
As you remember, the array is
symmetrical and all the values
00:01:15.120 --> 00:01:21.190
on the diagonal are equal to zero
so here we display only the
00:01:21.190 --> 00:01:28.110
meaningful values. So all the cells of
the array are not displayed here.
00:01:29.040 --> 00:01:37.880
OK. First step consists in selecting
the smallest value of the
00:01:37.880 --> 00:01:44.880
array, this is two here which is
the distance between F and C.
00:01:45.420 --> 00:01:50.990
So since it is the smallest
distance we allow to group these
00:01:50.990 --> 00:01:56.240
two species, these two nodes into
a first sub tree and create
00:01:57.030 --> 00:02:02.810
a new node here which is the
route of this sub tree.
00:02:04.370 --> 00:02:10.650
Now we must compute the distance
between these nodes and the
00:02:10.650 --> 00:02:13.400
remaining species,
the remaining nodes.
00:02:15.570 --> 00:02:20.350
This is done with these formulas.
00:02:20.460 --> 00:02:24.030
Here is an example and you understand
why the name of the algorithm
00:02:24.030 --> 00:02:30.380
includes mean. OK? Because, for
example, here we compute the
00:02:30.380 --> 00:02:34.990
distance between this node at C
and D as the mean, arithmetic
00:02:35.620 --> 00:02:40.890
mean between the distance of F
and D and C and D, a very simple
00:02:41.450 --> 00:02:44.150
computation indeed, here it is six.
00:02:44.250 --> 00:02:50.630
We do the other computation and
we get a new array with a new
00:02:50.630 --> 00:02:55.320
value, you see here we
have this node F C.
00:02:56.290 --> 00:03:03.470
OK. So we repeat the step again,
that is find the lowest value,
00:03:04.010 --> 00:03:10.090
here we have two minimal
values for here and here.
00:03:10.410 --> 00:03:11.640
We select the first one.
00:03:12.890 --> 00:03:15.540
OK. It could be the other one,
it would not change anything
00:03:15.770 --> 00:03:24.610
to the result, of course and we
group this node with this one
00:03:25.090 --> 00:03:30.490
to build a new sub tree
with a new node here.
00:03:31.950 --> 00:03:39.160
Again we have to compute the new
distances between these nodes
00:03:39.410 --> 00:03:46.560
and the remaining nodes which are for
the moment not positioned in the tree.
00:03:47.410 --> 00:03:56.880
We will get a new array, we will
apply another step selecting
00:03:57.700 --> 00:04:06.020
the lowest value, here six, that is
grouping these two nodes into a sub tree.
00:04:08.660 --> 00:04:15.600
Compute again the new values for
the remaining nodes with the
00:04:15.600 --> 00:04:21.600
same kind of formula. Get a very
simple matrix and with that
00:04:21.830 --> 00:04:27.010
we can complete the construction of
the tree by adding the last species.
00:04:27.510 --> 00:04:34.040
Here is the phylogenetic tree we
can build from the input array.
00:04:34.430 --> 00:04:38.970
We can write it with an expression
with parenthesis as we have
00:04:38.970 --> 00:04:40.740
seen in a previous session.
00:04:42.360 --> 00:04:47.400
So it's indeed a quite simple
algorithm with interactive steps
00:04:47.910 --> 00:04:52.410
which are easy to understand
but it's too simple to be used
00:04:52.480 --> 00:04:57.630
in a realistic situation, we will
see in the next session why.