- What does variation look like? Well, it depends on what kinda data
that we're talking about. To be concrete, let's focus on continuous
data and by that, I mean real numbers or numbers measured with decimal
points. So, thinking about continuous data, a lot of the data in
experimental biology follows a normal or a bell curve shape
distribution.
By that, I mean that if these are the values that you can observe,
there is some typical value that we'd denote with a Greek letter or
the mean that is often observed. But variability refers to the fact
that when you make a measurement of this, there may be some values
that are either higher or lower than the mean. This axis in the plot
is showing how likely it is to get these different observations. So, a
point here, quite far below the mean, has a low value on the vertical
axis. That means that it's possible but unlikely to see this kind of a
measurement. As we move closer to the mean value, the probability of
seeing it or the likelihood of seeing it is higher and the mean is the
most likely value here in the middle of the distribution.
This distribution is symmetric, meaning that this side is a mirror
image of this side. So likewise, as we move away from the mean and get
higher and higher, we get values here that are certainly possible but
they are less likely than the mean to occur. So, most variation is
occurring in here in this space, close to the mean, meaning most of
your data is here, but there is some small chance that if you collect
enough data, you're gonna see some very extreme values that are quite
different from the mean.
This is what we're talking about when we say variation or variability
in the data. We have a technical term for it, which is a variance. It
describes how wide this distribution is and its square root is the
standard deviation which we denote by sigma. We have a mean and a
variance or standard deviation for this distribution. Suppose we had
another distribution that had the same mean but it was narrower,
meaning that seeing values farther away from the mean are even less
likely. So, to see this same value here is even lower on the vertical
axis and the chance of seeing the mean value here even though it's the
same mean value, the chance is even higher. It's way up here on the
vertical axis. So, this distribution being narrower is going to have a
smaller variance or a smaller standard deviation. It has less noise
than the blue distribution. That's what we mean when we talk about
variation in your data.