MT2 stats: μ ≈ 77, σ ≈ 16
Updated MT2 solutions posted on Piazza
Exams can be picked up from Soda front office
PA3 due Thursday
HW8 due next Wednesday
Final exam next Thursday

Review
Recall that a probability space consists of the following:
(1) A random experiment.
(2) The sample space (set of possible outcomes).
(3) The probability of each possible outcome of the experiment.

Further recall that the probabilities must satisfy the following:
(1) ∀ω∈Ω . 0 <= Pr[ω] <= 1
(2) ∑_{ω∈Ω} Pr[ω] = 1

An event is a subset of the sample space, i.e. a set of outcomes
from the sample space. The probability of an event E is the sum of
the probabilities of the outcomes in E:
Pr[E] = ∑_{ω∈E} Pr[ω].
In the case of a uniform distribution, this simplifies to
Pr[E] = |E|/|Ω|.

A random variable is a function from the sample space Ω to the
reals.

Continuous Probability
Suppose that SPECTRE has captured James Bond. They handcuff him,
knock him out, and stuff him in the back of a plane to be
transported from their hideout to their secret underground lair 1000
miles away. When the plane lands at the lair, they discover that
Bond has apparently woken up mid-flight, slipped out of his
handcuffs, killed the flight attendant, and parachuted out of the
plane, to land in the desert somewhere along the 1000-mile flight
path.

Since SPECTRE knows nothing about when Bond escaped, his escape
point is equally likely to be anywhere along this 1000-mile segment.
What is the probability that he is at each point along this segment?

In the discrete version of this problem, we would assume that he
could be at any of a finite number of positions, say one per mile.
The sample space would be
Ω = {1, 2, ..., 1000},
with
Pr[ω] = 1/|Ω| = 1/1000
for each sample point ω ∈ Ω.

If we try to follow the same procedure here, we would define the
sample space as the set of real numbers
Ω = [0, 1000].
There are infinitely many possibilities, so we get
Pr[ω] = 0
for each sample point! And this is in fact the case; the probability
that Bond lands at any exact position is 0.

How do we make sense of this sample space? Rather than working with
outcomes, we work directly with events. Recall that an event E is a
subset of the sample space, and that in a uniform sample space, the
probability of E is the size of E divided by the size of the sample
space:
Pr[E] = |E|/|Ω|.
By analogy here, we use intervals as events. For example, the
interval [0, 50] is an event that Bond lands within 50 miles from
the hideout. What is the probability of this event? Again, it should
be the "size" of this interval divided by the "size" of the sample
space. Exactly what "size" means requires measure theory and is
beyond the scope of this class, but it should be intuitively clear
that the size we need here is the length of an interval. (Later, we
will see another definition of "size" for infinite sets that is not
applicable here.) Thus,
Pr[[0, 50]] = (length of [0, 50]) / (length of [0, 1000])
= 1/20.

Suppose that SPECTRE sends out its henchman in special dune buggies
(with frickin' lasers!) from each base. However, these buggies have
only a 50-mile range. What is the probability that Bond landed in
range of the buggies?

Let E be the event that Bond landed within 50 miles of either base.
Then
E = [0, 50] ∪ [950, 1000],
so
Pr[E] = Pr[[0, 50]] + Pr[[950, 1000]]
(since the intervals are disjoint)
= 1/10.

Suppose that one of the buggies finds Bond. In running away, he
shoots over his shoulder at the buggy. Suppose that he hits a random
spot on the buggy, which is 5 feet long and 4 feet high. The gas
tank presents a circular target with a radius of 1 foot. What is the
probability that Bond hits the gas tank, causing the buggy to
explode and allowing him to escape?

By analogy with the 1D case, the sample space here is the rectangle
Ω = [0, 5] × [0, 4],
and the "size" of Ω is its area, 20 square feet. Then the
"size" of the event G, that Bond hits the gas tank, is the area of
the gas tank, π square feet. So the probability that he hits the
gas tank is
Pr[G] = π/20.

Continuous Random Variables
In discrete probability, we defined random variables as functions
from the sample space Ω to R. In reality, the range of a
discrete random variable must be a finite or countably infinite
subset of R (we will define what countably infinite means later). It
is impossible for the range to be a continuous subset of R.

In continuous probability, however, the range of a random variable
may be a continuous subset of R. For example, if we define a random
variable X corresponding to Bond's landing position, then X takes on
any value in the range [0, 1000], with uniform probability. Thus, we
have Pr[X = a] = 0 for all possible values of a.

As with the events above, we can instead use intervals
Pr[a < X <= b]
that have meaningful probabilities. (Note that it doesn't matter if
we include the endpoints or not, since they have 0 probability.) If
we know the value of
Pr[X <= x]
for all values of a, then we can compute the probability of an
interval as
Pr[a < X <= b] = Pr[X <= b] - Pr[X <= a].
Thus, if we have a function
G(x) = Pr[X <= x],
we have all the information we need about a continuous random
variable. This is called the "cumulative distribution function",
abbreviated "cdf". Note that it is necessary to define this function
for all values of x ∈ R.

In the case of Bond's position, Pr[X <= x] is the probability of the
interval [0, x] when x is in the range [0, 1000]. Thus, the cdf is
{ 0       if x < 0
G(x) = { x/1000  if 0 <= x <= 1000
{ 1       if x > 1000.
Now what is the probability that Bond is within 50 miles of the center?
We have
Pr[450 < X <= 500] = Pr[X <= 500] - Pr[X <= 450]
= G(550) - G(450)
= 550/1000 - 450/1000
= 1/10.

As a more complex example, suppose that Bond hits the gas tank when
shooting at the buggy at some uniformly random location on the gas
tank. Let Y be the distance (in feet) of where he hits from the center
of the tank. What is the cdf of Y?

The area of the tank is 1 (we will leave off units, but know that we
are working in feet and square feet). The probability that he hits
less than y from the center is the area of the circle of radius x
divided by the total area, or πy^2/(π1^2) = y^2. Thus, the
cdf of Y is
{ 0    0 if y < 1
F(y) = { y^2  if 0 <= y <= 1
{ 1    1 if y > 1.

Now we determine the probability of any interval for Y. For example,
Pr[0.5 < Y <= 0.6] = Pr[Y <= 0.6] - Pr[Y <= 0.5]
= F(0.6) - F(0.5)
= 0.36 - 0.25
= 0.11.

While the cdf lets us do probability calculations, it does not give
us a good idea about where the value of the random variable is more
likely to be. For example, Y above is not uniformly distributed but
is more likely to be further away from the center, since there is
more area to hit there. This is hard to tell from the cdf.

What we'd like to know is the probability in some tiny interval
around a particular y:
Pr[y < Y <= y + δ].
If we compare the value of this probability for different y, we get
an idea of where the value of the random variable Y is more likely
to be located.

Of course, the probability above depends on how small of an interval
we use, i.e. the size of δ. To remove this dependency, let's
look at the ratio
Pr[y < Y <= y + δ] / δ.
This is the probability per unit length near y, or the "probability
density" at y. To get an exact expression, we take the limit
lim_{δ->0} Pr[y < Y <= y + δ] / δ
= lim_{δ->0} (Pr[Y <= y + δ] - Pr[Y <= y]) / δ
= lim_{δ->0} (F(y + δ) - F(y)) / δ
= d/dy F(y),
recalling the fundamental theorem of calculus. This leaves us with a
function
f(y) = d/dy F(y),
where f(y) is the "probability density function" or "pdf". Of
course, the fundamental theorem of calculus also tells us how to
undo this operation to get from the pdf to the cdf, by integrating:
F(y) = ∫_{-∞}^y f(x) dx.
Thus, the pdf and cdf contain the same information, and we can
obtain probabilities for intervals by integrating:
Pr[a < Y <= b] = F(b) - F(a)
= ∫_a^b f(x) dx.
We can make sense of this by discretizing, by dividing the interval
into a large number n of smaller intervals, each of size
δ = (b - a) / n,
we get
Pr[y < Y <= y + δ] ≈ f(y) δ,
so
Pr[a < Y <= b]
≈ ∑_{i=0}^{n-1} Pr[a + iδ < Y <= a + (i+1)δ]
≈ ∑_{i=0}^{n-1} f(a + iδ) δ.
Taking the limit of n -> ∞ gives us the integral above.

As an example, what is the pdf of X, Bond's position? It is
{ 0       if x < 0
g(x) = d/dx F(x) = { 1/1000  if 0 <= x <= 1000
{ 0       if x > 1000.
The probability density is 0 outside the range [0, 1000] and uniform
within that range, as we expect. Then we can use the pdf to compute
Pr[450 < X <= 550] = ∫_{450}^{550} g(x) dx
= ∫_{450}^{550} 1/1000 dx
= x/1000 |_{450}^{550}
= 550/1000 - 450/1000
= 1/10.

For the pdf of Y, the distance from the center of the gas tank to the
location of Bond's shot, is
{ 0   if x < 0
f(y) = d/dx F(y) = { 2x  if 0 <= x <= 1
{ 0   if x > 1.
So the density is higher further away from the center, as expected.

Since the pdf and cdf give us the same information, we will use
whichever is more convenient. Often, the cdf is easier to determine
directly. However, the pdf is what allows us to compute expectations
and variances.

Continuous Expectation and Variance
In the discrete case, the expectation of a random variable Z is
E(Z) = ∑_{a ∈ A} a * Pr[Z = a],
where A is the set of all values that Z can take on.

Continuous random variables can take on any real number. However, if
we discretize a continuous random variable Y, we get something like
E(Y) ≈ ∑_{b ∈ B} b Pr[b < Y <= b + δ]
= ∑_{b ∈ B} b f(b) δ,
where B is a countably infinite set of values that are δ
apart. Then if we undo the discretization, we get
E(Y) = ∫_{-∞}^{+∞} x f(x) dx.
This is the expectation of a continuous random variable.

The variance of a continuous random variable is exactly the same as
for a discrete random variable:
Var(Y) = E(Y^2) - E(Y)^2,
where E(Y) is as above and
E(Y^2) = ∫_{-∞}^{+∞} x^2 f(x) dx.

As an example, what is E(X), Bond's expected position? It is
E(X) = ∫_{-∞}^{+∞} x g(x) dx
= ∫_{0}^{1000} x/1000 dx.
= x^2/2000 |_0^{1000}
= 1000^2/2000
= 1000/2 = 500,
as we would expect.

Then
E(X^2) = ∫_{-∞}^{+∞} x^2 g(x) dx
= ∫_{0}^{1000} x^2/1000 dx.
= x^3/3000 |_0^{1000}
= 1000^3/3000
= 1000^2/3.
Then
Var(X) = E(X^2) - E(X)^2
= 1000^2/3 - 1000^2/4
= 1000^2/12.

In general, for a random variable Z that is uniformly distributed in
the interval [0, d], we get
E(Z) = d/2
Var(Z) = d^2/12.
A random variable W that is uniformly distirbuted in the interval
[a, a+d] is just
W = Z + a,
so we get
E(W) = E(Z) + a
= a + d/2
Var(W) = Var(Z)
= d^2/12.