HW5 due tomorrow
HW6 out tonight

Review
In defining events, we noted that we often do not care about a
specific outcome to a random experiment but whether or not the
outcome is part of a special set of outcomes, which we called
events. For example, when flipping a fair coin 100 times, we may
care about whether or not the number of heads and tails is the same.
So we define E as the event that we get 50 heads and compute Pr[E] =
C(100, 50) / 2^100.

Random Variables
Sometimes, what we care about is a numerical value of an outcome.
For example, if we receive \$1 for each heads in 100 flips of a fair
coin and lose \$1 for each tails, we care about how much total money
we earn or lose. For any particular outcome ω, this is just
the number of heads in ω minus the number of tails. We can
compute the amount of money we win or lose for every one of the 2^n
outcomes. This is a "random variable."

A random variable is a function that assigns a value to each sample
point. More formally, a random variable X is a function from
Ω, the sample space to R, the set of real numbers. The value
of the random variable at sample point ω is denoted as
X(ω) (like any other function). (Note: a random variable is
neither random nor a variable, since it is a function. Why is it
called a random variable? I don't know, but since the outcome of an
experiment is random, the value of the random variable is a function
of a random outcome.)

Let's go back to coin flipping. Suppose I flip a fair coin once. Let
X be a random variable that is +1 if I get heads, -1 if I get tails.
What is X(ω) for each outcome ω? Well, X(h) = +1 and
X(t) = -1.

What if I flip a fair coin three times, where X is the amount I win
if I win \$1 for each heads, lost \$1 for each tails? Then
X(hhh) = 3    X(hth) = 1     X(thh) = 1     X(tth) = -1
X(hht) = 1    X(htt) = -1    X(tht) = -1    X(ttt) = -3

In general, if I flip a fair coin n times, then if X is the amount I
win, X(ω) = H(ω) - T(ω), where H(ω) is the
number of heads in ω and T(ω) is the number of tails in
ω. Notice that H and T are also random variables, since they
assign a real number to each sample point. Defining a random
variable in terms of simpler random variables is a very useful
procedure. (Notice a common theme with induction, counting,
probability, and now random variables? All involve reducing a hard
problem to simpler problems.)

We can actually note that H(ω) + T(&omega) = n, since each
flip must be heads or tails. So we can further write
X(ω) = H(ω) - (n - H(ω))
= 2 H(ω) - n.

Suppose that rather than handing back your exams individually in
section, we hand back a random exam to each person in lecture. Let
X be the number of students who get their own homeworks back. What
is X(ω) for each sample point ω?

First, let us determine the sample space. Each outcome is just a
permutation of the n students in class {1, ..., s}. For example,
ω = (2, 3, ..., n, 1) corresponds to student i getting the
exam for student mod(i+1, n). Since each outcome is a permutation,
there are n! outcomes, which we assume to have uniform probability.

Now let's define a series of simpler random variables. Let X_i be a
random variable that is 1 if the ith person gets his or her own
homework back, 0 otherwise. Such a 0/1-valued random variable is
called an "indicator random variable" and is a very common and
useful type of random variable. Then we have
X(ω) = X_1(ω) + ... + X_n(ω).

As a concrete example, suppose n = 3. Then the outcomes are
(1,2,3)  (1,3,2)  (2,1,3)  (2,3,1)  (3,1,2)  (3,2,1)
Then the values of the X_i are
1        1        0        0        0        0       X_1
1        0        0        0        0        1       X_2
1        0        1        0        0        0       X_3
and the values of X are
3        1        1        0        0        1        X.

We can use the same procedure in the case of coin flipping. Here, X
is the total amount won in n flips. Let X_i be the amount won in the
ith flip, +1 if it is heads, -1 if it is tails. Then
X(ω) = X_1(ω) + ... + X_n(ω).
Then in the case of n = 3, we have
hhh   hht   hth   htt   thh   tht   tth   ttt
1     1     1     1    -1    -1    -1    -1           X_1
1     1    -1    -1     1     1    -1    -1           X_1
1    -1     1    -1     1    -1     1    -1           X_1
3     1     1    -1     1    -1    -1    -3            X
as before.

Distributions
As with events, we often don't care about the value of a random
variable at each outcome. Rather, we care about the probability that
the random variable takes any particular value. In fact, we can
define events in terms of random variables. We define "X = a" to be
the event that the random variable X takes on value a. More
formally,
X = a ≡ {&omega : &omega ∈ Ω ∧ X(&omega) = a},
i.e. X = a is the set of outcomes ω for which X(ω) = a.

In the above example with three coin flips, we have
(X = 3) = {hhh}
(X = 1) = {hht, hth, thh}
(X = -1) = {htt, tht, tth}
(X = -3) = {ttt}.

Now since each X = a is an event, we can compute the probability
Pr[X = a]. With the coin flips, we have
Pr[X = 3] = 1/8
Pr[X = 1] = 3/8
Pr[X = -1] = 3/8
Pr[X = -3] = 1/8
This set of probabilities is called the "distribution" of the random
variable X.

We can also draw a graph to depict the distribution.
Pr[X=a]
|
3/8 |       *     *
|       *     *
1/8 | *     *     *     *
--+--+--+--+--+--+--+--
-3 -2 -1  0  1  2  3  a

Note that since X is a function from Ω to R, each ω has
exactly one value a such that X(ω) = a, so ω is in exactly
one of the events X = a. Thus, the events X = a partition the sample
space. This means that
(1) (X = a_1 ∩ X = a_2) = ∅ if a_1 ≠ a_2
(2) ∪_{a ∈ A} (X = a) = Ω, where A is the set of all
possible values that X(ω) can take on.
These two facts imply that the sum of all probabilities in the
distribution of X is 1.

In the example of passing back exams, what is the distribution of X,
the number of students who get their own exam back? For n = 3, we
have
Pr[X = 3] = 1/6
Pr[X = 1] = 1/2
Pr[X = 0] = 1/3.
What about arbitrary n? Let's come back to that later.

Let's take another look at the coin flipping example, but for
arbitrary n. The distribution of X, the amount of money won, seems
non-trivial. But since we know that X(&omega) = 2 H(ω) - n,
let's first compute a distribution for H, the number of heads.

If we flip a fair coin n times, in how many outcomes are there
exactly i heads? This is just choosing i out of the n flips to be
heads, so there are C(n, i) outcomes. Then |H = i| = C(n, i), so
Pr[H = i] = C(n, i) / 2^n. This is the distribution of H, where i is
an integer 0 <= i <= n.

Here is a graph representation of the distribution of H when n = 5:
Pr[X=a]
|       *  *
9/32 |       *  *
|       *  *
7/32 |       *  *
|       *  *
5/32 |    *  *  *  *
|    *  *  *  *
3/32 |    *  *  *  *
|    *  *  *  *
1/32 | *  *  *  *  *  *
--+--+--+--+--+--+--
0  1  2  3  4  5  a
You can see the beginnings of a bell curve.

It follows that the distribution of X is Pr[X = i] = Pr[2H-n = i] =
Pr[H = (i+n)/2], where -n <= i <= n. If (i+n)/2 is not an integer
(e.g. i i is odd and n is even), then this is 0.

Now suppose we are flipping a biased coin with probability p of heads.
Then what is the distribution of H, the number of heads?

First, how many outcomes are in H = i? As before, there are C(n, i).
But now we can't just divide by the size of the sample space, since
it is not uniform. Instead, we use the definition of the probability
of an event, that it is the sum of the probabilities of the outcomes
in the event. What is the probability of each outcome in H = i? We
already computed this in a previous lecture as p^i (1-p)^(n-i). So
Pr[H = i] = C(n, i) p^i (1-p)^(n-i),
where i is an integer 0 <= i <= n.

This is known as the "binomial distribution" with parameters p and
n, where p is the probability of getting heads in any one flip and n
is the number of flips. We use the shorthand H ~ Bin(n, p) to denote
that H is a random variable with a binomial distribution with
parameters n and p.

The graph of a binomial distribution with parameters p and n is
bell-shaped, though it will be skewed in one direction if p is not
1/2. See the reader for an example.

The binomial distribution comes up in any experiment with n
independent trials, each with probability of success p. As another
example, suppose we are sending n packets over a network, where we
choose the path from source to destination randomly and
independently for each packet. Suppose that the probability that a
single packet reaches its destination is p. Then if X is the number
of packets that reach the destination, X ~ Bin(n, p), so Pr[X = i] =
C(n, i) p^i (1-p)^(n-i) for 0 <= i <= n.