PA3 out in the next day or so

Review
Recall the coin flipping game from last time. We flip a biased coin
that has probability p of heads n times, and for each heads we win
\$1 and we lose \$1 for each tails. We are interested in how much
total money we win.

In general, we determined that if we flip the coin n times, then
W(ω) = 2 H(ω) - n, where H(ω) is the number of
heads in ω. We abbreviate this statement as W = 2H - n.

We then demonstrated that the distribution of H is
Pr[H = i] = C(n, i) p^i (1-p)^(n-i)
for integer i, 0 <= i <= n. This is a binomial distribution with
parameters n and p, denoted by H ~ Bin(n, p).

Once we computed the distribution of H, we computed the distribution
of W = 2H - n. We have
Pr[W = j] = Pr[2H - n = j]
= Pr[H = (j+n)/2]
= C(n, (j+n)/2) p^[(j+n)/2] (1-p)^[n-(j+n)/2]
for integer (j+n)/2, 0 <= (j+n)/2 <= n. Solving for j, we get -n
<= j <= n as the range of values W can take on, but with the caveat
that it only takes on even values of n is even and odd values if n
is odd so that (j+n)/2 is an integer.

Recall the exam example from last time. We had n students, each of
who receives a random exam back. We were interested in how many
students get their own exam back. Calling this random variable X, we
then defined X_i to be an indicator random variable that is 1 if the
ith student gets his or her own exam back and 0 otherwise. Then X =
X_1 + ... + X_n.

We defined the expected value of a random variable X to
be
E(X) = ∑_{ω ∈ Ω} X(ω) Pr[ω].
or equivalently
E(X) = ∑_{a ∈ A} a * Pr[X = a],
where A is the set of all values that X can take on.

We determined that in the coin flipping example, E(W) = 6p - 3 when
n = 3.

In the example of passing back exams, for n = 3, we had
E(X) = 3 Pr[X=3] + 1 Pr[X=1] + 0 Pr[X=0]
= 3/6 + 1/2
= 1.

Suppose we roll a fair die. We calculated the expected value of N, the
number that shows, as
Pr[N=i] = 1/6 for 1 <= i <= 6, and
E(X) = 1 Pr[N=1] + 2 Pr[N=2] + ... + 6 Pr[N=6]
= 1 * 1/6 + 2 * 1/6 + ... + 6 * 1/6
= 7/2.

We then computed in a tedious manner that if we roll two dice, the
expected value of their sum S is E(S) = 7.

As a final example, suppose we pick 100 Californians at uniformly at
random. How many Democrats do we expect out of this group, given
that 44.5\% of Californians are Democrat? Intuitively, we'd expect
44.5, but how can we arrive at that without computing a large
distribution?

Linearity of Expectation
Suppose we have two random variables X and Y. Let Z = X + Y. What is
E(Z)? We have, from the first definition of expectation,
E(Z) = ∑_{ω ∈ Ω} Z(ω) Pr[ω]
= ∑_{ω} (X(ω) + Y(ω)) Pr[ω]
= ∑_{ω} X(ω) Pr[ω] +
∑_{ω} Y(ω) Pr[ω]
= E(X) + E(Y).
Thus, E(X+Y) = E(X) + E(Y). We can similarly show that E(cX) =
cE(X), where c is a constant. These two facts are known as
"linearity of expectation."

Linearity of expectation is a powerful tool for computing
expectations. We have already seen examples of defining a random
variable in terms of other random variables, which allows us to use
linearity of expectation.

Let's go back to the example of rolling two dice. Let N_1 be the
value of the first die, N_2 the value of the second die, and S = N_1
+ N_2 the sum. Then we compute E(N_1) = E(N_2) = 7/2. Then E(S) =
E(N_1) + E(N_2) = 7, as before. This computation, however, is much
simpler than using the distribution of S.

In the exam example, let us compute the distribution of X_i, which
is 1 if the ith student gets his or her own exam back and 0
otherwise. There are n choices of exam, only one of which is a
match, so
Pr[X_i=1] = 1/n
Pr[X_i=0] = 1 - 1/n.
What is E(X_i)? It is
E(X_i) = Pr[X_i=1] = 1/n.

Note that in general for an indicator random variable Y, E(Y) =
Pr[Y=1].

Now let us compute E(X), where X is the total number of students
who get their own exam back. Since X = X_1 + ... + X_n, we have
E(X) = E(X_1) + ... + E(X_n)
= 1/n + ... + 1/n
= 1.
This matches what we get when n = 3. Notice that the expected number
of students who get their own exam back is always 1, regardless of
n! This is quite surprising.

Let us proceed in the same manner to compute E(H), the expected
number of heads when flipping a biased coin n times. Let H_i be an
indicator random variable that is 1 if the ith flip is heads, 0
otherwise. Then Pr[H_i] = p, so E(H_i) = p. Then the number of heads
is just
H = H_1 + ... + H_n,
so
E(H) = E(H_1) + ... + E(H_n)
= p + ... + p
= np.
Again, this is much simpler than using the distribution of H.

In general, for a random variable X ~ Bin(n, p), we have E(X) = np.

In our coin flipping game, our winnings W were given by W = 2H - n.
Thus,
E(W) = 2 E(H) - n
= 2np - n
= n (2p - 1).
(Note that the expectation of a constant E(c) is just c, so E(n) =
n.) Plugging in n = 3, we get E(W) = 6p - 3, as before. Again, this
method is much easier than using the distribution of W.

Finally, how many Democrats do we expect in a group of 100 random
Californians? Let D be the number of Democrats, D_i an indicator
random variable if the ith person is a Democrat. Then Pr[D_i=1] =
0.445, so E(D_i) = 0.445. Then E(D) = 44.5, as we expected.

We could have also noticed that D ~ Bin(100, 0.445) and immediately
concluded that E(D) = 100 * 0.445 = 44.5.

Geometric Distribution
We have already seen one important distribution, the binomial
distribution. We will look at two more important discrete
distributions.

Suppose I take a written driver's license test. Since I don't study,
I only have a probability p of passing the test, mostly by getting
lucky. Let T be the number of times I have to take the test before I
pass. (Assume I can take it as many times as necessary, perhaps by
paying a not negligible fee.) What is the distribution of T?

(Fun fact: A South Korean woman took the test 950 times before
passing.)
[Note: By "before passing," we mean that she passed in the 950th
attempt, not the 951st. We may use this phrase again.]

Before we determine the distribution of T, we should figure out what
the sample space of the experiment is. An outcome consists of a
series of 0 or more failures followed by a success, since I keep
retaking the test until I pass it. Thus, if f is a failure and c is
passing, we get the outcomes
Ω = {c, fc, ffc, fffc, ffffc, ...}.
How many outcomes are there? There is no upper bound on how many
times I will have to take the test, since I can get very unlucky and
keep failing. So the number of outcomes is infinite!

What is the probability of each outcome? Well, let's assume that the
result of a test is independent each time I take it. (I really
haven't studied, so I'm just guessing blindly each time.) Then the
probability of passing a test is p and of failing is c, so we get
Pr[c] = p, Pr[fc] = (1-p)p, Pr[ffc] = (1-p)^2 p, ...
Do these probabilities add to 1? Well, their sum is
∑_{ω ∈ &Omega} Pr[ω]
= ∑_{i=0}^∞ (1-p)^i p
= p ∑_{i=0}^∞ (1-p)^i
= p 1/(1-(1-p))   [sum of geom. series r^i is 1/r if -1 < r < 1]
= 1.
So this probability assignment is valid.

We continue with this example next time.