HW6 due tomorrow
MT2 next Tuesday
Same location and policies as MT1
Cover through polling/LLN (Wednesday)

Review
We have already seen one important distribution, the binomial
distribution. A random variable X ~ Bin(n, p) has the distribution
Pr[X = i] = C(n, i) p^i (1-p)^(n-i)
for integer i, 0 <= i <= n. This distribution arises whenever we
have a fixed number of trials n, the trials are mutually
independent, the probability of success of any one trial is p, and
we are counting the number of successes.

We aslo computed the expectation of a binomial distribution using
indicator random variables:
X = X_1 + ... + X_n
X_i = { 1 if the ith trial is successful
{ 0 otherwise
E(X_i) = Pr[X_i = 1] = p
E(X) = E(X_1) + ... + E(X_n)
= np.

Now we turn our attention to two more important discrete
distributions.

Geometric Distribution
Suppose I take a written driver's license test. Since I don't study,
I only have a probability p of passing the test, mostly by getting
lucky. Let T be the number of times I have to take the test before I
pass. (Assume I can take it as many times as necessary, perhaps by
paying a not negligible fee.) What is the distribution of T?

(Fun fact: A South Korean woman took the test 950 times before
passing.)
[Note: By "before passing," we mean that she passed in the 950th
attempt, not the 951st. We may use this phrase again.]

Before we determine the distribution of T, we should figure out what
the sample space of the experiment is. An outcome consists of a
series of 0 or more failures followed by a success, since I keep
retaking the test until I pass it. Thus, if f is a failure and c is
passing, we get the outcomes
Ω = {c, fc, ffc, fffc, ffffc, ...}.
How many outcomes are there? There is no upper bound on how many
times I will have to take the test, since I can get very unlucky and
keep failing. So the number of outcomes is infinite!

What is the probability of each outcome? Well, let's assume that the
result of a test is independent each time I take it. (I really
haven't studied, so I'm just guessing blindly each time.) Then the
probability of passing a test is p and of failing is c, so we get
Pr[c] = p, Pr[fc] = (1-p)p, Pr[ffc] = (1-p)^2 p, ...
Do these probabilities add to 1? Well, their sum is
∑_{ω ∈ &Omega} Pr[ω]
= ∑_{i=0}^∞ (1-p)^i p
= p ∑_{i=0}^∞ (1-p)^i
= p 1/(1-(1-p))
(sum of geom. series r^i is 1/(1-r) if -1 < r < 1)
= 1.
So this probability assignment is valid.

Since the event T = i has only the single outcome f^{i-1}c, we get
Pr[T=1] = p, Pr[T=2] = (1-p)p, Pr[T=3] = (1-p)^2 p, ...
as the distribution of T, and the probabilities sum to 1, as
required for a random variable.

The distribution of T is known as a "geometric distribution" with
parameter p, T ~ Geom(p). This arises anytime we have a sequence of
independent trials, each of which has probability p of succes, and
we want to know when the first success occurs. (This is unlike the
binomial distribution, when we wanted to know how many success occur
in a fixed number n of independent trials.)

Now how many times can I expect to take the test before passing? We
want E(T). We get
E(T) = p + 2(1-p)p + 3(1-p)^2 p + ...
= ∑_{i=1}^∞ i(1-p)^{i-1}.
This isn't a pure geometric series, so directly computing the sum is
harder.

Let's use another method. It turns out that for any random variable
X that only takes on values in N,
E(X) = Pr[X >= 1] + Pr[X >= 2] + Pr[X >= 3] + ...
= ∑_{i=1}^∞ Pr[X >= i].
Proof:
Let p_i = Pr[X = i]. Then by definition,
E(X) = 0 p_0 + 1 p_1 + 2 p_2 + 3 p_3 + 4 p_4 + ...
= p_1 +
(p_2 + p_2) +
(p_3 + p_3 + p_3) +
(p_4 + p_4 + p_4 + p_4) +
...
= (p_1 + p_2 + p_3 + p_4 + ...) +
(p_2 + p_3 + p_4 + ...) +
(p_3 + p_4 + ...) +
...                   (combining columns from previous step)
= Pr[X >= 1] + Pr[X >= 2] + Pr[X >= 3] + Pr[X >= 4] + ...

Now what is Pr[T >= i]? This is the probability that I fail the
first i-1 tests, so
Pr[T >= i] = (1-p)^(i-1).
Then
E(T) = ∑_{i=1}^∞ Pr[X >= i]
= ∑_{i=1}^∞ (1-p)^(i-1)
= ∑_{j=0}^∞ (1-p)^j                 (with j = i - 1)
= 1/(1-(1-p))                              (geometric series)
= 1/p.

So I expect to take the test 1/p times before passing.

Here's another way to calculate E(T).
E(T) = p + 2p(1-p) + 3p(1-p)^2 + 4p(1-p)^3 + ...
(1-p)E(T) =      p(1-p) + 2p(1-p)^2 + 3p(1-p)^3 + ...
pE(T) = p +  p(1-p) +  p(1-p)^2 +  p(1-p)^3 + ...
= 1
E(T) = 1/p
In the second line, we multiplied E(T) by (1-p) and added some
whitespace to line up terms with the previous line. Then we
subtracted (1-p)E(T) from E(T) to get the third line. The resulting
right-hand side is the sum of the probabilities of each event T = i,
so it must be 1.

To summarize, for a random variable X ~ Geom(p), we've computed
(1) Pr[X = i] = (1-p)^(i-1) p
(2) Pr[X >= i] = (1-p)^(i-1)
(3) E(X) = 1/p.

Other examples of geometrically distributed random variables are the
number of runs before a system fails, the number of shots that must
be taken before hitting a target, and the number of coin flips

Coupon Collector Redux
Recall the coupon collector problem. We buy cereal boxes, each of
which contains a baseball card for one of the n Giants players. How
many do I expect to buy before I get a Panda card?

Let P be the number of boxes I buy before I get the Panda. Then,
each time I buy a box, I have 1/n chance of getting the Panda, and
the boxes are independent. So P ~ Geom(1/n), and E(P) = n.

Now suppose I want the entire team? Let T be the number of boxes I
buy to get the entire team. It is tempting to define a separate random
variable for each player,
P = # of boxes to get the Panda
B = # of boxes to get the Beard
F = # of boxes to get the Freak
...
but T ≠ P + B + F + ... (Can you see why? If we just consider
these three players and it takes me 1 box to get the Panda, 2 to get
the Beard, 3 to get the Freak, then T = 3, but P + B + F = 1 + 2 + 3
= 6.) So we need another approach.

Let's instead define random variables P_i as the number of boxes it
takes to get a new player after I get the (i-1)th player. (In
the above example, P_1 = P_2 = P_3 = 1, so T = P_1 + P_2 + P_3.)
Then it is the case that T = P_1 + ... + P_n, and we can appeal to
linearity of expectation.

Now E(P_i) is not constant for all i. In particular, I always get a
new player in the first box, so Pr[P_1 = 1] = 1 and E(P_1) = 1. But
then for the second box, I can get the same player as the first, so
Pr[P_2 = 1] ≠ 1.

Note, however, that I do have probability (n-1)/n of getting a new
player, and P_2 is the first occurrence of a new player. So P_2 ~
Geom((n-1)/n), and E(P_2) = n/(n-1).

By the same reasoning, P_i ~ Geom((n-i+1)/n), so E(P_i) = n/(n-i+1).
So by linearity of expectation,
E(T) = n/n + n/(n-1) + n/(n-2) + ... + n/2 + n/1
= n ∑_{i=1}^n 1/i.
The above sum has a good approximation
∑_{i=1}^n 1/i ≈ ln(n) + γ,
where γ ≈ 0.5772 is Euler's constant. So we get
E(T) ≈ n(ln(n) + 0.58).

Recall our previous result, were we computed that in order to have a
50% chance of getting all n cards, we needed to buy n ln(2n) =
n(ln(n) + ln(2)) ≈ n(ln(n) + 0.69) boxes.

It is not the case in general that Pr[X > E(X)] ≈ 1/2. The
simplest counter example is an indicator random variable Y, Pr[Y =
1] = p. Then E(Y) = p, so Pr[Y > E(Y)] = p ≠ 1/2. So the two
results for coupon collecting are not directly comparable.

Poisson Distribution
Suppose we throw n balls into n/λ bins, where n is large and
λ is a constant. We are interested in how many balls land in
bin 1. Call this X, then X ~ Bin(n, λ/n), and E(X) =
λ. In more detail, the distribution is
Pr[X = i] = C(n, i) (λ/n)^i (1 - λ/n)^(n-i),
for 0 <= i <= n.

We know n is large, so let's approximate this distribution. Let's
define p_i ≡ Pr[X = i]. Then we have
p_0 = Pr[X = 0] = (1 - λ/n)^n.
Recall the Taylor series for e^x:
e^x = 1 + x + x^2/2! + x^3/3! + ...
Plugging in x = -y, we get e^{-y} ≈ 1 - y, so
(1 - λ/n) ≈ e^{-λ/n}
(1 - λ/n)^n ≈ (e^{-λ/n})^n
= e^{-λ}.
Thus, p_0 ≈ e^{-λ}.

What about p_i in the general case? Let's look at the ratio
p_i/p_{i-1}.
p_i/p_{i-1} = [C(n,i) (λ/n)^i (1-λ/n)^{n-i}]/
[C(n,i-1) (λ/n)^{i-1} (1-λ/n)^{n-i+1}]
= [C(n,i) λ/n]/
[C(n,i-1) (1-λ/n)]
= [C(n,i) λ/n]/
[C(n,i-1) (n-λ)/n]
= C(n,i)/C(n,i-1) λ/(n-λ).
Now let's look at the ratio C(n,i)/C(n,i-1). We have
C(n,i)/C(n,i-1) = (n!/[i!(n-i)!])/
(n!/[(i-1)!(n-i+1)!])
= (i-1)!/i! (n-i+1)!/(n-i)!
= 1/i (n-i+1)
= (n-i+1)/i.
Plugging this in to our expression for p_i/p_{i-1}, we get
p_i/p_{i-1} = (n-i+1)/i λ/(n-λ)
= (n-i+1)/(n-λ) λ/i.
Now in the limit n -> ∞, (n-i+1)/(n-λ) -> 1, so
p_i/p_{i-1} ≈ λ/i,
p_i ≈ p_{i-1} λ/i.

This gives us a recurrence:
p_0 = exp(-λ)
p_1 = exp(-λ) λ
p_2 = exp(-λ) λ^2/2
p_3 = exp(-λ) λ^3/(2*3)
p_4 = exp(-λ) λ^4/(2*3*4)
...
p_i = exp(-λ) λ^i/i!
So we get a new distribution
Pr[X = i] = (λ^i)/i! e^{-λ}, i ∈ N.
This is a "Poisson distribution" with parameter λ, and we
write X ~ Poiss(λ). (Note that though in the original
binomial distribution, i is restricted to 0 <= i <= n, here it is
not.)

Let's check to make sure this is a proper distribution. We have
∑_{i=0}^∞ p_i
= ∑_{i=0}^∞ (λ^i)/i! e^{-λ}
= e^{-λ} ∑_{i=0}^∞ (λ^i)/i!
= e^{-λ} e^{λ}    (using the Taylor series above)
= 1.

Now let's compute E(X):
E(X) = ∑_{i=0}^∞ i (λ^i)/i! e^{-λ}
= e^{-λ} ∑_{i=0}^∞ i (λ^i)/i!
= e^{-λ} ∑_{i=1}^∞ i (λ^i)/i!
= e^{-λ} ∑_{i=1}^∞ (λ^i)/(i-1)!
= λ e^{-λ}
∑_{i=1}^∞ (λ^(i-1))/(i-1)!
= λ e^{-λ}
∑_{j=0}^∞ (λ^j)/j!        (with j = i - 1)
= λ e^{-λ} e^{λ}
= λ.
This is the same as that of the original binomial disribution.

The Poisson distribution is widely used for modeling rare events. It
is a good approximation of the binomial distribution when n >= 20
and p <= 0.05, and a very good approximation when n >= 100 and np
<= 10.