```Administrative info
Updated MT2 solutions posted on Piazza
Exams can be picked up from Soda front office
PA3 due today
HW8 due next Wednesday
Final exam next Thursday
No regrades for HW8 (not enough time) or final exam (UCB policy)

Review
In the continuous sample spaces we consider in this class, the
probability of any particular outcome is 0. So instead of working
with outcomes, we work directly with events.

EX: James Bond jumps out of a plane and lands at a position
uniformly distributed in [0, 1000]. The probability that he
lands in an interval [a, b], where 0 <= a <= b <= 1000, is
(b-a)/1000.

Often, we work directly with random variables. A continuous random
variable has a range that includes a continuous subset of R. Thus,
for a random variable X, Pr[X = a] = 0 for any a. Again, we work
with intervals, such as Pr[a < X <= b], which can have non-zero
probability.

We can describe a continuous random variable X in two ways.
(1) The cumulative distribution function (cdf):
F(x) = Pr[X <= x].
Then
Pr[a < X <= b] = F(b) - F(a).
The cdf F(x) must satisfy a few properties.
(a) 0 <= F(x) <= 1 for all x∈R, since F(x) is a
probability.
(b) F(x) must be monotonically increasing:
F(x) <= F(y) if x <= y.
(2) The probability density function (pdf):
f(x) = d/dx F(x).
Then
F(x) = ∫_{-∞}^y f(y) dy
Pr[a < X <= b] = ∫_a^b f(y) dy.
The pdf f(x) must satisfy a few properties.
(a) f(x) >= 0 for all x∈R. Otherwise it would be possible
to find an interval for which the integral is negative,
resulting in a negative probability.
(b) ∫_{-∞}^{+∞} f(x) dx = 1, i.e. the probability
Pr[-∞ < X < +∞] is 1.

The pdf tells us where there is a higher probability density. Its
graph is similar to the distribution graph for a discrete random
variable.

EX: Let X be Bond's landing position when he jumps out of the plane.
Then the cdf of X is
{ 0       if x < 0
G(x) = { x/1000  if 0 <= x <= 1
{ 1       if x > 0
The pdf of X is
{ 0        if x < 0
g(x) = { 1/1000   if 0 <= x <= 1
{ 0        if x > 0
A plot of the pdf:
f(x)
____
1/1000
_________    _________
0  1000      x
As you can see, he is likely likely to land anywhere in [0,
1000].

EX: James Bond shoots at a 1 foot radius gas tank, hitting any point
on it with uniform probability. What is the pdf of the distance
from the center to where he hits?

Let Y = distance of hit from center. Then the cdf of Y is
{ 0    if y < 0
F(y) = { y^2  if 0 <= y <= 1
{ 1    if y > 0
The pdf of Y is
{ 0    if y < 0
f(y) = { 2y   if 0 <= y <= 1
{ 0    if y > 0
A plot of the pdf:
f(y)
2           /
/
1         /
_________/   _________
0   1        y
This shows us that there is higher density away from the center
than closer to it. So Bond is more likely to hit further away
from the center than closer to it.

As you can see in the above example, the pdf is not restricted to the
range [0, 1]. This is because the pdf is a density, not an
actual probability. We defined the pdf as
f(y) = lim_{δ->0} Pr[y < Y <= y + δ] / δ,
so it is the limit of a tiny probability divided by a tiny length,
which can give us any non-negative value.

We derived an expression for the expectation of a random variable:
E(X) = ∫_{-∞}^{+∞} x f(x) dx.
Then the variance is defined as in the discrete case, with
E(X^2) = ∫_{-∞}^{+∞} x^2 f(x) dx.

EX: What is E(X), Bond's expected landing position? It is
E(X) = ∫_{-∞}^{+∞} x g(x) dx
= ∫_{0}^{1000} x/1000 dx.
= x^2/2000 |_0^{1000}
= 1000^2/2000
= 1000/2 = 500.
Then
E(X^2) = ∫_{-∞}^{+∞} x^2 g(x) dx
= ∫_{0}^{1000} x^2/1000 dx.
= x^3/3000 |_0^{1000}
= 1000^3/3000
= 1000^2/3.
Then
Var(X) = E(X^2) - E(X)^2
= 1000^2/3 - 1000^2/4
= 1000^2/12.

In general, for a random variable Z that is uniformly distributed in
the interval [0, d], we get
E(Z) = d/2
Var(Z) = d^2/12.
A random variable W that is uniformly distirbuted in the interval
[a, a+d] is just
W = Z + a,
so we get
E(W) = E(Z) + a
= a + d/2
Var(W) = Var(Z)
= d^2/12.

Exponential Distribution
Recall that if we have a number of independent trials, each of which
has a probability p of success, then the number of trials T until
the first success follows a geometric distribution
T ~ Geom(p),
so
Pr[T = i] = p(1-p)^{i-1} for all i∈Z^+,
and
Pr[T > i] = (1-p)^i for all i∈N.

Suppose now that we perform a large number of trials every second,
where each trial has a small probability p of success, so that we
perform a trial every δ seconds for some small δ. By
linearity of expectation, the average rate of success λ per
second is
λ = p / δ,
since there are 1/δ trials per second, each with probability
of success p, so we have
p = λ δ.
Let S be the number of seconds until the first success. Then
Pr[T > k] = Pr[S > kδ]   (since each trial takes δ seconds)
= Pr[S > t],
where t = kδ and t >= 0 since k >= 0. Then
Pr[S > t] = Pr[T > k]
= (1 - p)^k            (since T ~ Geom(p))
= (1 - p)^{t/δ}        (since k = t / δ)
= (1 - λδ)^{t/δ}       (since p = λ δ)
≈ (e^{-λδ})^{t/δ}      (since p = λ δ is small)
= e^{-λ t}.
Finally, we get, for t >= 0,
Pr[S <= t] = 1 - Pr[S > t]
= 1 - e^{-λ t}
as the cdf of S, which gives us a pdf
f(t) = d/dt (1 - e^{-λ t})
= λ e^{-λ t}.
Both the pdf and cdf are 0 if t < 0.

This is the "exponential distribution," which has pdf
f(t) = { λ e^{-λ t}   if t >= 0
{ 0            if t < 0
It is the continuous version of the geometric distribution and tells
us how long we need to wait for a success, if successes can occur at
any time and λ is the average rate of success per unit time.
We write
S ~ Exp(λ).
Computing the expectation and variance of an exponential random
variable requires integration by parts, and we get
E(S) = 1/λ
Var(S) = 1/λ^2.
These are similar to the geometric distribution, where we got an
expectation of 1/p and a variance of (1-p)/p^2.

Note that though p is restricted to [0, 1], since it is a
probability, λ can be any non-negative value, since it is the
average rate of success. In particular, it may be the case that we
expect many successes in a unit of time, in which case λ will
be greater than 1.

Recall the relationship between the binomial and the geometric
distribution. They both examine what happens when have a series of
independent trials, each with probability p of success. The binomial
distribution tells us how many successes we get in a fixed number of
trials, while the geometric distribution tells us in which trial the
first success occurs.

The exponential distribution has a similar relationship to the
Poisson distribution. They both examine what happens when we have a
particular average rate of success λ. The Poisson tells us
how many successes we get in a fixed unit length of time, and the
exponential tells us at what time the first success occurs.

EX: Suppose a web server processes on average 1.2 requests per
second. Then the amount of time between requests follows an
exponential distribution with λ = 1.2. Suppose a request
comes in. What is the probability that a new request will come
in within the next second?

Let S be the amount of time until the next request. Then
S ~ Exp(1.2).
Then
Pr[S <= 1] = ∫_0^1 1.2 e^{-1.2 t} dt
= -e^{-1.2 t} |_0^1
= -e^{-1.2} + 1
≈ 0.7.

Note that we could use the Poisson distribution to solve this
problem. Let R be the number of requests in the next second.
Then
R ~ Poiss(1.2).
Then
Pr[S <= 1] = Pr[R > 1]
= 1 - Pr[R = 0]
= 1 - 1.2^0/0! e^{-1.2}
= 1 - e^{-1.2},
as we computed before.

Normal Distribution
A random variable X has a "normal distribution", also called a
"Gaussian distribution,", if it has a pdf of the form
f(x) = 1/√{2πσ^2} e^{-(x-μ)^2/(2σ^2)}
for some values of μ and σ. It can then be shown that
∫_{-∞}^{+∞} f(x) dx = 1,
as required for a pdf, and that
E(X) = μ
Var(X) = σ^2,
hence the parameters μ and σ. We write
X ~ N(μ, σ^2).

The pdf of a normal distribution is a symmetric bell-shaped curve
centered at μ, with a width determined by σ.

The "standard normal distribution" has parameters μ = 0, σ
= 1. So if Y is a standard normal, then
Y ~ N(0, 1),
and the cdf of Y is
g(y) = 1/√{2π} e^{-y^2/2}.

More on the normal distribution next time.

```