Things covered: (only an overview here, see slides for details)
- motivation: we need a framework to handle a continuous space of outcomes, like "picking a random point on the unit sphere"
- expectations and law of large numbers
-
densities
$\phi$ and distribution functions$F$ , where$\phi = \frac{d}{dt}F(t)$ - probability spaces
$(\Omega, \Sigma, \Bbb P)$ - union bound:
$P\left(\bigcup_\ell B_\ell\right) \leq \sum_\ell P\left(B_\ell\right)$ - moments
$E(X^p)$ and absolute moments$E(|X|^p)$ -
$L^p(\Omega, P)$ space - "triangle inequality"
$\left(E(|X+Y|^p)\right)^{\frac{1}{p}} \leq \left(E(|X|^p)\right)^{\frac{1}{p}} + \left(E(|Y|^p)\right)^{\frac{1}{p}}$ -
Hoelder inequality
$|E(XY)| \leq \left(E(|X|^p)\right)^{\frac{1}{p}} \left(E(|Y|^q)\right)^{\frac{1}{q}}$ if$\frac{1}{p} + \frac{1}{q} = 1$ - Cavalieri's formula: for
$p > 0$ ,$$E(|X|^p) = p \int_0^\infty P(|X| \geq t) t^{p-1};dt$$ - corollary:
$E(X) = \int_0^\infty P(X \geq t);dt - \int_0^\infty P(X \leq -t);dt$
- corollary:
-
- Lebesgue dominated convergence theorem: If all
$X_n$ are dominated by some$Y$ in absolute value almost surely where$E(|Y|) < \infty$ and$X_n \to X$ almost surely, then$E(X_n) \to E(X)$ - Fubini's Theorem: If
$f: A \times B \to C$ is measurable and$\int_{A\times B} |f(x,y)| d(\nu \otimes \mu)(x,y) < \infty$ , then$$\int_A \left(\int_B f(x,y) d ;\mu(y) \right);d\nu(x) = \int_B \left( \int_A f(x,y) ; d\nu(x)\right) ;d\mu(y)$$ - Tail bounds
- tail
$t \mapsto P(|X| \geq t)$ - Markov inequality
$P(|X| \geq t) \leq \frac{E(|X|)}{t}$ for all$t > 0$ - as consequence, for
$p, t > 0$ ,$P(|X| \geq t) \leq t^{-p} E(|X|^p)$ - for
$p=2$ , this is called Chebyshev inequality
- for
- also for
$\theta > 0$ ,$P(X \geq t) \leq \exp(-\theta t) E(\exp(\theta X))$ - the function
$\theta \mapsto E(\exp(\theta X))$ is the Laplace transform, or moment generating function of$X$
- the function
- tail
- Normal distribution
$\mathcal N(\mu, \sigma)$
- random vectors
- joint distribution function
$F(t_1, \dots, t_n) = P(X_i \leq t_i ;\forall i)$ - joint density if
$P(X \in D) = \int_D \phi(t_1,\dots,t_n);dt_1,\dots,t_n$ for all measurable$D$ - complex random vectors
- joint distribution function
- independence
-
$E(\cdot)$ and multiplication commute for independent random variables - joint density factorizes for independent random variables
- applying measurable function maintains independence
-
%% #Lecture 12, 21.06. [[Foundations Section 3.1 part I printout.pdf]], [[Foundations Section 3.1 part II printout.pdf]] %%
- for independent RVs,
$\phi_{X+Y}(t) = (\phi_X \ast \phi_Y)(t) = \int_{-\infty}^\infty \phi_X(u) \phi_Y(t-u);du$ - conditional expectations
- keep some RVs
$X_1, \dots, X_k$ fixed - pick smallest
$\sigma$ -alg.$\mathcal F$ w.r.t. which$X_1,\dots,X_k$ are measurable - then: for any RV
$Y$ , there is a unique$Z$ measurable wrt.$\mathcal F$ s.t.$E(Y) = E(Z)$ . We call$Z =: E(Y | X_1, \dots, X_k)$ the conditional expectation. - relationship:
$P(A \mid X_1, \dots, X_k) = E(\mathbf{1}_A \mid X_1, \dots, X_k)$ allows to compute conditional probabilities in the continuous setting
- keep some RVs
- Fubini for expectations
$E_X(E_{Y}(f(X, Y))) = E_Y(E_X(f(X, Y))) = E_{X,Y}(f(X,Y))$ - the slides write this more formally is by introducing
$f_1(x) = E(f(x, Y))$ and$f_2(y) = E(f(X, y))$ . Then$E(f_1(X)) = E(f_2(Y)) = E(f(X, Y))$
- Gaussian vectors
- Gaussian distribution is rotation invariant
- sum of Gaussian variables is Gaussian
- multidimensional Gaussian density
$$\psi(x) = \frac{1}{(2\pi)^\frac{n}{2} \sqrt{\det(\Sigma)}} \exp\left(-\tfrac{1}{2} \langle(x-\mu), \Sigma^{-1}(x-\mu)\rangle\right)$$
- Jensens inequality: for
$f: \Bbb R^n \to \Bbb R$ convex,$f(E(X)) \leq E(f(X))$ - for concave
$f$ , the inequality is reversed
- for concave