Skip to content

Commit 7c9af26

Browse files
[likelihood_bayes] Update Two Likelihood Lectures (#506)
* Tom's July 31 edits of likelihood and bayes law lecture * Tom's edits of Blume Easley section July 31 * updates * minor updates * Tom's Aug 2 edits of likelihood ratio lecture, especially Blume Easley model * minor typo fixes * fix minor typos * update labels --------- Co-authored-by: thomassargent30 <[email protected]>
1 parent 566c278 commit 7c9af26

File tree

3 files changed

+243
-85
lines changed

3 files changed

+243
-85
lines changed

lectures/likelihood_bayes.md

Lines changed: 39 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -196,7 +196,13 @@ l_seq_f = np.cumprod(l_arr_f, axis=1)
196196

197197

198198

199-
## Likelihood Ratio Process and Bayes’ Law
199+
## Likelihood Ratio Processes and Bayes’ Law
200+
201+
Let $\pi_0 \in [0,1]$ be a Bayesian statistician's prior probability that nature generates $w^t$ as a sequence of i.i.d. draws from
202+
distribution $f$.
203+
204+
* here "probability" is to be interpreted as a way to summarize or express a subjective opinion
205+
* it does **not** mean an anticipated relative frequency as sample size grows without limit
200206

201207
Let $\pi_{t+1}$ be a Bayesian posterior probability defined as
202208

@@ -225,11 +231,26 @@ With no data in hand, our Bayesian statistician thinks that the probability dens
225231
226232
227233
$$
228-
{\rm Prob}(w^{t+1} |\emptyset) = \pi_0 f(w^{t+1})+ (1 - \pi_0)
234+
{\rm Prob}(w^{t+1} |\emptyset) = \pi_0 f(w^{t+1})+ (1 - \pi_0) g(w^{t+1})
235+
$$
236+
237+
Laws of probability say that the joint distribution ${\rm Prob}(AB)$ of events $A$ and $B$ are connected to the conditional distributions
238+
${\rm Prob}(A |B)$ and ${\rm Prob}(B |A)$ by
239+
240+
$$
241+
{\rm Prob}(AB) = {\rm Prob}(A |B) {\rm Prob}(B) = {\rm Prob}(B |A) {\rm Prob}(A) .
242+
$$ (eq:problawAB)
243+
244+
We are interested in events
245+
246+
$$
247+
A = \{q=f\}, \quad B = \{w^{t+1}\}, \quad
229248
$$
230249
231-
Probability laws connecting joint probability distributions and conditional probability distributions imply that
250+
where braces $\{\cdot\}$ are our shorthand for "event".
232251
252+
So in our setting, probability laws {eq}`eq:problawAB` imply that
253+
233254
$$
234255
{\rm Prob}(q=f |w^{t+1}) {\rm Prob}(w^{t+1} |\emptyset) = {\rm Prob}(w^{t+1} |q=f) {\rm Prob}(q=f | \emptyset)
235256
$$
@@ -293,7 +314,7 @@ Dividing both the numerator and the denominator on the right side of the equat
293314
```{math}
294315
:label: eq_recur1
295316
296-
\pi_{t+1}=\frac{\pi_{t} l_t(w_{t+1})}{\pi_{t} l_t(w_t)+1-\pi_{t}}
317+
\pi_{t+1}=\frac{\pi_{t} l_t(w_{t+1})}{\pi_{t} l_t(w_{t+1})+1-\pi_{t}}
297318
```
298319
299320
with $\pi_{0}$ being a Bayesian prior probability that $q = f$,
@@ -412,7 +433,7 @@ np.abs(π_seq - π_seq_f).max() < 1e-10
412433
```
413434
414435
We thus conclude that the likelihood ratio process is a key ingredient of the formula {eq}`eq_Bayeslaw1033` for
415-
a Bayesian's posterior probabilty that nature has drawn history $w^t$ as repeated draws from density
436+
a Bayesian's posterior probability that nature has drawn history $w^t$ as repeated draws from density
416437
$f$.
417438
418439
@@ -425,8 +446,11 @@ Until now we assumed that before time $1$ nature somehow chose to draw $w^t$ as
425446
426447
Nature's decision about whether to draw from $f$ or $g$ was thus **permanent**.
427448
428-
We now assume a different timing protocol in which before **each period** $t =1, 2, \ldots$ nature flips an $x$-weighted coin and with probability
429-
$x \in (0,1)$ draws from $f$ in period $t$ and with probability $1 - x $ draws from $g$.
449+
We now assume a different timing protocol in which before **each period** $t =1, 2, \ldots$ nature
450+
451+
* flips an $x$-weighted coin, then
452+
* draws from $f$ if it has drawn a "head"
453+
* draws from $g$ if it has drawn a "tail".
430454
431455
Under this timing protocol, nature draws permanently from **neither** $f$ **nor** $g$, so a statistician who thinks that nature is drawing
432456
i.i.d. draws **permanently** from one of them is mistaken.
@@ -479,7 +503,7 @@ Let's generate a sequence of observations from this mixture model with a true mi
479503
We will first use this sequence to study how $\pi_t$ behaves.
480504
481505
```{note}
482-
Later, we can use it to study how a statistician who knows that an $x$-mixture of $f$ and $g$ could construct maximum likelihood or Bayesian estimators of $x$ along with the free parameters of $f$ and $g$.
506+
Later, we can use it to study how a statistician who knows that nature generates data from an $x$-mixture of $f$ and $g$ could construct maximum likelihood or Bayesian estimators of $x$ along with the free parameters of $f$ and $g$.
483507
```
484508
485509
```{code-cell} ipython3
@@ -563,7 +587,7 @@ print(f'KL(m, f) = {KL_f:.3f}\nKL(m, g) = {KL_g:.3f}')
563587
Since $KL(m, f) < KL(m, g)$, $f$ is "closer" to the mixture distribution $m$.
564588
565589
Hence by our discussion on KL divergence and likelihood ratio process in
566-
{doc}`likelihood_ratio_process`, $log(L_t) \to \infty$ as $t \to \infty$.
590+
{doc}`likelihood_ratio_process`, $\log(L_t) \to \infty$ as $t \to \infty$.
567591
568592
Now looking back to the key equation {eq}`eq_Bayeslaw1033`.
569593
@@ -611,7 +635,7 @@ The worker's initial beliefs induce a joint probability distribution
611635
Bayes' law is simply an application of laws of
612636
probability to compute the conditional distribution of the $t$th draw $w_t$ conditional on $[w_0, \ldots, w_{t-1}]$.
613637
614-
After our worker puts a subjective probability $\pi_{-1}$ on nature having selected distribution $F$, we have in effect assumes from the start that the decision maker **knows** the joint distribution for the process $\{w_t\}_{t=0}$.
638+
After our worker puts a subjective probability $\pi_{-1}$ on nature having selected distribution $F$, we have in effect assumed from the start that the decision maker **knows** the joint distribution for the process $\{w_t\}_{t=0}$.
615639
616640
We assume that the worker also knows the laws of probability theory.
617641
@@ -632,7 +656,7 @@ $$
632656
Let $a \in \{ f, g\} $ be an index that indicates whether nature chose permanently to draw from distribution $f$ or from distribution $g$.
633657
634658
After drawing $w_0$, the worker uses Bayes' law to deduce that
635-
the posterior probability $\pi_0 = {\rm Prob} ({a = f | w_0}) $
659+
the posterior probability $\pi_0 = {\rm Prob}({a = f | w_0}) $
636660
that the density is $f(w)$ is
637661
638662
$$
@@ -691,7 +715,7 @@ Because $\{\pi_t\}$ is a bounded martingale sequence, it follows from the **mart
691715
Practically, this means that probability one is attached to sample paths
692716
$\{\pi_t\}_{t=0}^\infty$ that converge.
693717
694-
According to the theorem, it different sample paths can converge to different limiting values.
718+
According to the theorem, different sample paths can converge to different limiting values.
695719
696720
Thus, let $\{\pi_t(\omega)\}_{t=0}^\infty$ denote a particular sample path indexed by a particular $\omega
697721
\in \Omega$.
@@ -908,7 +932,7 @@ $w_t$'s and the $\pi_t$ sequences that gave rise to them.
908932
909933
Notice that one of the paths involves systematically higher $w_t$'s, outcomes that push $\pi_t$ upward.
910934
911-
The luck of the draw early in a simulation push the subjective distribution to draw from
935+
The luck of the draw early in a simulation pushes the subjective distribution to draw from
912936
$F$ more frequently along a sample path, and this pushes $\pi_t$ toward $0$.
913937
914938
```{code-cell} ipython3
@@ -938,7 +962,7 @@ In the following table, the left column in bold face reports an assumed value of
938962
939963
The second column reports the fraction of $N = 10000$ simulations for which $\pi_{t}$ had converged to $0$ at the terminal date $T=500$ for each simulation.
940964
941-
The third column reports the fraction of $N = 10000$ simulations for which $\pi_{t}$ had converged to $1$ as the terminal date $T=500$ for each simulation.
965+
The third column reports the fraction of $N = 10000$ simulations for which $\pi_{t}$ had converged to $1$ at the terminal date $T=500$ for each simulation.
942966
943967
```{code-cell} ipython3
944968
# create table
@@ -994,7 +1018,7 @@ ax.set_ylabel(r'$\sigma^{2}(\pi_{t}\vert \pi_{t-1})$')
9941018
plt.show()
9951019
```
9961020
997-
The shape of the the conditional variance as a function of $\pi_{t-1}$ is informative about the behavior of sample paths of $\{\pi_t\}$.
1021+
The shape of the conditional variance as a function of $\pi_{t-1}$ is informative about the behavior of sample paths of $\{\pi_t\}$.
9981022
9991023
Notice how the conditional variance approaches $0$ for $\pi_{t-1}$ near either $0$ or $1$.
10001024

0 commit comments

Comments
 (0)