You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: lectures/wald_friedman.md
+22-24Lines changed: 22 additions & 24 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -4,7 +4,7 @@ jupytext:
4
4
extension: .md
5
5
format_name: myst
6
6
format_version: 0.13
7
-
jupytext_version: 1.17.2
7
+
jupytext_version: 1.17.1
8
8
kernelspec:
9
9
display_name: Python 3 (ipykernel)
10
10
language: python
@@ -53,7 +53,7 @@ alternative **hypotheses**, key ideas in this lecture
53
53
54
54
- Type I and type II statistical errors
55
55
- a type I error occurs when you reject a null hypothesis that is true
56
-
- a type II error occures when you accept a null hypothesis that is false
56
+
- a type II error occurs when you accept a null hypothesis that is false
57
57
- The **power** of a frequentist statistical test
58
58
- The **size** of a frequentist statistical test
59
59
- The **critical region** of a statistical test
@@ -109,9 +109,9 @@ Let's listen to Milton Friedman tell us what happened
109
109
> can be so regarded.
110
110
111
111
> When Allen Wallis was discussing such a problem with (Navy) Captain
112
-
> Garret L. Schyler, the captain objected that such a test, to quote from
112
+
> Garret L. Schuyler, the captain objected that such a test, to quote from
113
113
> Allen's account, may prove wasteful. If a wise and seasoned ordnance
114
-
> officer like Schyler were on the premises, he would see after the first
114
+
> officer like Schuyler were on the premises, he would see after the first
115
115
> few thousand or even few hundred [rounds] that the experiment need not
116
116
> be completed either because the new method is obviously inferior or
117
117
> because it is obviously superior beyond what was hoped for
@@ -128,7 +128,7 @@ That set Wald on a path that led him to create *Sequential Analysis* {cite}`W
128
128
It is useful to begin by describing the theory underlying the test
129
129
that the U.S. Navy told Captain G. S. Schuyler to use.
130
130
131
-
Captain Schulyer's doubts motivated him to tell Milton Friedman and Allan Wallis his conjecture
131
+
Captain Schuyler's doubts motivated him to tell Milton Friedman and Allen Wallis his conjecture
132
132
that superior practical procedures existed.
133
133
134
134
Evidently, the Navy had told Captain Schuyler to use what was then a state-of-the-art
@@ -256,13 +256,13 @@ Wald summarizes Neyman and Pearson's setup as follows:
256
256
> will have the required size $\alpha$.
257
257
258
258
Wald goes on to discuss Neyman and Pearson's concept of *uniformly most
259
-
powerful* test.
259
+
powerful* test.
260
260
261
261
Here is how Wald introduces the notion of a sequential test
262
262
263
263
> A rule is given for making one of the following three decisions at any stage of
264
-
> the experiment (at the m th trial for each integral value of m ): (1) to
265
-
> accept the hypothesis H , (2) to reject the hypothesis H , (3) to
264
+
> the experiment (at the $m$ th trial for each integral value of $m$): (1) to
265
+
> accept the hypothesis $H$, (2) to reject the hypothesis $H$, (3) to
266
266
> continue the experiment by making an additional observation. Thus, such
267
267
> a test procedure is carried out sequentially. On the basis of the first
268
268
> observation, one of the aforementioned decision is made. If the first or
@@ -271,8 +271,8 @@ Here is how Wald introduces the notion of a sequential test
271
271
> the first two observations, one of the three decision is made. If the
272
272
> third decision is made, a third trial is performed, and so on. The
273
273
> process is continued until either the first or the second decisions is
274
-
> made. The number n of observations required by such a test procedure is
275
-
> a random variable, since the value of n depends on the outcome of the
274
+
> made. The number $n$ of observations required by such a test procedure is
275
+
> a random variable, since the value of $n$ depends on the outcome of the
276
276
> observations.
277
277
278
278
## Wald's Sequential Formulation
@@ -334,7 +334,7 @@ random variables is also independently and identically distributed (IID).
334
334
335
335
But the observer does not know which of the two distributions generated the sequence.
336
336
337
-
For reasons explained in [Exchangeability and Bayesian Updating](https://python.quantecon.org/exchangeable.html), this means that the observer thinks that sequence is not IID.
337
+
For reasons explained in [Exchangeability and Bayesian Updating](https://python.quantecon.org/exchangeable.html), this means that the observer thinks that the sequence is not IID.
338
338
339
339
Consequently, the observer has something to learn, namely, whether the observations are drawn from $f_0$ or from $f_1$.
340
340
@@ -414,7 +414,7 @@ B \approx b(\alpha,\beta) & \equiv \frac{\beta}{1-\alpha}
414
414
\end{aligned}
415
415
$$ (eq:Waldrule)
416
416
417
-
For small values of $\alpha$ and $\beta$, Wald shows that approximation {eq}`eq:Waldrule` provides a good way to set $A$ and $B$.
417
+
For small values of $\alpha$ and $\beta$, Wald shows that approximation {eq}`eq:Waldrule` provides a good way to set $A$ and $B$.
418
418
419
419
In particular, Wald constructs a mathematical argument that leads him to conclude that the use of approximation
420
420
{eq}`eq:Waldrule` rather than the true functions $A (\alpha, \beta), B(\alpha,\beta)$ for setting $A$ and $B$
@@ -781,7 +779,7 @@ When two distributions are "close", it should takes longer to decide which one
781
779
782
780
It is tempting to link this pattern to our discussion of [Kullback–Leibler divergence](rel_entropy) in {doc}`likelihood_ratio_process`.
783
781
784
-
While, KL divergence is larger when two distribution differ more, KL divergence is not symmetric, meaning that the KL divergence of distribution $f$ from distribution $g$ is not necessarily equal to the KL
782
+
While, KL divergence is larger when two distributions differ more, KL divergence is not symmetric, meaning that the KL divergence of distribution $f$ from distribution $g$ is not necessarily equal to the KL
785
783
divergence of $g$ from $f$.
786
784
787
785
If we want a symmetric measure of divergence that actually a metric, we can instead use [Jensen-Shannon distance](https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.distance.jensenshannon.html).
@@ -793,7 +791,7 @@ We shall compute Jensen-Shannon distance and plot it against the average stoppi
793
791
```{code-cell} ipython3
794
792
def kl_div(h, f):
795
793
"""KL divergence"""
796
-
integrand = lambda w: f(w) * np.log(f(w) / h(w))
794
+
integrand = lambda w: h(w) * np.log(h(w) / f(w))
797
795
val, _ = quad(integrand, 0, 1)
798
796
return val
799
797
@@ -896,7 +894,7 @@ plt.tight_layout()
896
894
plt.show()
897
895
```
898
896
899
-
Again, we find that the stopping time is shorter when the distributions are more separated
897
+
Again, we find that the stopping time is shorter when the distributions are more separated, as
900
898
measured by Jensen-Shannon distance.
901
899
902
900
Let's visualize individual likelihood ratio processes to see how they evolve toward the decision boundaries.
@@ -981,12 +979,12 @@ In the code below, we adjust Wald's rule by adjusting the thresholds $A$ and $B
Copy file name to clipboardExpand all lines: lectures/wald_friedman_2.md
+10-12Lines changed: 10 additions & 12 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -4,7 +4,7 @@ jupytext:
4
4
extension: .md
5
5
format_name: myst
6
6
format_version: 0.13
7
-
jupytext_version: 1.17.2
7
+
jupytext_version: 1.17.1
8
8
kernelspec:
9
9
display_name: Python 3 (ipykernel)
10
10
language: python
@@ -52,16 +52,16 @@ A frequentist statistician studies the distribution of that statistic under that
52
52
* when the distribution is a member of a set of parameterized probability distributions, his hypothesis takes the form of a particular parameter vector.
53
53
* this is what we mean when we say that the frequentist statistician 'conditions on the parameters'
54
54
* he regards the parameters as fixed numbers that are known to nature, but not to him.
55
-
* the statistician copes with his ignorance of thoe parameters by constructing type I and type II errors associated with frequentist hypothesis testing.
55
+
* the statistician copes with his ignorance of those parameters by constructing type I and type II errors associated with frequentist hypothesis testing.
56
56
57
57
In this lecture, we reformulate Friedman and Wald's problem by transforming our point of view from the 'objective' frequentist perspective of {doc}`the lecture on Wald's sequential analysis<wald_friedman>` to an explicitly 'subjective' perspective taken by a Bayesian decision maker who regards parameters not as fixed numbers but as (hidden) random variables that are jointly distributed with the random variables that he can observe by sampling from that joint distribution.
58
58
59
59
To form that joint distribution, the Bayesian statistician supplements the conditional distributions used by the frequentist statistician with
60
-
a prior probability distribution over the parameters that representive his personal, subjective opinion about those them.
60
+
a prior probability distribution over the parameters that represents his personal, subjective opinion about them.
61
61
62
62
That lets the Bayesian statistician calculate the joint distribution that he requires to calculate the conditional distributions that he wants.
63
63
64
-
To proceed in the way, we endow our decision maker with
64
+
To proceed in this way, we endow our decision maker with
65
65
66
66
- an initial prior subjective probability $\pi_{-1} \in (0,1)$ that nature uses to generate $\{z_k\}$ as a sequence of i.i.d. draws from $f_1$ rather than $f_0$.
67
67
- faith in Bayes' law as a way to revise his subjective beliefs as observations on $\{z_k\}$ sequence arrive.
@@ -97,12 +97,10 @@ from numba.experimental import jitclass
97
97
from math import gamma
98
98
```
99
99
100
-
101
-
102
100
## A Dynamic Programming Approach
103
101
104
102
The following presentation of the problem closely follows Dmitri
105
-
Berskekas's treatment in **Dynamic Programming and Stochastic Control** {cite}`Bertekas75`.
103
+
Bertsekas's treatment in **Dynamic Programming and Stochastic Control** {cite}`Bertsekas75`.
106
104
107
105
A decision-maker can observe a sequence of draws of a random variable $z$.
108
106
@@ -134,7 +132,7 @@ $$
134
132
$$
135
133
136
134
```{note}
137
-
In {cite:t}`Bertekas75`, the belief is associated with the distribution $f_0$, but here
135
+
In {cite:t}`Bertsekas75`, the belief is associated with the distribution $f_0$, but here
138
136
we associate the belief with the distribution $f_1$ to match the discussions in {doc}`the lecture on Wald's sequential analysis<wald_friedman>`.
0 commit comments