[wald_friedman] Fix minor typos across two wald_friedman lectures (#509)

HumphreyYang · web-flow · commit 4080fcf4591d · 2025-08-04T19:02:38.000+10:00
diff --git a/lectures/_static/quant-econ.bib b/lectures/_static/quant-econ.bib
@@ -1151,7 +1151,7 @@ @book{Kreps88
   address   = {Boulder, Colorado}
 }
 
-@book{Bertekas75,
+@book{Bertsekas75,
   author    = {Dmitri Bertsekas},
   title     = {Dynamic Programming and Stochastic Control},
   year      = {1975},
diff --git a/lectures/wald_friedman.md b/lectures/wald_friedman.md
@@ -4,7 +4,7 @@ jupytext:
     extension: .md
     format_name: myst
     format_version: 0.13
-    jupytext_version: 1.17.2
+    jupytext_version: 1.17.1
 kernelspec:
   display_name: Python 3 (ipykernel)
   language: python
@@ -53,7 +53,7 @@ alternative **hypotheses**, key ideas in this lecture
 
 - Type I and type II statistical errors
     - a type I error occurs when you reject a null hypothesis that is true
-    - a type II error occures when you accept a null hypothesis that is false
+    - a type II error occurs when you accept a null hypothesis that is false
 - The **power** of a frequentist statistical test
 - The **size** of a frequentist statistical test 
 - The **critical region** of a statistical test
@@ -109,9 +109,9 @@ Let's listen to Milton Friedman tell us what happened
 > can be so regarded.
 
 > When Allen Wallis was discussing such a problem with (Navy) Captain
-> Garret L. Schyler, the captain objected that such a test, to quote from
+> Garret L. Schuyler, the captain objected that such a test, to quote from
 > Allen's account, may prove wasteful. If a wise and seasoned ordnance
-> officer like Schyler were on the premises, he would see after the first
+> officer like Schuyler were on the premises, he would see after the first
 > few thousand or even few hundred [rounds] that the experiment need not
 > be completed either because the new method is obviously inferior or
 > because it is obviously superior beyond what was hoped for
@@ -128,7 +128,7 @@ That set  Wald on a path that led him  to create  *Sequential Analysis* {cite}`W
 It is useful to begin by describing the theory underlying the test
 that the U.S. Navy told  Captain G. S. Schuyler to use.
 
-Captain Schulyer's doubts  motivated  him to tell  Milton Friedman and Allan Wallis his conjecture
+Captain Schuyler's doubts  motivated  him to tell  Milton Friedman and Allen Wallis his conjecture
 that superior practical procedures existed.
 
 Evidently, the Navy had told Captain Schuyler to use what was then  a state-of-the-art
@@ -256,13 +256,13 @@ Wald summarizes Neyman and Pearson's setup as follows:
 > will have the required size $\alpha$.
 
 Wald goes on to discuss Neyman and Pearson's concept of *uniformly most
-powerful* test.
+powerful* test. 
 
 Here is how Wald introduces the notion of a sequential test
 
 > A rule is given for making one of the following three decisions at any stage of
-> the experiment (at the m th trial for each integral value of m ): (1) to
-> accept the hypothesis H , (2) to reject the hypothesis H , (3) to
+> the experiment (at the $m$ th trial for each integral value of $m$): (1) to
+> accept the hypothesis $H$, (2) to reject the hypothesis $H$, (3) to
 > continue the experiment by making an additional observation. Thus, such
 > a test procedure is carried out sequentially. On the basis of the first
 > observation, one of the aforementioned decision is made. If the first or
@@ -271,8 +271,8 @@ Here is how Wald introduces the notion of a sequential test
 > the first two observations, one of the three decision is made. If the
 > third decision is made, a third trial is performed, and so on. The
 > process is continued until either the first or the second decisions is
-> made. The number n of observations required by such a test procedure is
-> a random variable, since the value of n depends on the outcome of the
+> made. The number $n$ of observations required by such a test procedure is
+> a random variable, since the value of $n$ depends on the outcome of the
 > observations.
 
 ## Wald's Sequential Formulation 
@@ -334,7 +334,7 @@ random variables is also independently and identically distributed (IID).
 
 But the observer does not know which of the two distributions generated the sequence.
 
-For reasons explained in  [Exchangeability and Bayesian Updating](https://python.quantecon.org/exchangeable.html), this means that the observer thinks that  sequence is not IID.
+For reasons explained in  [Exchangeability and Bayesian Updating](https://python.quantecon.org/exchangeable.html), this means that the observer thinks that the sequence is not IID.
 
 Consequently, the observer has something to learn, namely, whether the observations are drawn from  $f_0$ or from $f_1$.
 
@@ -414,7 +414,7 @@ B \approx b(\alpha,\beta)  & \equiv \frac{\beta}{1-\alpha}
 \end{aligned} 
 $$ (eq:Waldrule)
 
-For small values of $\alpha $ and $\beta$, Wald shows that approximation  {eq}`eq:Waldrule` provides a  good way to set $A$ and $B$. 
+For small values of $\alpha$ and $\beta$, Wald shows that approximation  {eq}`eq:Waldrule` provides a  good way to set $A$ and $B$. 
 
 In particular, Wald constructs a mathematical argument that leads him to conclude that the use of approximation
  {eq}`eq:Waldrule` rather than the true functions $A (\alpha, \beta), B(\alpha,\beta)$ for setting $A$ and $B$
@@ -515,12 +515,12 @@ def sprt_single_run(a0, b0, a1, b1, logA, logB, true_f0, seed):
             return n, True   # Accept H0
 
 @njit(parallel=True)
-def run_sprt_simulation(a0, b0, a1, b1, alpha, βs, N, seed):
+def run_sprt_simulation(a0, b0, a1, b1, α, β, N, seed):
     """SPRT simulation described by the algorithm."""
     
     # Calculate thresholds
-    A = (1 - βs) / alpha
-    B = βs / (1 - alpha)
+    A = (1 - β) / α
+    B = β / (1 - α)
     logA = np.log(A)
     logB = np.log(B)
     
@@ -690,9 +690,7 @@ results_3 = run_sprt(params_3)
 ```
 
 ```{code-cell} ipython3
----
-tags: [hide-input]
----
+:tags: [hide-input]
 
 def plot_sprt_results(results, params, title=""):
     """Plot SPRT simulation results."""
@@ -781,7 +779,7 @@ When two distributions are "close", it should  takes longer to decide which one
 
 It is tempting to link this pattern to our discussion of [Kullback–Leibler divergence](rel_entropy) in {doc}`likelihood_ratio_process`.
 
-While, KL divergence is larger when two distribution differ more, KL divergence is not symmetric, meaning that the KL divergence of distribution $f$ from distribution $g$  is not necessarily equal to the KL
+While, KL divergence is larger when two distributions differ more, KL divergence is not symmetric, meaning that the KL divergence of distribution $f$ from distribution $g$  is not necessarily equal to the KL
 divergence of $g$ from $f$.  
 
 If we want a symmetric measure of divergence that actually a metric, we can instead use  [Jensen-Shannon distance](https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.distance.jensenshannon.html).
@@ -793,7 +791,7 @@ We shall compute Jensen-Shannon distance  and plot it against the average stoppi
 ```{code-cell} ipython3
 def kl_div(h, f):
     """KL divergence"""
-    integrand = lambda w: f(w) * np.log(f(w) / h(w))
+    integrand = lambda w: h(w) * np.log(h(w) / f(w))
     val, _ = quad(integrand, 0, 1)
     return val
 
@@ -896,7 +894,7 @@ plt.tight_layout()
 plt.show()
 ```
 
-Again, we find that the stopping time is shorter when the distributions are more separated
+Again, we find that the stopping time is shorter when the distributions are more separated, as
 measured by Jensen-Shannon distance.
 
 Let's visualize individual likelihood ratio processes to see how they evolve toward the decision boundaries.
@@ -981,12 +979,12 @@ In the code below, we adjust  Wald's rule by adjusting the thresholds $A$ and $B
 
 ```{code-cell} ipython3
 @njit(parallel=True)  
-def run_adjusted_thresholds(a0, b0, a1, b1, alpha, βs, N, seed, A_f, B_f):
+def run_adjusted_thresholds(a0, b0, a1, b1, α, β, N, seed, A_f, B_f):
     """SPRT simulation with adjusted thresholds."""
     
     # Calculate original thresholds  
-    A_original = (1 - βs) / alpha
-    B_original = βs / (1 - alpha)
+    A_original = (1 - β) / α
+    B_original = β / (1 - α)
     
     # Apply adjustment factors
     A_adj = A_original * A_f
diff --git a/lectures/wald_friedman_2.md b/lectures/wald_friedman_2.md
@@ -4,7 +4,7 @@ jupytext:
     extension: .md
     format_name: myst
     format_version: 0.13
-    jupytext_version: 1.17.2
+    jupytext_version: 1.17.1
 kernelspec:
   display_name: Python 3 (ipykernel)
   language: python
@@ -52,16 +52,16 @@ A frequentist statistician studies the distribution of that statistic under that
 * when the distribution is a member of a set of parameterized probability distributions, his hypothesis takes the form of a particular parameter vector.
 * this is what we mean when we say that the frequentist statistician 'conditions on the parameters' 
 * he regards the parameters as fixed numbers that are known to nature, but not to him.
-* the statistician copes with his ignorance of thoe parameters by constructing type I and type II errors associated with frequentist hypothesis testing.
+* the statistician copes with his ignorance of those parameters by constructing type I and type II errors associated with frequentist hypothesis testing.
 
 In this lecture, we reformulate Friedman and Wald's  problem  by transforming our point of view from the 'objective' frequentist perspective of {doc}`the lecture on Wald's sequential analysis<wald_friedman>` to an explicitly 'subjective' perspective taken by a Bayesian decision maker who regards parameters not as fixed numbers but as (hidden) random variables that are jointly distributed with the random variables that he can observe by  sampling from that joint distribution.
 
 To form that joint distribution, the Bayesian statistician supplements the conditional distributions used by the frequentist statistician with 
-a prior probability distribution over the parameters that representive his personal, subjective opinion about those them. 
+a prior probability distribution over the parameters that represents his personal, subjective opinion about them. 
 
 That lets the Bayesian statistician calculate the joint distribution that he requires to calculate the conditional distributions that he wants. 
 
-To proceed in the way, we endow our decision maker with 
+To proceed in this way, we endow our decision maker with 
 
 - an initial prior subjective probability $\pi_{-1} \in (0,1)$  that nature uses to  generate  $\{z_k\}$ as a sequence of i.i.d. draws from $f_1$ rather than $f_0$.
 - faith in Bayes' law as a way to revise his subjective beliefs as observations on $\{z_k\}$ sequence arrive. 
@@ -97,12 +97,10 @@ from numba.experimental import jitclass
 from math import gamma
 ```
 
-
-
 ## A Dynamic Programming Approach
 
 The following presentation of the problem closely follows Dmitri
-Berskekas's treatment in **Dynamic Programming and Stochastic Control** {cite}`Bertekas75`. 
+Bertsekas's treatment in **Dynamic Programming and Stochastic Control** {cite}`Bertsekas75`. 
 
 A decision-maker can observe a sequence of draws of a random variable $z$.
 
@@ -134,7 +132,7 @@ $$
 $$
 
 ```{note}
-In {cite:t}`Bertekas75`, the belief is associated with the distribution $f_0$, but here
+In {cite:t}`Bertsekas75`, the belief is associated with the distribution $f_0$, but here
 we associate the belief with the distribution $f_1$ to match the discussions in {doc}`the lecture on Wald's sequential analysis<wald_friedman>`.
 ```
 
@@ -194,7 +192,7 @@ axes[0].plot(grid, f1(grid), lw=2, label="$f_1$")
 axes[1].set_title("Mixtures")
 for π in 0.25, 0.5, 0.75:
     y = (1 - π) * f0(grid) + π * f1(grid)
-    axes[1].plot(y, lw=2, label=fr"$\pi_k$ = {π}")
+    axes[1].plot(grid, y, lw=2, label=fr"$\pi_k$ = {π}")
 
 for ax in axes:
     ax.legend()
@@ -756,11 +754,11 @@ and investigate
   as we increase the number of grid points in the piecewise linear  approximation.
 * effects of different settings for the cost parameters $L_0, L_1, c$, the
   parameters of two beta distributions $f_0$ and $f_1$, and the number
-  of points and linear functions $m$ to use in the piece-wise continuous approximation to the value function.
+  of points and linear functions $m$ to use in the piecewise continuous approximation to the value function.
 * various simulations from $f_0$ and associated distributions of waiting times to making a decision.
 * associated histograms of correct and incorrect decisions.
 
 
 [^f1]: The decision maker acts as if he believes that the sequence of random variables
-$[z_{0}, z_{1}, \ldots]$ is *exchangeable*.  See [Exchangeability and Bayesian Updating](https://python.quantecon.org/exchangeable.html) and
-{cite}`Kreps88` chapter 11, for  discussions of exchangeability.
+$[z_{0}, z_{1}, \ldots]$ is *exchangeable*. See [Exchangeability and Bayesian Updating](https://python.quantecon.org/exchangeable.html) and
+{cite}`Kreps88` chapter 11, for discussions of exchangeability.

Original file line number	Diff line number	Diff line change
`@@ -1151,7 +1151,7 @@ @book{Kreps88`
`1151`	`1151`	`address = {Boulder, Colorado}`
`1152`	`1152`	`}`
`1153`	`1153`
`1154`		`-@book{Bertekas75,`
	`1154`	`+@book{Bertsekas75,`
`1155`	`1155`	`author = {Dmitri Bertsekas},`
`1156`	`1156`	`title = {Dynamic Programming and Stochastic Control},`
`1157`	`1157`	`year = {1975},`