added pits

act65 · Oct 11, 2024 · a84bc10 · a84bc10
1 parent 1bf34ed
commit a84bc10
Show file tree

Hide file tree

Showing 9 changed files with 494 additions and 102 deletions.
diff --git a/TODO b/TODO
@@ -17,4 +17,5 @@ ideas
 - are markets really efficient? What are the alternatives?
 - the importance of good audio design?! A set of 3d scenarios. With the player turning their head / moving.
 - what's the best way to shuffle? mixing?
-- bury EC
+- bury EC
+- (when) would it make sense to start a recycling company? (let's track the price of materials, and understand how to break down / seperate efficiently)
diff --git a/_bibliography/pits.bib b/_bibliography/pits.bib
diff --git a/_drafts/technical-posts/pits/2024-08-10-pits-flow.md b/_drafts/technical-posts/pits/2024-08-10-pits-flow.md
@@ -0,0 +1,45 @@
+---
+title: "PITS plus flows"
+subtitle: "A method to apply PITS to arbitrary distributions using neural flows"
+layout: post
+permalink: /pits/flow
+scholar:
+  bibliography: "pits.bib"
+---
+
+Constriained search in the typical set is intractable for arbitrary distributions.
+We propose a method to apply PITS to arbitrary distributions using neural flows.
+
+
+<!-- how easily can we ask. is x in the typical set? -->
+
+
+% TODO in which cases does $x - f^{-1}(\alpha f(x))$ approximate $\nabla_x p_f(x)$??
+
+In general, the typical set, $\mathcal T_{p(x)}^{\epsilon}$, is intractable to compute for arbitrary continuous distributions.
+However, we assume we have access to a flow that maps from clean data to a Gaussian source distribution, $f_{push}: P(X) \to \mathcal N(Y)$.
+
+% (needs proof)
+We conjecture that is it possible to use a flow to sample from the typical set of arbitrary distributions (see future work \ref{futurePOTS}).
+This can be achieved by exploiting the structure of the flow-based models Gaussian source distribution.
+
+For Gaussian distributions, the typical set has a simple closed-form solution, an annulus, with radius and thickness dependent on the dimension and standard deviation of the Gaussian.
+
+% (needs proof)
+Projection into the typical set for a Gaussian can be approximated via a 
+
+Thus, we implement POTS as:
+
+\begin{align*}
+h = f(y) \tag{forward flow}\\
+\hat h = \text{proj}(h) \tag{project onto typical set}\\
+\hat x = f^{-1}(\hat h) \tag{backward flow}
+\end{align*}
+
+
+\begin{figure}[H]
+    \centering
+    \includegraphics[width=0.75\textwidth]{assets/pots-diagram.png}
+    \vspace{-1em}
+    \caption{A diagram of the POTS method. We start with the clean signal $x$, shown as a blue dot. The clean signal is then corrupted to produce the observed signal $y$, shown as a red dot. Next, we project the corrupted signal into the typical set to produce our denoised signal $\hat x$, shown as a green dot. The typical set is shown as a teal annulus. \label{f.A}} 
+\end{figure}
diff --git a/_drafts/technical-posts/pits/2024-08-10-pits-main.md b/_drafts/technical-posts/pits/2024-08-10-pits-main.md
diff --git a/_posts/technical-posts/pits/2024-08-10-pits-dps.md b/_posts/technical-posts/pits/2024-08-10-pits-dps.md
@@ -0,0 +1,69 @@
+---
+title: "Diffusion posterior sampling"
+subtitle: "A review of recent work"
+layout: post
+permalink: /pits/review-dps
+categories: 
+  - "tutorial"
+scholar:
+  bibliography: "pits.bib"
+---
+
+(_lit review as of 07/2024_)
+
+Given a pretrained diffusion model, we seek to generate conditional samples based on an observed signal $y$.
+For example, we many be given a noisy image and tasked with denoising it, or a black and white image and tasked with recoloring it.
+
+One approach seeks to augment the dynamics of the pretrained diffusion model. We call these guided diffusion models, after 'guided diffusion' {% cite ho2022classifierfreediffusionguidance %}.
+
+Early approaches were rather heuristic, for example; a mask-based approach {% cite lugmayr_repaint_2022 %}, SVD inspired {% cite kawar_denoising_2022 %}, null space projection {% cite wang_zero-shot_2022 %}.
+
+Next came diffusion posterior sampling (DPS), a more principled approach. It starts by rewriting the diffusion SDE to use the unknown posterior score, $\nabla_x \log p(x \mid y)$, rather than the prior score, $\nabla_x \log p(x)$.
+
+$$
+\begin{align*}
+    dx &= \left[ f(x, t) - g(t)^2 \nabla_x \log p_t(x) \right] dt + g(t) dw \tag{unconditional SDE} \\
+    dx &= \left[ f(x, t) - g(t)^2 \nabla_x \log p_t(x | y) \right] dt + g(t) dw \tag{conditional SDE} \\
+\end{align*}
+$$
+
+
+This allows us to generate samples from the posterior by solving the conditional SDE.
+
+But, we don't know the score of the posterior, so we use Bayes' rule to rewrite the posterior score in terms of the likelihood and prior scores.
+
+$$
+\begin{align*}
+    p(x | y) &= \frac{p(y | x) p(x)}{p(y)} \\
+    \log p(x | y) &= \log p(y | x) + \log p(x) - \log p(y) \\
+    \nabla_x \log p(x | y) &= \nabla_x \log p(y | x) + \nabla_x \log p(x) \\
+\end{align*}
+$$
+
+DPS {% cite chung_diffusion_2023 %}, $\Pi$GDM {% cite song_pseudoinverse-guided_2023 %} and others have shown that it is possible to construct / approximate $\nabla_x \log p_t(x \mid y)$.
+Note that $\Pi$GDM has also been applied to flows {% cite pokle_training-free_2024 %}.
+
+$$
+\begin{align*}
+    \nabla_x \log p(y | x_t) &\approx \nabla_x \parallel y - C(x) \parallel^2_2 \tag{DPS} \\
+    \nabla_x \log p(y | x_t) &\approx (y - H\hat x)^T (r_t^2 H H^T + \sigma^2I)^{-1} H \frac{\partial \hat x_t}{\partial x_t} \tag{$\Pi$GDM}
+\end{align*}
+$$
+
+In parallel, a variational approach frames the conditional generation problem as an optimization problem {% cite benhamu2024dflowdifferentiatingflowscontrolled mardani_variational_2023 mardani_variational_2023-1 %}. 
+
+$$
+\begin{align*}
+    x^* &= \arg \min_z \nabla_z \parallel y - f(D(z)) \parallel^2_2 \tag{DPS} \\
+\end{align*}
+$$
+
+Where $D$ is the diffusion model, $f$ is the forward model, $z$ is the latent variable and $y$ is the observed signal.
+These variational approaches proidve high quality samples, but are computationally expensive (approx 5-15 minutes for ImageNet-128 with an NVidia V100 GPU).
+
+
+And finally {% cite dou_diffusion_2024 %} present a Bayesian filtering perspective which leads to an algorithms that converges to the true posterior.
+
+
+## Bibliography
+{% bibliography --cited %}
diff --git a/_posts/technical-posts/pits/2024-08-10-pits-main.md b/_posts/technical-posts/pits/2024-08-10-pits-main.md
@@ -0,0 +1,82 @@
+---
+title: "Projection into the typical set: PITS"
+subtitle: "A new approach to solving inverse problems"
+layout: post
+permalink: /pits
+categories: 
+  - "research"
+scholar:
+  bibliography: "pits.bib"
+---
+
+
+The advances in generative modelling have shown that we can generate high-quality samples from complex distributions.
+A next step is to use these generative models as priors to help solve inverse problems.
+
+Diffusion models {% cite song2021scorebasedgenerativemodelingstochastic %} don't support likelihood estimates, only generating samples. Thus inverse problem solvers revert to sampling from the posterior to generate solutions {% cite chung_diffusion_2023 %}. Though it's best to think of these solutions as proposals, as there is no guarantee on quality or accuracy. 
+
+Neural flows {% cite albergo_stochastic_2023 liu_flow_2022 lipman_flow_2023 %} have recently achieved s.o.t.a {% cite esser2024scalingrectifiedflowtransformers %} and do support likelihood estimates. They can be used to find the local maximum of the posterior {% cite benhamu2024dflowdifferentiatingflowscontrolled %}. However, differentiating through a flow is extremely expensive.
+
+So, solving inverse problems via a principled approach like MAP is not quite possible with s.o.t.a generative models.
+Maybe we can provide a viable alternative.
+
+***
+
+Inverse problems are a class of problems where we want to find the input to a function given the output. For example (within generative machine learning) we care about;
+
+- image recoloring, where we want to find the original image given the black and white image.
+- image inpainting, where we want to find the original image given the image with a hole in it.
+- speech enhancement, where we want to find the clean speech given the noisy speech.
+
+We consider the setting where we have access to a prior $p(x)$ (e.g. normal, clear speech) and likelihood function $p(y \mid x)$ (the environment adding background noise and interference). We observe $y$ and want to recover $x$. 
+
+Using Bayes rule, we can write the posterior and our goal as;
+
+$$
+\begin{align*}
+p(x | y) &= \frac{p(y | x) p(x)}{p(y)} \tag{posterior} \\
+x^* &= \arg \max_x p(x | y) \tag{the MAP solution}
+\end{align*}
+$$
+
+> MAP will return the most likely value of $x$, given $y$.
+However, is the most likely value of $x$ the 'best' guess of $x$?
+
+We offer an alternative approach, suggesting that our guess of $x$ should be typical of the prior. 
+We write this as;
+
+$$
+\begin{align*}
+x^* &= \arg \max_{x \in \mathcal T(p(x))_\epsilon} p(y | x) \tag{PITS}
+\end{align*}
+$$
+
+where $\mathcal T(p(x))_\epsilon$ is the $\epsilon$-typical set of $p(x)$. Thus we have Projection Into the Typical Set (PITS).
+
+<!-- Note: This assumes we are working in high enough dimensions that the typical set has concentrated and any sample from the prior is very likely to be typical. -->
+
+I wrote a few posts to help you understand PITS;
+
+[1.]({{ site.baseurl }}/pits/typical) Background on typicality \
+[2.]({{ site.baseurl }}/pits/map) A simple worked example showing that MAP produces solutions that are not typical. \
+[3.]({{ site.baseurl }}/pits/non-typical) (WIP) Does it matter if solutions are not typical? \
+[4.]({{ site.baseurl }}/pits/flow) (WIP) A method to apply PITS arbitrary distributions (using neural flows). \
+[5.]({{ site.baseurl }}/pits/flow-theory) (WIP) Theory showing that in the Gaussian case, PITS combined with flows is principled. \
+[6.]({{ site.baseurl }}/pits/mnist-demo) (WIP) A demonstration of the PITS approach to inverse problems applied to neural flows. \
+[7.]({{ site.baseurl }}/pits/review-dps) A brief review of methods attempting to solve inverse problems using s.o.t.a  generative models.
+
+
+<!-- A main advantage of the PITS approach is that it provides a way to control the quality (/typicality) of the solutions. -->
+
+<!-- what if the true x is not typical? -->
+<!-- why not find the MAP solution and then project it into the typical set? -->
+
+<!-- why is it a problem if my diffusion model produces samples that are not typical? -->
+
+## Bibliography
+
+{% bibliography --cited %}
+
+***
+
+These ideas were generated while studying at [VUW](https://www.wgtn.ac.nz/) with [Bastiaan Kleijn](https://people.wgtn.ac.nz/bastiaan.kleijn) and [Marcus Frean](https://people.wgtn.ac.nz/marcus.frean). I was funded by [GN](https://www.gn.com/).
diff --git a/...chnical-posts/pits/2024-08-10-pits-map.md → ...chnical-posts/pits/2024-08-10-pits-map.md b/...chnical-posts/pits/2024-08-10-pits-map.md → ...chnical-posts/pits/2024-08-10-pits-map.md
@@ -7,6 +7,7 @@ scholar:
   bibliography: "pits.bib"
 ---
 
+We solving an inverse problem with MAP vs PITS.
 Here we pick the prior and likelihood to be Gaussian distributions.
 
 $$
@@ -16,9 +17,16 @@ p(y|x) &= \mathcal{N}(x, \sigma_y^2)
 \end{align*}
 $$
 
+Let's pick the prior to be a Gaussian distribution with mean 0 and standard deviation 1. And the likelihood to be a Gaussian distribution with mean 1 and standard deviation 0.5. 
+We find the MAP solution and the PITS solution.
+
+<img src="{{ site.baseurl }}/assets/pits/pits-ip.png">
+<figcaption>Illustration of the inverse problem for a Gaussian prior and likelihood. Where y is the observation and x is what we want to recover. Top; shows the typical sets of our two distributions and the solutions found by MAP, PITS (where mid is simple the mid point). Bottom; a reminder that we are working with Gaussians.
+</figcaption>
+
 ## MAP solution
 
-For Guassian prior and likelihood, can derive the MAP solution in closed form as follows;
+For Guassian prior and likelihood, can derive the MAP solution in closed form as;
 
 $$
 \begin{align*}
@@ -34,50 +42,46 @@ $$
 \nabla_x \left( -\frac{1}{2\sigma_y^2}||y - x||^2 - \frac{1}{2\sigma_x^2}||x||^2 \right) &= \nabla_x \left( -\frac{1}{2\sigma_y^2}||y - x||^2 \right) + \nabla_x \left( -\frac{1}{2\sigma_x^2}||x||^2 \right) \\
 &= \frac{1}{\sigma_y^2}(y - x) - \frac{1}{\sigma_x^2}x \\
 &= 0 \\
-x &= \frac{\sigma_x^2}{\sigma_x^2 + \sigma_y^2}y \\
-p(x|y) &= \mathcal{N}\left(\frac{\sigma_x^2}{\sigma_x^2 + \sigma_y^2}y, \frac{\sigma_x^2\sigma_y^2}{\sigma_x^2 + \sigma_y^2}\right) \\
+x^* &= \frac{\sigma_x^2}{\sigma_x^2 + \sigma_y^2}y \\
 \end{align*}
 $$
 
-## PITS solution
+<!-- p(x|y) &= \mathcal{N}\left(\frac{\sigma_x^2}{\sigma_x^2 + \sigma_y^2}y, \frac{\sigma_x^2\sigma_y^2}{\sigma_x^2 + \sigma_y^2}\right) \\ -->
 
-For Gaussian distributions we can rewrite PITS as;
+<img src="{{ site.baseurl }}/assets/pits/map-solns.png">
+<figcaption>
+The observed $y$'s are updated to be more likely under the prior.
+</figcaption>
 
-$$
-\begin{align*}
-x^* &= \arg \max_{x \in \mathcal T(p(x))_\epsilon} p(y | x) \tag{PITS} \\
-&= \arg \min_{x \in \mathcal T(p(x))_\epsilon} \parallel x - y \parallel^2
-\end{align*}
-$$
+## PITS solution
 
-Thus finding $x^*$ becomes a problem of finding the closest point in the $\epsilon$-typical set of $p(x)$ to $y$. Which can be calculated by normalising $y$.
+For Gaussian distributions derived the PITS solution as;
 
 $$
 \begin{align*}
-x^* &= \frac{y}{\parallel y \parallel}
+x^* &= \arg \max_{x \in \mathcal T(p(x))_\epsilon} p(y | x) \tag{PITS} \\
+&= \arg \max_{x \in \mathcal T(p(x))_\epsilon} \mathcal{N}(x, \sigma_y^2) \\
+&= \arg \max_{x \in \mathcal T(p(x))_\epsilon} \exp \left( -\frac{1}{2\sigma_y^2}||y - x||^2 \right) \\
+&= \arg \min_{x \in \mathcal T(p(x))_\epsilon} \parallel x - y \parallel^2 \\
+\lim_{\epsilon\to 0} \mathcal T(\mathcal N(0, \sigma^2 I))_\epsilon &= \{x ; \parallel x \parallel = \sqrt{d} \sigma \} \\
+x^* &= \sqrt{d} \sigma\frac{y}{\parallel y \parallel}
 \end{align*}
 $$
 
-
-## An illustrated example
-
-Let's pick the prior to be a Gaussian distribution with mean 0 and standard deviation 1. And the likelihood to be a Gaussian distribution with mean 1 and standard deviation 0.5.
-
-
-<img src="{{ site.baseurl }}/assets/pits/pits-ip.png">
-<figcaption>Illustration of the inverse problem for a Gaussian prior and likelihood. Top; 
-</figcaption>
-
-<img src="{{ site.baseurl }}/assets/pits/map-solns.png">
-<figcaption>
-The observed $y$'s are updated to be more likely under the prior.
-</figcaption>
+<!-- Finding the closest $x$ to $y$ such that $x$ is in $\epsilon$-typical set becomes -->
 
 <img src="{{ site.baseurl }}/assets/pits/pits-solns.png">
 <figcaption>
 The observed $y$'s are projected into the typical set.
 </figcaption>
 
-## Bibliography
+## Accuracy
+
+We can compare the accuracy of the MAP and PITS solutions by looking at the mean squared error (MSE) between the true $x$ and the estimated $x$.
+
+$$
+\text{err} = \frac{1}{N \sqrt{d}} \sum_{i=1}^N ||x_i - x_i^*||^2 \\
+$$
 
-{% bibliography --cited %}
+We find that MAP provides slightly more accurate solutions than PITS.
+For example, if we pick d=2048, average over 10000 samples. We get an average normalised error of MAP $0.45$ and PITS $0.46$.