Skip to content

Commit

Permalink
added pits
Browse files Browse the repository at this point in the history
  • Loading branch information
act65 committed Oct 11, 2024
1 parent 1bf34ed commit a84bc10
Show file tree
Hide file tree
Showing 9 changed files with 494 additions and 102 deletions.
3 changes: 2 additions & 1 deletion TODO
Original file line number Diff line number Diff line change
Expand Up @@ -17,4 +17,5 @@ ideas
- are markets really efficient? What are the alternatives?
- the importance of good audio design?! A set of 3d scenarios. With the player turning their head / moving.
- what's the best way to shuffle? mixing?
- bury EC
- bury EC
- (when) would it make sense to start a recycling company? (let's track the price of materials, and understand how to break down / seperate efficiently)
222 changes: 222 additions & 0 deletions _bibliography/pits.bib

Large diffs are not rendered by default.

45 changes: 45 additions & 0 deletions _drafts/technical-posts/pits/2024-08-10-pits-flow.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
---
title: "PITS plus flows"
subtitle: "A method to apply PITS to arbitrary distributions using neural flows"
layout: post
permalink: /pits/flow
scholar:
bibliography: "pits.bib"
---

Constriained search in the typical set is intractable for arbitrary distributions.
We propose a method to apply PITS to arbitrary distributions using neural flows.


<!-- how easily can we ask. is x in the typical set? -->


% TODO in which cases does $x - f^{-1}(\alpha f(x))$ approximate $\nabla_x p_f(x)$??

In general, the typical set, $\mathcal T_{p(x)}^{\epsilon}$, is intractable to compute for arbitrary continuous distributions.
However, we assume we have access to a flow that maps from clean data to a Gaussian source distribution, $f_{push}: P(X) \to \mathcal N(Y)$.

% (needs proof)
We conjecture that is it possible to use a flow to sample from the typical set of arbitrary distributions (see future work \ref{futurePOTS}).
This can be achieved by exploiting the structure of the flow-based models Gaussian source distribution.

For Gaussian distributions, the typical set has a simple closed-form solution, an annulus, with radius and thickness dependent on the dimension and standard deviation of the Gaussian.

% (needs proof)
Projection into the typical set for a Gaussian can be approximated via a

Thus, we implement POTS as:

\begin{align*}
h = f(y) \tag{forward flow}\\
\hat h = \text{proj}(h) \tag{project onto typical set}\\
\hat x = f^{-1}(\hat h) \tag{backward flow}
\end{align*}


\begin{figure}[H]
\centering
\includegraphics[width=0.75\textwidth]{assets/pots-diagram.png}
\vspace{-1em}
\caption{A diagram of the POTS method. We start with the clean signal $x$, shown as a blue dot. The clean signal is then corrupted to produce the observed signal $y$, shown as a red dot. Next, we project the corrupted signal into the typical set to produce our denoised signal $\hat x$, shown as a green dot. The typical set is shown as a teal annulus. \label{f.A}}
\end{figure}
54 changes: 0 additions & 54 deletions _drafts/technical-posts/pits/2024-08-10-pits-main.md

This file was deleted.

69 changes: 69 additions & 0 deletions _posts/technical-posts/pits/2024-08-10-pits-dps.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
---
title: "Diffusion posterior sampling"
subtitle: "A review of recent work"
layout: post
permalink: /pits/review-dps
categories:
- "tutorial"
scholar:
bibliography: "pits.bib"
---

(_lit review as of 07/2024_)

Given a pretrained diffusion model, we seek to generate conditional samples based on an observed signal $y$.
For example, we many be given a noisy image and tasked with denoising it, or a black and white image and tasked with recoloring it.

One approach seeks to augment the dynamics of the pretrained diffusion model. We call these guided diffusion models, after 'guided diffusion' {% cite ho2022classifierfreediffusionguidance %}.

Early approaches were rather heuristic, for example; a mask-based approach {% cite lugmayr_repaint_2022 %}, SVD inspired {% cite kawar_denoising_2022 %}, null space projection {% cite wang_zero-shot_2022 %}.

Next came diffusion posterior sampling (DPS), a more principled approach. It starts by rewriting the diffusion SDE to use the unknown posterior score, $\nabla_x \log p(x \mid y)$, rather than the prior score, $\nabla_x \log p(x)$.

$$
\begin{align*}
dx &= \left[ f(x, t) - g(t)^2 \nabla_x \log p_t(x) \right] dt + g(t) dw \tag{unconditional SDE} \\
dx &= \left[ f(x, t) - g(t)^2 \nabla_x \log p_t(x | y) \right] dt + g(t) dw \tag{conditional SDE} \\
\end{align*}
$$


This allows us to generate samples from the posterior by solving the conditional SDE.

But, we don't know the score of the posterior, so we use Bayes' rule to rewrite the posterior score in terms of the likelihood and prior scores.

$$
\begin{align*}
p(x | y) &= \frac{p(y | x) p(x)}{p(y)} \\
\log p(x | y) &= \log p(y | x) + \log p(x) - \log p(y) \\
\nabla_x \log p(x | y) &= \nabla_x \log p(y | x) + \nabla_x \log p(x) \\
\end{align*}
$$

DPS {% cite chung_diffusion_2023 %}, $\Pi$GDM {% cite song_pseudoinverse-guided_2023 %} and others have shown that it is possible to construct / approximate $\nabla_x \log p_t(x \mid y)$.
Note that $\Pi$GDM has also been applied to flows {% cite pokle_training-free_2024 %}.

$$
\begin{align*}
\nabla_x \log p(y | x_t) &\approx \nabla_x \parallel y - C(x) \parallel^2_2 \tag{DPS} \\
\nabla_x \log p(y | x_t) &\approx (y - H\hat x)^T (r_t^2 H H^T + \sigma^2I)^{-1} H \frac{\partial \hat x_t}{\partial x_t} \tag{$\Pi$GDM}
\end{align*}
$$

In parallel, a variational approach frames the conditional generation problem as an optimization problem {% cite benhamu2024dflowdifferentiatingflowscontrolled mardani_variational_2023 mardani_variational_2023-1 %}.

$$
\begin{align*}
x^* &= \arg \min_z \nabla_z \parallel y - f(D(z)) \parallel^2_2 \tag{DPS} \\
\end{align*}
$$

Where $D$ is the diffusion model, $f$ is the forward model, $z$ is the latent variable and $y$ is the observed signal.
These variational approaches proidve high quality samples, but are computationally expensive (approx 5-15 minutes for ImageNet-128 with an NVidia V100 GPU).


And finally {% cite dou_diffusion_2024 %} present a Bayesian filtering perspective which leads to an algorithms that converges to the true posterior.


## Bibliography
{% bibliography --cited %}
82 changes: 82 additions & 0 deletions _posts/technical-posts/pits/2024-08-10-pits-main.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,82 @@
---
title: "Projection into the typical set: PITS"
subtitle: "A new approach to solving inverse problems"
layout: post
permalink: /pits
categories:
- "research"
scholar:
bibliography: "pits.bib"
---


The advances in generative modelling have shown that we can generate high-quality samples from complex distributions.
A next step is to use these generative models as priors to help solve inverse problems.

Diffusion models {% cite song2021scorebasedgenerativemodelingstochastic %} don't support likelihood estimates, only generating samples. Thus inverse problem solvers revert to sampling from the posterior to generate solutions {% cite chung_diffusion_2023 %}. Though it's best to think of these solutions as proposals, as there is no guarantee on quality or accuracy.

Neural flows {% cite albergo_stochastic_2023 liu_flow_2022 lipman_flow_2023 %} have recently achieved s.o.t.a {% cite esser2024scalingrectifiedflowtransformers %} and do support likelihood estimates. They can be used to find the local maximum of the posterior {% cite benhamu2024dflowdifferentiatingflowscontrolled %}. However, differentiating through a flow is extremely expensive.

So, solving inverse problems via a principled approach like MAP is not quite possible with s.o.t.a generative models.
Maybe we can provide a viable alternative.

***

Inverse problems are a class of problems where we want to find the input to a function given the output. For example (within generative machine learning) we care about;

- image recoloring, where we want to find the original image given the black and white image.
- image inpainting, where we want to find the original image given the image with a hole in it.
- speech enhancement, where we want to find the clean speech given the noisy speech.

We consider the setting where we have access to a prior $p(x)$ (e.g. normal, clear speech) and likelihood function $p(y \mid x)$ (the environment adding background noise and interference). We observe $y$ and want to recover $x$.

Using Bayes rule, we can write the posterior and our goal as;

$$
\begin{align*}
p(x | y) &= \frac{p(y | x) p(x)}{p(y)} \tag{posterior} \\
x^* &= \arg \max_x p(x | y) \tag{the MAP solution}
\end{align*}
$$

> MAP will return the most likely value of $x$, given $y$.
However, is the most likely value of $x$ the 'best' guess of $x$?

We offer an alternative approach, suggesting that our guess of $x$ should be typical of the prior.
We write this as;

$$
\begin{align*}
x^* &= \arg \max_{x \in \mathcal T(p(x))_\epsilon} p(y | x) \tag{PITS}
\end{align*}
$$

where $\mathcal T(p(x))_\epsilon$ is the $\epsilon$-typical set of $p(x)$. Thus we have Projection Into the Typical Set (PITS).

<!-- Note: This assumes we are working in high enough dimensions that the typical set has concentrated and any sample from the prior is very likely to be typical. -->

I wrote a few posts to help you understand PITS;

[1.]({{ site.baseurl }}/pits/typical) Background on typicality \
[2.]({{ site.baseurl }}/pits/map) A simple worked example showing that MAP produces solutions that are not typical. \
[3.]({{ site.baseurl }}/pits/non-typical) (WIP) Does it matter if solutions are not typical? \
[4.]({{ site.baseurl }}/pits/flow) (WIP) A method to apply PITS arbitrary distributions (using neural flows). \
[5.]({{ site.baseurl }}/pits/flow-theory) (WIP) Theory showing that in the Gaussian case, PITS combined with flows is principled. \
[6.]({{ site.baseurl }}/pits/mnist-demo) (WIP) A demonstration of the PITS approach to inverse problems applied to neural flows. \
[7.]({{ site.baseurl }}/pits/review-dps) A brief review of methods attempting to solve inverse problems using s.o.t.a generative models.


<!-- A main advantage of the PITS approach is that it provides a way to control the quality (/typicality) of the solutions. -->

<!-- what if the true x is not typical? -->
<!-- why not find the MAP solution and then project it into the typical set? -->

<!-- why is it a problem if my diffusion model produces samples that are not typical? -->

## Bibliography

{% bibliography --cited %}

***

These ideas were generated while studying at [VUW](https://www.wgtn.ac.nz/) with [Bastiaan Kleijn](https://people.wgtn.ac.nz/bastiaan.kleijn) and [Marcus Frean](https://people.wgtn.ac.nz/marcus.frean). I was funded by [GN](https://www.gn.com/).
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ scholar:
bibliography: "pits.bib"
---

We solving an inverse problem with MAP vs PITS.
Here we pick the prior and likelihood to be Gaussian distributions.

$$
Expand All @@ -16,9 +17,16 @@ p(y|x) &= \mathcal{N}(x, \sigma_y^2)
\end{align*}
$$

Let's pick the prior to be a Gaussian distribution with mean 0 and standard deviation 1. And the likelihood to be a Gaussian distribution with mean 1 and standard deviation 0.5.
We find the MAP solution and the PITS solution.

<img src="{{ site.baseurl }}/assets/pits/pits-ip.png">
<figcaption>Illustration of the inverse problem for a Gaussian prior and likelihood. Where y is the observation and x is what we want to recover. Top; shows the typical sets of our two distributions and the solutions found by MAP, PITS (where mid is simple the mid point). Bottom; a reminder that we are working with Gaussians.
</figcaption>

## MAP solution

For Guassian prior and likelihood, can derive the MAP solution in closed form as follows;
For Guassian prior and likelihood, can derive the MAP solution in closed form as;

$$
\begin{align*}
Expand All @@ -34,50 +42,46 @@ $$
\nabla_x \left( -\frac{1}{2\sigma_y^2}||y - x||^2 - \frac{1}{2\sigma_x^2}||x||^2 \right) &= \nabla_x \left( -\frac{1}{2\sigma_y^2}||y - x||^2 \right) + \nabla_x \left( -\frac{1}{2\sigma_x^2}||x||^2 \right) \\
&= \frac{1}{\sigma_y^2}(y - x) - \frac{1}{\sigma_x^2}x \\
&= 0 \\
x &= \frac{\sigma_x^2}{\sigma_x^2 + \sigma_y^2}y \\
p(x|y) &= \mathcal{N}\left(\frac{\sigma_x^2}{\sigma_x^2 + \sigma_y^2}y, \frac{\sigma_x^2\sigma_y^2}{\sigma_x^2 + \sigma_y^2}\right) \\
x^* &= \frac{\sigma_x^2}{\sigma_x^2 + \sigma_y^2}y \\
\end{align*}
$$

## PITS solution
<!-- p(x|y) &= \mathcal{N}\left(\frac{\sigma_x^2}{\sigma_x^2 + \sigma_y^2}y, \frac{\sigma_x^2\sigma_y^2}{\sigma_x^2 + \sigma_y^2}\right) \\ -->

For Gaussian distributions we can rewrite PITS as;
<img src="{{ site.baseurl }}/assets/pits/map-solns.png">
<figcaption>
The observed $y$'s are updated to be more likely under the prior.
</figcaption>

$$
\begin{align*}
x^* &= \arg \max_{x \in \mathcal T(p(x))_\epsilon} p(y | x) \tag{PITS} \\
&= \arg \min_{x \in \mathcal T(p(x))_\epsilon} \parallel x - y \parallel^2
\end{align*}
$$
## PITS solution

Thus finding $x^*$ becomes a problem of finding the closest point in the $\epsilon$-typical set of $p(x)$ to $y$. Which can be calculated by normalising $y$.
For Gaussian distributions derived the PITS solution as;

$$
\begin{align*}
x^* &= \frac{y}{\parallel y \parallel}
x^* &= \arg \max_{x \in \mathcal T(p(x))_\epsilon} p(y | x) \tag{PITS} \\
&= \arg \max_{x \in \mathcal T(p(x))_\epsilon} \mathcal{N}(x, \sigma_y^2) \\
&= \arg \max_{x \in \mathcal T(p(x))_\epsilon} \exp \left( -\frac{1}{2\sigma_y^2}||y - x||^2 \right) \\
&= \arg \min_{x \in \mathcal T(p(x))_\epsilon} \parallel x - y \parallel^2 \\
\lim_{\epsilon\to 0} \mathcal T(\mathcal N(0, \sigma^2 I))_\epsilon &= \{x ; \parallel x \parallel = \sqrt{d} \sigma \} \\
x^* &= \sqrt{d} \sigma\frac{y}{\parallel y \parallel}
\end{align*}
$$


## An illustrated example

Let's pick the prior to be a Gaussian distribution with mean 0 and standard deviation 1. And the likelihood to be a Gaussian distribution with mean 1 and standard deviation 0.5.


<img src="{{ site.baseurl }}/assets/pits/pits-ip.png">
<figcaption>Illustration of the inverse problem for a Gaussian prior and likelihood. Top;
</figcaption>

<img src="{{ site.baseurl }}/assets/pits/map-solns.png">
<figcaption>
The observed $y$'s are updated to be more likely under the prior.
</figcaption>
<!-- Finding the closest $x$ to $y$ such that $x$ is in $\epsilon$-typical set becomes -->

<img src="{{ site.baseurl }}/assets/pits/pits-solns.png">
<figcaption>
The observed $y$'s are projected into the typical set.
</figcaption>

## Bibliography
## Accuracy

We can compare the accuracy of the MAP and PITS solutions by looking at the mean squared error (MSE) between the true $x$ and the estimated $x$.

$$
\text{err} = \frac{1}{N \sqrt{d}} \sum_{i=1}^N ||x_i - x_i^*||^2 \\
$$

{% bibliography --cited %}
We find that MAP provides slightly more accurate solutions than PITS.
For example, if we pick d=2048, average over 10000 samples. We get an average normalised error of MAP $0.45$ and PITS $0.46$.
Loading

0 comments on commit a84bc10

Please sign in to comment.