forked from barryclark/jekyll-now
-
Notifications
You must be signed in to change notification settings - Fork 5
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
13 changed files
with
354 additions
and
15 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,66 @@ | ||
--- | ||
--- | ||
References | ||
========== | ||
@misc{albergo_stochastic_2023, | ||
title = {Stochastic {Interpolants}: {A} {Unifying} {Framework} for {Flows} and {Diffusions}}, | ||
shorttitle = {Stochastic {Interpolants}}, | ||
url = {http://arxiv.org/abs/2303.08797}, | ||
abstract = {A class of generative models that unifies flow-based and diffusion-based methods is introduced. These models extend the framework proposed in [1], enabling the use of a broad class of continuoustime stochastic processes called ‘stochastic interpolants’ to bridge any two arbitrary probability density functions exactly in finite time. These interpolants are built by combining data from the two prescribed densities with an additional latent variable that shapes the bridge in a flexible way. The time-dependent probability density function of the stochastic interpolant is shown to satisfy a first-order transport equation as well as a family of forward and backward Fokker-Planck equations with tunable diffusion. Upon consideration of the time evolution of an individual sample, this viewpoint immediately leads to both deterministic and stochastic generative models based on probability flow equations or stochastic differential equations with an adjustable level of noise. The drift coefficients entering these models are time-dependent velocity fields characterized as the unique minimizers of simple quadratic objective functions, one of which is a new objective for the score of the interpolant density. Remarkably, we show that minimization of these quadratic objectives leads to control of the likelihood for any of our generative models built upon stochastic dynamics. By contrast, we establish that generative models based upon a deterministic dynamics must, in addition, control the Fisher divergence between the target and the model. We also construct estimators for the likelihood and the cross-entropy of interpolant-based generative models, discuss connections with other stochastic bridges, and demonstrate that such models recover the Schr¨odinger bridge between the two target densities when explicitly optimizing over the interpolant.}, | ||
language = {en}, | ||
urldate = {2023-07-25}, | ||
publisher = {arXiv}, | ||
author = {Albergo, Michael S. and Boffi, Nicholas M. and Vanden-Eijnden, Eric}, | ||
month = mar, | ||
year = {2023}, | ||
note = {arXiv:2303.08797 [cond-mat]}, | ||
keywords = {Computer Science - Machine Learning, Condensed Matter - Disordered Systems and Neural Networks, Mathematics - Probability}, | ||
file = {Albergo et al. - 2023 - Stochastic Interpolants A Unifying Framework for .pdf:/local/scratch/telfaralex/Zotero/storage/ZIMWLRQ9/Albergo et al. - 2023 - Stochastic Interpolants A Unifying Framework for .pdf:application/pdf}, | ||
} | ||
|
||
@misc{liu_flow_2022, | ||
title = {Flow {Straight} and {Fast}: {Learning} to {Generate} and {Transfer} {Data} with {Rectified} {Flow}}, | ||
shorttitle = {Flow {Straight} and {Fast}}, | ||
url = {http://arxiv.org/abs/2209.03003}, | ||
abstract = {We present rectified flow, a surprisingly simple approach to learning (neural) ordinary differential equation (ODE) models to transport between two empirically observed distributions π0 and π1, hence providing a unified solution to generative modeling and domain transfer, among various other tasks involving distribution transport. The idea of rectified flow is to learn the ODE to follow the straight paths connecting the points drawn from π0 and π1 as much as possible. This is achieved by solving a straightforward nonlinear least squares optimization problem, which can be easily scaled to large models without introducing extra parameters beyond standard supervised learning. The straight paths are special and preferred because they are the shortest paths between two points, and can be simulated exactly without time discretization and hence yield computationally efficient models. We show that the procedure of learning a rectified flow from data, called rectification, turns an arbitrary coupling of π0 and π1 to a new deterministic coupling with provably non-increasing convex transport costs. In addition, recursively applying rectification allows us to obtain a sequence of flows with increasingly straight paths, which can be simulated accurately with coarse time discretization in the inference phase. In empirical studies, we show that rectified flow performs superbly on image generation, image-to-image translation, and domain adaptation. In particular, on image generation and translation, our method yields nearly straight flows that give high quality results even with a single Euler discretization step.}, | ||
language = {en}, | ||
urldate = {2023-08-03}, | ||
publisher = {arXiv}, | ||
author = {Liu, Xingchao and Gong, Chengyue and Liu, Qiang}, | ||
month = sep, | ||
year = {2022}, | ||
note = {arXiv:2209.03003 [cs]}, | ||
keywords = {Computer Science - Machine Learning}, | ||
file = {Liu et al. - 2022 - Flow Straight and Fast Learning to Generate and T.pdf:/local/scratch/telfaralex/Zotero/storage/F7SEBUCV/Liu et al. - 2022 - Flow Straight and Fast Learning to Generate and T.pdf:application/pdf}, | ||
} | ||
|
||
|
||
@article{lipman_flow_2023, | ||
title = {{FLOW} {MATCHING} {FOR} {GENERATIVE} {MODELING}}, | ||
abstract = {We introduce a new paradigm for generative modeling built on Continuous Normalizing Flows (CNFs), allowing us to train CNFs at unprecedented scale. Specifically, we present the notion of Flow Matching (FM), a simulation-free approach for training CNFs based on regressing vector fields of fixed conditional probability paths. Flow Matching is compatible with a general family of Gaussian probability paths for transforming between noise and data samples—which subsumes existing diffusion paths as specific instances. Interestingly, we find that employing FM with diffusion paths results in a more robust and stable alternative for training diffusion models. Furthermore, Flow Matching opens the door to training CNFs with other, non-diffusion probability paths. An instance of particular interest is using Optimal Transport (OT) displacement interpolation to define the conditional probability paths. These paths are more efficient than diffusion paths, provide faster training and sampling, and result in better generalization. Training CNFs using Flow Matching on ImageNet leads to consistently better performance than alternative diffusion-based methods in terms of both likelihood and sample quality, and allows fast and reliable sample generation using off-the-shelf numerical ODE solvers.}, | ||
language = {en}, | ||
author = {Lipman, Yaron and Chen, Ricky T Q and Ben-Hamu, Heli and Nickel, Maximilian and Le, Matt}, | ||
year = {2023}, | ||
file = {Lipman et al. - 2023 - FLOW MATCHING FOR GENERATIVE MODELING.pdf:/local/scratch/telfaralex/Zotero/storage/4EP7JY9I/Lipman et al. - 2023 - FLOW MATCHING FOR GENERATIVE MODELING.pdf:application/pdf}, | ||
} | ||
|
||
@misc{dao2023flowmatchinglatentspace, | ||
title={Flow Matching in Latent Space}, | ||
author={Quan Dao and Hao Phung and Binh Nguyen and Anh Tran}, | ||
year={2023}, | ||
eprint={2307.08698}, | ||
archivePrefix={arXiv}, | ||
primaryClass={cs.CV}, | ||
url={https://arxiv.org/abs/2307.08698}, | ||
} | ||
|
||
@misc{esser2024scalingrectifiedflowtransformers, | ||
title={Scaling Rectified Flow Transformers for High-Resolution Image Synthesis}, | ||
author={Patrick Esser and Sumith Kulal and Andreas Blattmann and Rahim Entezari and Jonas Müller and Harry Saini and Yam Levi and Dominik Lorenz and Axel Sauer and Frederic Boesel and Dustin Podell and Tim Dockhorn and Zion English and Kyle Lacey and Alex Goodwin and Yannik Marek and Robin Rombach}, | ||
year={2024}, | ||
eprint={2403.03206}, | ||
archivePrefix={arXiv}, | ||
primaryClass={cs.CV}, | ||
url={https://arxiv.org/abs/2403.03206}, | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,128 @@ | ||
--- | ||
title: The behaviour of ideal generative flows | ||
layout: post | ||
permalink: /ideal-si/ | ||
scholar: | ||
bibliography: "ideal-si.bib" | ||
--- | ||
|
||
Stocahastic interpolants are a recent innovation that frames generative modelling as building a transport map between distributions. {% cite albergo_stochastic_2023 liu_flow_2022 lipman_flow_2023 %} | ||
|
||
We are given two distributions, $p(x)$ and $q(x)$ over the same space $X$. Our goal is to find a vector field $v$ that allows us to map from $p(x)$ to $q(x)$. | ||
We can do so by minimising the following objective; | ||
|
||
$$ | ||
\begin{aligned} | ||
b(z, t) &= \mathbb E\big[\nabla_t I(x, y, t) \big] \\ | ||
\mathcal L(\theta) &= \int_0^1 \mathbb E \big[ \parallel v(z, t, \theta) - b(z, t) \parallel_2^2 \big] dt \\ | ||
\end{aligned} | ||
$$ | ||
|
||
Where $I(x, y, t)$ is the interpolant function, and $v(z, t, \theta)$ is the parameterised vector field. | ||
|
||
Here we explore the behaviour of transport maps generated by; | ||
|
||
- stochastic interpolants / linear flows | ||
- optimal transport maps | ||
<!-- what about a comparison to NODE or ?? --> | ||
|
||
## Stochastic interpolants | ||
|
||
Here are a few examples of what stochastic interpolants do. | ||
\footnote{these SI transport maps can be calulated exactly for gaussian distributions. So we can verify that the behaviour we observe is not due to the approximations made by a neural network.} | ||
|
||
### Splitting modes | ||
|
||
> __Q:__ If I sample from a mode of p(x), must it map to a mode of q(x)? No. | ||
 | ||
|
||
> We have two 2D gaussian distributions (in blue and cyan). We use SI to learn a map from p(x) (blue) to q(x) (cyan). The learned mapping 'splits' the modes in p(x) when mapping from p(x) to p(y). ie if we sample from a mode in p(x) (circle or dot) we get 50:50 samples from modes in q(x). Note this mapping approximated using a neural network. | ||
In the more tivial case, if we map from a single gaussian distribution to a multi modal gaussian distribution, then of course the mode of the single gaussian must be split. | ||
|
||
### Maximum likelihood sample | ||
|
||
> __Q:__ If I take the max likelihood sample from p, must it map to a max likelihood sample from q? | ||
 | ||
|
||
> Taking the max likelihood x from p(x), we generate a sample using our mapping from p(x) to q(x). We start wih $p(x) \approx 0.8$ and we calculate $p(y\mid x) \approx 0.07$. | ||
So it's possible for a high probability sample from $p(x)$ to map to a low probability sample in $q(x)$! | ||
|
||
This observation calls into question the validity / reliability of using flow based approaches to generate solutions to problems with optimal solutions; such as [sudoku](https://arxiv.org/abs/2210.11633), [source separation](https://ieeexplore.ieee.org/document/10095310/), ... | ||
|
||
### Mapping the identity | ||
|
||
> __Q:__ If I learn to map from $p(x)$ to $q(x)$, then I should learn the identity map? No. | ||
 | ||
|
||
> The learned map from $p(x)$ to $q(x)$ when $p(x) \mathop{=}_d q(x)$. | ||
Mapping from $p(x)$ to $p(x)$ learns a non-linear transform. | ||
However, it is possible to 'rectify' the flow. See [@liu_flow_2022] for more details. | ||
|
||
<!-- While it may seem strange to see such an non-uniform mapping. It is the result of The all to all pairings. --> | ||
|
||
|
||
<!-- ### Topology of modes | ||
This should be preserved?! --> | ||
|
||
|
||
|
||
## Optimal transport maps | ||
|
||
WIP | ||
|
||
<!-- ## Thoughts | ||
Consider the problem of speech enhancement. The --> | ||
|
||
<!-- None of this matters for the speech - noisy speech setting since the feature spaces will align. And solving an optimal transport problem should give us good results (since p(x) and p(y) share similar feature interpretations. ie X and Y represent the same state space). and p(y) is (approximately) a slightly higher variance version of p(x) (ie convolved with a blurring gaussian). --> | ||
|
||
|
||
|
||
<!-- - how similar do these spaces need to be? --> | ||
|
||
## Alternative setting | ||
|
||
The 'unsupservised translation' problem hints at an alternative setting. [@grave_unsupervised_nodate] show it's possible to "infer a bilingual lexicon, without supervised data, by aligning word embeddings trained on monolingual data". | ||
|
||
So, imagine a setting where we have two 'similar' distributions on different spaces. We want to find a way to 'align' them. | ||
|
||
<!-- want a kind of topology preserving map. can be done by enforcing a cost to local changes? --> | ||
|
||
$$ | ||
T p(x) \to q(y) \\ | ||
X \neq Y | ||
$$ | ||
|
||
Other applicatons could include; | ||
|
||
- unsupervised phoneme translation (or accent 'correction') | ||
- unsupervised | ||
|
||
Open questions; | ||
|
||
- is it possible to achieve this within the transport framework (with the right cost function)? | ||
- | ||
|
||
|
||
## Discussion | ||
|
||
More generally, which other properties (modes, max likelihood, ) of a distribution are (not) invariant to the transport map? | ||
|
||
\begin{align} | ||
?? | ||
\end{align} | ||
|
||
if we are mapping from text to images, then X and Y are clearly different. But if we are mapping from | ||
|
||
<!-- HOW DOES HIGH DIMENSIONALITY AFFECT THESE OBSERVATIONS --> | ||
|
||
## Bibliography | ||
|
||
{% bibliography %} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,37 @@ | ||
--- | ||
title: "Note on ??" | ||
layout: post | ||
permalink: /opt-init-const/ | ||
--- | ||
|
||
Consider a constrained optimisation problem. | ||
We want to minimise an objective, while also staying close to a specific point $y$ (the constraint). | ||
|
||
Is there a rigorous connection between; | ||
|
||
1. an optimisation with a penalty term (the typical approach) | ||
2. a finite-step optimisation started from a specific initialisation | ||
|
||
$$ | ||
x^* = \arg\min_x f(x) + \lambda D(x, y) \\ | ||
D(x, y) = \frac{1}{2} \| x - y \|^2 \\ | ||
x_0 \sim \mathcal{N}(0, \sigma^2) \\ | ||
$$ | ||
|
||
*** | ||
|
||
We have a fixed budget of $T$ steps to optimise a function $f(x)$, and we start from a specific initialisation $x_0$. | ||
|
||
$$ | ||
x^* = \arg\min_x f(x) \\ | ||
x_0 = y | ||
$$ | ||
|
||
|
||
*** | ||
|
||
What's the point of this? / Questions | ||
|
||
- 2. will vary dramatically depending on the optimiser used, the learning rate, | ||
- 2. would allow easier worst case distance bounds? | ||
- relatonship to trust region methods? |
Oops, something went wrong.