RecapPrevious section introduced Langevin Dynamics, a special diffusion process that aims to generate samples from a distribution . It is defined as:
or equivalently
where could be roughly treated as , where is a standard Gaussian random variable. is the score function. The Langevin dynamics for acts as an identity operation on the distribution, transforming samples from into new samples from the same distribution.
In this section, we present the fundamental theory of Denoising Diffusion Probabilistic Models (DDPMs):
- Forward Diffusion Process: How DDPMs gradually corrupt an image into pure Gaussian noise
- Backward Diffusion Process: How DDPMs generate images by gradually denoising pure Gaussian noise
Prerequisites: Calculus, SDE and Langevin Dynamics.
The Denoising Diffusion Probabilistic Model (DDPM)
DDPMs 1 are models that generate high-quality images from noise via a sequence of denoising steps. Denoting images as random variable of the probabilistic density distribution , the DDPM aims to learn a model distribution that mimics the image distribution and draw samples from it. The training and sampling of the DDPM utilize two diffusion process: the forward and the backward diffusion process.
The Forward Diffusion Process
The forward diffusion process in DDPM generates the necessary training data: clean images and their progressively noised counterparts. It gradually adds noise to existing images using the Ornstein-Uhlenbeck diffusion process (OU process) 2 within a finite time interval . The OU process is defined by the stochastic differential equation (SDE):
in which is the forward time of the diffusion process, is the noise contaminated image at time , and is a Brownian noise.
Note that is just the score function of the standard Gaussian distribution . Thus, the forward diffusion process corresponds to the Langevin dynamics of the standard Gaussian .
The forward diffusion process has as its stationary distribution. This means, for any initial distribution of positions , their density converges to as . When these positions represent vectors of clean images, the process describes a gradual noising operation that transforms clean images into Gaussian noise.
One forward diffusion step with a step size of is displayed in the following picture.
The Backward Diffusion Process
The backward diffusion process is the conjugate of the forward process. While the forward process evolves toward , the backward process reverses this evolution, restoring to .
To derive it, we employ Langevin dynamics as a stepping stone, which provides the fastest way to obtain the backward diffusion process:
NOTEacts as an “identity” operation on a distribution. Thus, the composition of forward and backward processes, at time , must yield the Langevin dynamics for . As shown in the following picture
To formalize this, consider the Langevin dynamics for with a distinct time variable , distinguished from the forward diffusion time . This dynamics can be decomposed into forward and backward components as follows:
where is the score function of . We have utilized the property that .
The “Forward” part in this decomposition corresponds to the forward diffusion process, effectively increasing the forward diffusion time by , bringing the distribution to . Since the forward and backward components combine to form an “identity” operation, the “Backward” part must reverse the forward process—decreasing the forward diffusion time by and restoring the distribution back to .
Now we can define the backward process according to the backward part in the equation above, and a backward diffusion time different from the forward diffusion time :
One step of this backward diffusion process with acts as a reversal of the forward process.
The backward diffusion process itself is also a standalong SDE that advances the backward diffusion time
These two interpretations help us determine the relationship between the forward diffusion time and the backward diffusion time . Since is interpreted as a “decrease” in the forward diffusion time , we have
which means the backward diffusion time is the inverse of the forward. To make lies in the same range of the forward diffusion time, we define . In this notation, the backward diffusion process 3 is
in which is the backward time, is the score function of the density of in the forward process.
Forward-Backward Duality
The forward and backward processes form a dual pair, advancing the time means receding time by the same amount. The following figure illustrates consecutive steps of and .
The green arrows in the above picture represent consecutive forward process steps that advance the forward diffusion time , while the blue arrows indicate consecutive backward process steps that advance the backward diffusion time .
Each horizontal row in this picture corresponds to consecutive steps of Langevin dynamics, which alters the samples while maintaining the probability density. This illustrates the dual relationship between the probability of samples evolving through the forward diffusion process and the backward diffusion process.
TIPIt’s important to note that the backward diffusion process does not generate identical samples to the forward process; rather, it produces samples according to the same probability distribution, due to the identity property of Langevin dynamics.
To formalize the duality, we define the densities of (forward) as , the densities of (backward) as . If we initialize
then their evolution are related by
For large , converges to . Thus, the backward process starts at with and, after evolving to , generates samples from the data distribution:
This establishes an exact correspondence between the forward diffusion process and the backward diffusion process, indicating that the backward diffusion process can generate image data from pure Gaussian noise.
What is Next
We demonstrated that backward diffusion—the dual of the forward process—can generate image data from noise. However, this requires access to the score function at every timestep . In practice, we approximate this function using a neural network. In the next section, we will explain how to train such score networks.
Stay tuned for the next installment!
Discussion
If you have questions, suggestions, or ideas to share, please visit the discussion post.
Footnotes
-
Ho, J., Jain, A., & Abbeel, P. (2020). Denoising diffusion probabilistic models. Advances in neural information processing systems, 33, 6840-6851. ↩
-
Uhlenbeck, G. E., & Ornstein, L. S. (1930). On the theory of the Brownian motion. Physical Review, 36(5), 823–841. ↩
-
Anderson, B. D. O. (1982). Reverse-time diffusion equation models. Stochastic Processes and their Applications, 12(3), 313–326. ↩