1060 words
5 minutes
The Fastest Way to Diffusion Model Theory - III
Recap

Previous section introduced Forward Process and Backward Process of Denoising Diffusion Probabilistic Model (DDPM).

Forward Process

where is the forward diffusion time. This process describes a gradual noising operation that transforms clean images into Gaussian noise.

where is the backward diffusion time, is the score function of the density of in the forward process.

In this section, we will show how to train a neural network that models the score function .

Prerequisites: Calculus.

Implementation of the Denoising Diffusion Probabilistic Model (DDPM)#

Numerical Implementation of the Forward Process#

To numerically simulate the forward diffusion process, we divide the time range into intervals of length , where . We denote the intermediate times as .

The vanilla discretization of the is given by:

where we approximate as , is standard Gaussian random variable. (refer to the previous section).

A more subtle but equivalent implementation is the variance-preserving (VP) form 1:

This formulation ensures that if is initialized with unit variance, then the variance of remains equal to 1. It gradually adds a small amount of Gaussian noise to the image at each time step , gradually contaminating the image until .

WARNING

Note that our interpretation of differs from that in 1, treating as a varying time-step size to solve the autonomous SDE (1.5 OU process noise) instead of a time-dependent SDE. Our interpretation greatly simplifies future analysis, but it holds only if every is sufficiently small.

Instead of expressing the iterative relationship between and , we can directly represent the dependency of on using the following forward relation:

where denotes the contamination weight, and represents standard Gaussian noise.

TIP

An useful property we shall exploit later is that for infinitesimal time steps , the contamination weight is the exponential of the diffusion time

Numerical Implementation of the Backward Process#

The backward diffusion process is used to sample from the DDPM by removing the noise of an image step by step. It is the time reversed version of the OU process, starting at , using the reverse of the OU process (1.5 reverse diffusion process).

The vanilla discretization of the is given by:

where represents the backward time step, and is the image at the th step with time .

A more common discretization is:

This formulation is equivalent to the vanilla discretization when is small. The score function is typically modeled by a neural network trained using a denoising objective.

Training the Score Function#

Training the score function requires a training objective. We will show that the score function could be trained with a denoising objective.

DDPM is trained to removes the noise from in the forward diffusion process, by training a denoising neural network to predict and remove the noise . This means that DDPM minimizes the denoising objective 2:

where is determined from according to the process.

Now we show that trained with the above objective is proportional to the score function . There are two important properties regarding the relationship between the noise and the score function :

  1. Gaussian Distribution of :
    According to the , the distribution of given is a Gaussian distribution, expressed as:

  2. Proportionality of Noise to Score Function:
    The noise is directly proportional to a score function, given by:

    where represents the score of the conditional probability density at .

These properties indicate that the noise is directly related to a conditional score function, which connects to the score function through the above equations.

Now we are very close to our target. The conditional score function is connected to the score function through the following equation:

where is an arbitrary function and is the score function of the probability density of .

Substituting the into the , expanding the squares, and utilizing the above equation, we can derive that the denoising objective is equivalent to a denoising score matching objective:

This objectives says that the denoising neural network is trained to approximate a scaled score function 3

Summary:#

We have covered all aspects of the DDPM theory. You can now find a suitable dataset, perform the , train a denoising neural network using the , and subsequently generate new samples with the and the .

What is Next#

In the next section, we will discuss an alternative version of the backward diffusion process: ordinary differential equation (ODE) based backward sampling. This approach serves as the foundation for several modern architectures, such as rectified flow diffusion models.

Stay tuned for the next installment!

Discussion#

If you have questions, suggestions, or ideas to share, please visit the discussion post.


Footnotes#

  1. Yang Song, et al. “Score-Based Generative Modeling through Stochastic Differential Equations.” ArXiv (2020). 2

  2. Jonathan Ho, et al. “Denoising Diffusion Probabilistic Models.” ArXiv (2020).

  3. Ling Yang, et al. “Diffusion Models: A Comprehensive Survey of Methods and Applications.” ACM Computing Surveys (2022).

The Fastest Way to Diffusion Model Theory - III
https://scraed.github.io/scraedBlog/posts/dmtheory/fastest_way__diffusion_model_theory_iii/
Author
Candi Zheng
Published at
2025-06-23
License
CC BY-NC-SA 4.0