RecapPrevious section introduced Forward Process and Backward Process of Denoising Diffusion Probabilistic Model (DDPM).
Forward Process
where is the forward diffusion time. This process describes a gradual noising operation that transforms clean images into Gaussian noise.
Backward Process
where is the backward diffusion time, is the score function of the density of in the forward process.
In this section, we will show how to train a neural network that models the score function .
Prerequisites: Calculus.
Training DDPM with the Denoising Objective
Numerical Implementation of the Forward Process
To numerically simulate the forward diffusion process, we divide the time range into intervals of length , where . We denote the intermediate times as .
The vanilla discretization of the is given by:
where we approximate as , is standard Gaussian random variable. (refer to the previous section).
A more subtle but equivalent implementation is the variance-preserving (VP) form 1:
The variance-preserving form adds small Gaussian noise to an image step by step, eventually turning it into . When is small, it is equivalent to the vanilla discretization, since . Its key benefit is maintaining unit variance: if starts with unit variance, keeps it too. This is because .
WARNINGNote that our interpretation of differs from that in 1, treating as a varying time-step size to solve the autonomous SDE (1.5 OU process noise) instead of coefficients of a time-dependent SDE. Our interpretation greatly simplifies future analysis, but it holds only if every is sufficiently small.
Instead of expressing the iterative relationship between and , we can directly represent the dependency of on using the following forward relation:
where are clean images from dataset, denotes the contamination weight, and represents standard Gaussian noise that accumulates noises from to .
TIPAn useful property we shall exploit later is that for infinitesimal time steps , the contamination weight is the exponential of the diffusion time
Numerical Implementation of the Backward Process
The backward diffusion process is used to sample from the DDPM by removing the noise of an image step by step. It is the time reversed version of the OU process, starting at , using the reverse of the OU process (1.5 reverse diffusion process).
The vanilla discretization of the is given by:
where represents the backward time step, and is the image at the th step with time .
A more common discretization 1 is:
This formulation is equivalent to the vanilla discretization when is small. The score function is typically modeled by a neural network trained using a denoising objective.
Training the Score Function
Training the score function requires a training objective. We will show that the score function could be trained with a denoising objective.
DDPM is trained to removes the noise from in the forward diffusion process, by training a denoising neural network to predict and remove the noise . This means that DDPM minimizes the denoising objective 2:
where is determined from according to the process.
Now we show that trained with the above objective is proportional to the score function . There are two important properties regarding the relationship between the noise and the score function :
-
Gaussian Distribution of :
According to the , the distribution of given is a Gaussian distribution, expressed as: -
Proportionality of Noise to Score Function:
The noise is directly proportional to a score function, given by:where represents the score of the conditional probability density at .
These properties indicate that the noise is directly related to a conditional score function, which connects to the score function through the above equations.
Now we are very close to our target. The conditional score function is connected to the score function through the following equation:
where is an arbitrary function, , and is the score function of the probability density of .
Substituting the into the , expanding the squares, and utilizing the above equation, and drop terms that are irrelavant with parameter , we can show that optimizing the denoising objective is equivalent to optimize a denoising score matching objective:
This objectives says that the denoising neural network is trained to approximate a scaled score function 3
With the help of this relation between and score function, we could rewrite the backward process as
Summary:
- We’ve covered all aspects of DDPM theory. To implement it:
- Select a suitable dataset as the empirical distribution and sample data points as .
- Apply the to add noise to the data.
- Record the noise used during this process.
- Train a denoising neural network using the noise recorded and the .
- Generate new samples using the .
What is Next
In the next section, we will discuss an alternative version of the backward diffusion process: ordinary differential equation (ODE) based backward sampling. This approach serves as the foundation for several modern architectures, such as rectified flow diffusion models.
Stay tuned for the next installment!
Discussion
If you have questions, suggestions, or ideas to share, please visit the discussion post.
Cite this blog
This blog is a reformulation of the appendix of the following paper.
@misc{zheng2025lanpainttrainingfreediffusioninpainting, title={LanPaint: Training-Free Diffusion Inpainting with Asymptotically Exact and Fast Conditional Sampling}, author={Candi Zheng and Yuan Lan and Yang Wang}, year={2025}, eprint={2502.03491}, archivePrefix={arXiv}, primaryClass={eess.IV}, url={https://arxiv.org/abs/2502.03491},}
Footnotes
-
Yang Song, et al. “Score-Based Generative Modeling through Stochastic Differential Equations.” ArXiv (2020). ↩ ↩2 ↩3
-
Jonathan Ho, et al. “Denoising Diffusion Probabilistic Models.” ArXiv (2020). ↩
-
Ling Yang, et al. “Diffusion Models: A Comprehensive Survey of Methods and Applications.” ACM Computing Surveys (2022). ↩