When I first learned about diffusion models, I was introduced to them as a type of variational autoencoder (VAE) applied to a series of quantities . Deriving the forward and reverse processes required lengthy derivations spanning multiple pages, dense with priors, posteriors, Bayesian theorems, and mathematical intricacies. Later, I encountered the stochastic differential equation (SDE) perspective, which frames diffusion models through Fokker-Planck and Kolmogorov backward equations—concepts no simpler to grasp than the VAE approach.
This blog series aims to provide a concise, self-contained, and rigorous introduction to diffusion models, specifically from the perspective of Langevin dynamics. The key to understanding diffusion models lies in understanding the following triangle relation:
which illustrates the connection among the forward, backward diffusion process and the Langevin dynamics.
While some concepts may be challenging, I believe this approach offers the fastest and most straightforward pathway to understanding diffusion model theory. We will focus exclusively on fundamental principles of stochastic differential equations (SDEs) and calculus to intuitively derive the core theory, revealing its intrinsic structure without the need for advanced machinery.
- Contents
Prelude: Langevin Dynamics as Identity
In this section, we cover the basics of Stochastic Differential Equations (SDEs), focusing on two fundamental concepts:
- Brownian noise (): The core random process driving SDE dynamics
- Langevin Dynamics: The basic SDE to generate samples from a probability distribution.
By the end of this section, you will grasp one edge of the triangle relation: the “identity” property of Langevin dynamics on its stationary distribution .
Prerequisites: Calculus, particularly series expansions and vector calculus (gradients, Laplacians).
Diffusion Process
The Diffusion Process forms the mathematical foundation of diffusion models, describing a system’s evolution through deterministic drift and stochastic noise. Here we consider a diffusion process of the following form of stochastic differential equation (SDE):
where the drift term governs deterministic motion, while adds Brownian noise.
NOTEWhen no quadratic terms of are involved, can often be roughly treated as ,
where is a standard Gaussian random variable.
Brownian noise, denoted as , is a core feature of stochastic differential equations (SDEs), highlighting their random behavior. It acts like tiny bursts of Gaussian random noise.
To grasp it better, consider this formal definition:
where each is an independent standard Gaussian random variable with mean and covariance matrix (the identity matrix).
This limit emphasizes that isn’t simply one Gaussian random variable with mean . Instead, it’s the buildup from countless tiny, independent Gaussian steps. This buildup lets us calculate its covariance as a vector product:
where is the identity matrix. This covariance structure reveals that has variance proportional to the infinitesimal time increment , linking it intrinsically to while differing from ordinary calculus.
Itô’s lemma
Because Brownian noise scales like , it bends ordinary calculus rules for SDEs.
For instance, rescaling time from to in regular calculus gives , but for Brownian noise , since its scales with , the transformation becomes to preserve that scaling.
Similarly, differentiating a function in ordinary calculus yields . But for , it follows the Itô’s lemma:
We won’t derive it step by step here. Intuitively, it comes from a Taylor expansion: plug in the , and since , the second-order Laplacian term persists as a first-order contribution. This Laplacian highlights the key difference from deterministic calculus. We’ll later use this lemma to analyze how the probability distribution of evolves.
Langevin Dynamics
Langevin Dynamics is a special diffusion process that aims to generate samples from a probability distribution . It is defined as:
where is the score function of . This dynamics is often used as a Monte Carlo sampler to draw samples from , since is its stationary distribution—the distribution that converges to and and remains at as , regardless of the initial distribution of .
NOTEStationary distribution
is the stationary distribution of the Langevin dynamics. This means: If you start with particles whose initial positions already follow (like sampling from ), then when you evolve those same particles using Langevin dynamics, their positions at any future time will still follow . The distribution doesn’t change over time.
If you’re comfortable assuming that is the stationary distribution for the Langevin dynamics, that’s fine. If not, here’s a short proof.
To check stationarity, we show that after a small time step from 0 to , the distribution of remains .
Pick any smooth test function . Start with initial points drawn from . We track the change in the average value of at , i.e., .
Using (substitute with ) we have, to the first order accuracy,
and noting that the noise term averages to zero (), we get:
where the zero comes from plugging in (making the term inside the divergence vanish).
Since this average change is zero for any test function , the distribution must stay unchanged.
Alternative form of the Langevin Dynamics:
Recall that the term in Langevin dynamics scales as . We can reformulate the Langevin dynamics by substituting with , resulting in the alternative form of the Langevin Dynamics:
IMPORTANTLangevin Dynamics as ‘Identity’
The stationary of is very important: The Langevin dynamics for acts as an “identity” operation on the distribution, transforming samples from into new samples from the same distribution. This property enables a simple way to derive the forward and backward diffusion processes of diffusion models.
Langevin Dynamics as Monte Carlo Sampler
Langevin dynamics can be used to generate samples from a distribution , given its score function . But its success hinges on two critical factors. First, the method is highly sensitive to initialization - a poorly chosen may trap the sampling process in local likelihood maxima, failing to explore the full distribution. Second, inaccuracies in the score estimation, particularly near , can prevent convergence altogether. These limitations led to the development of diffusion models, which eliminate the difficulty in choosing : all samples are generated by gradually denoising pure Gaussian noise.
What is Next
In the next section, we will use Langevin dynamics as a stepping stone to derive the forward and backward diffusion processes. We will examine their mathematical formulation and how they form a dual pair—each reversing the other’s evolution.
Stay tuned for the next installment!
Discussion
If you have questions, suggestions, or ideas to share, please visit the discussion post.
Cite this blog
This blog is a reformulation of the appendix of the following paper.
@misc{zheng2025lanpainttrainingfreediffusioninpainting, title={LanPaint: Training-Free Diffusion Inpainting with Asymptotically Exact and Fast Conditional Sampling}, author={Candi Zheng and Yuan Lan and Yang Wang}, year={2025}, eprint={2502.03491}, archivePrefix={arXiv}, primaryClass={eess.IV}, url={https://arxiv.org/abs/2502.03491},}