Tag: sampling

Normalized Uniform Disk Pattern

Normalized Uniform Disk Pattern

Motivation

In computer graphics, one of the approaches to simulate a depth-of-field (DOF) effect with bokeh is to “mix” nearby pixel values. The idea is to use a sampling pattern to retrieve nearby pixel values (usually referred to as “scatter-as-gather”). One option is to use a normalized sampling pattern that determines the bokeh shape, while the circle-of-confusion (COC) is used to dictate the scale of this pattern (in texels). That way, more out-of-focus pixels will tend to gather pixel values that are further away, increasing the out-of-focus blur and bokeh effect. 

Figure 1: DOF and Bokeh Effect

The goal of this article is to introduce a normalized symmetric disk sampling computation method that will yield a circular bokeh shape. I will not explain how to simulate the DOF effect using the pattern in this article. 

My initial motivation for this derivation was to understand the math behind a sampling pattern such as Unity’s disk kernel. If you’re interested in seeing an example of a DOF effect that uses this pattern, I highly encourage you to read the excellent Catlike Coding Depth of Field tutorial.

Problem Statement

Let’s formalize the problem at hand. We want to symmetrically sample the normalized 2D circle using a disk pattern. That is, equally-spaced concentric rings with equally distributed sample points (referred to as samples going forward) on their perimeters.

Let L be the total number of rings (layers). We will be using l to denote a running ring index, where 0≤l<L. Note that the innermost ring (l = 0) always has one sample exactly at the origin (i.e. when simulating DOF, this sample corresponds to the current pixel being blurred). 

Let S be the base-ring size. That is, the number of samples in the second innermost ring (l = 1).

Let NTotal be the total number of samples ( NTotal depends on L,S). We will be using i to denote a sample index, where 0≤i<NTotal . NTotal is usually used as a proxy for the sampling pattern efficiency (i.e. number of taps required to blur each pixel).

We parameterize our disk sampling pattern by L>0 and S>0, which will dictate a specific NTotal. 

Figure 2: Disk Pattern Parametrization and Notation

To make the code highly parallelizable, I want to find a way to compute each sample point directly without introducing any dependency on other samples. 

Hence, our final goal can be stated as: given a disk parametrization L,S and a sample index 0≤i<NTotal – find the 2D normalized disk sampling point pi(x,y) with minimum machine instructions.

About the Demos

I will be using shadertoy to present the sampling pattern. I’m using a signed distance field (SDF) calculation to drive the pattern visualization because it’s easy and quick to implement – although not efficient. 
The important part (i.e. the function that we will design in this blog post) is called disk. That’s the function which takes a sample index i and disk parameters L,S and computes the sampling point pi(x,y).

Naive Approach

We start with a straight-forward and naive approach to gain some intuition. Let’s fill the samples on a ring-by-ring basis: We start at the innermost ring, then add the base ring by equally distribute S points along the base ring perimeter. Then, we proceed to the next ring (assuming that all rings have equal spacing), until we filled a total of L rings. We will denote the ring radius by R(l) (radius depends on the ring-index / layer).

We have a total of NTotal=L·S samples and the index i can be rewritten as:

i = S·l + s

Where:

l = floor(i / S)
s = i mod S

For a given ring (or layer), l, we can easily determine its radius by remembering that we’re interested in a unit disk distribution. Hence, the outermost ring should correspond to the unit circle, R(L-1)=1. In other words, we’re always normalizing the outermost disk layer. 
We already decided that the innermost ring maps to the origin, R(0)=0. Finally, since we choose a uniform spacing between our concentric rings, we can write a linear equation given these constraints R(0)=0 and R(L-1)=1:

R(l) = l /(L – 1)

For a given layer, equal distribution of points on the ring can be done by equally spacing the polar angle along the [0,2π) segment:

θ(s)=2πs / S

Putting it all together, for a given sample i parametrized by (l,s) we can derive the sampling point using 2D polar coordinates as follows:

pl,s(x,y)=R(l) ·vec2(cos(θ(s)),sin( θ (s)))

And by substituting the above equations we get:

pi(x,y)=(floor(i/S)/(L-1))·vec2(cos(2π/S (i mod S)),sin(2π/S (i mod S)))

Here’s a shadertoy code and visualization based on this approach:

vec2 disk(int i, int L, int S)
{
    float r = float(i / S) / max(EPSILON, float(L-1));   
    float theta = 2.0 * PI * float(i % S) / float(S);
    
    return r * vec2(cos(theta), sin(theta));    
}

Listing 3: Naive Approach Disk Pattern (glsl code)

Figure 3: Naive Approach Disk Pattern (L=6, S=8)

Better Attempt

The naive approach did not yield the pattern we are after. Intuitively, outer rings should contain more samples because they have bigger circumferences. Furthermore, because R(0)=0, S number of sampling points overlap at the origin, resulting in an overstatement of the origin by S-1. This is obviously wrong since we should not sample any point more than once. Let’s change our approach to fix these issues. 

First, we’ll add an if statement to handle the innermost ring (l=0 ⇔ i=0) case by returning vec2(0,0) immediately (we will get rid of this branch towards the end of this blog post). 

Second, we’ll close the “outer ring gaps” (Figure 3) by doubling the number of samples between subsequent rings. That is, the base ring (l=1) would have S samples, the next ring (l=2) will have 2S samples, and so on (Figure 2). 

The number of samples for each ring varies. Let C(l) denote the number of samples for a given ring l:

C(0)=1 ; C(l)=Sl  ; 0<l<L

Let N(l) denote the the total number of samples up to (including) ring l:

N(l)=C(0)+C(1)+…+C(l)

N(l)=1+S+2S+…+lS

N(l)=1+S(1+2+…+l)

N(l)=1+Sl(l+1)/2

And note that the overall total number of samples, which we previously denoted by NTotal, is given by that outermost layer L-1. Using the above notation we can write NTotal=N(L-1), and we get:

NTotal=1+S(L-1)L/2

We can repeat the construction we made in our first naive attempt:

R(l)=l /(L-1)

θ(s)=2πs/C(l)=2s/Sl ; 0<l

pl,s(x,y)=R(l)·vec2(cos(θ(s)),sin( θ (s)))

Note that s can be easily derived from the sample index i and the ring-index l

s = i – N(l – 1) = i – 1 – S(l – 1)l/2 ; (Eq. 1)

The whole problem is now reduced for finding a ring-index l given a sample index i. We’re going to solve this problem by making a good initial guess for the ring-index based on the sample index and correct our initial guess as needed. We will see that we can do it at a surprisingly relatively low and simple computation cost.

Finding Layer Bounds

Let’s assume that the index i is on some layer, l’ (that is, the ring-index is l’). Since we already took care of the innermost layer l’=0 by returning the origin, we can assume i>0.

By the construction definition, the index i must conform to:

N(l’-1) ≤ i < N(l’)

1+S(l’-1)l’/2 ≤ i < 1+Sl'(l’+1)/2

l'(l’-1) ≤ 2(i-1)/S < l'(l’+1)

Let’s denote x = sqrt(2(i-1)/S) (note that x is well defined since i,S>0), we can write:

l'(l’-1) ≤ x2 < l'(l’+1); (Eq. 2)

Let’s find lower and upper bounds for the ring-index l’. The lower bound:

x2 < l'(l’+1) < (l’+1)2

x<l’+1

x-1<l’

And the upper bound:

(l’-1)2<l'(l’-1)≤x2

l’-1<x

l'<x+1

Putting it all together, we can “sandwich” the unknown ring-index l’ by an expression that depends only on x and therefore i:

x-1<l'<x+1

Recall that l’ must be an integer. Let’s divide to two cases based on x:

x is a non-integer

If x is a non-integer, we can write:

ceil(x-1)≤l’ ≤ floor(x+1)

ceil(x)-1 ≤ l’ ≤ floor(x)+1

floor(x) ≤ l’ ≤ ceil(x)

Hence the ring-index l’ can be either floor(x) or ceil(x).

x is an integer

If x is an integer, l’ can be x-1,x or x+1. Let’s find a better condition for this case. Recall the bounds from the last section (Eq. 2):

l'(l’-1)≤x2<l'(l’+1)

The only integer that satisfies these conditions is l’. Therefore, x=l’.

Although the initial analysis gave three options for l’x-1,x or x+1. We ended up proving that l’=x in this case. Note that if x is an integer, we can also write:

l’ = x = floor(x) = ceil(x)

Initial guess and Correction

From the last section we know that there are only two candidates at most to consider: floor(x) and ceil(x) (If x is an integer, these two candidates are actually the same, so we’re down to only one candidate).

Let’s take an initial guess of l’=floor(x)=floor(sqrt(2(i-1)/S))

We can verify if i is indeed mapped to l’ ring-index by checking its boundary condition: 

i<N(l’)=1+Sl'(l’+1)/2

x2=2(i-1)/S<l'(l’+1)

If this condition doesn’t hold, we increment our initial guess to get floor(x)+1=ceil(x)

Since there are only at most two candidates for the ring-index to consider, the above condition is sufficient.

Here’s a shadertoy code and visualization based on this approach:

vec2 disk(int i, int L, int S)
{
    if (i == 0) // first branch
    {
        return vec2(0.0); // innermost ring (l=0)
    }
    float x2 = float(2 * (i - 1)) / float(S);    	
    int l = int(floor(sqrt(x2))); // initial guess        
    if (x2 >= float(l * ( l + 1)))  // second branch
    {
		++l; // guess correction
    }
    int s = i - N(l-1, S);
    
    float r = float(l) / float(L-1);   
    float theta = 2.0 * PI * float(s) / float(S*l);
    
    return r * vec2(cos(theta), sin(theta));    
}

Listing 4: Guess-Based Approach Disk Pattern (glsl code)

Figure 4: Guess-Based Approach Disk Pattern (L=6, S=8)

Code Robustness and Simplification

In this section we will rely on the suggested implementation. Please look at the previous glsl code snippet (Figure 4) before reading on.

As promised, here are a few things we can do to get rid of branches, improve robustness and simplify the code. 

Let’s address the first branch. For i=0 (or l=0), we notice that x2<0. If we clamp our initial guess to 0 (by making sure we’re not taking a square-root of a negative number – which has undefined behavior in glsl), and notice that x2<l(l+1)=0 always evaluates true, the initial guess would stay at 0, yielding the right ring-index. To sum up, by clamping the term under the square-root we can omit this branch.

For the second branch, we would like to avoid accumulated floating point computational errors by using integral expressions instead of floating point. Recall x definition x=sqrt(2(i-1)/S) and since S>0 we can change the second conditional as follows:

if (x2≥l(l+1)) ++l;

if (x2S l(l+1)S) ++l;

if (2(i-1) l(l+1)S) ++l;

Moreover, we could have changed this branch by using a step function:

if (2(i-1)≥l(l+1)S) ++l; ⇔ l+=step(l(l+1)S,2(i-1));

But this won’t really eliminate the branch (in fact, on some GPU architectures it might even be less performant). This is merely a coding style preference and I have decided to leave the conditional as is

Let’s also simplify θ as follows (using Eq. 1):

θ = 2πs/Sl = 2π(i-N(l-1))/Sl = 2πi/Sl – 2πN(l-1)/Sl

cos(θ) and sin(θ) will span C(l)=Sl points for the layer l. Hence, applying any phase of the form 2πk/Sl (for any integer k) will span the same points. Since we don’t care about the internal ordering of the samples within the layer, we can just take:

θ = 2πi/Sl

The additional changes I made were to rewrite some terms in MAD form, save registers, remove the floor function (as casting to int will use a floor) and add ε when needed to increase code robustness (avoid division by 0).

Here’s the final shadertoy code and visualization

/* 
Returns the ith sampling location of a normalized disk sampling pattern with 0 < L rings (layers) and 0 < S base-ring samples. 0 <= i < 1 + SL(L+1)/2.
*/
vec2 disk(int i, int L, int S)
{    
    float fS = float(S);    
    int x2S = 2 * i - 2;
    int l = int(sqrt(max(0.0, float(x2S) / fS)));    
    if (x2S >= ((l * l + l) * S)) ++l;	
    float fl = float(l);
    
    float r = fl / (float(L-1) + EPSILON);       
    float theta = (2.0 * PI * float(i)) / (fS * fl + EPSILON);
    
    return r * vec2(cos(theta), sin(theta));    
}

Listing 5: Normalized Disk Pattern Generation (glsl code)

Figure 5a: Normalized Uniform Disk Pattern (L=4, S=12)

Figure 5b: Normalized Uniform Disk Pattern (L=10, S=12)

Figure 5c: Normalized Uniform Disk Pattern (L=4, S=26)

Figure 5d: Normalized Uniform Disk Pattern (L=10, S=26)

Closing Thoughts

It goes without saying that if either L or S are known, one can optimize the shader by hardcoding their values. If L,S are both known and the total number of samples NTotal is relatively small – the entire pattern can be coded as a shader immediate constant buffer (similar to this table used by Unity).

Another interesting idea is to reuse the “guess approach” to find the best disk pattern within a sampling budget. Suppose you’re limited by using a maximum of T taps and you want to find the best disk sampling pattern within this budget. That is, given S and T, find L such that NTotal = N(L-1)T, or in other words, find the number of rings within the given tap budget. You can first repeat the guessing process we did to find L, followed by our “regular” disk sampling pattern generation. Here’s a shadertoy demonstration.

If concentric rings aren’t required, it is possible to apply any shaping function (e.g. pow) for the radius before returning the sampling point. Ring radius change will give different weight for the further / closer samples and can affect, for example, the DOF blur effect. It is also possible to reduce the total number of required taps NTotal by “pushing” the samples towards the verge of the 2D unit circle (e.g. r = pow(r, 0.5)) while reducing the ring count L, thus reducing NTotal = N(L-1).

A final closing remark about one of the expensive instructions in my approach – the square-root evaluation of x2. Note that we don’t really care about the actual value of x itself, we’re only interested in the rounded-down square-root value (i.e. floor(..)).It is possible to use a “good-enough” rounded-down square-root approximation, one that gives the precise square-root integral part, but is allowed to have an error in the fractional part. For more information, please check the iterative implementation flavor of the integer square root algorithm.