Tag: graphics

Fast Inverse Smoothstep

Fast Inverse Smoothstep

Motivation

Smoothstep is useful in computer graphics and sometimes the inverse effect is needed. Iñigo Quilez derived a closed formula for inverse smoothstep:

invsmoothstep(x) = smoothstep-1(x) =0.5 – sin(arcsin(1-2x)/3)

Figure 1: smoothstep (blue) and invsmoothstep (yellow) (source)

While no one can deny the beauty of this closed form, I was wondering if we could find another practical way to compute this function without using sine and arc-sine built-in functions (which potentially can be ALU prohibitive).

As it turns out, we can. The rest of this article shows how to do it step-by-step.

Cubic Equations

We start the derivation similar to Iñigo. Note that smoothstep (between 0 to 1, excluding clamping) could be re-written implicitly as:

y = smoothstep(x) = x2(3-2x)

-2x3+3x2-y=0 (Eq. 1)

The cubic disciminant is:

Q = -1/4 , R = (1-2y)/8 => Δ = y(y-1)/16

And since y∈[0,1], the discriminant is always negative in this range, except y=0 and y=1 for which the discriminant is zero. Hence, the equation has 3 different real roots, except the boundaries (zero and one) – for which it has at most two real roots. The conclusion is that we indeed deal with non-complex solutions and in the general case we have 3 roots.

Let’s continue by erasing the quadratic term by making a substitution using t (similar to Cardano’s formula):

x = t+0.5 => t3 – 3t/4 + (2y-1)/4 = 0
t(t2 – 3/4) + (2y-1)/4 = 0 (Eq. 2)

To summarize this section, I created a shadertoy demo that shows the cubic equations for (x,y)∈[-1,2]x[-1,1]. The green graph shows Eq. 1 and the blue graph shows Eq. 2. Roots are highlighted in white. Observe, for example, that for y=0.5, the root for Eq. 1 is x=0.5 and for Eq. 2 is t=0, and the substitution x=t+0.5 is correct as expected. This also confirms our intuition since we know that smoothestep(0.5) = 0.5, and therefore invsmoothstep(0.5) = 0.5.

Figure 2: invsmoothstep cubic equations and roots (green is Eq. 1, blue is Eq 2.)

Numerical Analysis

Now we’re going to solve Eq. 2 for t using Newton-Raphson method.

Define:

g(t) = t(t2 – 3/4) + (2y-1)/4

And compute its derivative:

g'(t) = 3t2-3/4 (Eq. 3)

Now, for a “good” guess tn we can update our next guess, tn+1, as follows:

tn+1 = tn– g(tn) / g'(tn) = tn – (t(t2 – 3/4) + (2y-1)/4) / (3t2-3/4)
tn+1 = tn – (t(4tn2-3) + 2y-1) / (12tn2-3) (Eq. 4)

The concerning part about this strategy is convergence. Picking the “wrong” initial guess will yield a wrong root, that is, a root outside of the t∈[-0.5,0.5] range. Recall that we’re interested in finding the root within the [-0.5,0.5] segment for Eq. 2. By examining the blue chart (Figure 2, see animation in shadertoy) we see that there’s always one root inside the segment we care about. Two roots converge at -0.5 and +0.5 for y=0 and y=1 respectively.

Thus, picking an initial guess of t0=0 will ensure that the closest root is indeed the root that we care about and the algorithm should converge to the right answer.

Putting it all together. we start with t0=0 and run the update step (Eq. 4) N times until the result converges. Finally, we return tN+0.5.

After experimenting a bit with this strategy, I found that 5 steps are sufficient to ensure reasonable convergence. Here’s the code:

float fastinvsmoothstep_naive(in float y)
{
  float yn = 2.0*y-1.0;
  float t = 0.0;
  t -= ( t*(4.0*t*t-3.0) + yn) / (12.0*t*t-3.0);
  t -= ( t*(4.0*t*t-3.0) + yn) / (12.0*t*t-3.0);
  t -= ( t*(4.0*t*t-3.0) + yn) / (12.0*t*t-3.0);
  t -= ( t*(4.0*t*t-3.0) + yn) / (12.0*t*t-3.0);
  t -= ( t*(4.0*t*t-3.0) + yn) / (12.0*t*t-3.0);
  return t + 0.5;  
}

Better Initial Guesses

We want to speed-up convergence by taking a reasonable first guess. To simplify, let’s first remove the constant term by assuming that y=0.5 in Eq. 2. Note that from Eq. 3 we see that the tangent slope at the origin is -3/4. Since we’re solving Eq. 2 for any y, we can intersect the tangent with the horizontal line -(2y-1)/4 to get a reasonable first approximation for the cubic function and our horizontal line intersection. That is:

t0 near= -(2y-1)/4 / -3/4 = (2y-1)/3

Alas, this approximation is only good for when the root is “close” to the origin. Using the tangent always underestimates the guess (the derivative is monotonic-increasing for both sides around the origin, see Eq. 3). Thus, if the solution resides towards the verge of the segment [-0.5,0.5] – it will require more iterations to converge using t0near. Indeed, using the suggested t0near above yields good results after just one iteration towards the center of the range, but requires additional iterations towards its boundaries. A better approach would be to correct for this underestimation by adding an additional factor to compensate if the solution goes towards the segment’s boundaries. This compensation shouldn’t hinder the fast convergence near the origin though.

Let’s repeat the same logic, this time searching for a “reasonable” guess using roots that lie outside the segment. Note that for y=0, ±sqrt(3/4) are also roots with a tangent slope of 3/2. There’s a subtle detail in here – we actually want to move “away” from the root that is outside the segment so we move towards the root which is inside the segment. So we move “backwards” – using the mirrored tangent slope of -3/2. Now, assuming abs(y) is “close enough” to the boundary at 0.5, we can intersect the horizontal line –(2y-1)/4 again to get:

t0 far= -(2y-1)/4 / -3/2 = (2y-1)/6

We are now ready to compensate for the underestimation by adding the t0 far term:

t0 = t0 near + αt0 far = (2y-1)/3 + α(2y-1)/6 = (2+α)(2y-1)/6

The α term controls the amount of overestimation we’re adding back in (again, relying on our algorithm to correct for the overestimation near the origin). We need to pick our α carefully as this additional factor t0far might overshoot and cause our numerical method to converge to a root outside [-0.5,0.5]. I made a shadertoy demo that animates α with just a single update step (Eq. 4). Note that for smaller values, α ≅ 0 – the boundaries aren’t converging fast enough. However, for larger values, α ≅ 1.0 – the boundaries overshoot.

Figure 3a: invsmoothestep (blue) fastinvsmoothstep (yellow) and error (red) for α∈[0,1] and a single update step

By eyeballing the graph, a good α value resides somewhere in [~0.6,~0.8]. I was satisfied with α = 0.7. Here’s the implementation code that uses just a single update step from the last section and a better initial guess based on the above strategy. Note that if better precision is required – one can run a second update step again.

float fastinvsmoothstep_linearguess(in float y)
{
  float yn = 2.0*y-1.0;
  float t = 0.45*yn;
  t -= ( t*(4.0*t*t-3.0) + yn) / (12.0*t*t-3.0);
  // Uncomment for increased precision
  //t -= ( t*(4.0*t*t-3.0) + yn) / (12.0*t*t-3.0); 
  return t + 0.5;  
}

Figure 3b: invsmoothestep (blue) fastinvsmoothstep (yellow) and error (red) for a single update step

Figure 3c: invsmoothestep (blue) fastinvsmoothstep (yellow) and error (red) for two update steps

Reducing Errors

The approximation looks good but we’re not done yet. A possible application for invsmoothstep could be a continuous transition such as an animation. It might be useful to preserve properties of the original invsmoothstep function at certain points of interest. For example, imagine that we use the curve to animate the position of an object. If the starting and ending points of the approximation don’t match the original curve exactly – we might get “jerky” movements as the object “snaps” to its new position at the beginning and the end of the animation sequence. This is not great.

In particular, we would like to preserve the beginning, ending and midpoint of the original curve as follows:

invsmoothstep(0.0)=0.0
invsmoothstep(0.5)=0.5
invsmoothstep(1.0)=1.0

How bad are the errors using our current implementation? I measured errors for the selected points using a 32 bit floating point on the CPU:

Error (#updates / y)y = 0.0y = 0.5y = 1.0
1 update step0.0245610.00.024561
2 update steps0.0121770.00.012177
3 update steps0.0060630.00.006063

Table 1: Error per interest point and number of update steps

We can see that the error is roughly decreasing by a multiple of 2 for each additional update step. Unfortunately, the errors are still quite high. We would like to redesign our implementation to have minimum error at these selected points. Ideally, the update step should return exactly -0.5,0.0,0.5 for input y of 0.0,0.5,1.0 respectively (we made the substitution of x=t+0.5, and we calculated t in our implementation).

Recall our update step (Eq. 4) and solve for tN given our selected y points to “back-calculate” what the input tN-1 for the update step would have been. So result of the equation yields the desired result discussed earlier:

Figure 4: Solving the update equation for a desired output of +0.5 (red), 0.0 (blue) and -0.5 (green).

We can see that we need to constrain our initial guess t0 according to:

t0(y=0) = -0.5 or +0.25
t0(y=0.5) = 0
t0(y=1)= +0.5 or -0.25
(Eq. 5)

To preserve a symmetric solution, we also require that t0(y) = -t0(1-y). Meaning we either pick t0(0)=-0.5, t0(1)=0.5 or t0(0)=-0.25, t0(1)=0.25. We will also require a continuous and smooth t0(y) function, to support a pleasant, “non-jerky” animation curve.

As a side note, now we can explain why we got a zero error for y=0.5 in Table 1. In the last section we were using a linear function with no offset for the initial guess, so we conformed to the requirement t0(0.5) = 0 and therefore we had zero errors for this point.

Although these constraints can be satisfied using a linear equation, unfortunately, the results don’t look good. Let’s try instead a “banded” curve that satisfies the requirements. Let’s use a signed power curve as follows:

t0(y) = k sign(2y-1) abs(2y-1)p

Note that by either picking k = 0.5 or -0.25, the requirements above are satisfied for any p≠0. Here’s a shadertoy demo that animates the solution for various p.

Figure 5a: invsmoothestep with an animated power curved initial guess (k=0.5, p∈[0,20], single update step)

Figure 5b: invsmoothestep with a power curved initial guess (k=-0.25, p∈[0,20], single update step)

Again, by eyeballing the graphs, and restricting ourselves to pick an integral p value to save some additional ALU by avoid using the pow function, I’ve decided to go with p=4,3 for k=0.5,-0.25 respectively. Here are the results and source code:

float fastinvsmoothstep_edgeoption1(in float y)
{
  float yn = 2.0*y-1.0;
  float t = 0.5*abs(yn)*yn*yn*yn;
  t -= ( t*(4.0*t*t-3.0) + yn) / (12.0*t*t-3.0);
  // Uncomment for increased precision
  //t -= ( t*(4.0*t*t-3.0) + yn) / (12.0*t*t-3.0);
  return t + 0.5;     
}

float fastinvsmoothstep_edgeoption2(in float y)
{
  float yn = 2.0*y-1.0;
  float t = -0.25*yn*yn*yn;
  t -= ( t*(4.0*t*t-3.0) + yn) / (12.0*t*t-3.0);
  // Uncomment for increased precision
  //t -= ( t*(4.0*t*t-3.0) + yn) / (12.0*t*t-3.0);
  return t + 0.5; 
}

Figure 5c: invsmoothestep with a power curved initial guess (k=0.5, p=4, single update step)

Figure 5d: invsmoothestep with a power curved initial guess (k=0.5, p=4, two update steps)

Figure 5e: invsmoothestep with a power curved initial guess (k=-0.25, p=3, single update step)

Figure 5f: invsmoothestep with a power curved initial guess (k=-0.25, p=3, two update steps)

Indeed, running this new implementation on the CPU confirms that it holds the restrictions and we get zero errors for the points of interest (i.e. y=0,0.5,1), regardless of the number of update steps.

Which implementation is better? For a single update step, the curves are very similar but the second option is slightly better regarding its ALU usage. If one can afford multiple update steps, however, the first option gives better results near the boundaries when multiple update steps are being used (compare Figure 5d and 5f). Observe also the difference in the curves’ slope close to the boundaries. We will revisit these two options in the conclusions section again.

Note that we analyzed a single update step. There’s a subtle assumption here that applying additional steps would not introduce any additional errors (at y=0,0.5,1) and can only improve results. This requires a short discussion. Please refer back to Figure 4 and Eq. 4 and observe that the initial guess we picked for option 1 is actually a fixed point for the update function (in Figure 4, for each of the graphs, the desired output of the update step is also a fixed point). For option 2, after the first update step the points of interest will reach the fixed points as well. In other words:

n>0 tn(0) = -0.5
n>0 tn(0.5) = 0
n>0 tn(1) = +0.5

Thus additional update steps do not introduce additional errors for these particular solutions.

Stability and Robustness

We have already established that the algorithm will converge to the right answer in [0,1] and learned how to bring errors to zero at certain strategic points. We now turn our attention to discuss numerical instability issues.

Recall the update step (Eq. 4):

tn+1 = tn – (t(4tn-3) + 2y-1) / (12tn2-3)

We won’t have numerical issues as long as the denominator isn’t too close to zero, that is tn±0.5.

Let’s consider our first solution, the initial guess is t0 = 0.45(2y-1) and substituting it in the update step (Eq. 4) and denoting x(y)=0.45(2y-1) we get:

f(x)= t1 = x – (x(4x2-3) + x/0.45) / (12x2-3)

Since x(y) and f(x) are monotonic-increasing the conclusion is that f(y) is also monotonic-increasing. I derived f(x) and made sure the derivative is always positive for x∈[-0.45,0.45], but you can also look at the following graph to convince yourself.

Figure 6: single update step with initial guess of 0.45(2y-1)

Since f(y) is continuous and monotonic-increasing, it’s enough to check the edge of the segment to find maximum and minimum. Now we can compute:

t1max = t1(y=1.0) = 0.475438
t1min = t1(y=0.0) = -0.475438

Similar logic shows that when a second update step is used, t2(y) is also continuous monotonic-increasing. Therefore:

t2max = t2(y=1.0) = 0.487822
t2min = t2(y=1.0) = -0.487822

So to wrap things up, even after two updates, the minimum denominator (in absolute value) is at most 0.144351, which is not close enough to zero. Therefore, the first solution is numerically stable.

For the second solution, we can use a similar construction to prove monotonous (to be accurate, the second option, with k = -0.25, is actually monotonic-decreasing, this is still a sufficient argument for checking only the boundaries of the segment).

So again, we only need to check the extremum. The alert reader might have noticed that we actually designed the update step to return -0.5,+0.5 for input of 0.0,1.0 respectively. How do we reconcile it with the fact that the denominator in Eq. 4 will be zero?

Recall from Figure 4 that for the particular cases of y=0.0,1.0, the corresponding update step functions have removable discontinuity at these particular points. It’s easy to see it by applying L’hôpital rule, but I also provided a visualization (Figure 7) to show the update step error as a function of the initial guess. The animation shows the function for a particular input.

Figure 7: error function of the update step for various y

Since we’re getting too close to a discontinuity we should correct to avoid a possible division by zero. Let’s add a small ε to the denominator for this case. It’s probably better to actually subtract an ε in this case to keep the denominator sign (the denominator 12x2-3 is always non-positive in our [-0.5,0.5] range).

Note that the update step’s nominator equals exactly zero for our points of interest (y=0.0,0.5,1.0). That proves that changing the update equation as suggested above didn’t introduce any errors for the evaluation of our interest points and so we’re free to use the ε in the denominator.

Best of Both Worlds

When comparing the overall errors after a single update step from the three solutions we found so far (compare Figure 3c, 5c and 5e), we can observe that even though the linear initial guess gives the lowest errors in the entire segment, it does not guarantee zero errors at the boundaries. Can we do better?

A simple idea would be to lerp between the linear initial guess and the other two solutions in a way that combines the benefits; having overall reduced errors with zero errors on the boundaries. To do so, we write the following two options:

t0combined(y) = lerp(0.45(2y-1), 0.5sign(2y-1)abs(2y-1)4, β(y))
t0combined(y) = lerp(0.45(2y-1), -0.25sign(2y-1)abs(2y-1)3, β(y))
(Eq. 6)

We will also constrain ourselves to continuous β(y) with symmetry around the middle point. That is β(0.5+y) = β(0.5-y). As we’ve seen earlier, in order to get zero errors on the boundary, we have to satisfy Eq. 5, therefore:

β(y=0) = 1
β(y=1) = 1

Note that we get zero error in the middle for any β(y=0.5).

In terms of stability and robustness, since we’re picking a linear combination of two stable solutions (i.e. both solutions are in [-0.5,0.5]), the combined solution will also be stable. Hence the denominator of the update step (Eq. 4) cannot be zero. That means that the solution is stable for any β(y) as long as we decrease ε in the denominator (see last section).

Let’s pick β(y). Intuitively, we want to give more weight to the linear guess everywhere except the boundaries, in which we want to give full weight to the second guess. Let’s pick again a power curve function as follows:

β(y) = abs(2y-1)r

Intuitively, it looks like the second option in Eq. 6 (for k=-0.25) won’t give good results since the difference between the two lerped functions is quite large. Indeed, the overall solution doesn’t look good: the approximation overshoots / undershoots where the combined initial guess evaluates to zero. That leaves us with the first equation in Eq. 6. Here’s a shadertoy animation of the solution for different r values:

Figure 8a: invsmoothestep with a combined initial guess (r∈[0,20], single update step)

Again, by eyeballing the graphs and picking an integral r value, I’ve decided to go with p=3. We can also expand the lerp and rearrenge terms using MAD instructions to obtain:

t0combined(y) = lerp(0.45(2y-1), 0.5sign(2y-1)abs(2y-1)4, abs(2y-1)3)
t0combined(y) = 0.45(2y-1) + 0.5(2y-1) (abs(2y-1)6 – 0.9 abs(2y-1)3)

Here are the results and source code:

float fastinvsmoothstep_combined(in float y)
{
  float yn = 2.0*y-1.0;
  float absyn3 = abs(yn)*yn*yn;
  float t = 0.45*yn+0.5*yn*(absyn3*absyn3-0.9*absyn3);
  t -= ( t*(4.0*t*t-3.0) + yn) / (12.0*t*t-3.0);
  // Uncomment for increased precision
  //t -= ( t*(4.0*t*t-3.0) + yn) / (12.0*t*t-3.0);
  return t + 0.5;     
}

Figure 8b: invsmoothestep with a combined initial guess (r=3, single update step)

Figure 8c: invsmoothestep with a combined initial guess (r=3, two update steps)

Conclusions

In this article I presented novel, efficient and robust methods for computing the inverse-smoothstep function. I presented several flavors and discussed theoretical trade-offs. To compare these methods in practice and draw conclusions, I created this sandbox to demonstrate fast and slow invsmoothstep-based animations for position, scale and fade.

Figure 9: invsmoothstep based animations using different methods (purple – ground-truth, olive – linear guess, pink – power guess (k=0.5), light blue – power guess (k=-0.25), green – combined guess)

I encourage the reader to explore the sandbox to better understand the differences. Overall, I think that the linear guess is both efficient and plausible when used in reasonable animation speeds and should be the go-to implementation. If invsmoothstep must be accurate for 0.0 and 1.0, I suggest using the power guess for k=0.5 since the additional ALU required for the combined guess does not justify the additional quality gain, in my opinion.

We still have issues really close to the boundaries for extremely slow animations. This is an inherent limitation of our methods since we’re using a continuous guessing function that gets close to the discontinuity point at the edge (review Figure 7). Recall the two options we found earlier for zero boundary errors (compare Figure 5c and Figure 5e). Now we can see another difference between the approximations – they both reach invsmoothstep from different sides of the curve! In other words, the implementation for k = 0.5 would reach the boundary “slower” than invsmoothstep, while the implementation for k = -0.25 would reach invsmoothstep faster. In practice, it means that for k = 0.5 we will always get a “jerk” close to the boundaries, while for k = -0.25, we will reach the boundary smoother, but “too fast”.

Reducing the “abrupt jump” for k = 0.5 can be achieved by running additional update steps. Note that (due to the discontinuity) the “abrupt jump” will always be there – it will just be pushed closer to the boundary itself. That’s why this issue is being highlighted in a super-slow animation (refer again to Figure 9). On the other hand, for k = -0.25, we have a much smoother rate of change, although it’s a bit faster compared to the original invsmoothstep. In Figure 9, I think that this difference is exaggerated because we are comparing the implementations side-by-side. If a slow and pleasant animation is required, the k = -0.25 solution might be a reasonable compromise.

Based on the previous sections, it goes without saying that if increased precision is required and more ALU cycles are allowed, all implementations can increase quality by running the update step more times without jeopardizing stability or robustness (see sandbox for demonstration).

Finally, the implementation you pick should be based on your requirements from approximating invsmoothstep. For completion, here’s the final source code and implementation:

// NOTE: Plausible but doesn't reach boundaries correctly.
float fastinvsmoothstep(in float y)
{
  float yn = 2.0*y-1.0;
  float t = 0.45*yn;
  t -= ( t*(4.0*t*t-3.0) + yn) / (12.0*t*t-3.0);
  return t + 0.5;  
}

// NOTE: Abrupt changes towards true boundaries.
float fastinvsmoothstep(in float y)
{
  float yn = 2.0*y-1.0;
  float t = 0.5*abs(yn)*yn*yn*yn;
  t -= ( t*(4.0*t*t-3.0) + yn) / (12.0*t*t-3.0-EPSILON);
  return t + 0.5;     
}

// NOTE: Smoother but faster changes towards true boundaries.
float fastinvsmoothstep(in float y)
{
  float yn = 2.0*y-1.0;
  float t = -0.25*yn*yn*yn;
  t -= ( t*(4.0*t*t-3.0) + yn) / (12.0*t*t-3.0);
  return t + 0.5;     
}

Closing Thoughts

I used an ad-hoc approach for picking α, p and r for the suggsested solutions. It’s possible to use more rigorous methods by defining a metric (e.g. RMSE) and finding optimal parameters (e.g. by using simulated annealing). Though I’m fairly satisfied with the parameters I found – I would greatly appreciate it if you shared with me any parameters with better results so that I can update this post accordingly.

Let’s circle back to the beginning of this article and recall the analytic form of invsmoothstep. By rearranging a few terms, it’s possible to use the method developed here for fast computation of an expression that has the form sin(arcsin(x)/3). This expression might be useful for other applications such as accelerating other cubic equation solvers. I haven’t yet experimented with this idea though.

Normalized Uniform Disk Pattern

Normalized Uniform Disk Pattern

Motivation

In computer graphics, one of the approaches to simulate a depth-of-field (DOF) effect with bokeh is to “mix” nearby pixel values. The idea is to use a sampling pattern to retrieve nearby pixel values (usually referred to as “scatter-as-gather”). One option is to use a normalized sampling pattern that determines the bokeh shape, while the circle-of-confusion (COC) is used to dictate the scale of this pattern (in texels). That way, more out-of-focus pixels will tend to gather pixel values that are further away, increasing the out-of-focus blur and bokeh effect. 

Figure 1: DOF and Bokeh Effect

The goal of this article is to introduce a normalized symmetric disk sampling computation method that will yield a circular bokeh shape. I will not explain how to simulate the DOF effect using the pattern in this article. 

My initial motivation for this derivation was to understand the math behind a sampling pattern such as Unity’s disk kernel. If you’re interested in seeing an example of a DOF effect that uses this pattern, I highly encourage you to read the excellent Catlike Coding Depth of Field tutorial.

Problem Statement

Let’s formalize the problem at hand. We want to symmetrically sample the normalized 2D circle using a disk pattern. That is, equally-spaced concentric rings with equally distributed sample points (referred to as samples going forward) on their perimeters.

Let L be the total number of rings (layers). We will be using l to denote a running ring index, where 0≤l<L. Note that the innermost ring (l = 0) always has one sample exactly at the origin (i.e. when simulating DOF, this sample corresponds to the current pixel being blurred). 

Let S be the base-ring size. That is, the number of samples in the second innermost ring (l = 1).

Let NTotal be the total number of samples ( NTotal depends on L,S). We will be using i to denote a sample index, where 0≤i<NTotal . NTotal is usually used as a proxy for the sampling pattern efficiency (i.e. number of taps required to blur each pixel).

We parameterize our disk sampling pattern by L>0 and S>0, which will dictate a specific NTotal. 

Figure 2: Disk Pattern Parametrization and Notation

To make the code highly parallelizable, I want to find a way to compute each sample point directly without introducing any dependency on other samples. 

Hence, our final goal can be stated as: given a disk parametrization L,S and a sample index 0≤i<NTotal – find the 2D normalized disk sampling point pi(x,y) with minimum machine instructions.

About the Demos

I will be using shadertoy to present the sampling pattern. I’m using a signed distance field (SDF) calculation to drive the pattern visualization because it’s easy and quick to implement – although not efficient. 
The important part (i.e. the function that we will design in this blog post) is called disk. That’s the function which takes a sample index i and disk parameters L,S and computes the sampling point pi(x,y).

Naive Approach

We start with a straight-forward and naive approach to gain some intuition. Let’s fill the samples on a ring-by-ring basis: We start at the innermost ring, then add the base ring by equally distribute S points along the base ring perimeter. Then, we proceed to the next ring (assuming that all rings have equal spacing), until we filled a total of L rings. We will denote the ring radius by R(l) (radius depends on the ring-index / layer).

We have a total of NTotal=L·S samples and the index i can be rewritten as:

i = S·l + s

Where:

l = floor(i / S)
s = i mod S

For a given ring (or layer), l, we can easily determine its radius by remembering that we’re interested in a unit disk distribution. Hence, the outermost ring should correspond to the unit circle, R(L-1)=1. In other words, we’re always normalizing the outermost disk layer. 
We already decided that the innermost ring maps to the origin, R(0)=0. Finally, since we choose a uniform spacing between our concentric rings, we can write a linear equation given these constraints R(0)=0 and R(L-1)=1:

R(l) = l /(L – 1)

For a given layer, equal distribution of points on the ring can be done by equally spacing the polar angle along the [0,2π) segment:

θ(s)=2πs / S

Putting it all together, for a given sample i parametrized by (l,s) we can derive the sampling point using 2D polar coordinates as follows:

pl,s(x,y)=R(l) ·vec2(cos(θ(s)),sin( θ (s)))

And by substituting the above equations we get:

pi(x,y)=(floor(i/S)/(L-1))·vec2(cos(2π/S (i mod S)),sin(2π/S (i mod S)))

Here’s a shadertoy code and visualization based on this approach:

vec2 disk(int i, int L, int S)
{
    float r = float(i / S) / max(EPSILON, float(L-1));   
    float theta = 2.0 * PI * float(i % S) / float(S);
    
    return r * vec2(cos(theta), sin(theta));    
}

Listing 3: Naive Approach Disk Pattern (glsl code)

Figure 3: Naive Approach Disk Pattern (L=6, S=8)

Better Attempt

The naive approach did not yield the pattern we are after. Intuitively, outer rings should contain more samples because they have bigger circumferences. Furthermore, because R(0)=0, S number of sampling points overlap at the origin, resulting in an overstatement of the origin by S-1. This is obviously wrong since we should not sample any point more than once. Let’s change our approach to fix these issues. 

First, we’ll add an if statement to handle the innermost ring (l=0 ⇔ i=0) case by returning vec2(0,0) immediately (we will get rid of this branch towards the end of this blog post). 

Second, we’ll close the “outer ring gaps” (Figure 3) by doubling the number of samples between subsequent rings. That is, the base ring (l=1) would have S samples, the next ring (l=2) will have 2S samples, and so on (Figure 2). 

The number of samples for each ring varies. Let C(l) denote the number of samples for a given ring l:

C(0)=1 ; C(l)=Sl  ; 0<l<L

Let N(l) denote the the total number of samples up to (including) ring l:

N(l)=C(0)+C(1)+…+C(l)

N(l)=1+S+2S+…+lS

N(l)=1+S(1+2+…+l)

N(l)=1+Sl(l+1)/2

And note that the overall total number of samples, which we previously denoted by NTotal, is given by that outermost layer L-1. Using the above notation we can write NTotal=N(L-1), and we get:

NTotal=1+S(L-1)L/2

We can repeat the construction we made in our first naive attempt:

R(l)=l /(L-1)

θ(s)=2πs/C(l)=2s/Sl ; 0<l

pl,s(x,y)=R(l)·vec2(cos(θ(s)),sin( θ (s)))

Note that s can be easily derived from the sample index i and the ring-index l

s = i – N(l – 1) = i – 1 – S(l – 1)l/2 ; (Eq. 1)

The whole problem is now reduced for finding a ring-index l given a sample index i. We’re going to solve this problem by making a good initial guess for the ring-index based on the sample index and correct our initial guess as needed. We will see that we can do it at a surprisingly relatively low and simple computation cost.

Finding Layer Bounds

Let’s assume that the index i is on some layer, l’ (that is, the ring-index is l’). Since we already took care of the innermost layer l’=0 by returning the origin, we can assume i>0.

By the construction definition, the index i must conform to:

N(l’-1) ≤ i < N(l’)

1+S(l’-1)l’/2 ≤ i < 1+Sl'(l’+1)/2

l'(l’-1) ≤ 2(i-1)/S < l'(l’+1)

Let’s denote x = sqrt(2(i-1)/S) (note that x is well defined since i,S>0), we can write:

l'(l’-1) ≤ x2 < l'(l’+1); (Eq. 2)

Let’s find lower and upper bounds for the ring-index l’. The lower bound:

x2 < l'(l’+1) < (l’+1)2

x<l’+1

x-1<l’

And the upper bound:

(l’-1)2<l'(l’-1)≤x2

l’-1<x

l'<x+1

Putting it all together, we can “sandwich” the unknown ring-index l’ by an expression that depends only on x and therefore i:

x-1<l'<x+1

Recall that l’ must be an integer. Let’s divide to two cases based on x:

x is a non-integer

If x is a non-integer, we can write:

ceil(x-1)≤l’ ≤ floor(x+1)

ceil(x)-1 ≤ l’ ≤ floor(x)+1

floor(x) ≤ l’ ≤ ceil(x)

Hence the ring-index l’ can be either floor(x) or ceil(x).

x is an integer

If x is an integer, l’ can be x-1,x or x+1. Let’s find a better condition for this case. Recall the bounds from the last section (Eq. 2):

l'(l’-1)≤x2<l'(l’+1)

The only integer that satisfies these conditions is l’. Therefore, x=l’.

Although the initial analysis gave three options for l’x-1,x or x+1. We ended up proving that l’=x in this case. Note that if x is an integer, we can also write:

l’ = x = floor(x) = ceil(x)

Initial guess and Correction

From the last section we know that there are only two candidates at most to consider: floor(x) and ceil(x) (If x is an integer, these two candidates are actually the same, so we’re down to only one candidate).

Let’s take an initial guess of l’=floor(x)=floor(sqrt(2(i-1)/S))

We can verify if i is indeed mapped to l’ ring-index by checking its boundary condition: 

i<N(l’)=1+Sl'(l’+1)/2

x2=2(i-1)/S<l'(l’+1)

If this condition doesn’t hold, we increment our initial guess to get floor(x)+1=ceil(x)

Since there are only at most two candidates for the ring-index to consider, the above condition is sufficient.

Here’s a shadertoy code and visualization based on this approach:

vec2 disk(int i, int L, int S)
{
    if (i == 0) // first branch
    {
        return vec2(0.0); // innermost ring (l=0)
    }
    float x2 = float(2 * (i - 1)) / float(S);    	
    int l = int(floor(sqrt(x2))); // initial guess        
    if (x2 >= float(l * ( l + 1)))  // second branch
    {
		++l; // guess correction
    }
    int s = i - N(l-1, S);
    
    float r = float(l) / float(L-1);   
    float theta = 2.0 * PI * float(s) / float(S*l);
    
    return r * vec2(cos(theta), sin(theta));    
}

Listing 4: Guess-Based Approach Disk Pattern (glsl code)

Figure 4: Guess-Based Approach Disk Pattern (L=6, S=8)

Code Robustness and Simplification

In this section we will rely on the suggested implementation. Please look at the previous glsl code snippet (Figure 4) before reading on.

As promised, here are a few things we can do to get rid of branches, improve robustness and simplify the code. 

Let’s address the first branch. For i=0 (or l=0), we notice that x2<0. If we clamp our initial guess to 0 (by making sure we’re not taking a square-root of a negative number – which has undefined behavior in glsl), and notice that x2<l(l+1)=0 always evaluates true, the initial guess would stay at 0, yielding the right ring-index. To sum up, by clamping the term under the square-root we can omit this branch.

For the second branch, we would like to avoid accumulated floating point computational errors by using integral expressions instead of floating point. Recall x definition x=sqrt(2(i-1)/S) and since S>0 we can change the second conditional as follows:

if (x2≥l(l+1)) ++l;

if (x2S l(l+1)S) ++l;

if (2(i-1) l(l+1)S) ++l;

Moreover, we could have changed this branch by using a step function:

if (2(i-1)≥l(l+1)S) ++l; ⇔ l+=step(l(l+1)S,2(i-1));

But this won’t really eliminate the branch (in fact, on some GPU architectures it might even be less performant). This is merely a coding style preference and I have decided to leave the conditional as is

Let’s also simplify θ as follows (using Eq. 1):

θ = 2πs/Sl = 2π(i-N(l-1))/Sl = 2πi/Sl – 2πN(l-1)/Sl

cos(θ) and sin(θ) will span C(l)=Sl points for the layer l. Hence, applying any phase of the form 2πk/Sl (for any integer k) will span the same points. Since we don’t care about the internal ordering of the samples within the layer, we can just take:

θ = 2πi/Sl

The additional changes I made were to rewrite some terms in MAD form, save registers, remove the floor function (as casting to int will use a floor) and add ε when needed to increase code robustness (avoid division by 0).

Here’s the final shadertoy code and visualization

/* 
Returns the ith sampling location of a normalized disk sampling pattern with 0 < L rings (layers) and 0 < S base-ring samples. 0 <= i < 1 + SL(L+1)/2.
*/
vec2 disk(int i, int L, int S)
{    
    float fS = float(S);    
    int x2S = 2 * i - 2;
    int l = int(sqrt(max(0.0, float(x2S) / fS)));    
    if (x2S >= ((l * l + l) * S)) ++l;	
    float fl = float(l);
    
    float r = fl / (float(L-1) + EPSILON);       
    float theta = (2.0 * PI * float(i)) / (fS * fl + EPSILON);
    
    return r * vec2(cos(theta), sin(theta));    
}

Listing 5: Normalized Disk Pattern Generation (glsl code)

Figure 5a: Normalized Uniform Disk Pattern (L=4, S=12)

Figure 5b: Normalized Uniform Disk Pattern (L=10, S=12)

Figure 5c: Normalized Uniform Disk Pattern (L=4, S=26)

Figure 5d: Normalized Uniform Disk Pattern (L=10, S=26)

Closing Thoughts

It goes without saying that if either L or S are known, one can optimize the shader by hardcoding their values. If L,S are both known and the total number of samples NTotal is relatively small – the entire pattern can be coded as a shader immediate constant buffer (similar to this table used by Unity).

Another interesting idea is to reuse the “guess approach” to find the best disk pattern within a sampling budget. Suppose you’re limited by using a maximum of T taps and you want to find the best disk sampling pattern within this budget. That is, given S and T, find L such that NTotal = N(L-1)T, or in other words, find the number of rings within the given tap budget. You can first repeat the guessing process we did to find L, followed by our “regular” disk sampling pattern generation. Here’s a shadertoy demonstration.

If concentric rings aren’t required, it is possible to apply any shaping function (e.g. pow) for the radius before returning the sampling point. Ring radius change will give different weight for the further / closer samples and can affect, for example, the DOF blur effect. It is also possible to reduce the total number of required taps NTotal by “pushing” the samples towards the verge of the 2D unit circle (e.g. r = pow(r, 0.5)) while reducing the ring count L, thus reducing NTotal = N(L-1).

A final closing remark about one of the expensive instructions in my approach – the square-root evaluation of x2. Note that we don’t really care about the actual value of x itself, we’re only interested in the rounded-down square-root value (i.e. floor(..)).It is possible to use a “good-enough” rounded-down square-root approximation, one that gives the precise square-root integral part, but is allowed to have an error in the fractional part. For more information, please check the iterative implementation flavor of the integer square root algorithm.