Author: iradicator

2D Surface Reconstruction: Marching Squares with Meta-balls

2D Surface Reconstruction: Marching Squares with Meta-balls

Motivation

SPH fluid simulator solves the Navier-Stokes equations by animating particles that represent water-droplets density (or other fluid) over time. To render the actual fluid, one first has to reconstruct the fluid surface from density data and then apply a rendering method using the fluid BxDF properties.

This is where Marching Cubes algorithm kicks in – it allows us to reconstruct surface from density as demonstrated in Figure 1. In fact, the algorithm works for any scalar-field that is usually being stored in a 3D grid (texture).

SPH Fluids in Computer Graphics

Figure 1a: SPH simulator before surface reconstruction (source).

SPH Fluids in Computer Graphics

Figure 1b: SPH simulator after surface reconstruction and rendering (source).

It’s Demo Time

Don’t worry if you don’t know how fluid simulation works. In this post we will implement a 2D version of the Marching Cubes algorithm (namely – Marching Squares).

The density in our demonstration is being generated using Meta-balls animated over time as suggested by Jamie Wong, though as I mentioned, any scalar-field would work. I strongly suggest that you first read his blog post where he covers the theory.

The goal of this post is to visualize the Marching Squares algorithm in practice and provide glsl code reference for others. To that end, the simulation runs entirely on the GPU and the code was not optimized for performance, as I wanted it to be readable by others. I also stayed consistent with the mentioned post coding schemes of the cells’ corners.

Without further ado, here’s the shadertoy:

Figure 2: Marching Squares with Meta-balls shadertoy demo.

Please watch at least one full cycle of the shader (76 seconds) prior to reading on.

Shader Breakdown

Let’s unpack this shader: when the value of the scalar-field, generated using Meta-balls, approaches 1, the shape perimeters turn purple. You can think of the Meta-balls generated data has a “height” pointing towards the screen where black is “far away” and white represents surface peaks (see Figure 3).

Figure 3: Meta-balls scalar-field data

Reconstructing the surface when we actually have an analytic expression seems superfluous, but it’s still useful: it will serve as a quick sanity check for the algorithm’s correctness and quality.

First, I voxelized the function into a low-resolution grid and recorded the value at each of the cells’ center (Figure 4). Voxels with scalar-field value of lower than 1 are considered “empty” and marked as red – otherwise they are “non-empty” (e.g. have fluid density) and are painted green. Note that due to under-sampling some of the “smaller” balls would potentially not be recorded as they travel through voxels. The net result is voxels “popping” in and out – this is expected and can be fixed by increasing the sampling rate (with additional memory cost).

Figure 4: Voxelizing the scalar-field.

Voxelizing the scalar-field on the GPU is definitely an overkill and the data should be provided via texture (e.g. MRI scan). However, I’ve decided to pack the whole simulation into a single fragment shader for simplicity and readability.

Moving on to the actual algorithm implementation. Here we have two flavors: surface reconstruction by using the “classic” 16 states of the corners (in cyan, in Figure 5a) and a reconstruction that uses the corners’ weights to linearly interpolate the voxel-boundary and surface intersection-point (in yellow, in Figure 5b). As you can see in Figures 5a,b, the linear-interpolation yields a “less-blocky” reconstruction and animates better as the Meta-balls pass through voxels.

Figure 5a: Marching Squares without linear-interpolation.

Figure 5b: Marching Squares with linear-interpolation.

In my implementation, the surface is being drawn directly on the GPU by calculating line equations and its SDF within each cell on a per-pixel basis. In practice though, this is the place where the implementation will feed the vertex shader with triangles and potentially merge vertices to compress vertex data buffering. The nice thing about drawing the surface contour in the fragment shader is that we avoid triangle facing issues (i.e. the ambiguity in cases 5 and 10 as mentioned here). Full implementation will have to address these issues.

As we saw in the last section, both reconstructions suffer from under-sampling issues. As expected, by increasing our grid resolution we get superior results (Figures 6a,b). Note that linear-interpolation yields better results at relatively cheap costs.

Figure 6a: Marching Squares without linear-interpolation and higher resolution.

Figure 6b: Marching Squares with linear-interpolation and higher resolution.

Observe also that the under-sampling issues are mitigated and the “small” Meta-balls are being recorded in voxels and their surfaces can be reconstructed. The higher volumetric resolution is coming, of course, with an additional memory and texture bandwidth costs and the best grid size depends on the application and the desired quality.

Finally, I’ll note that the algorithm can be easily extended to extract iso-surfaces with different iso-values as shown in Figure 7. This can be useful to generate something like contour maps (e.g. Figure 8).

Figure 7: Marching Squares with iso-surfaces of 0.5, 1.0, 2.0 and 3.0.

Figure 8: Example contour map.

Closing Thoughts

Now that we explained how the shader works, please try to re-watch a full animation cycle (76 seconds) below and take notes of the generated surfaces.

The next step would be, naturally, to extend the implementation for full 3D, reconstruct surface normals and shade the produced surface. This is left as an exercise for the reader.

I hope you found this visualization to be both helpful and inspiring. If you happen to extend this work, please let me know so I can link to your extensions.

Fast Inverse Smoothstep

Fast Inverse Smoothstep

Motivation

Smoothstep is useful in computer graphics and sometimes the inverse effect is needed. Iñigo Quilez derived a closed formula for inverse smoothstep:

invsmoothstep(x) = smoothstep-1(x) =0.5 – sin(arcsin(1-2x)/3)

Figure 1: smoothstep (blue) and invsmoothstep (yellow) (source)

While no one can deny the beauty of this closed form, I was wondering if we could find another practical way to compute this function without using sine and arc-sine built-in functions (which potentially can be ALU prohibitive).

As it turns out, we can. The rest of this article shows how to do it step-by-step.

Cubic Equations

We start the derivation similar to Iñigo. Note that smoothstep (between 0 to 1, excluding clamping) could be re-written implicitly as:

y = smoothstep(x) = x2(3-2x)

-2x3+3x2-y=0 (Eq. 1)

The cubic disciminant is:

Q = -1/4 , R = (1-2y)/8 => Δ = y(y-1)/16

And since y∈[0,1], the discriminant is always negative in this range, except y=0 and y=1 for which the discriminant is zero. Hence, the equation has 3 different real roots, except the boundaries (zero and one) – for which it has at most two real roots. The conclusion is that we indeed deal with non-complex solutions and in the general case we have 3 roots.

Let’s continue by erasing the quadratic term by making a substitution using t (similar to Cardano’s formula):

x = t+0.5 => t3 – 3t/4 + (2y-1)/4 = 0
t(t2 – 3/4) + (2y-1)/4 = 0 (Eq. 2)

To summarize this section, I created a shadertoy demo that shows the cubic equations for (x,y)∈[-1,2]x[-1,1]. The green graph shows Eq. 1 and the blue graph shows Eq. 2. Roots are highlighted in white. Observe, for example, that for y=0.5, the root for Eq. 1 is x=0.5 and for Eq. 2 is t=0, and the substitution x=t+0.5 is correct as expected. This also confirms our intuition since we know that smoothestep(0.5) = 0.5, and therefore invsmoothstep(0.5) = 0.5.

Figure 2: invsmoothstep cubic equations and roots (green is Eq. 1, blue is Eq 2.)

Numerical Analysis

Now we’re going to solve Eq. 2 for t using Newton-Raphson method.

Define:

g(t) = t(t2 – 3/4) + (2y-1)/4

And compute its derivative:

g'(t) = 3t2-3/4 (Eq. 3)

Now, for a “good” guess tn we can update our next guess, tn+1, as follows:

tn+1 = tn– g(tn) / g'(tn) = tn – (t(t2 – 3/4) + (2y-1)/4) / (3t2-3/4)
tn+1 = tn – (t(4tn2-3) + 2y-1) / (12tn2-3) (Eq. 4)

The concerning part about this strategy is convergence. Picking the “wrong” initial guess will yield a wrong root, that is, a root outside of the t∈[-0.5,0.5] range. Recall that we’re interested in finding the root within the [-0.5,0.5] segment for Eq. 2. By examining the blue chart (Figure 2, see animation in shadertoy) we see that there’s always one root inside the segment we care about. Two roots converge at -0.5 and +0.5 for y=0 and y=1 respectively.

Thus, picking an initial guess of t0=0 will ensure that the closest root is indeed the root that we care about and the algorithm should converge to the right answer.

Putting it all together. we start with t0=0 and run the update step (Eq. 4) N times until the result converges. Finally, we return tN+0.5.

After experimenting a bit with this strategy, I found that 5 steps are sufficient to ensure reasonable convergence. Here’s the code:

float fastinvsmoothstep_naive(in float y)
{
  float yn = 2.0*y-1.0;
  float t = 0.0;
  t -= ( t*(4.0*t*t-3.0) + yn) / (12.0*t*t-3.0);
  t -= ( t*(4.0*t*t-3.0) + yn) / (12.0*t*t-3.0);
  t -= ( t*(4.0*t*t-3.0) + yn) / (12.0*t*t-3.0);
  t -= ( t*(4.0*t*t-3.0) + yn) / (12.0*t*t-3.0);
  t -= ( t*(4.0*t*t-3.0) + yn) / (12.0*t*t-3.0);
  return t + 0.5;  
}

Better Initial Guesses

We want to speed-up convergence by taking a reasonable first guess. To simplify, let’s first remove the constant term by assuming that y=0.5 in Eq. 2. Note that from Eq. 3 we see that the tangent slope at the origin is -3/4. Since we’re solving Eq. 2 for any y, we can intersect the tangent with the horizontal line -(2y-1)/4 to get a reasonable first approximation for the cubic function and our horizontal line intersection. That is:

t0 near= -(2y-1)/4 / -3/4 = (2y-1)/3

Alas, this approximation is only good for when the root is “close” to the origin. Using the tangent always underestimates the guess (the derivative is monotonic-increasing for both sides around the origin, see Eq. 3). Thus, if the solution resides towards the verge of the segment [-0.5,0.5] – it will require more iterations to converge using t0near. Indeed, using the suggested t0near above yields good results after just one iteration towards the center of the range, but requires additional iterations towards its boundaries. A better approach would be to correct for this underestimation by adding an additional factor to compensate if the solution goes towards the segment’s boundaries. This compensation shouldn’t hinder the fast convergence near the origin though.

Let’s repeat the same logic, this time searching for a “reasonable” guess using roots that lie outside the segment. Note that for y=0, ±sqrt(3/4) are also roots with a tangent slope of 3/2. There’s a subtle detail in here – we actually want to move “away” from the root that is outside the segment so we move towards the root which is inside the segment. So we move “backwards” – using the mirrored tangent slope of -3/2. Now, assuming abs(y) is “close enough” to the boundary at 0.5, we can intersect the horizontal line –(2y-1)/4 again to get:

t0 far= -(2y-1)/4 / -3/2 = (2y-1)/6

We are now ready to compensate for the underestimation by adding the t0 far term:

t0 = t0 near + αt0 far = (2y-1)/3 + α(2y-1)/6 = (2+α)(2y-1)/6

The α term controls the amount of overestimation we’re adding back in (again, relying on our algorithm to correct for the overestimation near the origin). We need to pick our α carefully as this additional factor t0far might overshoot and cause our numerical method to converge to a root outside [-0.5,0.5]. I made a shadertoy demo that animates α with just a single update step (Eq. 4). Note that for smaller values, α ≅ 0 – the boundaries aren’t converging fast enough. However, for larger values, α ≅ 1.0 – the boundaries overshoot.

Figure 3a: invsmoothestep (blue) fastinvsmoothstep (yellow) and error (red) for α∈[0,1] and a single update step

By eyeballing the graph, a good α value resides somewhere in [~0.6,~0.8]. I was satisfied with α = 0.7. Here’s the implementation code that uses just a single update step from the last section and a better initial guess based on the above strategy. Note that if better precision is required – one can run a second update step again.

float fastinvsmoothstep_linearguess(in float y)
{
  float yn = 2.0*y-1.0;
  float t = 0.45*yn;
  t -= ( t*(4.0*t*t-3.0) + yn) / (12.0*t*t-3.0);
  // Uncomment for increased precision
  //t -= ( t*(4.0*t*t-3.0) + yn) / (12.0*t*t-3.0); 
  return t + 0.5;  
}

Figure 3b: invsmoothestep (blue) fastinvsmoothstep (yellow) and error (red) for a single update step

Figure 3c: invsmoothestep (blue) fastinvsmoothstep (yellow) and error (red) for two update steps

Reducing Errors

The approximation looks good but we’re not done yet. A possible application for invsmoothstep could be a continuous transition such as an animation. It might be useful to preserve properties of the original invsmoothstep function at certain points of interest. For example, imagine that we use the curve to animate the position of an object. If the starting and ending points of the approximation don’t match the original curve exactly – we might get “jerky” movements as the object “snaps” to its new position at the beginning and the end of the animation sequence. This is not great.

In particular, we would like to preserve the beginning, ending and midpoint of the original curve as follows:

invsmoothstep(0.0)=0.0
invsmoothstep(0.5)=0.5
invsmoothstep(1.0)=1.0

How bad are the errors using our current implementation? I measured errors for the selected points using a 32 bit floating point on the CPU:

Error (#updates / y)y = 0.0y = 0.5y = 1.0
1 update step0.0245610.00.024561
2 update steps0.0121770.00.012177
3 update steps0.0060630.00.006063

Table 1: Error per interest point and number of update steps

We can see that the error is roughly decreasing by a multiple of 2 for each additional update step. Unfortunately, the errors are still quite high. We would like to redesign our implementation to have minimum error at these selected points. Ideally, the update step should return exactly -0.5,0.0,0.5 for input y of 0.0,0.5,1.0 respectively (we made the substitution of x=t+0.5, and we calculated t in our implementation).

Recall our update step (Eq. 4) and solve for tN given our selected y points to “back-calculate” what the input tN-1 for the update step would have been. So result of the equation yields the desired result discussed earlier:

Figure 4: Solving the update equation for a desired output of +0.5 (red), 0.0 (blue) and -0.5 (green).

We can see that we need to constrain our initial guess t0 according to:

t0(y=0) = -0.5 or +0.25
t0(y=0.5) = 0
t0(y=1)= +0.5 or -0.25
(Eq. 5)

To preserve a symmetric solution, we also require that t0(y) = -t0(1-y). Meaning we either pick t0(0)=-0.5, t0(1)=0.5 or t0(0)=-0.25, t0(1)=0.25. We will also require a continuous and smooth t0(y) function, to support a pleasant, “non-jerky” animation curve.

As a side note, now we can explain why we got a zero error for y=0.5 in Table 1. In the last section we were using a linear function with no offset for the initial guess, so we conformed to the requirement t0(0.5) = 0 and therefore we had zero errors for this point.

Although these constraints can be satisfied using a linear equation, unfortunately, the results don’t look good. Let’s try instead a “banded” curve that satisfies the requirements. Let’s use a signed power curve as follows:

t0(y) = k sign(2y-1) abs(2y-1)p

Note that by either picking k = 0.5 or -0.25, the requirements above are satisfied for any p≠0. Here’s a shadertoy demo that animates the solution for various p.

Figure 5a: invsmoothestep with an animated power curved initial guess (k=0.5, p∈[0,20], single update step)

Figure 5b: invsmoothestep with a power curved initial guess (k=-0.25, p∈[0,20], single update step)

Again, by eyeballing the graphs, and restricting ourselves to pick an integral p value to save some additional ALU by avoid using the pow function, I’ve decided to go with p=4,3 for k=0.5,-0.25 respectively. Here are the results and source code:

float fastinvsmoothstep_edgeoption1(in float y)
{
  float yn = 2.0*y-1.0;
  float t = 0.5*abs(yn)*yn*yn*yn;
  t -= ( t*(4.0*t*t-3.0) + yn) / (12.0*t*t-3.0);
  // Uncomment for increased precision
  //t -= ( t*(4.0*t*t-3.0) + yn) / (12.0*t*t-3.0);
  return t + 0.5;     
}

float fastinvsmoothstep_edgeoption2(in float y)
{
  float yn = 2.0*y-1.0;
  float t = -0.25*yn*yn*yn;
  t -= ( t*(4.0*t*t-3.0) + yn) / (12.0*t*t-3.0);
  // Uncomment for increased precision
  //t -= ( t*(4.0*t*t-3.0) + yn) / (12.0*t*t-3.0);
  return t + 0.5; 
}

Figure 5c: invsmoothestep with a power curved initial guess (k=0.5, p=4, single update step)

Figure 5d: invsmoothestep with a power curved initial guess (k=0.5, p=4, two update steps)

Figure 5e: invsmoothestep with a power curved initial guess (k=-0.25, p=3, single update step)

Figure 5f: invsmoothestep with a power curved initial guess (k=-0.25, p=3, two update steps)

Indeed, running this new implementation on the CPU confirms that it holds the restrictions and we get zero errors for the points of interest (i.e. y=0,0.5,1), regardless of the number of update steps.

Which implementation is better? For a single update step, the curves are very similar but the second option is slightly better regarding its ALU usage. If one can afford multiple update steps, however, the first option gives better results near the boundaries when multiple update steps are being used (compare Figure 5d and 5f). Observe also the difference in the curves’ slope close to the boundaries. We will revisit these two options in the conclusions section again.

Note that we analyzed a single update step. There’s a subtle assumption here that applying additional steps would not introduce any additional errors (at y=0,0.5,1) and can only improve results. This requires a short discussion. Please refer back to Figure 4 and Eq. 4 and observe that the initial guess we picked for option 1 is actually a fixed point for the update function (in Figure 4, for each of the graphs, the desired output of the update step is also a fixed point). For option 2, after the first update step the points of interest will reach the fixed points as well. In other words:

n>0 tn(0) = -0.5
n>0 tn(0.5) = 0
n>0 tn(1) = +0.5

Thus additional update steps do not introduce additional errors for these particular solutions.

Stability and Robustness

We have already established that the algorithm will converge to the right answer in [0,1] and learned how to bring errors to zero at certain strategic points. We now turn our attention to discuss numerical instability issues.

Recall the update step (Eq. 4):

tn+1 = tn – (t(4tn-3) + 2y-1) / (12tn2-3)

We won’t have numerical issues as long as the denominator isn’t too close to zero, that is tn±0.5.

Let’s consider our first solution, the initial guess is t0 = 0.45(2y-1) and substituting it in the update step (Eq. 4) and denoting x(y)=0.45(2y-1) we get:

f(x)= t1 = x – (x(4x2-3) + x/0.45) / (12x2-3)

Since x(y) and f(x) are monotonic-increasing the conclusion is that f(y) is also monotonic-increasing. I derived f(x) and made sure the derivative is always positive for x∈[-0.45,0.45], but you can also look at the following graph to convince yourself.

Figure 6: single update step with initial guess of 0.45(2y-1)

Since f(y) is continuous and monotonic-increasing, it’s enough to check the edge of the segment to find maximum and minimum. Now we can compute:

t1max = t1(y=1.0) = 0.475438
t1min = t1(y=0.0) = -0.475438

Similar logic shows that when a second update step is used, t2(y) is also continuous monotonic-increasing. Therefore:

t2max = t2(y=1.0) = 0.487822
t2min = t2(y=1.0) = -0.487822

So to wrap things up, even after two updates, the minimum denominator (in absolute value) is at most 0.144351, which is not close enough to zero. Therefore, the first solution is numerically stable.

For the second solution, we can use a similar construction to prove monotonous (to be accurate, the second option, with k = -0.25, is actually monotonic-decreasing, this is still a sufficient argument for checking only the boundaries of the segment).

So again, we only need to check the extremum. The alert reader might have noticed that we actually designed the update step to return -0.5,+0.5 for input of 0.0,1.0 respectively. How do we reconcile it with the fact that the denominator in Eq. 4 will be zero?

Recall from Figure 4 that for the particular cases of y=0.0,1.0, the corresponding update step functions have removable discontinuity at these particular points. It’s easy to see it by applying L’hôpital rule, but I also provided a visualization (Figure 7) to show the update step error as a function of the initial guess. The animation shows the function for a particular input.

Figure 7: error function of the update step for various y

Since we’re getting too close to a discontinuity we should correct to avoid a possible division by zero. Let’s add a small ε to the denominator for this case. It’s probably better to actually subtract an ε in this case to keep the denominator sign (the denominator 12x2-3 is always non-positive in our [-0.5,0.5] range).

Note that the update step’s nominator equals exactly zero for our points of interest (y=0.0,0.5,1.0). That proves that changing the update equation as suggested above didn’t introduce any errors for the evaluation of our interest points and so we’re free to use the ε in the denominator.

Best of Both Worlds

When comparing the overall errors after a single update step from the three solutions we found so far (compare Figure 3c, 5c and 5e), we can observe that even though the linear initial guess gives the lowest errors in the entire segment, it does not guarantee zero errors at the boundaries. Can we do better?

A simple idea would be to lerp between the linear initial guess and the other two solutions in a way that combines the benefits; having overall reduced errors with zero errors on the boundaries. To do so, we write the following two options:

t0combined(y) = lerp(0.45(2y-1), 0.5sign(2y-1)abs(2y-1)4, β(y))
t0combined(y) = lerp(0.45(2y-1), -0.25sign(2y-1)abs(2y-1)3, β(y))
(Eq. 6)

We will also constrain ourselves to continuous β(y) with symmetry around the middle point. That is β(0.5+y) = β(0.5-y). As we’ve seen earlier, in order to get zero errors on the boundary, we have to satisfy Eq. 5, therefore:

β(y=0) = 1
β(y=1) = 1

Note that we get zero error in the middle for any β(y=0.5).

In terms of stability and robustness, since we’re picking a linear combination of two stable solutions (i.e. both solutions are in [-0.5,0.5]), the combined solution will also be stable. Hence the denominator of the update step (Eq. 4) cannot be zero. That means that the solution is stable for any β(y) as long as we decrease ε in the denominator (see last section).

Let’s pick β(y). Intuitively, we want to give more weight to the linear guess everywhere except the boundaries, in which we want to give full weight to the second guess. Let’s pick again a power curve function as follows:

β(y) = abs(2y-1)r

Intuitively, it looks like the second option in Eq. 6 (for k=-0.25) won’t give good results since the difference between the two lerped functions is quite large. Indeed, the overall solution doesn’t look good: the approximation overshoots / undershoots where the combined initial guess evaluates to zero. That leaves us with the first equation in Eq. 6. Here’s a shadertoy animation of the solution for different r values:

Figure 8a: invsmoothestep with a combined initial guess (r∈[0,20], single update step)

Again, by eyeballing the graphs and picking an integral r value, I’ve decided to go with p=3. We can also expand the lerp and rearrenge terms using MAD instructions to obtain:

t0combined(y) = lerp(0.45(2y-1), 0.5sign(2y-1)abs(2y-1)4, abs(2y-1)3)
t0combined(y) = 0.45(2y-1) + 0.5(2y-1) (abs(2y-1)6 – 0.9 abs(2y-1)3)

Here are the results and source code:

float fastinvsmoothstep_combined(in float y)
{
  float yn = 2.0*y-1.0;
  float absyn3 = abs(yn)*yn*yn;
  float t = 0.45*yn+0.5*yn*(absyn3*absyn3-0.9*absyn3);
  t -= ( t*(4.0*t*t-3.0) + yn) / (12.0*t*t-3.0);
  // Uncomment for increased precision
  //t -= ( t*(4.0*t*t-3.0) + yn) / (12.0*t*t-3.0);
  return t + 0.5;     
}

Figure 8b: invsmoothestep with a combined initial guess (r=3, single update step)

Figure 8c: invsmoothestep with a combined initial guess (r=3, two update steps)

Conclusions

In this article I presented novel, efficient and robust methods for computing the inverse-smoothstep function. I presented several flavors and discussed theoretical trade-offs. To compare these methods in practice and draw conclusions, I created this sandbox to demonstrate fast and slow invsmoothstep-based animations for position, scale and fade.

Figure 9: invsmoothstep based animations using different methods (purple – ground-truth, olive – linear guess, pink – power guess (k=0.5), light blue – power guess (k=-0.25), green – combined guess)

I encourage the reader to explore the sandbox to better understand the differences. Overall, I think that the linear guess is both efficient and plausible when used in reasonable animation speeds and should be the go-to implementation. If invsmoothstep must be accurate for 0.0 and 1.0, I suggest using the power guess for k=0.5 since the additional ALU required for the combined guess does not justify the additional quality gain, in my opinion.

We still have issues really close to the boundaries for extremely slow animations. This is an inherent limitation of our methods since we’re using a continuous guessing function that gets close to the discontinuity point at the edge (review Figure 7). Recall the two options we found earlier for zero boundary errors (compare Figure 5c and Figure 5e). Now we can see another difference between the approximations – they both reach invsmoothstep from different sides of the curve! In other words, the implementation for k = 0.5 would reach the boundary “slower” than invsmoothstep, while the implementation for k = -0.25 would reach invsmoothstep faster. In practice, it means that for k = 0.5 we will always get a “jerk” close to the boundaries, while for k = -0.25, we will reach the boundary smoother, but “too fast”.

Reducing the “abrupt jump” for k = 0.5 can be achieved by running additional update steps. Note that (due to the discontinuity) the “abrupt jump” will always be there – it will just be pushed closer to the boundary itself. That’s why this issue is being highlighted in a super-slow animation (refer again to Figure 9). On the other hand, for k = -0.25, we have a much smoother rate of change, although it’s a bit faster compared to the original invsmoothstep. In Figure 9, I think that this difference is exaggerated because we are comparing the implementations side-by-side. If a slow and pleasant animation is required, the k = -0.25 solution might be a reasonable compromise.

Based on the previous sections, it goes without saying that if increased precision is required and more ALU cycles are allowed, all implementations can increase quality by running the update step more times without jeopardizing stability or robustness (see sandbox for demonstration).

Finally, the implementation you pick should be based on your requirements from approximating invsmoothstep. For completion, here’s the final source code and implementation:

// NOTE: Plausible but doesn't reach boundaries correctly.
float fastinvsmoothstep(in float y)
{
  float yn = 2.0*y-1.0;
  float t = 0.45*yn;
  t -= ( t*(4.0*t*t-3.0) + yn) / (12.0*t*t-3.0);
  return t + 0.5;  
}

// NOTE: Abrupt changes towards true boundaries.
float fastinvsmoothstep(in float y)
{
  float yn = 2.0*y-1.0;
  float t = 0.5*abs(yn)*yn*yn*yn;
  t -= ( t*(4.0*t*t-3.0) + yn) / (12.0*t*t-3.0-EPSILON);
  return t + 0.5;     
}

// NOTE: Smoother but faster changes towards true boundaries.
float fastinvsmoothstep(in float y)
{
  float yn = 2.0*y-1.0;
  float t = -0.25*yn*yn*yn;
  t -= ( t*(4.0*t*t-3.0) + yn) / (12.0*t*t-3.0);
  return t + 0.5;     
}

Closing Thoughts

I used an ad-hoc approach for picking α, p and r for the suggsested solutions. It’s possible to use more rigorous methods by defining a metric (e.g. RMSE) and finding optimal parameters (e.g. by using simulated annealing). Though I’m fairly satisfied with the parameters I found – I would greatly appreciate it if you shared with me any parameters with better results so that I can update this post accordingly.

Let’s circle back to the beginning of this article and recall the analytic form of invsmoothstep. By rearranging a few terms, it’s possible to use the method developed here for fast computation of an expression that has the form sin(arcsin(x)/3). This expression might be useful for other applications such as accelerating other cubic equation solvers. I haven’t yet experimented with this idea though.

Screen-Space Effects Using Polar Coordinates Linear Transformations

Screen-Space Effects Using Polar Coordinates Linear Transformations

Motivation

Like most graphics engineers I’m familiar with linear transformations applied in texture-space and their usage. I was always impressed by how easy it is to achieve different intuitive effects by simply applying matrix multiplications. 

After implementing a swirl / twist effect that uses an underlying connection between texture-space radius and angle, I was intrigued to investigate the math behind it and look at other effects that could be achieved by applying a simple matrix transformation – this time in polar coordinate space.

This article explores common linear transformations applied in texture polar coordinate space. I will present different possible artistic effects that use these transformations and provide a shadertoy sandbox that allows the reader to experiment further with polar coordinate transformations and their effects.

While these techniques might not have a practical use in a production environment, I thought it would still be valuable to share my findings with the community. I hope you find my experiments interesting and inspiring and would appreciate it if you share with me applications that were not presented in this article.

Notation and Basic Math

The 2D point p can be expressed directly using the cartesian coordinates (x,y) or the polar coordinates (r,θ) as follows:

p = vec2(x,y) = r·vec2(cosθ,sin θ)

The relationship between cartesian and polar coordinates representation is given by:

r(x,y) = sqrt(x2+y2)

θ(x,y) = atan2(y,x)

and:

x(r,θ) = r·cosθ

y(r, θ) = r·sinθ

The fact that atan2 function has a discontinuity along the negative x axis will cause visual artifacts if not properly handled. For that reason, most of the simple angle transformations will not work “out of the box” and will require additional thinking.

In texture space, we usually consider normalized values pnorm-tex (between 0 and 1) where the behavior for values outside of this range depends on the texture sampling / wrapping mode (e.g. clamp, repeat, border, etc.). In the examples below, we will be using ptex in the [-1,1] segment instead, as it will place the origin of the coordinate system at the center of the image by default. In addition, for maximum flexibility, we will allow an arbitrary origin by translating the coordinates.

Putting it all together, we get the following equations:

ptex = 2·pnorm-tex – vec2(1,1) – porigin

pnorm-tex = 0.5·(ptex + vec2(1,1) + porigin)

Finally, when a transformation is needed to be applied to only a part of the image, I will be mixing the original image with the transformed one using a coefficient named α (alpha). This effectively controls the area in which transformation is being applied and unless otherwise specified, it is assumed to be 1 (i.e. apply transform for the whole image).

Linear Transformations

Linear transformations are quite common in graphics programming because they can be easily expressed in matrix form and implemented efficiently on GPU hardware. 

When using a screen-space cartesian coordinate system, several common linear transformations can be used to achieve different effects. Namely: translation, scale, rotation, and shear

The next section will survey these transformations in matrix form and briefly discuss their effect when applied in the cartesian coordinate system. We then focus our attention to examine the effect when applied in the polar coordinate system.

The careful reader may observe that translation is not a linear transform (the origin is not necessarily mapped to the origin). We will use the 2D homogeneous coordinates notation for translations so we can express affine transforms using matrices (i.e. translation matrix dimensions would be 3×3).

Translation

Tough translation is a non-linear transformation (the origin isn’t mapped to itself). It can be however expressed in matrix notation by using 2D homogeneous coordinates. Thus:

In the cartesian system, one use of translations (also sometimes referred to as bias) is to animate a seamless texture by “sliding” it by a given offset (tx,ty) that varies over time (with a repeating sampling mode).

Radius Translation (Pinch and Explode)

Consider the polar coordinates equation after applying a radius bias:

(r+tr)·vec2(cosθ,sinθ )

Pinch

When tr > 0, the radius “contracts” towards the origin. The contraction is most pronounced closest to the origin and since the image borders are also contracting towards the origin – special care should be applied to deal with image orders. The overall effect looks like the image was “pinched” with a distortion towards the image borders (Figure 1a).

Figure 1a: Radius Translation (tr=0.1 and vec2(0.7,0.2) origin, border sampling mode)

We can somewhat avoid the border distortion by setting the α coefficient to use the pinched image around the origin and gradually mix with the original image as distance from origin grows. Unfortunately this idea won’t avoid the distortion completely when the origin is too close to the image border (Figure 1b). Here’s the shadertoy implementation (note, the visible border distortion as the origin moves closer to the image border and pinch intensity gets stronger). Crafting images that have at least 1 pixel border and setting a clamp sampling mode can further reduce distortion.

Figure 1b: Radius Translation (tr=0.1 and vec2(0.7,0.2) origin, PINCH_POWER=0.2, clamp sampling mode)

Explode and Raindrops

When tr < 0, the radius “expands” from the origin. Note that radius can also get negative values in this case. If r > 0, the pixels are being “pushed” away from the origin. When r < 0, the image is essentially flipped. The overall effect looks like a “flipped-image bubble” expanding from the origin (Figure 1c).

Figure 1c: Radius Translation (tr=-0.1 and vec2(0.7,0.2) origin, border sampling mode) 

One usage of explosion effect is to simulate screen-space fake raindrops. Consider multiple small animated explosions in different locations (origins), with α coefficient that decays really quickly and peaks only at the raindrop locations. The result would be blended small explosions that look like (depends on the parameters) fake raindrops (Figure 1d). This shadertoy code demonstrates the idea.

Figure 1d: Raindrops effect (15 animated drops, repeat sampling mode)

Angle Translation (Rotation)

It’s easy to see from the polar coordinate definition that angle translation is equivalent to image rotation around the origin point. 

r·vec2(cos(θ+tθ),sin( θ +tθ))

Positive bias is equivalent to clockwise rotation, while negative bias is equivalent to counterclockwise rotation (Figure 2). 

See the following shadertoy example. Note that just like the last example, when using cartesian coordinates (by setting IS_USING_POLAR_COORD to 0), we can observe, as expected, a sliding animation on the vertical axis (similarly, when dealing with radius translation and setting IS_USING_POLAR_COORDS to 0, we will observe a horizontal sliding movement).

Figure 2: Angle Translation (Clockwise rotation of 45 degrees around (0,0) rotation axis, border sampling mode)

Scale

The traditional scaling matrix is just a diagonal matrix:

In the cartesian system, we use scale for stretching (i.e. shrinking or expanding) the cartesian axes. This technique is useful, for instance, when tiling a pattern by using a seamless periodic texture with a repeating sampling mode. Another usage is to apply the same texture a few times to fill a larger area with a repeating pattern.

Radius Scale (Zoom)

Scaling the radius in polar coordinates is similar to the cartesian case, when using isometric axes scale factor (meaning: x,y are scaled exactly by the same factor sr = sx = sy = s).

r(sx,sy) = sqrt((sx)2+(sy)2) = |s|·sqrt(x2+y2) = |s|·r(x,y) 

The image is flipped if a negative scaling factor is being used, s < 0. Additionally, the image will be zoomed in if s < 1 and zoomed out if s > 1 around the coordinate system origin point.

To achieve zoom-in effect, one can choose any monotonic decreasing function f(t) between 0 and 1 (e.g. e-at, 1/(1+at) with a>0). Conversely, to get a zoom-out effect, choose an increasing function in the same range (e.g. 1-e-at, at/(1+at) with a>0).

Here’s a shadertoy code that demonstrates the polar coordinates zoom-in and-out effects. 

Note that if cartesian coordinates are being used (by setting IS_USING_POLAR_COORD to 0), we get a “stretching” effect on the horizontal axis around the origin point (ZOOM_POINT), as expected.

Figure 3a: Original image (Radius Scale 1.0, zoom point vec2(-0.25,0.5), any sampling mode)

Figure 3b: Zoomed-in image (Radius Scale 0.2, zoom point vec2(-0.25,0.5), any sampling mode)

Angle Scale (Starburst and Rings)

When scaling the angle, we are essentially shrinking / expanding the image according to the angle and around the origin:

r·vec2(cos(sθ·θ),sin(sθ·θ))

Fake Starburst

If s < 0, the image is being mirrored according to the angle around the origin.

When |s| > 1, the replication of the image looks like a “fan” where the “fins” themselves are a stretched version of the lines between the origin to the image corners (Figure 4a). Furthermore, the positive x axis is a “compressed” version of the image, while the negative side has a discontinuity. These artifacts can be somewhat alleviated when a large number of fins is being used along with a repeating texture sampling mode, that is, sθ should be large. Caution though, setting a very big sθ will lead to aliasing (Figure 4b). Experimentation shows that a good number is somewhere between 50 and 200 (Figure 4c).

Figure 4a: Angle Scale (sθ=10, origin=vec2(-0.6,-0.2), border sampling mode)

Figure 4b: Angle Scale (sθ=1000, origin=vec2(-0.6,-0.2), border sampling mode)

Figure 4c: Angle Scale (sθ=70, origin=vec2(-0.6,-0.2), repeat sampling mode)

We can use this pattern to create an interesting starburst effect centered around the origin (Figure 4d). Set the α coefficient to use the original image around the origin and change it gradually to use the above pattern as distance from origin grows. See this shadertoy example for more information.

Figure 4d: Starburst Effect (sθ=160, origin=vec2(-0.6,-0.2), repeat sampling mode)

Concentric Rings

When |s| < 1, the image compresses along concentric rings around the transformation origin with visible negative x axis discontinuity (Figure 4e, Figure 4f). The discontinuity disappears when both ends of the ring are identical, meaning an “infinite” amount of compression, or sθ = 0.

When sθ = 0, the color of each ring is being determined by positive x axis (Figure 4g):

r·vec2(cos0,sin0)=r·vec2(1,0)

We can combine this method with an angle translation tθ to color the rings according to a generic ray that shoots from the origin. This will add a somewhat continuous ring color animation due to neighbouring pixel similarity (except object edges).

r·vec2(cos(tθ),sin(tθ))

Please refer to the shadertoy demo to see the animation.

Figure 4e: Rings Effect (sθ=0.4, origin=vec2(-0.6,-0.2), repeat sampling mode)

Figure 4f: Rings Effect (sθ=0.1, origin=vec2(-0.6,-0.2), repeat sampling mode)

Figure 4g: Rings Effect (sθ=0.0, origin=vec2(-0.6,-0.2), repeat sampling mode)

A potential ring image usage could be for pseudo lens flare generation. Note that we can save the expansive atan2 computation since we don’t really use θ for ring generation.

Rotation

A 2D rotation matrix is a unitary matrix that can be written in the following form for both coordinate systems:

In the cartesian system, we use this transformation to counterclockwise rotate the image by an angle of Φ.

While rotation in cartesian coordinates makes a lot of sense and this transformation is quite common, beyond math, I could not find a good use for rotating polar coordinates (Figure 5).

Here’s the shadertoy prototype I used for experimentation. Please let me know if you have any idea how polar coordinate rotation can be used for screen-space effects creation.

Figure 5: Polar Coordinates Rotation (30 degrees around (0,0) origin, border sampling mode)

Shear

Shearing is a very interesting linear operation, it essentially displaces points using a proportional amount to the other axis. 2D shearing can be written as:

For example, in the cartesian coordinate system, when my = 0 we get vec2(x+mx·y, y). So the shear strength depends on y absolute value, when the direction depends on its sign (i.e. -mx will mirror the effect).

Radius Shear

Let’s shear only the radius (i.e. mθ = 0) to get:

In other words:

(r+mr·θ)·vec2(cosθ,sinθ)

Recall the analysis we made for radius translation. By applying similar arguments we conclude that the overall effect is a pinch / explode, depending on mr·θ sign. Since θ∈[-π,π], we actually get a part of the image with a pinch effect (above the x axis that intersects the transform origin) and the other part with an explode effect. Note that the negative x axis has a discontinuity (Figure 6a).

Figure 6a: Radius Shear (mr=0.12 with origin vec2(0.4, 0.4), border sampling mode)

Here’s the shadertoy demo I was using to produce the image. I couldn’t find a practical effect for this transformation and I would appreciate it if you share with me any interesting thoughts / artistic usage for this transformation. Note that if cartesian coordinates are being used (by setting IS_USING_POLAR_COORD to 0), we get, as expected, a horizontal shearing around the origin point, when the sign of mx dictates the shearing direction.

Angle Shear (Spiral)

Now let’s consider an angle shear (i.e. mr = 0) to get:

In other words:

r·vec2(cos(mθ·r+θ),sin(mθ·r+θ))

Recall the analysis we made earlier for angle translation. We’re essentially rotating each concentric ring by a different rotation amount that depends on the rings’ radii. Since the rotation addition is linear (and therefore monotonic and continuous) – we’re actually creating a continuous spiral around the transformation origin (Figure 6b)!

Since, by definition, r > 0, we see that angle shearing direction is dictated solely by the sign of mθ. Meaning, mθ sign determines the spiral orientation (clockwise or counterclockwise for positive and negative values respectively).

Figure 6b: Spirals (mθ=15 with origin vec2(0.0, 0.0), border sampling mode)

Now, obviously we will use clamp or repeat sampling mode going forward to produce better looking images. I’m also interested in animating these spirals over time, so let’s explore several options for doing so.

First, let’s take mθ to be linearly dependant on time (denoted by t), and denote mθ=2πfθ/rmax·t, where rmax is the maximum image coordinate value for radius. For example, if the origin is exactly at the center (i.e. vec2(0,0)) we get maximum radius at the corners, so rmax=sqrt(2). If the origin is located at one of the corners, the maximum radius is at the other corner – rmax=2sqrt(2).

This notation means that the whole spiral completes a cycle in a frequency of fθ, that is completing 1/fθ cycles per a single time unit (i.e. a second).

r·vec2(cos((2πfθ/rmax)rt+θ),sin((2πfθ/rmax)rt+θ))

From the above equation we can see that larger radii will get stronger frequencies, which makes the spiral look like it is spinning “inwards”. Intuitively, if we want to get the opposite effect, that is, the spiral to look like it is spinning “outwards”, the inner radii should have stronger frequencies compared to outer radii. All what we need to do is to inverse the frequency dependency on r:

r·vec2(cos((2πfθ/rmax)(rmax-r)t+θ),sin((2πfθ/rmax)(rmax-r)t+θ))

But the above equation is equivalent to inversing the sign of the original mθ and adding an angle bias of: 

tθ = 2πfθ·t = -mθ·rmax·t

To sum up, we can get clockwise and counterclockwise spirals that grow inwards or outwards simply by setting the angle shearing and angle bias parameters with the right sign. Here are the formulas:

GrowthOrientationmθtθ
InwardClockwise+(2πfθ/rmax)t0
InwardCounterclockwise-(2πfθ/rmax)t0
OutwardCounterclockwise+(2πfθ/rmax)t-2πfθt
OutwardClockwise-(2πfθ/rmax)t+2πfθt

Listing 1: Spiral parameterization (angle shear and bias)

Note that if mθ and tθ have the same sign – the result is a faster spinning spiral, so I haven’t listed these cases here.

Since we’re investigating spiral animations, I strongly suggest the reader check out the shadertoy demo, which highlights the differences (by switching SPIRAL_GROWTH and SPIRAL_ORIENTATION). 

In practice, since the fθ parameter expresses a degree of freedom, we don’t really need to calculate rmax, mθ and tθ. Instead, we can let the artist plug in numbers until a visually pleasing result is achieved. In fact, we can choose any combination of mθ (controlling growth speed and direction (inwards / outwards)) and tθ (controlling spin speed and orientation (clockwise / counterclockwise)) to achieve interesting spiral effects. 

We can even use a radius scale sr to add a zoom-in or zoom-out effect that interacts with the spiral inwards and outwards growth effect as well as animating these parameters individually over time. Here’s a shadertoy example that allows you to control these three parameters and see the animated spiral results (Figure 6c). Here’s another demo showing a “pulsing” portal effect that reaches a steady state (Figure 6d). There are many possibilities here…

Figure 6c: Animated Spiral (25% of animation sequence, origin at vec2(0.75,0.0), repeat sampling mode)

Figure 6c: Portal (steady state, origin at vec2(-0.52,-0.2), repeat sampling mode)

Closing Thoughts

In this article we investigated the way common linear transformations react with screen-space 2D polar coordinates. We suggested a few artistic effects based on these transformations. 

I hope you found these examples inspiring and that you would like to investigate more yourself. I made this shadertoy sandbox to help get yourself started. All the other demos in this article were initially based on this sandbox. To use it, just change the get_origin and transform functions as well as the alpha variable to achieve your desired effect. You can also use the preprocessor flags to toggle on and off different options for better visualization and debugging. Have fun exploring polar coordinates!


Normalized Uniform Disk Pattern

Normalized Uniform Disk Pattern

Motivation

In computer graphics, one of the approaches to simulate a depth-of-field (DOF) effect with bokeh is to “mix” nearby pixel values. The idea is to use a sampling pattern to retrieve nearby pixel values (usually referred to as “scatter-as-gather”). One option is to use a normalized sampling pattern that determines the bokeh shape, while the circle-of-confusion (COC) is used to dictate the scale of this pattern (in texels). That way, more out-of-focus pixels will tend to gather pixel values that are further away, increasing the out-of-focus blur and bokeh effect. 

Figure 1: DOF and Bokeh Effect

The goal of this article is to introduce a normalized symmetric disk sampling computation method that will yield a circular bokeh shape. I will not explain how to simulate the DOF effect using the pattern in this article. 

My initial motivation for this derivation was to understand the math behind a sampling pattern such as Unity’s disk kernel. If you’re interested in seeing an example of a DOF effect that uses this pattern, I highly encourage you to read the excellent Catlike Coding Depth of Field tutorial.

Problem Statement

Let’s formalize the problem at hand. We want to symmetrically sample the normalized 2D circle using a disk pattern. That is, equally-spaced concentric rings with equally distributed sample points (referred to as samples going forward) on their perimeters.

Let L be the total number of rings (layers). We will be using l to denote a running ring index, where 0≤l<L. Note that the innermost ring (l = 0) always has one sample exactly at the origin (i.e. when simulating DOF, this sample corresponds to the current pixel being blurred). 

Let S be the base-ring size. That is, the number of samples in the second innermost ring (l = 1).

Let NTotal be the total number of samples ( NTotal depends on L,S). We will be using i to denote a sample index, where 0≤i<NTotal . NTotal is usually used as a proxy for the sampling pattern efficiency (i.e. number of taps required to blur each pixel).

We parameterize our disk sampling pattern by L>0 and S>0, which will dictate a specific NTotal. 

Figure 2: Disk Pattern Parametrization and Notation

To make the code highly parallelizable, I want to find a way to compute each sample point directly without introducing any dependency on other samples. 

Hence, our final goal can be stated as: given a disk parametrization L,S and a sample index 0≤i<NTotal – find the 2D normalized disk sampling point pi(x,y) with minimum machine instructions.

About the Demos

I will be using shadertoy to present the sampling pattern. I’m using a signed distance field (SDF) calculation to drive the pattern visualization because it’s easy and quick to implement – although not efficient. 
The important part (i.e. the function that we will design in this blog post) is called disk. That’s the function which takes a sample index i and disk parameters L,S and computes the sampling point pi(x,y).

Naive Approach

We start with a straight-forward and naive approach to gain some intuition. Let’s fill the samples on a ring-by-ring basis: We start at the innermost ring, then add the base ring by equally distribute S points along the base ring perimeter. Then, we proceed to the next ring (assuming that all rings have equal spacing), until we filled a total of L rings. We will denote the ring radius by R(l) (radius depends on the ring-index / layer).

We have a total of NTotal=L·S samples and the index i can be rewritten as:

i = S·l + s

Where:

l = floor(i / S)
s = i mod S

For a given ring (or layer), l, we can easily determine its radius by remembering that we’re interested in a unit disk distribution. Hence, the outermost ring should correspond to the unit circle, R(L-1)=1. In other words, we’re always normalizing the outermost disk layer. 
We already decided that the innermost ring maps to the origin, R(0)=0. Finally, since we choose a uniform spacing between our concentric rings, we can write a linear equation given these constraints R(0)=0 and R(L-1)=1:

R(l) = l /(L – 1)

For a given layer, equal distribution of points on the ring can be done by equally spacing the polar angle along the [0,2π) segment:

θ(s)=2πs / S

Putting it all together, for a given sample i parametrized by (l,s) we can derive the sampling point using 2D polar coordinates as follows:

pl,s(x,y)=R(l) ·vec2(cos(θ(s)),sin( θ (s)))

And by substituting the above equations we get:

pi(x,y)=(floor(i/S)/(L-1))·vec2(cos(2π/S (i mod S)),sin(2π/S (i mod S)))

Here’s a shadertoy code and visualization based on this approach:

vec2 disk(int i, int L, int S)
{
    float r = float(i / S) / max(EPSILON, float(L-1));   
    float theta = 2.0 * PI * float(i % S) / float(S);
    
    return r * vec2(cos(theta), sin(theta));    
}

Listing 3: Naive Approach Disk Pattern (glsl code)

Figure 3: Naive Approach Disk Pattern (L=6, S=8)

Better Attempt

The naive approach did not yield the pattern we are after. Intuitively, outer rings should contain more samples because they have bigger circumferences. Furthermore, because R(0)=0, S number of sampling points overlap at the origin, resulting in an overstatement of the origin by S-1. This is obviously wrong since we should not sample any point more than once. Let’s change our approach to fix these issues. 

First, we’ll add an if statement to handle the innermost ring (l=0 ⇔ i=0) case by returning vec2(0,0) immediately (we will get rid of this branch towards the end of this blog post). 

Second, we’ll close the “outer ring gaps” (Figure 3) by doubling the number of samples between subsequent rings. That is, the base ring (l=1) would have S samples, the next ring (l=2) will have 2S samples, and so on (Figure 2). 

The number of samples for each ring varies. Let C(l) denote the number of samples for a given ring l:

C(0)=1 ; C(l)=Sl  ; 0<l<L

Let N(l) denote the the total number of samples up to (including) ring l:

N(l)=C(0)+C(1)+…+C(l)

N(l)=1+S+2S+…+lS

N(l)=1+S(1+2+…+l)

N(l)=1+Sl(l+1)/2

And note that the overall total number of samples, which we previously denoted by NTotal, is given by that outermost layer L-1. Using the above notation we can write NTotal=N(L-1), and we get:

NTotal=1+S(L-1)L/2

We can repeat the construction we made in our first naive attempt:

R(l)=l /(L-1)

θ(s)=2πs/C(l)=2s/Sl ; 0<l

pl,s(x,y)=R(l)·vec2(cos(θ(s)),sin( θ (s)))

Note that s can be easily derived from the sample index i and the ring-index l

s = i – N(l – 1) = i – 1 – S(l – 1)l/2 ; (Eq. 1)

The whole problem is now reduced for finding a ring-index l given a sample index i. We’re going to solve this problem by making a good initial guess for the ring-index based on the sample index and correct our initial guess as needed. We will see that we can do it at a surprisingly relatively low and simple computation cost.

Finding Layer Bounds

Let’s assume that the index i is on some layer, l’ (that is, the ring-index is l’). Since we already took care of the innermost layer l’=0 by returning the origin, we can assume i>0.

By the construction definition, the index i must conform to:

N(l’-1) ≤ i < N(l’)

1+S(l’-1)l’/2 ≤ i < 1+Sl'(l’+1)/2

l'(l’-1) ≤ 2(i-1)/S < l'(l’+1)

Let’s denote x = sqrt(2(i-1)/S) (note that x is well defined since i,S>0), we can write:

l'(l’-1) ≤ x2 < l'(l’+1); (Eq. 2)

Let’s find lower and upper bounds for the ring-index l’. The lower bound:

x2 < l'(l’+1) < (l’+1)2

x<l’+1

x-1<l’

And the upper bound:

(l’-1)2<l'(l’-1)≤x2

l’-1<x

l'<x+1

Putting it all together, we can “sandwich” the unknown ring-index l’ by an expression that depends only on x and therefore i:

x-1<l'<x+1

Recall that l’ must be an integer. Let’s divide to two cases based on x:

x is a non-integer

If x is a non-integer, we can write:

ceil(x-1)≤l’ ≤ floor(x+1)

ceil(x)-1 ≤ l’ ≤ floor(x)+1

floor(x) ≤ l’ ≤ ceil(x)

Hence the ring-index l’ can be either floor(x) or ceil(x).

x is an integer

If x is an integer, l’ can be x-1,x or x+1. Let’s find a better condition for this case. Recall the bounds from the last section (Eq. 2):

l'(l’-1)≤x2<l'(l’+1)

The only integer that satisfies these conditions is l’. Therefore, x=l’.

Although the initial analysis gave three options for l’x-1,x or x+1. We ended up proving that l’=x in this case. Note that if x is an integer, we can also write:

l’ = x = floor(x) = ceil(x)

Initial guess and Correction

From the last section we know that there are only two candidates at most to consider: floor(x) and ceil(x) (If x is an integer, these two candidates are actually the same, so we’re down to only one candidate).

Let’s take an initial guess of l’=floor(x)=floor(sqrt(2(i-1)/S))

We can verify if i is indeed mapped to l’ ring-index by checking its boundary condition: 

i<N(l’)=1+Sl'(l’+1)/2

x2=2(i-1)/S<l'(l’+1)

If this condition doesn’t hold, we increment our initial guess to get floor(x)+1=ceil(x)

Since there are only at most two candidates for the ring-index to consider, the above condition is sufficient.

Here’s a shadertoy code and visualization based on this approach:

vec2 disk(int i, int L, int S)
{
    if (i == 0) // first branch
    {
        return vec2(0.0); // innermost ring (l=0)
    }
    float x2 = float(2 * (i - 1)) / float(S);    	
    int l = int(floor(sqrt(x2))); // initial guess        
    if (x2 >= float(l * ( l + 1)))  // second branch
    {
		++l; // guess correction
    }
    int s = i - N(l-1, S);
    
    float r = float(l) / float(L-1);   
    float theta = 2.0 * PI * float(s) / float(S*l);
    
    return r * vec2(cos(theta), sin(theta));    
}

Listing 4: Guess-Based Approach Disk Pattern (glsl code)

Figure 4: Guess-Based Approach Disk Pattern (L=6, S=8)

Code Robustness and Simplification

In this section we will rely on the suggested implementation. Please look at the previous glsl code snippet (Figure 4) before reading on.

As promised, here are a few things we can do to get rid of branches, improve robustness and simplify the code. 

Let’s address the first branch. For i=0 (or l=0), we notice that x2<0. If we clamp our initial guess to 0 (by making sure we’re not taking a square-root of a negative number – which has undefined behavior in glsl), and notice that x2<l(l+1)=0 always evaluates true, the initial guess would stay at 0, yielding the right ring-index. To sum up, by clamping the term under the square-root we can omit this branch.

For the second branch, we would like to avoid accumulated floating point computational errors by using integral expressions instead of floating point. Recall x definition x=sqrt(2(i-1)/S) and since S>0 we can change the second conditional as follows:

if (x2≥l(l+1)) ++l;

if (x2S l(l+1)S) ++l;

if (2(i-1) l(l+1)S) ++l;

Moreover, we could have changed this branch by using a step function:

if (2(i-1)≥l(l+1)S) ++l; ⇔ l+=step(l(l+1)S,2(i-1));

But this won’t really eliminate the branch (in fact, on some GPU architectures it might even be less performant). This is merely a coding style preference and I have decided to leave the conditional as is

Let’s also simplify θ as follows (using Eq. 1):

θ = 2πs/Sl = 2π(i-N(l-1))/Sl = 2πi/Sl – 2πN(l-1)/Sl

cos(θ) and sin(θ) will span C(l)=Sl points for the layer l. Hence, applying any phase of the form 2πk/Sl (for any integer k) will span the same points. Since we don’t care about the internal ordering of the samples within the layer, we can just take:

θ = 2πi/Sl

The additional changes I made were to rewrite some terms in MAD form, save registers, remove the floor function (as casting to int will use a floor) and add ε when needed to increase code robustness (avoid division by 0).

Here’s the final shadertoy code and visualization

/* 
Returns the ith sampling location of a normalized disk sampling pattern with 0 < L rings (layers) and 0 < S base-ring samples. 0 <= i < 1 + SL(L+1)/2.
*/
vec2 disk(int i, int L, int S)
{    
    float fS = float(S);    
    int x2S = 2 * i - 2;
    int l = int(sqrt(max(0.0, float(x2S) / fS)));    
    if (x2S >= ((l * l + l) * S)) ++l;	
    float fl = float(l);
    
    float r = fl / (float(L-1) + EPSILON);       
    float theta = (2.0 * PI * float(i)) / (fS * fl + EPSILON);
    
    return r * vec2(cos(theta), sin(theta));    
}

Listing 5: Normalized Disk Pattern Generation (glsl code)

Figure 5a: Normalized Uniform Disk Pattern (L=4, S=12)

Figure 5b: Normalized Uniform Disk Pattern (L=10, S=12)

Figure 5c: Normalized Uniform Disk Pattern (L=4, S=26)

Figure 5d: Normalized Uniform Disk Pattern (L=10, S=26)

Closing Thoughts

It goes without saying that if either L or S are known, one can optimize the shader by hardcoding their values. If L,S are both known and the total number of samples NTotal is relatively small – the entire pattern can be coded as a shader immediate constant buffer (similar to this table used by Unity).

Another interesting idea is to reuse the “guess approach” to find the best disk pattern within a sampling budget. Suppose you’re limited by using a maximum of T taps and you want to find the best disk sampling pattern within this budget. That is, given S and T, find L such that NTotal = N(L-1)T, or in other words, find the number of rings within the given tap budget. You can first repeat the guessing process we did to find L, followed by our “regular” disk sampling pattern generation. Here’s a shadertoy demonstration.

If concentric rings aren’t required, it is possible to apply any shaping function (e.g. pow) for the radius before returning the sampling point. Ring radius change will give different weight for the further / closer samples and can affect, for example, the DOF blur effect. It is also possible to reduce the total number of required taps NTotal by “pushing” the samples towards the verge of the 2D unit circle (e.g. r = pow(r, 0.5)) while reducing the ring count L, thus reducing NTotal = N(L-1).

A final closing remark about one of the expensive instructions in my approach – the square-root evaluation of x2. Note that we don’t really care about the actual value of x itself, we’re only interested in the rounded-down square-root value (i.e. floor(..)).It is possible to use a “good-enough” rounded-down square-root approximation, one that gives the precise square-root integral part, but is allowed to have an error in the fractional part. For more information, please check the iterative implementation flavor of the integer square root algorithm.