Tag: perf

Fast Inverse Smoothstep

Fast Inverse Smoothstep

Motivation

Smoothstep is useful in computer graphics and sometimes the inverse effect is needed. Iñigo Quilez derived a closed formula for inverse smoothstep:

invsmoothstep(x) = smoothstep-1(x) =0.5 – sin(arcsin(1-2x)/3)

Figure 1: smoothstep (blue) and invsmoothstep (yellow) (source)

While no one can deny the beauty of this closed form, I was wondering if we could find another practical way to compute this function without using sine and arc-sine built-in functions (which potentially can be ALU prohibitive).

As it turns out, we can. The rest of this article shows how to do it step-by-step.

Cubic Equations

We start the derivation similar to Iñigo. Note that smoothstep (between 0 to 1, excluding clamping) could be re-written implicitly as:

y = smoothstep(x) = x2(3-2x)

-2x3+3x2-y=0 (Eq. 1)

The cubic disciminant is:

Q = -1/4 , R = (1-2y)/8 => Δ = y(y-1)/16

And since y∈[0,1], the discriminant is always negative in this range, except y=0 and y=1 for which the discriminant is zero. Hence, the equation has 3 different real roots, except the boundaries (zero and one) – for which it has at most two real roots. The conclusion is that we indeed deal with non-complex solutions and in the general case we have 3 roots.

Let’s continue by erasing the quadratic term by making a substitution using t (similar to Cardano’s formula):

x = t+0.5 => t3 – 3t/4 + (2y-1)/4 = 0
t(t2 – 3/4) + (2y-1)/4 = 0 (Eq. 2)

To summarize this section, I created a shadertoy demo that shows the cubic equations for (x,y)∈[-1,2]x[-1,1]. The green graph shows Eq. 1 and the blue graph shows Eq. 2. Roots are highlighted in white. Observe, for example, that for y=0.5, the root for Eq. 1 is x=0.5 and for Eq. 2 is t=0, and the substitution x=t+0.5 is correct as expected. This also confirms our intuition since we know that smoothestep(0.5) = 0.5, and therefore invsmoothstep(0.5) = 0.5.

Figure 2: invsmoothstep cubic equations and roots (green is Eq. 1, blue is Eq 2.)

Numerical Analysis

Now we’re going to solve Eq. 2 for t using Newton-Raphson method.

Define:

g(t) = t(t2 – 3/4) + (2y-1)/4

And compute its derivative:

g'(t) = 3t2-3/4 (Eq. 3)

Now, for a “good” guess tn we can update our next guess, tn+1, as follows:

tn+1 = tn– g(tn) / g'(tn) = tn – (t(t2 – 3/4) + (2y-1)/4) / (3t2-3/4)
tn+1 = tn – (t(4tn2-3) + 2y-1) / (12tn2-3) (Eq. 4)

The concerning part about this strategy is convergence. Picking the “wrong” initial guess will yield a wrong root, that is, a root outside of the t∈[-0.5,0.5] range. Recall that we’re interested in finding the root within the [-0.5,0.5] segment for Eq. 2. By examining the blue chart (Figure 2, see animation in shadertoy) we see that there’s always one root inside the segment we care about. Two roots converge at -0.5 and +0.5 for y=0 and y=1 respectively.

Thus, picking an initial guess of t0=0 will ensure that the closest root is indeed the root that we care about and the algorithm should converge to the right answer.

Putting it all together. we start with t0=0 and run the update step (Eq. 4) N times until the result converges. Finally, we return tN+0.5.

After experimenting a bit with this strategy, I found that 5 steps are sufficient to ensure reasonable convergence. Here’s the code:

float fastinvsmoothstep_naive(in float y)
{
  float yn = 2.0*y-1.0;
  float t = 0.0;
  t -= ( t*(4.0*t*t-3.0) + yn) / (12.0*t*t-3.0);
  t -= ( t*(4.0*t*t-3.0) + yn) / (12.0*t*t-3.0);
  t -= ( t*(4.0*t*t-3.0) + yn) / (12.0*t*t-3.0);
  t -= ( t*(4.0*t*t-3.0) + yn) / (12.0*t*t-3.0);
  t -= ( t*(4.0*t*t-3.0) + yn) / (12.0*t*t-3.0);
  return t + 0.5;  
}

Better Initial Guesses

We want to speed-up convergence by taking a reasonable first guess. To simplify, let’s first remove the constant term by assuming that y=0.5 in Eq. 2. Note that from Eq. 3 we see that the tangent slope at the origin is -3/4. Since we’re solving Eq. 2 for any y, we can intersect the tangent with the horizontal line -(2y-1)/4 to get a reasonable first approximation for the cubic function and our horizontal line intersection. That is:

t0 near= -(2y-1)/4 / -3/4 = (2y-1)/3

Alas, this approximation is only good for when the root is “close” to the origin. Using the tangent always underestimates the guess (the derivative is monotonic-increasing for both sides around the origin, see Eq. 3). Thus, if the solution resides towards the verge of the segment [-0.5,0.5] – it will require more iterations to converge using t0near. Indeed, using the suggested t0near above yields good results after just one iteration towards the center of the range, but requires additional iterations towards its boundaries. A better approach would be to correct for this underestimation by adding an additional factor to compensate if the solution goes towards the segment’s boundaries. This compensation shouldn’t hinder the fast convergence near the origin though.

Let’s repeat the same logic, this time searching for a “reasonable” guess using roots that lie outside the segment. Note that for y=0, ±sqrt(3/4) are also roots with a tangent slope of 3/2. There’s a subtle detail in here – we actually want to move “away” from the root that is outside the segment so we move towards the root which is inside the segment. So we move “backwards” – using the mirrored tangent slope of -3/2. Now, assuming abs(y) is “close enough” to the boundary at 0.5, we can intersect the horizontal line –(2y-1)/4 again to get:

t0 far= -(2y-1)/4 / -3/2 = (2y-1)/6

We are now ready to compensate for the underestimation by adding the t0 far term:

t0 = t0 near + αt0 far = (2y-1)/3 + α(2y-1)/6 = (2+α)(2y-1)/6

The α term controls the amount of overestimation we’re adding back in (again, relying on our algorithm to correct for the overestimation near the origin). We need to pick our α carefully as this additional factor t0far might overshoot and cause our numerical method to converge to a root outside [-0.5,0.5]. I made a shadertoy demo that animates α with just a single update step (Eq. 4). Note that for smaller values, α ≅ 0 – the boundaries aren’t converging fast enough. However, for larger values, α ≅ 1.0 – the boundaries overshoot.

Figure 3a: invsmoothestep (blue) fastinvsmoothstep (yellow) and error (red) for α∈[0,1] and a single update step

By eyeballing the graph, a good α value resides somewhere in [~0.6,~0.8]. I was satisfied with α = 0.7. Here’s the implementation code that uses just a single update step from the last section and a better initial guess based on the above strategy. Note that if better precision is required – one can run a second update step again.

float fastinvsmoothstep_linearguess(in float y)
{
  float yn = 2.0*y-1.0;
  float t = 0.45*yn;
  t -= ( t*(4.0*t*t-3.0) + yn) / (12.0*t*t-3.0);
  // Uncomment for increased precision
  //t -= ( t*(4.0*t*t-3.0) + yn) / (12.0*t*t-3.0); 
  return t + 0.5;  
}

Figure 3b: invsmoothestep (blue) fastinvsmoothstep (yellow) and error (red) for a single update step

Figure 3c: invsmoothestep (blue) fastinvsmoothstep (yellow) and error (red) for two update steps

Reducing Errors

The approximation looks good but we’re not done yet. A possible application for invsmoothstep could be a continuous transition such as an animation. It might be useful to preserve properties of the original invsmoothstep function at certain points of interest. For example, imagine that we use the curve to animate the position of an object. If the starting and ending points of the approximation don’t match the original curve exactly – we might get “jerky” movements as the object “snaps” to its new position at the beginning and the end of the animation sequence. This is not great.

In particular, we would like to preserve the beginning, ending and midpoint of the original curve as follows:

invsmoothstep(0.0)=0.0
invsmoothstep(0.5)=0.5
invsmoothstep(1.0)=1.0

How bad are the errors using our current implementation? I measured errors for the selected points using a 32 bit floating point on the CPU:

Error (#updates / y)y = 0.0y = 0.5y = 1.0
1 update step0.0245610.00.024561
2 update steps0.0121770.00.012177
3 update steps0.0060630.00.006063

Table 1: Error per interest point and number of update steps

We can see that the error is roughly decreasing by a multiple of 2 for each additional update step. Unfortunately, the errors are still quite high. We would like to redesign our implementation to have minimum error at these selected points. Ideally, the update step should return exactly -0.5,0.0,0.5 for input y of 0.0,0.5,1.0 respectively (we made the substitution of x=t+0.5, and we calculated t in our implementation).

Recall our update step (Eq. 4) and solve for tN given our selected y points to “back-calculate” what the input tN-1 for the update step would have been. So result of the equation yields the desired result discussed earlier:

Figure 4: Solving the update equation for a desired output of +0.5 (red), 0.0 (blue) and -0.5 (green).

We can see that we need to constrain our initial guess t0 according to:

t0(y=0) = -0.5 or +0.25
t0(y=0.5) = 0
t0(y=1)= +0.5 or -0.25
(Eq. 5)

To preserve a symmetric solution, we also require that t0(y) = -t0(1-y). Meaning we either pick t0(0)=-0.5, t0(1)=0.5 or t0(0)=-0.25, t0(1)=0.25. We will also require a continuous and smooth t0(y) function, to support a pleasant, “non-jerky” animation curve.

As a side note, now we can explain why we got a zero error for y=0.5 in Table 1. In the last section we were using a linear function with no offset for the initial guess, so we conformed to the requirement t0(0.5) = 0 and therefore we had zero errors for this point.

Although these constraints can be satisfied using a linear equation, unfortunately, the results don’t look good. Let’s try instead a “banded” curve that satisfies the requirements. Let’s use a signed power curve as follows:

t0(y) = k sign(2y-1) abs(2y-1)p

Note that by either picking k = 0.5 or -0.25, the requirements above are satisfied for any p≠0. Here’s a shadertoy demo that animates the solution for various p.

Figure 5a: invsmoothestep with an animated power curved initial guess (k=0.5, p∈[0,20], single update step)

Figure 5b: invsmoothestep with a power curved initial guess (k=-0.25, p∈[0,20], single update step)

Again, by eyeballing the graphs, and restricting ourselves to pick an integral p value to save some additional ALU by avoid using the pow function, I’ve decided to go with p=4,3 for k=0.5,-0.25 respectively. Here are the results and source code:

float fastinvsmoothstep_edgeoption1(in float y)
{
  float yn = 2.0*y-1.0;
  float t = 0.5*abs(yn)*yn*yn*yn;
  t -= ( t*(4.0*t*t-3.0) + yn) / (12.0*t*t-3.0);
  // Uncomment for increased precision
  //t -= ( t*(4.0*t*t-3.0) + yn) / (12.0*t*t-3.0);
  return t + 0.5;     
}

float fastinvsmoothstep_edgeoption2(in float y)
{
  float yn = 2.0*y-1.0;
  float t = -0.25*yn*yn*yn;
  t -= ( t*(4.0*t*t-3.0) + yn) / (12.0*t*t-3.0);
  // Uncomment for increased precision
  //t -= ( t*(4.0*t*t-3.0) + yn) / (12.0*t*t-3.0);
  return t + 0.5; 
}

Figure 5c: invsmoothestep with a power curved initial guess (k=0.5, p=4, single update step)

Figure 5d: invsmoothestep with a power curved initial guess (k=0.5, p=4, two update steps)

Figure 5e: invsmoothestep with a power curved initial guess (k=-0.25, p=3, single update step)

Figure 5f: invsmoothestep with a power curved initial guess (k=-0.25, p=3, two update steps)

Indeed, running this new implementation on the CPU confirms that it holds the restrictions and we get zero errors for the points of interest (i.e. y=0,0.5,1), regardless of the number of update steps.

Which implementation is better? For a single update step, the curves are very similar but the second option is slightly better regarding its ALU usage. If one can afford multiple update steps, however, the first option gives better results near the boundaries when multiple update steps are being used (compare Figure 5d and 5f). Observe also the difference in the curves’ slope close to the boundaries. We will revisit these two options in the conclusions section again.

Note that we analyzed a single update step. There’s a subtle assumption here that applying additional steps would not introduce any additional errors (at y=0,0.5,1) and can only improve results. This requires a short discussion. Please refer back to Figure 4 and Eq. 4 and observe that the initial guess we picked for option 1 is actually a fixed point for the update function (in Figure 4, for each of the graphs, the desired output of the update step is also a fixed point). For option 2, after the first update step the points of interest will reach the fixed points as well. In other words:

n>0 tn(0) = -0.5
n>0 tn(0.5) = 0
n>0 tn(1) = +0.5

Thus additional update steps do not introduce additional errors for these particular solutions.

Stability and Robustness

We have already established that the algorithm will converge to the right answer in [0,1] and learned how to bring errors to zero at certain strategic points. We now turn our attention to discuss numerical instability issues.

Recall the update step (Eq. 4):

tn+1 = tn – (t(4tn-3) + 2y-1) / (12tn2-3)

We won’t have numerical issues as long as the denominator isn’t too close to zero, that is tn±0.5.

Let’s consider our first solution, the initial guess is t0 = 0.45(2y-1) and substituting it in the update step (Eq. 4) and denoting x(y)=0.45(2y-1) we get:

f(x)= t1 = x – (x(4x2-3) + x/0.45) / (12x2-3)

Since x(y) and f(x) are monotonic-increasing the conclusion is that f(y) is also monotonic-increasing. I derived f(x) and made sure the derivative is always positive for x∈[-0.45,0.45], but you can also look at the following graph to convince yourself.

Figure 6: single update step with initial guess of 0.45(2y-1)

Since f(y) is continuous and monotonic-increasing, it’s enough to check the edge of the segment to find maximum and minimum. Now we can compute:

t1max = t1(y=1.0) = 0.475438
t1min = t1(y=0.0) = -0.475438

Similar logic shows that when a second update step is used, t2(y) is also continuous monotonic-increasing. Therefore:

t2max = t2(y=1.0) = 0.487822
t2min = t2(y=1.0) = -0.487822

So to wrap things up, even after two updates, the minimum denominator (in absolute value) is at most 0.144351, which is not close enough to zero. Therefore, the first solution is numerically stable.

For the second solution, we can use a similar construction to prove monotonous (to be accurate, the second option, with k = -0.25, is actually monotonic-decreasing, this is still a sufficient argument for checking only the boundaries of the segment).

So again, we only need to check the extremum. The alert reader might have noticed that we actually designed the update step to return -0.5,+0.5 for input of 0.0,1.0 respectively. How do we reconcile it with the fact that the denominator in Eq. 4 will be zero?

Recall from Figure 4 that for the particular cases of y=0.0,1.0, the corresponding update step functions have removable discontinuity at these particular points. It’s easy to see it by applying L’hôpital rule, but I also provided a visualization (Figure 7) to show the update step error as a function of the initial guess. The animation shows the function for a particular input.

Figure 7: error function of the update step for various y

Since we’re getting too close to a discontinuity we should correct to avoid a possible division by zero. Let’s add a small ε to the denominator for this case. It’s probably better to actually subtract an ε in this case to keep the denominator sign (the denominator 12x2-3 is always non-positive in our [-0.5,0.5] range).

Note that the update step’s nominator equals exactly zero for our points of interest (y=0.0,0.5,1.0). That proves that changing the update equation as suggested above didn’t introduce any errors for the evaluation of our interest points and so we’re free to use the ε in the denominator.

Best of Both Worlds

When comparing the overall errors after a single update step from the three solutions we found so far (compare Figure 3c, 5c and 5e), we can observe that even though the linear initial guess gives the lowest errors in the entire segment, it does not guarantee zero errors at the boundaries. Can we do better?

A simple idea would be to lerp between the linear initial guess and the other two solutions in a way that combines the benefits; having overall reduced errors with zero errors on the boundaries. To do so, we write the following two options:

t0combined(y) = lerp(0.45(2y-1), 0.5sign(2y-1)abs(2y-1)4, β(y))
t0combined(y) = lerp(0.45(2y-1), -0.25sign(2y-1)abs(2y-1)3, β(y))
(Eq. 6)

We will also constrain ourselves to continuous β(y) with symmetry around the middle point. That is β(0.5+y) = β(0.5-y). As we’ve seen earlier, in order to get zero errors on the boundary, we have to satisfy Eq. 5, therefore:

β(y=0) = 1
β(y=1) = 1

Note that we get zero error in the middle for any β(y=0.5).

In terms of stability and robustness, since we’re picking a linear combination of two stable solutions (i.e. both solutions are in [-0.5,0.5]), the combined solution will also be stable. Hence the denominator of the update step (Eq. 4) cannot be zero. That means that the solution is stable for any β(y) as long as we decrease ε in the denominator (see last section).

Let’s pick β(y). Intuitively, we want to give more weight to the linear guess everywhere except the boundaries, in which we want to give full weight to the second guess. Let’s pick again a power curve function as follows:

β(y) = abs(2y-1)r

Intuitively, it looks like the second option in Eq. 6 (for k=-0.25) won’t give good results since the difference between the two lerped functions is quite large. Indeed, the overall solution doesn’t look good: the approximation overshoots / undershoots where the combined initial guess evaluates to zero. That leaves us with the first equation in Eq. 6. Here’s a shadertoy animation of the solution for different r values:

Figure 8a: invsmoothestep with a combined initial guess (r∈[0,20], single update step)

Again, by eyeballing the graphs and picking an integral r value, I’ve decided to go with p=3. We can also expand the lerp and rearrenge terms using MAD instructions to obtain:

t0combined(y) = lerp(0.45(2y-1), 0.5sign(2y-1)abs(2y-1)4, abs(2y-1)3)
t0combined(y) = 0.45(2y-1) + 0.5(2y-1) (abs(2y-1)6 – 0.9 abs(2y-1)3)

Here are the results and source code:

float fastinvsmoothstep_combined(in float y)
{
  float yn = 2.0*y-1.0;
  float absyn3 = abs(yn)*yn*yn;
  float t = 0.45*yn+0.5*yn*(absyn3*absyn3-0.9*absyn3);
  t -= ( t*(4.0*t*t-3.0) + yn) / (12.0*t*t-3.0);
  // Uncomment for increased precision
  //t -= ( t*(4.0*t*t-3.0) + yn) / (12.0*t*t-3.0);
  return t + 0.5;     
}

Figure 8b: invsmoothestep with a combined initial guess (r=3, single update step)

Figure 8c: invsmoothestep with a combined initial guess (r=3, two update steps)

Conclusions

In this article I presented novel, efficient and robust methods for computing the inverse-smoothstep function. I presented several flavors and discussed theoretical trade-offs. To compare these methods in practice and draw conclusions, I created this sandbox to demonstrate fast and slow invsmoothstep-based animations for position, scale and fade.

Figure 9: invsmoothstep based animations using different methods (purple – ground-truth, olive – linear guess, pink – power guess (k=0.5), light blue – power guess (k=-0.25), green – combined guess)

I encourage the reader to explore the sandbox to better understand the differences. Overall, I think that the linear guess is both efficient and plausible when used in reasonable animation speeds and should be the go-to implementation. If invsmoothstep must be accurate for 0.0 and 1.0, I suggest using the power guess for k=0.5 since the additional ALU required for the combined guess does not justify the additional quality gain, in my opinion.

We still have issues really close to the boundaries for extremely slow animations. This is an inherent limitation of our methods since we’re using a continuous guessing function that gets close to the discontinuity point at the edge (review Figure 7). Recall the two options we found earlier for zero boundary errors (compare Figure 5c and Figure 5e). Now we can see another difference between the approximations – they both reach invsmoothstep from different sides of the curve! In other words, the implementation for k = 0.5 would reach the boundary “slower” than invsmoothstep, while the implementation for k = -0.25 would reach invsmoothstep faster. In practice, it means that for k = 0.5 we will always get a “jerk” close to the boundaries, while for k = -0.25, we will reach the boundary smoother, but “too fast”.

Reducing the “abrupt jump” for k = 0.5 can be achieved by running additional update steps. Note that (due to the discontinuity) the “abrupt jump” will always be there – it will just be pushed closer to the boundary itself. That’s why this issue is being highlighted in a super-slow animation (refer again to Figure 9). On the other hand, for k = -0.25, we have a much smoother rate of change, although it’s a bit faster compared to the original invsmoothstep. In Figure 9, I think that this difference is exaggerated because we are comparing the implementations side-by-side. If a slow and pleasant animation is required, the k = -0.25 solution might be a reasonable compromise.

Based on the previous sections, it goes without saying that if increased precision is required and more ALU cycles are allowed, all implementations can increase quality by running the update step more times without jeopardizing stability or robustness (see sandbox for demonstration).

Finally, the implementation you pick should be based on your requirements from approximating invsmoothstep. For completion, here’s the final source code and implementation:

// NOTE: Plausible but doesn't reach boundaries correctly.
float fastinvsmoothstep(in float y)
{
  float yn = 2.0*y-1.0;
  float t = 0.45*yn;
  t -= ( t*(4.0*t*t-3.0) + yn) / (12.0*t*t-3.0);
  return t + 0.5;  
}

// NOTE: Abrupt changes towards true boundaries.
float fastinvsmoothstep(in float y)
{
  float yn = 2.0*y-1.0;
  float t = 0.5*abs(yn)*yn*yn*yn;
  t -= ( t*(4.0*t*t-3.0) + yn) / (12.0*t*t-3.0-EPSILON);
  return t + 0.5;     
}

// NOTE: Smoother but faster changes towards true boundaries.
float fastinvsmoothstep(in float y)
{
  float yn = 2.0*y-1.0;
  float t = -0.25*yn*yn*yn;
  t -= ( t*(4.0*t*t-3.0) + yn) / (12.0*t*t-3.0);
  return t + 0.5;     
}

Closing Thoughts

I used an ad-hoc approach for picking α, p and r for the suggsested solutions. It’s possible to use more rigorous methods by defining a metric (e.g. RMSE) and finding optimal parameters (e.g. by using simulated annealing). Though I’m fairly satisfied with the parameters I found – I would greatly appreciate it if you shared with me any parameters with better results so that I can update this post accordingly.

Let’s circle back to the beginning of this article and recall the analytic form of invsmoothstep. By rearranging a few terms, it’s possible to use the method developed here for fast computation of an expression that has the form sin(arcsin(x)/3). This expression might be useful for other applications such as accelerating other cubic equation solvers. I haven’t yet experimented with this idea though.