pow(2,X) approximation

6 May 2019

While building a VST instrument I needed to support modulation of lots of things (more on this in a later post, if I finish it). For most things this isn’t a problem, but pitch gets tricky as it’s non-linear.

To shift a note up an octave you must double the frequency, and shifting it down requires halving. To save things getting too complicated I decided to use the 1-volt per octave (1V/O) scale that’s used in analogue synths. This makes modulation easy, as the conversion from 1V/O takes care of the weirdness.

There’s only one little problem, the formula for conversion from 1V/O to frequency is:

(2^voltage) * 6.875 = frequency
^ means power, so 2 to the power of voltage multiplied by 6.875.

Now I like the idea of being able to use the stuff I build on microcontrollers, so I’m careful to not use anything too processor intensive, and the powf function isn’t exactly short.

Another way

So, since I’m only ever doing 2^X, rather than Y^X, this makes things a lot simpler. While 2^ isn’t a linear function itself, at the small scale it is, so a lookup table with interpolation should give good results. I've used this method before for generating Sine waves, so I’m not going to cover the basics again.

Covering the whole range of 0-10V with one table wouldn’t work very well, but you can split the power function into parts. So to calculate powf(2, 3.576) you can do powf(2, 3.0) * powf(2, 0.576) .

This means we can just have a lookup table covering the range of 0-1V and multiply this with the integer part.

Testing the theory

I think it makes the most sense to explain with code so, here it is in C:

static uint32_t powFastTable = 1 << (POW_FAST_TABLE_BITS + POW_FAST_FRAC_BITS);
static uint32_t powFastDiv = (1 << (POW_FAST_FRAC_BITS)) - 1;

float pow_fast(float voltage)
{
    // Split off the integer part, and raise 2^X
	float whole = (float)(1 << (uint32_t)(voltage));

    // Separate off the fractional part
	float frac = voltage - (float)((uint32_t)(voltage));

    // Multiply the fractional part with an n-bit long interger
    // This gives a value between 0 and powFastTable
	uint32_t tableIndex = (uint32_t)(powFastTable * frac);

    // Shift right to leave a number between 0 and the size of the lookup table
	uint32_t tI = tableIndex >> POW_FAST_FRAC_BITS;

    // Mask off the lookup table bits, leaving just the interpolation ones
    // Divide this by the highest value to get a float between 0 and 1
	float tF = (float)(tableIndex & powFastDiv) / powFastDiv;

    // Interpolate between the two lookup table values,
    // the multiply this by the integer part to get the result
	return whole * (powLookup[tI] + ((powLookup[tI + 1] - powLookup[tI]) * tF));
}

Now there’s two parameters you can adjust here, the lookup table size, and number of bits to interpolate with. I wrote a quick script to test how this varies between 5-11 bits of both and an 8-bit lookup table with 11 bits of interpolation gave <1ppm error (1 part per million, is an error of 0.0001%) over 0.0 to 1.0 in steps of 0.000001. This is good enough for audio stuff.

Speed testing

So, this looks like it should work, the question is, is it faster?

I'm going to use by STM32F4 dev board for testing as the Cortex-m4 core on it has a useful feature called SysTick.

This is a counter that, when enabled, counts down by 1 on every CPU cycle. If you set it to a known value, enable it, do stuff, then read the counter you know the exact number of cycles it took.

If you’re using STCube it looks like:

SysTick->LOAD = 0x00FFFFFF;
SysTick->VAL = 0;
// Set the source to CPU clock, disable interupt on zero, and enable SysTick
SysTick->CTRL = SysTick_CTRL_CLKSOURCE_Msk | SysTick_CTRL_ENABLE_Msk;
// Do stuff
ticks = SysTick->VAL;
ticks = 0x00FFFFFF - ticks;

Other parts of STCube use the SysTick timer (e.g. HAL_Delay), so some stuff might break if you use this, but for testing it’s fine.

Results

Taking the C code above (so no assembly optimization) we get the following results:

MethodCPU cycles% of original
newlib-c, powf658100%
pow_fast7511%

Now this is only for 2^X, rather than X^Y, but that’s all I need and at 11% of the CPU time that’s worth it.