more optimizations...
calculate the offsets from the phase differently, this happens to reduce the register pressure in the main loop, which in turns allows the compiler to generate much better code (doesn't need to spill a lot of stuff on the stack). this gives another 15% performance increase Change-Id: I2ce3479dd48b9e6941adb80e6d443d6e14d64d96
Loading
Please register or sign in to comment