Name: Anonymous 2020-09-17 20:05
Ok. The new voxel rendering algorithm performs 1.4 times faster. Not much, but voxel models can also be split into 16-bit chunks, which locally use only 16-bit addressing. That should speedup the algorithm further, if the bottleneck is with the memory bandwidth.
I've also discovered that GCC's __builtin_popcount is slower than a lookup table, since it branches into subroutine __popcountdi2. Why GCC doesnt use the x86 popcnt opcode? No idea. No idea. But it probably would be easier to answer why Stallman molests little children.
I've also discovered that GCC's __builtin_popcount is slower than a lookup table, since it branches into subroutine __popcountdi2. Why GCC doesnt use the x86 popcnt opcode? No idea. No idea. But it probably would be easier to answer why Stallman molests little children.