Return Styles: Pseud0ch, Terminal, Valhalla, NES, Geocities, Blue Moon. Entire thread

x86 SSE are not that fast

Name: Anonymous 2019-08-31 15:13

Name: Anonymous 2019-09-01 10:13

https://danluu.com/assembly-intrinsics/
For example, as of this writing, the first two Google hits for popcnt benchmark (and 2 out of the top 3 bing hits) claim that Intel's hardware popcnt instruction is slower than a software implementation that counts the number of bits set in a buffer, via a table lookup using the SSSE3 pshufb instruction. This turns out to be untrue, but it must not be obvious, or this claim wouldn't be so persistent. Let's see why someone might have come to the conclusion that the popcnt instruction is slow if they coded up a solution using intrinsics.

Name: Anonymous 2019-09-01 11:51

popcunt

Name: Anonymous 2019-09-01 14:19

https://www.learning2.de/cs/partdiff/
>Clang also supports these intrinsics, but lowers most of them only to the corresponding LLVM IR instructions. Consequently, as optimizations in Clang are performed on the level of the LLVM IR, the produced assembly code diverges from the original code. This implies that the intrinsics are not as useful as for compilers like GCC, and the programs produced by Clang are in many cases slower than GCC-compiled programs if intrinsics are employed.

Name: Anonymous 2019-09-02 6:10

make your're are game

Name: Anonymous 2019-09-05 2:31

>>4
Doesn't matter, at least Clang is free software

Name: Anonymous 2019-09-05 21:11

>>6
Free as in shit.

Newer Posts
Don't change these.
Name: Email:
Entire Thread Thread List