Return Styles: Pseud0ch, Terminal, Valhalla, NES, Geocities, Blue Moon. Entire thread

RLE Blitting

Name: Anonymous 2019-08-17 15:20

Ok. Transitioning to RLE blitting haven't improved the performance that much - just 20% speedup, but code complexity greatly increased. One thing I noticed while measuring performance (for both rle and non-rle code) was that at times my static code completed two times faster, which is impossible because test used all static data (a sprite which it blitted a million of times in a loop), with only variables being branch prediction and I have 0% CPU load, and it doesn't make any syscall inside measured code. What does that even mean? Branch misprediction does affect performance, but not two times in the long run, because it would quickly retrain the cache on thousandth iteration.

Broken scheduling or OSX intentionally slowing down the code? Or maybe the Intel CPU itself does that? My MacBook is relatively old, so if it has any time bomb, it would be activated by now. Or maybe that is the infamous Meltdown fix slowing down my code two times? How does one disable the Meltdown patch? For Linux there is https://make-linux-fast-again.com/, but what about OSX? I don't care about security - it is overrated.

Name: Anonymous 2019-08-17 21:20

>>6
Nope. It is 100% pure C integer code, doing RLE pixels skip, or just copying pixel.

regarding _builtin_expect, nemequ at stackoverflow explained that is what one should expect from it, because it actually does some unexpected black magic
https://stackoverflow.com/questions/57538301/clang-mishandles-builtin-expect?noredirect=1#comment101543316_57538301

https://pastebin.com/S8Y8tqZy
`__builtin_expect` can alter lots of different optimizations with different trade-offs…

Your assumption about compiler writers optimizing for their own architecture is probably invalid. You can control exactly which architecture the code is tuned for (see the `-mtune` option). There may still be a bit of bias in instruction selection, but for the most part the instructions are chosen automatically.

It also doesn't help that until recently (GCC 9, IIRC) there was no set probability for what `__builtin_expect` meant. Sometimes you would see a slowdown if it failed more than around 1% of the time, other times it's more like 10%. GCC recently added a `__builtin_expect_with_probability` and defined the probability for `__builtin_expect` to be 90%, I'd suggest taking a look at using that. Unfortunately clang hasn't (yet?) picked it up, but in the meantime you can use a macro like [`HEDLEY_PREDICT`](https://nemequ.github.io/hedley/api-reference.html#HEDLEY_PREDICT), which has a few possible definitions depending on the availability of `__builtin_expect_with_probability` and `__builtin_expect`:

```c
# define HEDLEY_PREDICT(expr, value, probability) __builtin_expect_with_probability(expr, value, probability)
# define HEDLEY_PREDICT(expr, expected, probability) \
(((probability) >= 0.9) ? __builtin_expect(!!(expr), (expected)) : (((void) (expected)), !!(expr)))
# define HEDLEY_PREDICT(expr, expected, probability) (((void) (expected)), !!(expr))
```

Newer Posts
Don't change these.
Name: Email:
Entire Thread Thread List