Return Styles: Pseud0ch, Terminal, Valhalla, NES, Geocities, Blue Moon. Entire thread

Hue Shift

Name: Anonymous 2019-07-19 9:30

Ok. The HSV formula for transferring RGB triplet into the HSV form is widely published. But I've failed to find any hints to how one arrives at it, so I could try to optimize it for my game's blitter routine. Therefore I tried to infer h,s,v formula myself, using basic vector math.

v = (r+g+b)/3 //value
hs = v-r,v-g,v-b //hue*saturation vector
s = hs.length //saturation
h = angle(hs/s) //hue is the angle of the vector


Guess when one needs to just change the saturation, there is no reason compute the angle or even vector length in full, but the common requirement to change the hue requires nasty computation. Can anyone hint at any efficient way of shifting hue?

Name: Anonymous 2019-07-31 18:45

>>40
Posted it as an answer to https://stackoverflow.com/questions/1100090/looking-for-an-efficient-integer-square-root-algorithm-for-arm-thumb2/57293481

and got downvoted despite it being faster than anything but native sqrtss on x86.

Name: Anonymous 2019-08-01 3:59

This is not needed, just slows down the code.
Who would computer sqrt of 0?
if (!value) return 0;

Name: Anonymous 2019-08-01 7:45

>>42
if r, g or b is zero, then the sqrt is zero.

Name: Anonymous 2019-08-01 9:07

>>42
Then get rid of the the early return.
uint32_t clz_sqrt(uint32_t value) {
uint32_t xn = 1 << ((32 - CLZ(value))/2);
xn = (xn + value/xn)/2;
xn = (xn + value/xn)/2;
xn = (xn + value/xn)/2;
return xn*(value!=0);
}

Name: Anonymous 2019-08-01 9:16

>>45
If range is 0-255, then a lookup table will be much faster. 256 byte(sqrt is <16) of table will fit in the cache.

Name: Anonymous 2019-08-01 9:49

>>41
and got downvoted despite it being faster than anything but native sqrtss on x86.
question is about arm and you reply with x86 benchmarks. what did you expect, bydlita? make your're are game

Name: Anonymous 2019-08-01 10:28

>>46
can you prove it is slow on ARM?

Name: Anonymous 2019-08-01 10:31

>>44
That branch doesn't affect anything, because of x86 branch prediction. So eliminating it solves nothing.

Name: Anonymous 2019-08-01 11:01

>>48
Not with recent hardware fixes for Branch prediction exploits and compiler trampolines.

Name: Anonymous 2019-08-01 11:04

>>49
Just turn these fixes off. Security is overrated, when you're using smartphone. Edited on 01/08/2019 11:05.

Name: Anonymous 2019-08-01 11:04

Persistent threat without a possibility of mitigation in software

In February 2019, it was reported that there are variants of Spectre threat that cannot be effectively mitigated in software at all.[98][99]

Name: Anonymous 2019-08-01 11:11

>>50
Oh... I forgot. You can't turn them off. Because Microsoft, Apple and Linux Foundation know better.

Name: Anonymous 2019-08-01 11:15

Benchmark it with current Spectre patches.
Branch prediction is getting riskier and riskier.
https://en.wikipedia.org/wiki/Category:Speculative_execution_security_vulnerabilities Edited on 01/08/2019 11:20.

Name: Anonymous 2019-08-01 11:18

>>47
OP asked a question about ARM. you post answer about x86. it is you who needs to prove its relevance to the question you hamster-killing psychopathic bydlo

Name: Anonymous 2019-08-01 11:19

sqrt my dubs

Name: Anonymous 2019-08-01 11:26

>>54
I posted code that works fast on any CPU.

Name: Anonymous 2019-08-01 11:27

>>53
I've disabled OS auto-update to avoid all that crud. Moreover, auto-update easily eats several gigabytes of my precious SSD space.

Name: Anonymous 2019-08-01 12:30

>>56
Its several magnitudes slower than a lookup table.

Name: Anonymous 2019-08-01 12:43

>>58
proof?

Name: Anonymous 2019-08-01 12:49

>>56
how do you know this? your're are poast mentions it being fast on x86, on which you benchmarked it. this does not always map 1:1 to speed on ARM

Name: Anonymous 2019-08-01 12:57

>>60
Test it yourself.

Name: Anonymous 2019-08-01 13:23

>>61
StackOverflower: how much is 2+2?
Bydlita: 3+3 is 6
SO: but I want to know how much is 2+2
B: it's 6
SO: I don't think your're are right
B: prove it!

Name: Anonymous 2019-08-01 13:48

Hue shit

Name: Anonymous 2019-08-01 13:53

>>59
Lookup table: one(likely cached) memory load.
sqrt: 3 divs with consequent dependence, 1 early branch.

Name: Anonymous 2019-08-01 14:06

Lookup tables win EVEN if they don't fit in the cache, IIRC most chess programs have precomputed "bitboard" tables, often several megabytes of different piece tables to quickly solve intersection/bijection test for attacks. Only the first accesses of a such table is penalized, then the L2/L3 cache begins to kick in and no algorithms can compete.

Name: Anonymous 2019-08-01 14:15

look up the repeating digits in my poast number

Name: Anonymous 2019-08-01 16:02

>>64
ARMs are not that cache dependent and the LUT is just 1000 bytes - enough to fit in a cache.

Name: Anonymous 2019-08-01 16:04

>>65
In most cases lookup tables are accessed locally. I.e. if you're processing RGB color photo, then they RGB values will vary smoothly across image.

Name: Anonymous 2019-08-01 16:06

>>64
Also, that early branch dependency isn't my code, but a copy from the upvoted answer from stackoveflow.

Name: Anonymous 2019-08-01 16:52

>>67
256 bytes, if the sqrt is in range 0-15(0-255).
128 bytes, with more complex adressing(store 2 sqrts in one byte.4bits fits 0-15 exactly)

Name: Anonymous 2019-08-01 17:54

Lookups = non constant time operation.

Name: Anonymous 2019-08-01 19:10

>>71
Actually memory/cache latency is fairly constant.

Name: Anonymous 2019-08-02 5:28

>>36
Do you still have the presquared values?

function sqraprx(a, b) // ~= sqrt a^2 + b^2

c = max(a,b) + 0.41* min(a,b) / max(a,b)

return c;

Name: Anonymous 2019-08-02 5:33

Missed a multiply i think
c = max(a,b) + (0.41* min(a,b) / max(a,b)) * max(a,b)

Name: Anonymous 2019-08-02 5:35

lol, simplified
c = max(a,b) + 0.41* min(a,b)

Name: Anonymous 2019-08-02 5:40

sqraprx(3,4) = 5.2

Name: Anonymous 2019-08-02 5:57

\(c = max(a,b) + (0.41* min(a,b) / max(a,b)) * max(a,b)\)

Name: Anonymous 2019-08-02 7:13

>>77
It's using a lazy estimate/precalc of sqrt(2) in 1 + 1 * 0.41
error value of 20 on ~sqrt(300^2 + 400^2) doesn't seem too bad, +4% error
6% for (100, 400), and similar for (200,400)

Calculation should just about be competitive with the sum of square precalculation

Name: Anonymous 2019-08-02 13:30

>>73

I'm using it for gamma packing, not distance. And precision is somewhat important. I considered using 9bit floating point numbers, but they mapped badly to gamma rgbs, producing more loss of precision.

I'm doing it all in software, so I can't really afford true 16 floats, like GPUs do.

Name: Anonymous 2019-08-03 1:53

I was just introduced to YCoCg-R, I'm in love.
—FLIF user

Newer Posts
Don't change these.
Name: Email:
Entire Thread Thread List