Ok. The HSV formula for transferring RGB triplet into the HSV form is widely published. But I've failed to find any hints to how one arrives at it, so I could try to optimize it for my game's blitter routine. Therefore I tried to infer h,s,v formula myself, using basic vector math.
v = (r+g+b)/3 //value hs = v-r,v-g,v-b //hue*saturation vector s = hs.length //saturation h = angle(hs/s) //hue is the angle of the vector
Guess when one needs to just change the saturation, there is no reason compute the angle or even vector length in full, but the common requirement to change the hue requires nasty computation. Can anyone hint at any efficient way of shifting hue?
Note that in the hs/s vector the sum of any two elements is always equal to the third element, and its length is always less than 1, so it can be safely reduced to a single angle value.
Name:
Anonymous2019-07-19 18:31
Ok. Tried to complete the implementation of that h = angle(hs/s)
And found that I can't really represent hs/s as an angle, but I can still represent it as a single value by solving quadratic equation. I also had to solve the issue with v, doing hs=(v-r,v-g,v-b)/v instead, to normalize the shit. Now it is better than HSV, because it handles saturation properly. But the problem is: it is not HSV or HSL, because it still works inside the RGB cube.
Still needs further research on how its hue part works, because I noticed that quadratic equation solely by accident.
Name:
Anonymous2019-07-19 18:40
Here is the full code, in case anyone can help with it. qsolve B C = | D = B*B - 4.0*C | when D < 0.0: leave 0 | D = D.sqrt | S0 = (D - B)/2.0 | S1 = (-D - B)/2.0 | S0,S1
dergb RGB = | R = 0 | G = 0 | B = 0 | unrgb RGB R G B | R = R.float/255.0 | G = G.float/255.0 | B = B.float/255.0 | V = (R+G+B)/3.0 | HS = [R-V G-V B-V]/V | S = HS.abs | H = HS/S | Hue = H.0 | D = 0 | A = H.0 | Bs = qsolve A (A*A - 0.5) | when Bs: | B0,B1 = Bs | B = H.1 | D <= if (B-B0).abs < (B-B1).abs then 1 else 2 | Hue,S,V,D
enrgb HSV = | Hue,S,V,D = HSV | A = Hue | B = 0 | if D then | Bs = qsolve A (A*A - 0.5) | less Bs: leave rgb{ 0 0} | B <= if D><1 then Bs.0 else Bs.1 else | B <= -(A*0.5) | C = -(A+B) | H = [A B C] | HS = H*S | R,G,B = (HS*V + [V V V])*255.0 | rgb R.int G.int B.int
Name:
Anonymous2019-07-19 20:26
Warning: you're posting in a Nikita thread.
Name:
Anonymous2019-07-20 7:38
Ok. Dropped this shit. It solved the brightness-saturation clash problem, but I found that at higher brightness values it gets non-uniform. Beyond my current grasp in math to analyze further: https://www.youtube.com/watch?v=qMgVFyn7M14
>>8 In addition it required at least 16 bits to hold the hue. But this Lab also requires 16 bits, so I cant used 32bit array to hold values, or have to sacrifice the alpha channel :(
basically they call that crap XYZ plane, due to the property of X+Y+Z = 1.0. I botched the calculation, because I have no experience with linear algebra and plane equation. But well, I've admitted that my math skills are near zero and I prefer doing hacks, than solving the shit analytically.
Name:
Anonymous2019-07-24 10:40
r is 0, g is 120, b is 240 with the usual units 0, 80, 160 would fit in a byte
you should be able to change both saturation and value without changing hue it might do some funny things like 0.0 sat, 1.0 val is #ffffff
Name:
Anonymous2019-07-24 11:34
I think val might be max(r,g,b)/255, sat = max(rgb) - min(rgb), and then hue ~= mid-min / max-min
Name:
Anonymous2019-07-24 11:44
~hue = 0 is a solid r/g/b, or 0 degrees, ~hue = 1 is yellow/cyan, or +/- 45's
Name:
Anonymous2019-07-24 12:57
>>16 That is the problem with usual HSV - its saturation has interference with its value. Therefore HSV is considered bad choice even for modern video gaming graphics.
Name:
Anonymous2019-07-24 17:01
( r + g + b ) / 3
Name:
Anonymous2019-07-24 21:54
(r * 0.299f + g * 0.587f + b * 0.114f)
Name:
Anonymous2019-07-24 21:57
make blacker nigger
Name:
Anonymous2019-07-25 4:11
>>21 you could use this to make it keep it's greyscale value constant while changing hues
Hue is more of a mock-angle, the rgb vector length will vary as the hue is changed
Name:
Anonymous2019-07-25 7:42
>>24 Well, I actually used cosine to interpolate, because I disliked these pronounced magenta, cyan and yellow lines. And used proper weighted gamma (instead of (r + g + b)/3) to avoid botching brightness.
Here is the final encoder routine, converting the usual RGB into my version of HSL. Had to use int64_t, because of overflow in RGB_METRIC: #define RGB_METRIC(dist, item, rr, gg, bb) \ do { \ int64_t x = (int64_t)(item)->r - rr; \ int64_t y = (int64_t)(item)->g - gg; \ int64_t z = (int64_t)(item)->b - bb; \ dist = x*x + y*y + z*z; \ } while (0)
uint32_t rgb2hsl(uint32_t rgb) { uint32_t lm; uint32_t hs; uint32_t r = rgb>>16; //no div by 255, cuz luma() gives us L*255 uint32_t g = (rgb>>8)&0xff; uint32_t b = rgb&0xff; uint32_t rgb16 = R5G6B5(r,g,b); uint32_t l = LUMA8(r,g,b); lut_item_t *item = rgb_to_hs_lut + rgb16*3; lut_item_t *best_item = item; uint64_t dist,best_dist; item += 1; if (!item->hs) return (l<<16) | best_item->hs; lm = inv8[l]; r = (r*lm)<<6; g = (g*lm)<<6; b = (b*lm)<<6; RGB_METRIC(best_dist, item, r, g, b); item += 1; RGB_METRIC(dist, item, r, g, b); if (dist < best_dist) { best_dist = dist; best_item = item; } item += 1; if (!item->hs) return (l<<16) | best_item->hs; RGB_METRIC(dist, item, r, g, b); if (dist < best_dist) { best_dist = dist; best_item = item; } return (l<<16) | best_item->hs; }
Name:
Anonymous2019-07-25 16:51
Why do you need to implement your own hueshift op? There are multiple good options available.
Name:
Anonymous2019-07-25 17:29
>>28 All options are some variation of usual HSV, or this academic Lab, which gives you useless a and b params.
>>25 I guess the trouble with using hue for gradients is you always get the colour cycling /rainbow type effect, as it separates out the rgb channel fades
It's probably good for storing a colour palette, beats trying to cycle through rgb values in code, and it'll even do a slight compression for groups of S/V/L
although not the same as in rgb, because (Color1.h+Color2.h)/2 doesn't work with the spectral wheel. I.e. using this format makes mixing colors more expensive.
Name:
Anonymous2019-07-31 10:22
TLDR: I came with different color space, and now need a fast integer sqrt for gamma crunching.
>>36 everything depends on how much accuracy you want to sacrifice for speed, how fast you really need to be and do you have other requirements like memory use. iterative methods based on Newton's formula are a classic, and you can set max number of iterations to control the tradeoff. if this isn't enough, pre-computed lookup tables can help. here's a solution using those made by some stackoverflowgrammer: https://stackoverflow.com/a/1100591
Name:
Anonymous2019-07-31 12:07
>>38 I would have gone with just blending raw RGB, ignoring there gamma nature, but people say that is incorrect:
If it helps, you can think of sRGB as being an opaque compression format. You wouldn’t try to add two ZIP files together, and you wouldn’t try to multiply a CRC32 result by 2 and expect to get something useful, so don’t do it with sRGB! The fact that you can get something kinda reasonable out is a red herring, and will lead you down the path of pain and deep deep bugs. Before doing any maths, you have to “decompress” from sRGB to linear, do the maths, and then “recompress” back.
Checked my sqrt against the log2 based sqrt, using clang's __builtin_clz (which should expand to single assembly opcode), and the library's sqrtf, called using (int)sqrtf((float)i): #define CLZ(x) __builtin_clz(x) uint32_t clz_sqrt(uint32_t value) { if (!value) return 0; uint32_t xn = 1 << ((32 - CLZ(value))/2); xn = (xn + value/xn)/2; xn = (xn + value/xn)/2; xn = (xn + value/xn)/2; return xn; }
got rather strange results: $ gcc -O3 test.c -o test && ./test isqrt16: 6.498955 sqrtf: 6.981861 log2_sqrt: 61.755873
Clang provided CPU based sqrtss, which is nearly as fast as my one. Lesson learned: on x86 compiler can provide fast enough sqrt, which is less than %10 slower than what you can come with up yourself, wasting a lot of time, or can be 10 times faster, if you use some ugly bitwise hacks. And still sqrtss is a bit slower than custom function, so if you really need these 5%, you can get them. Yet ARM for example has no sqrtss, so log2_sqrt shouldn't lag that bad.
>>48↵
Benchmark it with current Spectre patches.
Benchmark it with current Spectre patches. ↵
Branch prediction is getting riskier and riskier.↵
https://en.wikipedia.org/wiki/Category:Speculative_execution_security_vulnerabilities
Name:
Anonymous2019-08-01 11:18
>>47 OP asked a question about ARM. you post answer about x86. it is you who needs to prove its relevance to the question you hamster-killing psychopathic bydlo
>>61 StackOverflower: how much is 2+2? Bydlita: 3+3 is 6 SO: but I want to know how much is 2+2 B: it's 6 SO: I don't think your're are right B: prove it!
Name:
Anonymous2019-08-01 13:48
Hue shit
Name:
Anonymous2019-08-01 13:53
>>59 Lookup table: one(likely cached) memory load. sqrt: 3 divs with consequent dependence, 1 early branch.
Name:
Anonymous2019-08-01 14:06
Lookup tables win EVEN if they don't fit in the cache, IIRC most chess programs have precomputed "bitboard" tables, often several megabytes of different piece tables to quickly solve intersection/bijection test for attacks. Only the first accesses of a such table is penalized, then the L2/L3 cache begins to kick in and no algorithms can compete.
Name:
Anonymous2019-08-01 14:15
look up the repeating digits in my poast number
Name:
Anonymous2019-08-01 16:02
>>64 ARMs are not that cache dependent and the LUT is just 1000 bytes - enough to fit in a cache.
Name:
Anonymous2019-08-01 16:04
>>65 In most cases lookup tables are accessed locally. I.e. if you're processing RGB color photo, then they RGB values will vary smoothly across image.
Name:
Anonymous2019-08-01 16:06
>>64 Also, that early branch dependency isn't my code, but a copy from the upvoted answer from stackoveflow.
Name:
Anonymous2019-08-01 16:52
>>67 256 bytes, if the sqrt is in range 0-15(0-255). 128 bytes, with more complex adressing(store 2 sqrts in one byte.4bits fits 0-15 exactly)
Name:
Anonymous2019-08-01 17:54
Lookups = non constant time operation.
Name:
Anonymous2019-08-01 19:10
>>71 Actually memory/cache latency is fairly constant.
>>77 It's using a lazy estimate/precalc of sqrt(2) in 1 + 1 * 0.41 error value of 20 on ~sqrt(300^2 + 400^2) doesn't seem too bad, +4% error 6% for (100, 400), and similar for (200,400)
Calculation should just about be competitive with the sum of square precalculation
I'm using it for gamma packing, not distance. And precision is somewhat important. I considered using 9bit floating point numbers, but they mapped badly to gamma rgbs, producing more loss of precision.
I'm doing it all in software, so I can't really afford true 16 floats, like GPUs do.
Name:
Anonymous2019-08-03 1:53
I was just introduced to YCoCg-R, I'm in love. —FLIF user
r/2 + g/4 + b/2 is not a proper luma function. Proper luma is 0.299*r + 0.587*g + 0.114*b.
You can't even do a proper sprite recolor inside YCoCg. I.e. if you have a colorable font and make to draw it as blue, then in YCoCg your blue would be too dark and unreadable.
Name:
Anonymous2019-08-03 15:38
>>72 Not fully consistent however. Every single AES implementation that uses sboxes that I know of has been broken. Would you like some cia nigger to be able to see what is on your screen based on a timing attack?
Name:
Anonymous2019-08-04 0:15
>>82 *Using sbox values specially chosen by the people trying to break in
Name:
Anonymous2019-08-05 15:35
>>82 Not with real-time kernel patches, with fine grained multithreading its impossible once you run background task.
Name:
Anonymous2019-08-05 22:44
Instead of introducing the previously devised custom color space, I want to see how fast I can do hue-saturation change in plain RGB. It seems not exactly fast. But would it be fast enough for my game? Here is the saturation multiplier function. void saturate(int *sr, int *sg, int *sb, int f) { int r, g, b, l; r = *sr; g = *sg; b = *sb; r = r*r; g = g*g; b = b*b; l = LUMA8(r,g,b); l = l*(256-f); r = (r*f + l)>>8; g = (g*f + l)>>8; b = (b*f + l)>>8; if (r < 0) r = 0; if (g < 0) g = 0; if (b < 0) b = 0; r = isqrt16(r); g = isqrt16(g); b = isqrt16(b); *sr = clamp_byte255(r); *sg = clamp_byte255(g); *sb = clamp_byte255(b); }
Yes. You see it right. A mere saturation boost/reduce requires 3 square roots and a lot of other operations. That is for each pixel. Ideally gamma function should be 2.2, but that would be even more expensive, square roots map better to a lookup table and there is that old Quake hack you can use to computer them lightning fast. In addition, gamma=2.2 would require doing r=pow22lut[r], instead of less expensive r*r, but 256 byes LUT isn't that expensive. Disabling gamma correction leads to heavy artifacts, like de-saturated sprite being too dark.
Still in my format changing saturation would be as simple as moving the U,V coords towards the whitepoint: NV = (V-WV)*Saturation + WV NU = (U-WU)*Saturation + WU I.e. far more simpler code. So if one does full scene saturation change, then RGB is not an option. Hue shifting can be solved in part by recoloring, but even recoloring is expensive in general case, because of requiring to compute 256 byte LUT for every shade of recolored color.
Generally one reduces saturation to make special effects more eye popping. I.e. if on a bomb explosion you reduce surroundings saturation, that explosion would look more heavy.
TLDR: gamedev isn't easy.
Name:
Anonymous2019-08-06 3:26
>>85 Use the sqrt lookup tables instead of Isqrt if (r < 0) r = 0; replaced by r*=(r>0) or something similar; which doesn't require a branch. *sr = clamp_byte255(r); use ternary *sr= (r>255?255:r)
Name:
Anonymous2019-08-06 7:22
>>86 cmov is faster than imul. Although I should probably use uint32_t instead of int. then just one r>255 check would suffice.
Name:
Anonymous2019-08-06 8:28
also, 32-bit ints are not enough for gamma unpacked r,g,b, so one has to use floats or 64-bit ints.
Name:
Anonymous2019-08-06 9:39
Ok. For now I will use the following code: static INLINE void saturate(int *sr, int *sg, int *sb, int f) { int r = unglut[*sr]; int g = unglut[*sg]; int b = unglut[*sb]; int l = LUMA8(r,g,b)*(256-f); *sr = glut[clamp(0,MAX_GAMMA,(r*f + l)>>8)]; *sg = glut[clamp(0,MAX_GAMMA,(g*f + l)>>8)]; *sb = glut[clamp(0,MAX_GAMMA,(b*f + l)>>8)]; }
Before transitioning to proper color space.
The problem is that I still have to support RGB color space for stuff like sprite sheet packing, because such transition from RGB to another color space is not one-to-one, and therefore lossy.
Nice article about gamma. Unfortunately I've stumbled upon it only after learning the lesson that sRGB is non-linear in the hard way. Still that guy explains a few quirks, like notice incorrect render and why font rasterizers use unusual 1.42 gamma.