/prog/ - Software pushups

Name: Anonymous 2014-11-24 23:01

Let's exercise together, /prog/!

1) Write a subroutine that accepts an array of unsigned ints and returns 4. It must operate on the array in-place and partition it so that all nonzero values are at the beginning of the array, and all zero values are moved to the end. For example, the input [0, 2, 0, 0, 4, 1, 4, 5] could be changed to [2, 4, 1, 4, 5, 0, 0, 0]. The relative order of the nonzero values is unimportant.

Name: Anonymous 2014-11-26 19:27

>>38
>the needs of others
Shalom!

Name: Anonymous 2014-11-27 7:22

>>40
Because it isn't taught in school and appers are lazy niggers

Name: Cudder !MhMRSATORI 2014-11-27 15:13

>>37
Better to save some cache space, a miss will take more cycles than you save with uop fusion... and LOOP is basically as fast on a modern CPU when you're limited by memory bandwidth.

Name: Anonymous 2014-11-27 18:43

Eh, now that actually tested it I realized there's no [reg - reg] addressing mode (been awhile since I did x86), and the subtract obviously didn't do element wise, just number of bytes, gotta divide by 4.
Revised and working version:

; esi=array, ecx=len
partition:
	mov edi, esi
L1:	lodsd
	test eax, eax
	jz L2
	stosd
L2:	loop L1
	mov ecx, esi
	sub ecx, edi
	shr ecx, 2
	xor eax, eax
	rep stosd
	ret

Name: Anonymous 2014-11-27 20:17

>>44
>loop
Good goy!

Name: Anonymous 2014-11-27 22:37

>>43

there's no [reg - reg] addressing mode (been awhile since I did x86), and the subtract obviously didn't do element wise, just number of bytes, gotta divide by 4.

how kum u miss det 1 coddo? i thot u wer a kool dude what knoed da ass embly

Name: Anonymous 2014-11-28 7:19

      INTEGER FUNCTION ZP(A, N)
      INTEGER N, A(N)

      INTEGER DEST, TEMP, I
      DEST = 1
      DO 10 I = 1, N
         IF (A(I) .NE. 0) THEN
            TEMP = A(I)
            A(I) = A(DEST)
            A(DEST) = TEMP
            DEST = DEST + 1
         END IF
 10   CONTINUE

      ZP = 4
      RETURN
      END

Name: Anonymous 2014-11-28 8:52

Not a big C guy so critique is welcome.

int f(unsigned int *array, size_t length)
{
  int i = -1;
  unsigned int *end = &array[length/sizeof(unsigned int)-1];
	
  top:
    if(array+i >= end)
    {
      return 4;
    }
    if(*end == 0)
    {
      end--;
      goto top;
    }
    i++;
    if(array[i] == 0)
    {
      array[i] = *end;
      *end = 0;
    }
    goto top;
}

Name: Cudder !MhMRSATORI 2014-11-28 15:36

>>46
Saw dec/jnz, didn't read the rest.

Name: Anonymous 2014-11-28 16:44

>>49
Shalom!

Name: Anonymous 2014-11-29 1:44

>>48
That struck me as quite an interesting approach, particularly because I only thought of one way to do it (which has been done in this thread already), but then I kind of stopped there and didn't think that there might be other ways. Nice work.

Some criticiism relating to C:

One suggestion is to pass the number of elements in the array to length, instead of the size in bytes. Also using a for or while statement instead of label+goto is usually considered better practice.

Might also be a good idea to stick with either int or size_t, and avoid mixing them (they've different ranges).

One issue that pops out at me is array+i -- this is done when i stores -1, so array+i will result in an invalid pointer.

Name: Anonymous 2014-11-29 3:38

>>43
So either all real compiler writers are morons, or there's more to performance optimization than using archaic complex instructions to avoid hypothetical icache misses. I wonder which it is.

Name: Cudder !MhMRSATORI 2014-11-29 14:56

>>52
Yes they are. Ever seen any real compiler output?

archaic complex instructions

Idiot RISCtard. http://progrider.org/prog/read/1405230133

Name: Anonymous 2014-11-29 15:21

>>53
This guy disagrees with you.

Q: If you believe your other arguments, you can do the reference counting or local pooling, and point out when it's actually wrong.

Right. The philosophical answer to you guys is: compilers will eventually get smart enough to deal with these problems better than a programmer can. This has happened time and again. [For instance] compilers generate better assembly code [than programmers do].

http://steve-yegge.blogspot.ru/2008/05/dynamic-languages-strike-back.html

Name: Anonymous 2014-11-29 15:27

>>53
CISC fucktard is mad because his instructions take 420 cycles to complete

Name: Anonymous 2014-11-29 16:31

>>54
Compilers simply aren't able to generate the kind of fast code that a human expert can. The world would be better off if we relied on human experts to do all our code generation.

Name: Cudder !MhMRSATORI 2014-11-29 16:41

>>54

[For instance] compilers generate better assembly code [than programmers do].

No they bloody well don't. This stupid myth has to die. It's all just wishful thinking. Ditto for those ultra-high-level languages being more performant than C "because their compilers can do more optimisation" - keep dreaming. Maybe the hoards of incompetents that CS educations are shitting out these days can't program anymore, but that's a different issue... or perhaps it's intentional: if you make programmers stupid enough that almost all of them don't know Asm anymore, or if they do it's only from looking at the god-awful shit that compilers emit, then they certainly can't generate code better than a compiler because they've never seen anything better! A disgusting self-fulfilling prophecy.

As for the rest of that link... WTF. It's like that guy is seriously stoned.

>>55
Have you seen any real compiler output? And 420 cycles in 1 byte is better than 420 cycles and 1680 bytes.

Name: Anonymous 2014-11-29 16:50

Ditto for those ultra-high-level languages being more performant than C "because their compilers can do more optimisation" - keep dreaming.

TIL: Cudder is a crack addict.

Name: Anonymous 2014-11-29 16:50

I went to the University of Washington and [then] I got hired by this company called Geoworks, doing assembly-language programming, and I did it for five years. To us, the Geoworkers, we wrote a whole operating system, the libraries, drivers, apps, you know: a desktop operating system in assembly. 8086 assembly! It wasn't even good assembly! We had four registers! [Plus the] si [register] if you counted, you know, if you counted 386, right? It was horrible.

I mean, actually we kind of liked it. It was Object-Oriented Assembly. It's amazing what you can talk yourself into liking, which is the real irony of all this. And to us, C++ was the ultimate in Roman decadence. I mean, it was equivalent to going and vomiting so you could eat more. They had IF! We had jump CX zero! Right? They had "Objects". Well we did too, but I mean they had syntax for it, right? I mean it was all just such weeniness. And we knew that we could outperform any compiler out there because at the time, we could!

So what happened? Well, they went bankrupt. Why? Now I'm probably disagreeing – I know for a fact that I'm disagreeing with every Geoworker out there. I'm the only one that holds this belief. But it's because we wrote fifteen million lines of 8086 assembly language. We had really good tools, world class tools: trust me, you need 'em. But at some point, man...

The problem is, picture an ant walking across your garage floor, trying to make a straight line of it. It ain't gonna make a straight line. And you know this because you have perspective. You can see the ant walking around, going hee hee hee, look at him locally optimize for that rock, and now he's going off this way, right?

This is what we were, when we were writing this giant assembly-language system. Because what happened was, Microsoft eventually released a platform for mobile devices that was much faster than ours. OK? And I started going in with my debugger, going, what? What is up with this? This rendering is just really slow, it's like sluggish, you know. And I went in and found out that some title bar was getting rendered 140 times every time you refreshed the screen. It wasn't just the title bar. Everything was getting called multiple times.

Because we couldn't see how the system worked anymore!

Name: Anonymous 2014-11-29 20:10

>>53

RISCtard

Hardly. I checked again and the Intel optimization manual still advises against the use of loop.

Name: Anonymous 2014-11-29 20:44

x86 gets compiled into intel's proprietary RISC at the run-time.

Name: Anonymous 2014-11-29 21:40

>>61
That's not what RISC means, you fucking ivory tower shill.

Name: Anonymous 2014-11-29 22:04

>>62
I never said that this is what RISC means.

Name: Anonymous 2014-11-29 22:28

Draining an ocean through a straw and measuring efficiency in the amount of pressure the straw can withstand.

Name: Cudder !MhMRSATORI 2014-11-30 2:44

>>60
That's probably more because it could give AMD the advantage:

http://dis.4chan.org/read/prog/1359605422/52

Look at Agner's timing tables for the true story: http://www.agner.org/optimize/instruction_tables.pdf

K7 - 7uops, 3-4 clocks
P4 - 4+4 uops, 2-4 clocks
K8 - 7uops, 3-4 clocks
Prescott - 4uops, 4 clocks
K10 - 7uops, 3 clocks
Merom - 11uops, 5 clocks
Wolfdale - 11uops, 5 clocks
Nehalem - 6uops, 4 clocks
Sandy Bridge - 7uops, 5 clocks
Bulldozer - 1uop, 1-2 clocks
Ivy Bridge - 7uops, 4-5 clocks
Piledriver - 1uop, 1-2 clocks
Haswell - 7 uops, 5 clocks
Steamroller - 1uop, 1-2 clocks

The story is similar for the low-power microarchitectures:

Atom - 8uops, 8 clocks
Bobcat - 8uops, 4 clocks
Silvermont - 7uops, 10-20 clocks(!)
Jaguar - 8uops, 5 clocks

Name: Anonymous 2014-11-30 2:46

>>65
>AMD
Shalooom!

Name: Anonymous 2014-11-30 3:51

>>65
Working from Agner's tables, using loop is a losing decision unless you happen to be using one of the few AMD architectures you've highlighted. I don't understand why you are so adamant about using it in the general case.

Name: Anonymous 2014-11-30 4:31

>>65
BULLDOZER!

Name: Anonymous 2014-11-30 9:02

>>65
Real world example of a palette lookup blitting loop:

align 64
@@:
	mov eax, dword ptr [rsi]
	add rsi, r11
	movzx edx, ah
	movzx ebp, al
	shr eax, 16
	movzx ebx, ah
	movzx eax, al
	pinsrd xmm0, dword ptr[r10+rbp*4], 0
	pinsrd xmm0, dword ptr[r10+rdx*4], 1
	pinsrd xmm0, dword ptr[r10+rax*4], 2
	pinsrd xmm0, dword ptr[r10+rbx*4], 3
	movdqa xmmword ptr [rdi], xmm0
	add rdi, 16
	dec ecx
	jnz @b
	ret

Replacing dec+jnz with loop slows things down from ~900 fps to ~830 fps
This is on an Intel i7 920 (Nehalem)
The loop code is exactly 64 bytes long and fits an icache line

Name: Cudder !MhMRSATORI 2014-11-30 16:27

>>67,69
I didn't mention that the parallel execution of uops means that unless you're saturating the decode bandwidth by some other means, it doesn't matter - I have benchmarks that show absolutely NO difference between dec/jnz and loop. In the general case you should optimise for size, and only the areas that are speed-constrained may need some "expansion".

>>69
That's only an 8% difference, and this is that rare case where it does matter since you're saturating the decoders with all those big SSE instructions.

Have you tried doing 8 bytes at a time with an expanded lookup table (64k elements) using pinsrq?

Name: Anonymous 2014-12-03 21:13

Okay /prog/! Let's keep moving forward and all become satori programmers together! Yeah!

2) Write a program that simulates physical card shuffling by partitoning and merging the deck. You can do this in place or using temporary structures at your own discretion, and you can use as many helper functions as you like.

The input will always contain 52 cards. You can represent them as pairs of characters (strings, tuples, etc.) identifying rank and suite: "6H" is the six of hearts, "AS" is the ace of spades, "XC" is the ten of clubs, and "QD" is the queen of diamonds. You can generate your own input in sorted order or hard-code a particular deck.

Name: Anonymous 2014-12-05 5:50

>>5

that isn't a subroutine, but a function, and it doesn't return "4".

you're disqualified for being a mental midget.

Name: Anonymous 2014-12-05 7:37

>>72
You're damn fast, you know that?

Name: Anonymous 2014-12-05 7:42

>>72
hey kodak, just thought I'd let you know that I'm the guy who pees all over the floor at work. hope you enjoyed cleaning it

Name: Anonymous 2014-12-05 8:12

>>71
(<A 2 3 4 5 6 7 8 9 X J Q K> X~ <C H S D>).pick(*);

Name: Anonymous 2014-12-05 10:45

shuffleM $ liftA2 (,) "A23456789XJQK" "HDCS"

Name: Anonymous 2014-12-05 22:49

>>71
maybe try posting one that isn't a sorting problem

Name: Anonymous 2014-12-05 23:20

OK. Write a program that check's >>77-kun's dubs.

Name: Anonymous 2014-12-06 2:24

function subroutine(ints) {
    return ints.sort().reverse();
}

Name: Anonymous 2014-12-06 10:40

JACKSON 80 GET!

Software pushups

1 Name: Anonymous 2014-11-24 23:01

41 Name: Anonymous 2014-11-26 19:27

42 Name: Anonymous 2014-11-27 7:22

43 Name: Cudder !MhMRSATORI 2014-11-27 15:13

44 Name: Anonymous 2014-11-27 18:43

45 Name: Anonymous 2014-11-27 20:17

46 Name: Anonymous 2014-11-27 22:37

47 Name: Anonymous 2014-11-28 7:19

48 Name: Anonymous 2014-11-28 8:52

49 Name: Cudder !MhMRSATORI 2014-11-28 15:36

50 Name: Anonymous 2014-11-28 16:44

51 Name: Anonymous 2014-11-29 1:44

52 Name: Anonymous 2014-11-29 3:38

53 Name: Cudder !MhMRSATORI 2014-11-29 14:56

54 Name: Anonymous 2014-11-29 15:21

55 Name: Anonymous 2014-11-29 15:27

56 Name: Anonymous 2014-11-29 16:31

57 Name: Cudder !MhMRSATORI 2014-11-29 16:41

58 Name: Anonymous 2014-11-29 16:50

59 Name: Anonymous 2014-11-29 16:50

60 Name: Anonymous 2014-11-29 20:10

61 Name: Anonymous 2014-11-29 20:44

62 Name: Anonymous 2014-11-29 21:40

63 Name: Anonymous 2014-11-29 22:04

64 Name: Anonymous 2014-11-29 22:28

65 Name: Cudder !MhMRSATORI 2014-11-30 2:44

66 Name: Anonymous 2014-11-30 2:46

67 Name: Anonymous 2014-11-30 3:51

68 Name: Anonymous 2014-11-30 4:31

69 Name: Anonymous 2014-11-30 9:02

70 Name: Cudder !MhMRSATORI 2014-11-30 16:27

71 Name: Anonymous 2014-12-03 21:13

72 Name: Anonymous 2014-12-05 5:50

73 Name: Anonymous 2014-12-05 7:37

74 Name: Anonymous 2014-12-05 7:42

75 Name: Anonymous 2014-12-05 8:12

76 Name: Anonymous 2014-12-05 10:45

77 Name: Anonymous 2014-12-05 22:49

78 Name: Anonymous 2014-12-05 23:20

79 Name: Anonymous 2014-12-06 2:24

80 Name: Anonymous 2014-12-06 10:40

Name: Anonymous 2014-11-24 23:01

Name: Anonymous 2014-11-26 19:27

Name: Anonymous 2014-11-27 7:22

Name: Cudder !MhMRSATORI 2014-11-27 15:13

Name: Anonymous 2014-11-27 18:43

Name: Anonymous 2014-11-27 20:17

Name: Anonymous 2014-11-27 22:37

Name: Anonymous 2014-11-28 7:19

Name: Anonymous 2014-11-28 8:52

Name: Cudder !MhMRSATORI 2014-11-28 15:36

Name: Anonymous 2014-11-28 16:44

Name: Anonymous 2014-11-29 1:44

Name: Anonymous 2014-11-29 3:38

Name: Cudder !MhMRSATORI 2014-11-29 14:56

Name: Anonymous 2014-11-29 15:21

Name: Anonymous 2014-11-29 15:27

Name: Anonymous 2014-11-29 16:31

Name: Cudder !MhMRSATORI 2014-11-29 16:41

Name: Anonymous 2014-11-29 16:50

Name: Anonymous 2014-11-29 16:50

Name: Anonymous 2014-11-29 20:10

Name: Anonymous 2014-11-29 20:44

Name: Anonymous 2014-11-29 21:40

Name: Anonymous 2014-11-29 22:04

Name: Anonymous 2014-11-29 22:28

Name: Cudder !MhMRSATORI 2014-11-30 2:44

Name: Anonymous 2014-11-30 2:46

Name: Anonymous 2014-11-30 3:51

Name: Anonymous 2014-11-30 4:31

Name: Anonymous 2014-11-30 9:02

Name: Cudder !MhMRSATORI 2014-11-30 16:27

Name: Anonymous 2014-12-03 21:13

Name: Anonymous 2014-12-05 5:50

Name: Anonymous 2014-12-05 7:37

Name: Anonymous 2014-12-05 7:42

Name: Anonymous 2014-12-05 8:12

Name: Anonymous 2014-12-05 10:45

Name: Anonymous 2014-12-05 22:49

Name: Anonymous 2014-12-05 23:20

Name: Anonymous 2014-12-06 2:24

Name: Anonymous 2014-12-06 10:40