Return Styles: Pseud0ch, Terminal, Valhalla, NES, Geocities, Blue Moon. Entire thread

Software pushups

Name: Anonymous 2014-11-24 23:01

Let's exercise together, /prog/!

1) Write a subroutine that accepts an array of unsigned ints and returns 4. It must operate on the array in-place and partition it so that all nonzero values are at the beginning of the array, and all zero values are moved to the end. For example, the input [0, 2, 0, 0, 4, 1, 4, 5] could be changed to [2, 4, 1, 4, 5, 0, 0, 0]. The relative order of the nonzero values is unimportant.

Name: Anonymous 2014-11-30 9:02

>>65
Real world example of a palette lookup blitting loop:
align 64
@@:
mov eax, dword ptr [rsi]
add rsi, r11
movzx edx, ah
movzx ebp, al
shr eax, 16
movzx ebx, ah
movzx eax, al
pinsrd xmm0, dword ptr[r10+rbp*4], 0
pinsrd xmm0, dword ptr[r10+rdx*4], 1
pinsrd xmm0, dword ptr[r10+rax*4], 2
pinsrd xmm0, dword ptr[r10+rbx*4], 3
movdqa xmmword ptr [rdi], xmm0
add rdi, 16
dec ecx
jnz @b
ret

Replacing dec+jnz with loop slows things down from ~900 fps to ~830 fps
This is on an Intel i7 920 (Nehalem)
The loop code is exactly 64 bytes long and fits an icache line

Newer Posts
Don't change these.
Name: Email:
Entire Thread Thread List