>>19How exactly were you able to get GCC output in Intel syntax? I was able to do it with the online tool at gcc.godbolt.org, however even with maximum optimization it gives
add DWORD PTR [rdi], 1
rather than using the inc instruction. This seems to be the case with all x86 GCC versions. However, both Clang and ICC generate something along the lines of
inc DWORD PTR [rdi]
which is much closer to your optimized version.
Though in any case, I do agree its silly to generate half a page of assembly code and use 6 registers just to perform a dereference-and-increment operation.