>>39It's the same one I've been repeating for over 20 years (albeit in slightly modified form) - back then it was mostly prediction, but I knew that raising the clock frequency would have its limits.
>>40Backwards-endian, minimum length of 2 bytes per instruction, can't use GPRs for memory addressing, and an encoding that is one of the most random ones I've seen (e.g. compare
http://goldencrystal.free.fr/M68kOpcodes.pdf ,
http://i.stack.imgur.com/07zKL.png , and
http://www.z80.info/decoding.htm )... it's CISC done wrong, like the VAX.
>>41The amount of software you can write without ever having to divide is
tiny - probably an application better served by an 8-bit or even 4-bit MCU, and when you do have to divide, it's far better for the hardware to have a divide instruction than have to do it manually with individual instructions, because at best it'll be the same speed as a hardware divide instruction expanding to the same series of uops but using more memory, and at worst it'll be both slower and bigger. The funny thing is that although you seem to think embedded aplications don't need divide instructions, the only ARM cores that are guaranteed to have divide instructions are the embedded ones - ARMv7-{M,R}, and ARMv7-R only in Thumb mode.
>>42modern caches are huge.
That's the sort of idiotic thinking that lead to the wasteful state of software today, and a similar attitude ("the Earth is huge, we can dump whatever into the oceans and atmosphere and it'll be OK") is why the environment is getting fucked up. If we don't realise that all resources are intrinsically limited and use them responsibly, everyone suffers because we're all sharing it. See also: programs that think they can use all the RAM and HDD they want, or that network bandwidth is free, etc.
simple instructions have orders of magnitude faster execution
When you need orders of magnitude more of them, no. Also see what I wrote above about recognising and combining simple instructions to be executed faster by dedicated hardware.
>>45,46That doesn't solve the problem of every single instruction being the same length, which means that the simpler and more common operations aren't any shorter.
This is what an ARM memcpy looks like:
https://android.googlesource.com/platform/bionic/+/5b349fc/libc/arch-arm/bionic/memcpy.Shttps://android.googlesource.com/platform/bionic/+/5b349fc/libc/arch-arm/bionic/memcpy.a15.SDo we really need all that bloat in the cache? A disgustingly large amount of that code is just for handling different alignment cases, something that the hardware could easily figure out by itself (and ARM has thankfully realised this, later versions do handle alignment in hardware.) It's still no match for REP MOVS, however.
RISC: stupid processors by stupid hardware designers, for stupid compilers, stupid programmers, and stupid users.