/prog/ - /prague/ Q&A — Tinychan BBS

Name: Anonymous 2013-11-20 13:45

>>17
Pretty bad, yeah, at least if you have any kind of performance concern.

The problem is that they're wasteful. Let's say you have a hundred elements and you want to iterate over all of them. The processor tries to access the first element, loads it into the cache, and does a thing, maybe some quick arithmetic. We'll just agree not to worry about the time cost up to this point, because it's the same for both an array and a list.

So the CPU finishes doing the thing, and now it needs the second element. With an array, the second element is already in the cache, but with a list, it's somewhere else in the heap, and chances are low that it happens to be in the same cache line (which is probably only 64 bytes long). So right away, with the list implementation, you have a cache miss and you need to fetch from main memory. Meanwhile the array version just keeps plugging away on the subsequent elements. In the time it takes for the list program to retrieve element 2, the array program could be ready to look for element 4 or 8 or 16.

This effect is compounded over the length of the collection. In the worst case scenario, the list-based code needs to go all the way to main memory for every element, whereas the elements of an array are guaranteed to be lined up one after another. You probably won't see it get this bad in real code unless your collections are extraordinarily long, like for example in scientific computing applications. Usually you have bigger fish to fry, like disk or network access. And in the event that your elements are large data structures, you can't really do anything without blowing out multiple cache lines, so it no longer matters how you've organized them. At any rate, linked lists are a code smell for a cache-aware programmer, especially a list of boxed elements, which is almost always the case in dynamically typed languages.

Memory consumption is a little more obvious. A 64-bit pointer is 8 bytes. If your elements are one byte each (like a UTF-8 character sequence) and your processor expects everything to be word-aligned, each list element consumes 16 bytes plus whatever the allocator uses for bookkeeping. A hundred-element char list eats up 1600 bytes. A hundred-element char array uses 101 bytes (if you're using end markers instead of a length counter). Try writing a text editor with a linked list as your main data structure, and see what happens when it reads its own source code.

Now, it's possible to expose linked lists as an abstraction for the sake of pattern matching, and then when you actually compile the program, you instead generate tight inner loops and array manipulation code. That's how it ought to work. Haskell, to its credit, does this some of the time. However, I get the feeling the actual optimizations involved aren't low-level to make it totally pervasive, which is a weakness that needs to be addressed.

I'm not aware of any Lisp capable of such optimizations.

/prague/ Q&A

1 Name: Anonymous 2013-11-19 12:18

19 Name: Anonymous 2013-11-20 13:45

Name: Anonymous 2013-11-19 12:18

Name: Anonymous 2013-11-20 13:45