Well, >>1 just copies the behavior of PHP explode() <http://php.net/manual/en/function.explode.php>, although it works on chunks of bytes, so you can explode, for example, an array of integers.
Now, >>2 simply searches for a string in a file and reports back as soon as it is found. The algorithm is O(nm), with a cellular matrix implemented in place. This is written with memory restrictions in mind.
The last code, >>3 is a unique sort of a character array. It modifies the original array and uses a bit array (again, memory restrictions were assumed).
>>13 [i][b][o]IGNORE CUDDER POSTERS DO NOT REPLY TO CUDDER POSTS HIDE CUDDER THREADS[/i][/b][[/o]
Name:
Anonymous2015-07-29 19:51
>>13 for(i = 0;; i++) { It's quadratic in the number of fragments. i is the number of rv blocks, and it tries to grow the array of rv pointers by only 1 each time, realloc() potentially has to move and copy the memory if it can't expand the current block.
Even if you fix it to a geometric progression, it's still way too many allocations and deallocations, should run through p once and record all the offsets of where it should be split into an index array, then allocate entire rv once and then copy the fragments there using the index array. Two passes through p, but I reckon it's way better than allocating and reallocing all the time.
Name:
Anonymous2015-07-29 20:12
>>15 You're right. So the question is, which is better? Two passes through p or n reallocations? It really depends on the problem, doesn't it? Big strings with mostly one occurrence come to mind. I'm not trying to say my code is the best, but that it has its place.
Most of the time people criticise code in the wild about how it could be done better are missing this.
False dichotomy. You can do it in log(n) reallocations by doubling the size of rv each time, with a final realloc to trim off the excess allocated space (which won't relocate it in memory). In fact, I think this is the way most resizeable containers are implemented.
It never allocates more than twice what it needs, which I think is a reasonable overhead when you take into account the difference in the number of reallocations required.
notice logn and n are equivalent for small n
If you start by allocating ~16 entries or so, small n will never need to realloc to a larger size. Of course, this contradicts what I just said above (for small n), but I don't think a very-short-lived block of 16 pointers is going to run anyone out of memory. In my opinion this is the optimal point in the speed/memory tradeoff.
Name:
Anonymous2015-07-29 22:10
>>20 You can also take into account the fact realloc is probably doing that itself. In implementations where realloc doesn't do this by itself you are left wondering if your code should (why would you know better than the libc implementers?).
10KB of lorem ipsum tokenized by spaces: real 0m0.020s user 0m0.007s sys 0m0.006s
5KB of lorem ipsum tokenize by spaces using OP's program real 0m0.630s user 0m0.052s sys 0m0.009s
Name:
Cudder !cXCudderUE2015-07-30 14:37
>>16 ; in: ; esi = source ; edi = delim ; edx = source length ; ecx = delim length ; out: ; eax = ptr to splits, {ptr,len}, null-terminated, 0 on err ; all registers change, except esp and ebx explode: mov ebp, esp jmp begin_loop exploop: push esi push ecx push edi repnz cmpsb pop edi pop ecx jz match pop esi inc esi dec edx jmp loop_cmp match: pop eax sub eax, [esp] push eax sub edx, ecx begin_loop: push esi loop_cmp: cmp ecx, edx jbe exploop expdone: push edx push 0 sub ebp, esp push ebp shr ebp, 2 call malloc test eax, eax pop ecx jz malerr mov ecx, ebp ; not needed if malloc does not modify arg retloop: pop [eax+ecx*4-4] loop retloop malerr: ret
One pass through p, and one "reallocation" (try doing that in C!)
>>25 But edx has source length. That's edx = strlen(str);, which is the second hidden pass.
Name:
Cudder !cXCudderUE2015-07-31 1:46
>>28 You're assuming the strings are null-terminated and not explicitly length-delimited. Like the OP's code, >>25 works for arbitrary data and not only null-terminated strings.
Name:
Anonymous2015-07-31 4:34
assume my anus
Name:
Anonymous2015-07-31 5:05
assume this: a pack of wild niggers
Name:
Anonymous2015-07-31 5:15
>>25 Less code, quite easy to understand and very efficient. Why do we use C again?
Name:
Anonymous2015-07-31 5:55
>>32 It's normally a waste of programmer time to require the mental overhead of writing the whole system this way. It's normally more productive to have the programmers to complete the system in a high paradigm and then focus on the provably time critical parts i.e we've actually profiled the run times of our code. We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%.
Name:
Cudder !cXCudderUE2015-07-31 10:46
>>33 That mentality creates systems which are slow and resource-consuming everywhere. The 3% gets spread out over the whole thing and it's hard to see what needs to be optimised because everything is as slow as everything else and nothing stands out anymore.
That mentality creates systems which are slow and resource-consuming everywhere.
I think this problem is not significant for the case of commodity high performance computing of modern personal computers. If something is irritatingly slow, it's often not too difficult to refactor the experience so it feel less irritating. There are certainly cases of computing that require tight resource accounting like in the case of embedded programming or battery powered computing; the vast majority of computing doesn't mandate such strict accountability as most computing is done on high performance machines.
Name:
Anonymous2015-07-31 19:00
>>37 Actually, most computing is done on low-resource embedded devices:
If you round off the fractions, embedded systems consume 100% of the worldwide production of microprocessors.