Return Styles: Pseud0ch, Terminal, Valhalla, NES, Geocities, Blue Moon. Entire thread

Benchmark Contest:Count character ratio in DNA file

Name: Anonymous 2015-04-12 12:19

RULES:
The program must count(without prior knowledge of file beyond its format) GC-content of source file.
GC-content is fraction of GC vs AT chars found in the file expressed as floating point number. http://en.wikipedia.org/wiki/GC-content
source file https://github.com/dubst3pp4/GC-Content-OOC/blob/master/Homo_sapiens.GRCh37.67.dna_rm.chromosome.Y.fa

Name: Anonymous 2015-04-13 6:32

"fastest" version from
http://saml.rilspace.org/moar-languagez-gc-content-in-python-d-fpc-c-and-c is
https://gist.github.com/samuell/5559717
37.6217301394 in 922,484,508 cycles(20 times slower than 46 million)
according to Cudder its >>4 "2 clocks/byte" mine is 20 times faster so 0.1 clocks/byte or 10 bytes per clock(and no SSE intrinsics used).
(in reality its 0.76 clocks/byte)
Here is the source of the "2 clocks per byte"(with added rdtsc)
#define __USE_MINGW_ANSI_STDIO 1
#include <stdio.h>

int main()
{
char buf[1000];
int gc=0;
int total=0;
char tablegc[256]={0,};
char tabletotal[256]={0,};
FILE *f=fopen("Homo_sapiens.GRCh37.67.dna_rm.chromosome.Y.fa","r");
unsigned long long st=__rdtsc();
tabletotal['A']=1;
tabletotal['T']=1;
tabletotal['C']=1;
tabletotal['G']=1;
tablegc['C']=1;
tablegc['G']=1;
while (fgets(buf,1000,f))
if (*buf!='>') {
char c, *ptr=buf;
while ((c=*ptr++)) {
total+=tabletotal[(int)c];
gc+=tablegc[(int)c];
}
}

unsigned long long et=__rdtsc();
fclose(f);
printf("%.10f in %llu cycles\n",(100.*gc)/total,et-st);
return 0;
}

Newer Posts
Don't change these.
Name: Email:
Entire Thread Thread List