/prog/ - Why browsers are bloated [Part 2]

Name: Cudder !cXCudderUE 2016-06-13 1:34

Found another HTML parser/tokeniser, and it's no better than any of the existing ones I've seen:

https://github.com/google/gumbo-parser/blob/master/src/tokenizer.c

Nearly 3k lines of disgustingly verbose, redundant C. Look at the functions handle_doctype_system_id_double_quoted_state and handle_doctype_system_id_single_quoted_state, or handle_attr_value_double_quoted_state and handle_attr_value_single_quoted_state for good examples of this utter idiocy. I refuse to believe that a programmer with a functioning brain could generate such filth. Did the thought "this state looks almost exactly like that state exact for this one thing, I should probably merge them" ever cross that retard's mind? Probably not, because there was no mind for the thought to cross!

The most valuable part of that file is contained within these 4 lines:

  // Initial size chosen by statistical analysis of a corpus of 60k webpages. 
   // 99.5% of elements have 0 attributes, 93% of the remainder have 1.  These 
   // numbers are a bit higher for more modern websites (eg. ~45% = 0, ~40% = 1 
   // for the HTML5 Spec), but still have basically 99% of nodes with <= 2 attrs.

Ironically, the description here...

https://github.com/google/gumbo-parser

... says it's "relatively lightweight". No, it's not - far from it. You just haven't seen what real "lightweight" is.

Why browsers are bloated [Part 2]

1 Name: Anonymous 2016-04-23 22:49

23 Name: Cudder !cXCudderUE 2016-06-13 1:34

Name: Anonymous 2016-04-23 22:49

Name: Cudder !cXCudderUE 2016-06-13 1:34