Return Styles: Pseud0ch, Terminal, Valhalla, NES, Geocities, Blue Moon. Entire thread

Why browsers are bloated

Name: Anonymous 2014-07-27 0:20

https://github.com/WebKit/webkit/blob/master/Source/WebCore/platform/Scrollbar.cpp
https://github.com/WebKit/webkit/blob/master/Source/WebCore/platform/win/ScrollbarThemeWin.cpp
Let's reinvent the fucking scrollbar, which every goddamn platform with a UI already has, and make it behave subtly different from the native one!

Right-click a native scrollbar in some other app:
- Scroll Here
- Top
- Bottom
- Page Up
- Page Down
- Scroll Up
- Scroll Down

Right-click a scrollbar in Chrome:
- Back
- Forward
- Reload
- Save As...
...

Right-click a scrollbar in Firefox and Opera:
Absolutely fucking nothing happens!

What the fuck!? How did these terminally retarded idiots get involved in creating one of the most important pieces of software to the average user?

Name: Anonymous 2016-03-09 4:25

>>840
The application is unordered. It's like a partial derivative in that sense.

Name: Anonymous 2016-03-10 4:24

Can someone tell me what this thread is about? I've been seeing it on the front page since it was created but it always seemed boring so I never read it, and now it's so long that I can't be bothered reading the whole thing. Just a quick summary with bullets for key points and pivotal moments will do. Thanks.

Name: Anonymous 2016-03-10 5:28

>>842
Cudder is pregnant.

Name: Anonymous 2016-03-10 22:43

>>842
*Cudder is all talk and no action*

Name: Anonymous 2016-03-11 4:05

>>842
Cudder is making a browser but hasn't demonstrated anything yet.

Name: Anonymous 2016-03-11 5:30

>>845
Cudder is dicking around with weak text processing but hasn't demonstrated anything yet.
FTFY

Name: Anonymous 2016-03-11 10:33

Why do you hate elf cudder? How's PE better?

Name: Cudder !cXCudderUE 2016-03-11 12:12

I'm currently busy with other things.

>>847
The prominent design-by-committee infestation throughout the standard, the fact that it tries to be an object and executable format, and the seriously-fucked-up dynamic linking system.

Also it's basically impossible to get the normal GNU tools to produce a good single-section ELF with all the shit stripped out, whereas MS' linker will easily do that for a PE.

Name: Anonymous 2016-03-12 10:46

Name: Cudder !cXCudderUE 2016-03-12 18:51

>>849
That is not mine... and I wouldn't even want to call it mine.

https://github.com/lexborisov/myhtml/blob/master/source/myhtml/tokenizer.c

Holy. Fucking. Shit. :facepalm: :facepalm: :visagearecaceae:

Over 2K lines of C spread across half a dozen files just to implement an HTML tokeniser!? Mine is less than 768 lines of x86 Asm, in one file. What sort of - dogmatic - insanity causes someone to turn a reasonably simple state machine that would most sanely be implemented as a single switch into a bloody mess of several dozen functions with massive code duplication? This is like copy-paste coding taken to the next level. A lot of what I wrote about Hubbub (NetSurf)'s tokeniser in >>549 applies to this abomination too, including the several-dozen-character-long variable names, horrificly duplicated EOF checking every-fucking-where, and allocating (and freeing) on every goddamn token the thing processes.

The most amusing part is there is not a single switch statement anywhere!!!

This level of bloatyness is only slightly reasonable if he was being paid per line of code.

Name: Anonymous 2016-03-12 21:34

>>850
Why the fuck can't people comment their goddamn code? All the naming is around internal conventions, and who knows what the fuck if((qnode->begin - 2) > tmp_begin) or whatever is actually testing for without reverse engineering the entire fucking thing.

Also, Cudder, you haven't shown your 768 lines, so we have no clue if it actually works, or how slow it is. Duplication of code like EOF checking and avoiding inner-loop switch cases tend to speed things up. Larger code is VERY often faster, if it's larger because it knows the exact cases coming up and does custom handling for each.

You waffle around with such vague ideas of "bloat", like a nigger waffles around with concepts in a math class.

Name: Cudder !cXCudderUE 2016-03-13 1:21

Larger code is VERY often faster
You obviously don't know what a cache is. Loop unrolling and similar bloating "optimisations" are dead, they died a decade ago along with RISC, ultra-high clock speeds, and long pipelines.

I have posted benchmarks above in >>131. The tokeniser takes around 20 clock cycles per input byte on average and that's including all the processing to determine tags and attributes. There is no fucking way his tokeniser can do better than that - the overhead of all the unnecessary instructions introduced by manually keeping track of state and calling functions (through a pretty unpredictable indirect call) and accessing and allocating memory when it doesn't need to is going to slow things down significantly. Looking at the benchmark you can see the slowest part is the tree construction, which is - currently - in C, and does do the stupid one-function-per-state thing, but that's just because I wanted to quickly get something working, and never got around to rewriting it better/examining the spec more closely to find what could be simplified.

Name: Anonymous 2016-03-13 3:22

>>852
Stop thinking tiny, like a tiny brained fuck.

Consider a system with a caching filesystem, compared to one without. The caching one is larger, both in code and memory footprint, but is going to be hella faster than any simple disk access code that hits the drive every time.

Consider a JIT compiler, versus a bytecode interpreter. Which one is "bigger" and "bloated", and which do you think run faster?

Consider any system with actual brains in it, be it predictive analysis and heuristics, or just something which has been hardcoded & optimized for AOT measured slow paths.

And that's not even getting into tons and tons of benefits of larger architectures for being able to configure, debug, and instrument what the fuck is going on in your computer, instead of your shitty opaque asm code.

You are clueless and bring no value to the world, defending your absolute shit to the very end. I hate hearing your ignorant, anti-reality blather. Not just this, but every single word you've said. Kill yourself.

Name: Anonymous 2016-03-13 3:56

>>853
No bully

Name: Anonymous 2016-03-13 4:18

JIT compiler,
I have two words for you:

LOL JAVA

Name: Anonymous 2016-03-13 4:33

>>855
Java is a shitty language, but the JVM is pretty damn impressive. There's a reason there's a ton of other languages for it.

Name: Anonymous 2016-03-13 4:39

There's a reason there's a ton of other languages for it.
Because their creators were too lazy to learn x86 assembly and optimisation techniques?

Name: Anonymous 2016-03-13 5:18

>>856
but the JVM is pretty damn impressive.
The only thing impressive about the JVM is how much memory and CPU it uses. It only exists to help sell hugely overpriced Enterprise Quality™ hardware.

Name: Anonymous 2016-03-13 8:11

myhtml_status_t myhtml_tokenizer_state_init(myhtml_t* myhtml)
{
myhtml->parse_state_func = (myhtml_tokenizer_state_f*)mymalloc(sizeof(myhtml_tokenizer_state_f) *
((MyHTML_TOKENIZER_STATE_LAST_ENTRY *
MyHTML_TOKENIZER_STATE_LAST_ENTRY) + 1));

if(myhtml->parse_state_func == NULL)
return MyHTML_STATUS_TOKENIZER_ERROR_MEMORY_ALLOCATION;

myhtml->parse_state_func[MyHTML_TOKENIZER_STATE_DATA] = myhtml_tokenizer_state_data;
myhtml->parse_state_func[MyHTML_TOKENIZER_STATE_TAG_OPEN] = myhtml_tokenizer_state_tag_open;
myhtml->parse_state_func[MyHTML_TOKENIZER_STATE_TAG_NAME] = myhtml_tokenizer_state_tag_name;
myhtml->parse_state_func[MyHTML_TOKENIZER_STATE_END_TAG_OPEN] = myhtml_tokenizer_state_end_tag_open;
myhtml->parse_state_func[MyHTML_TOKENIZER_STATE_SELF_CLOSING_START_TAG] = myhtml_tokenizer_state_self_closing_start_tag;
myhtml->parse_state_func[MyHTML_TOKENIZER_STATE_MARKUP_DECLARATION_OPEN] = myhtml_tokenizer_state_markup_declaration_open;
myhtml->parse_state_func[MyHTML_TOKENIZER_STATE_BEFORE_ATTRIBUTE_NAME] = myhtml_tokenizer_state_before_attribute_name;
myhtml->parse_state_func[MyHTML_TOKENIZER_STATE_ATTRIBUTE_NAME] = myhtml_tokenizer_state_attribute_name;
myhtml->parse_state_func[MyHTML_TOKENIZER_STATE_AFTER_ATTRIBUTE_NAME] = myhtml_tokenizer_state_after_attribute_name;
myhtml->parse_state_func[MyHTML_TOKENIZER_STATE_BEFORE_ATTRIBUTE_VALUE] = myhtml_tokenizer_state_before_attribute_value;
myhtml->parse_state_func[MyHTML_TOKENIZER_STATE_ATTRIBUTE_VALUE_DOUBLE_QUOTED] = myhtml_tokenizer_state_attribute_value_double_quoted;
myhtml->parse_state_func[MyHTML_TOKENIZER_STATE_ATTRIBUTE_VALUE_SINGLE_QUOTED] = myhtml_tokenizer_state_attribute_value_single_quoted;
myhtml->parse_state_func[MyHTML_TOKENIZER_STATE_ATTRIBUTE_VALUE_UNQUOTED] = myhtml_tokenizer_state_attribute_value_unquoted;

// comments
myhtml->parse_state_func[MyHTML_TOKENIZER_STATE_COMMENT] = myhtml_tokenizer_state_comment;
myhtml->parse_state_func[MyHTML_TOKENIZER_STATE_COMMENT_END] = myhtml_tokenizer_state_comment_end;
myhtml->parse_state_func[MyHTML_TOKENIZER_STATE_COMMENT_END_DASH] = myhtml_tokenizer_state_comment_end_dash;
myhtml->parse_state_func[MyHTML_TOKENIZER_STATE_COMMENT_END_BANG] = myhtml_tokenizer_state_comment_end_bang;
myhtml->parse_state_func[MyHTML_TOKENIZER_STATE_BOGUS_COMMENT] = myhtml_tokenizer_state_bogus_comment;

// cdata
myhtml->parse_state_func[MyHTML_TOKENIZER_STATE_CDATA_SECTION] = myhtml_tokenizer_state_cdata_section;

// rcdata
myhtml->parse_state_func[MyHTML_TOKENIZER_STATE_RCDATA] = myhtml_tokenizer_state_rcdata;
myhtml->parse_state_func[MyHTML_TOKENIZER_STATE_RCDATA_LESS_THAN_SIGN] = myhtml_tokenizer_state_rcdata_less_than_sign;
myhtml->parse_state_func[MyHTML_TOKENIZER_STATE_RCDATA_END_TAG_OPEN] = myhtml_tokenizer_state_rcdata_end_tag_open;
myhtml->parse_state_func[MyHTML_TOKENIZER_STATE_RCDATA_END_TAG_NAME] = myhtml_tokenizer_state_rcdata_end_tag_name;

// rawtext
myhtml->parse_state_func[MyHTML_TOKENIZER_STATE_RAWTEXT] = myhtml_tokenizer_state_rawtext;
myhtml->parse_state_func[MyHTML_TOKENIZER_STATE_RAWTEXT_LESS_THAN_SIGN] = myhtml_tokenizer_state_rawtext_less_than_sign;
myhtml->parse_state_func[MyHTML_TOKENIZER_STATE_RAWTEXT_END_TAG_OPEN] = myhtml_tokenizer_state_rawtext_end_tag_open;
myhtml->parse_state_func[MyHTML_TOKENIZER_STATE_RAWTEXT_END_TAG_NAME] = myhtml_tokenizer_state_rawtext_end_tag_name;

// plaintext
myhtml->parse_state_func[MyHTML_TOKENIZER_STATE_PLAINTEXT] = myhtml_tokenizer_state_plaintext;

// doctype
myhtml->parse_state_func[MyHTML_TOKENIZER_STATE_DOCTYPE] = myhtml_tokenizer_state_doctype;
myhtml->parse_state_func[MyHTML_TOKENIZER_STATE_BEFORE_DOCTYPE_NAME] = myhtml_tokenizer_state_before_doctype_name;
myhtml->parse_state_func[MyHTML_TOKENIZER_STATE_DOCTYPE_NAME] = myhtml_tokenizer_state_doctype_name;
myhtml->parse_state_func[MyHTML_TOKENIZER_STATE_AFTER_DOCTYPE_NAME] = myhtml_tokenizer_state_after_doctype_name;
myhtml->parse_state_func[MyHTML_TOKENIZER_STATE_CUSTOM_AFTER_DOCTYPE_NAME_A_Z] = myhtml_tokenizer_state_custom_after_doctype_name_a_z;
myhtml->parse_state_func[MyHTML_TOKENIZER_STATE_BEFORE_DOCTYPE_PUBLIC_IDENTIFIER] = myhtml_tokenizer_state_before_doctype_public_identifier;
myhtml->parse_state_func[MyHTML_TOKENIZER_STATE_DOCTYPE_PUBLIC_IDENTIFIER_DOUBLE_QUOTED] = myhtml_tokenizer_state_doctype_public_identifier_double_quoted;
myhtml->parse_state_func[MyHTML_TOKENIZER_STATE_DOCTYPE_PUBLIC_IDENTIFIER_SINGLE_QUOTED] = myhtml_tokenizer_state_doctype_public_identifier_single_quoted;
myhtml->parse_state_func[MyHTML_TOKENIZER_STATE_AFTER_DOCTYPE_PUBLIC_IDENTIFIER] = myhtml_tokenizer_state_after_doctype_public_identifier;
myhtml->parse_state_func[MyHTML_TOKENIZER_STATE_DOCTYPE_SYSTEM_IDENTIFIER_DOUBLE_QUOTED] = myhtml_tokenizer_state_doctype_system_identifier_double_quoted;
myhtml->parse_state_func[MyHTML_TOKENIZER_STATE_DOCTYPE_SYSTEM_IDENTIFIER_SINGLE_QUOTED] = myhtml_tokenizer_state_doctype_system_identifier_single_quoted;
myhtml->parse_state_func[MyHTML_TOKENIZER_STATE_AFTER_DOCTYPE_SYSTEM_IDENTIFIER] = myhtml_tokenizer_state_after_doctype_system_identifier;
myhtml->parse_state_func[MyHTML_TOKENIZER_STATE_BOGUS_DOCTYPE] = myhtml_tokenizer_state_bogus_doctype;

// script
myhtml->parse_state_func[MyHTML_TOKENIZER_STATE_SCRIPT_DATA] = myhtml_tokenizer_state_script_data;
myhtml->parse_state_func[MyHTML_TOKENIZER_STATE_SCRIPT_DATA_LESS_THAN_SIGN] = myhtml_tokenizer_state_script_data_less_than_sign;
myhtml->parse_state_func[MyHTML_TOKENIZER_STATE_SCRIPT_DATA_END_TAG_OPEN] = myhtml_tokenizer_state_script_data_end_tag_open;
myhtml->parse_state_func[MyHTML_TOKENIZER_STATE_SCRIPT_DATA_END_TAG_NAME] = myhtml_tokenizer_state_script_data_end_tag_name;
myhtml->parse_state_func[MyHTML_TOKENIZER_STATE_SCRIPT_DATA_ESCAPE_START] = myhtml_tokenizer_state_script_data_escape_start;
myhtml->parse_state_func[MyHTML_TOKENIZER_STATE_SCRIPT_DATA_ESCAPE_START_DASH] = myhtml_tokenizer_state_script_data_escape_start_dash;
myhtml->parse_state_func[MyHTML_TOKENIZER_STATE_SCRIPT_DATA_ESCAPED] = myhtml_tokenizer_state_script_data_escaped;
myhtml->parse_state_func[MyHTML_TOKENIZER_STATE_SCRIPT_DATA_ESCAPED_DASH] = myhtml_tokenizer_state_script_data_escaped_dash;
myhtml->parse_state_func[MyHTML_TOKENIZER_STATE_SCRIPT_DATA_ESCAPED_DASH_DASH] = myhtml_tokenizer_state_script_data_escaped_dash_dash;
myhtml->parse_state_func[MyHTML_TOKENIZER_STATE_SCRIPT_DATA_ESCAPED_LESS_THAN_SIGN] = myhtml_tokenizer_state_script_data_escaped_less_than_sign;
myhtml->parse_state_func[MyHTML_TOKENIZER_STATE_SCRIPT_DATA_ESCAPED_END_TAG_OPEN] = myhtml_tokenizer_state_script_data_escaped_end_tag_open;
myhtml->parse_state_func[MyHTML_TOKENIZER_STATE_SCRIPT_DATA_ESCAPED_END_TAG_NAME] = myhtml_tokenizer_state_script_data_escaped_end_tag_name;
myhtml->parse_state_func[MyHTML_TOKENIZER_STATE_SCRIPT_DATA_DOUBLE_ESCAPE_START] = myhtml_tokenizer_state_script_data_double_escape_start;
myhtml->parse_state_func[MyHTML_TOKENIZER_STATE_SCRIPT_DATA_DOUBLE_ESCAPED] = myhtml_tokenizer_state_script_data_double_escaped;
myhtml->parse_state_func[MyHTML_TOKENIZER_STATE_SCRIPT_DATA_DOUBLE_ESCAPED_DASH] = myhtml_tokenizer_state_script_data_double_escaped_dash;
myhtml->parse_state_func[MyHTML_TOKENIZER_STATE_SCRIPT_DATA_DOUBLE_ESCAPED_DASH_DASH] = myhtml_tokenizer_state_script_data_double_escaped_dash_dash;
myhtml->parse_state_func[MyHTML_TOKENIZER_STATE_SCRIPT_DATA_DOUBLE_ESCAPED_LESS_THAN_SIGN] = myhtml_tokenizer_state_script_data_double_escaped_less_than_sign;
myhtml->parse_state_func[MyHTML_TOKENIZER_STATE_SCRIPT_DATA_DOUBLE_ESCAPE_END] = myhtml_tokenizer_state_script_data_double_escape_end;

// ***********
// for ends
// *********
myhtml->parse_state_func[(MyHTML_TOKENIZER_STATE_LAST_ENTRY
+ MyHTML_TOKENIZER_STATE_DATA)] = myhtml_tokenizer_end_state_data;
myhtml->parse_state_func[(MyHTML_TOKENIZER_STATE_LAST_ENTRY
+ MyHTML_TOKENIZER_STATE_TAG_OPEN)] = myhtml_tokenizer_end_state_tag_open;
myhtml->parse_state_func[(MyHTML_TOKENIZER_STATE_LAST_ENTRY
+ MyHTML_TOKENIZER_STATE_TAG_NAME)] = myhtml_tokenizer_end_state_tag_name;
myhtml->parse_state_func[(MyHTML_TOKENIZER_STATE_LAST_ENTRY
+ MyHTML_TOKENIZER_STATE_END_TAG_OPEN)] = myhtml_tokenizer_end_state_end_tag_open;
myhtml->parse_state_func[(MyHTML_TOKENIZER_STATE_LAST_ENTRY
+ MyHTML_TOKENIZER_STATE_SELF_CLOSING_START_TAG)] = myhtml_tokenizer_end_state_self_closing_start_tag;
myhtml->parse_state_func[(MyHTML_TOKENIZER_STATE_LAST_ENTRY
+ MyHTML_TOKENIZER_STATE_MARKUP_DECLARATION_OPEN)] = myhtml_tokenizer_end_state_markup_declaration_open;
myhtml->parse_state_func[(MyHTML_TOKENIZER_STATE_LAST_ENTRY
+ MyHTML_TOKENIZER_STATE_BEFORE_ATTRIBUTE_NAME)] = myhtml_tokenizer_end_state_before_attribute_name;
myhtml->parse_state_func[(MyHTML_TOKENIZER_STATE_LAST_ENTRY
+ MyHTML_TOKENIZER_STATE_ATTRIBUTE_NAME)] = myhtml_tokenizer_end_state_attribute_name;
myhtml->parse_state_func[(MyHTML_TOKENIZER_STATE_LAST_ENTRY
+ MyHTML_TOKENIZER_STATE_AFTER_ATTRIBUTE_NAME)] = myhtml_tokenizer_end_state_after_attribute_name;
myhtml->parse_state_func[(MyHTML_TOKENIZER_STATE_LAST_ENTRY
+ MyHTML_TOKENIZER_STATE_BEFORE_ATTRIBUTE_VALUE)] = myhtml_tokenizer_end_state_before_attribute_value;
myhtml->parse_state_func[(MyHTML_TOKENIZER_STATE_LAST_ENTRY
+ MyHTML_TOKENIZER_STATE_ATTRIBUTE_VALUE_DOUBLE_QUOTED)] = myhtml_tokenizer_end_state_attribute_value_double_quoted;
myhtml->parse_state_func[(MyHTML_TOKENIZER_STATE_LAST_ENTRY
+ MyHTML_TOKENIZER_STATE_ATTRIBUTE_VALUE_SINGLE_QUOTED)] = myhtml_tokenizer_end_state_attribute_value_single_quoted;
myhtml->parse_state_func[(MyHTML_TOKENIZER_STATE_LAST_ENTRY
+ MyHTML_TOKENIZER_STATE_ATTRIBUTE_VALUE_UNQUOTED)] = myhtml_tokenizer_end_state_attribute_value_unquoted;

// for ends comments
myhtml->parse_state_func[(MyHTML_TOKENIZER_STATE_LAST_ENTRY
+ MyHTML_TOKENIZER_STATE_COMMENT)] = myhtml_tokenizer_end_state_comment;
myhtml->parse_state_func[(MyHTML_TOKENIZER_STATE_LAST_ENTRY
+ MyHTML_TOKENIZER_STATE_COMMENT_END)] = myhtml_tokenizer_end_state_comment_end;
myhtml->parse_state_func[(MyHTML_TOKENIZER_STATE_LAST_ENTRY
+ MyHTML_TOKENIZER_STATE_COMMENT_END_DASH)] = myhtml_tokenizer_end_state_comment_end_dash;
myhtml->parse_state_func[(MyHTML_TOKENIZER_STATE_LAST_ENTRY
+ MyHTML_TOKENIZER_STATE_COMMENT_END_BANG)] = myhtml_tokenizer_end_state_comment_end_bang;
myhtml->parse_state_func[(MyHTML_TOKENIZER_STATE_LAST_ENTRY
+ MyHTML_TOKENIZER_STATE_BOGUS_COMMENT)] = myhtml_tokenizer_end_state_bogus_comment;

// for ends cdata
myhtml->parse_state_func[(MyHTML_TOKENIZER_STATE_LAST_ENTRY
+ MyHTML_TOKENIZER_STATE_CDATA_SECTION)] = myhtml_tokenizer_end_state_cdata_section;

// rcdata
myhtml->parse_state_func[(MyHTML_TOKENIZER_STATE_LAST_ENTRY
+ MyHTML_TOKENIZER_STATE_RCDATA)] = myhtml_tokenizer_end_state_rcdata;
myhtml->parse_state_func[(MyHTML_TOKENIZER_STATE_LAST_ENTRY
+ MyHTML_TOKENIZER_STATE_RCDATA_LESS_THAN_SIGN)] = myhtml_tokenizer_end_state_rcdata_less_than_sign;
myhtml->parse_state_func[(MyHTML_TOKENIZER_STATE_LAST_ENTRY
+ MyHTML_TOKENIZER_STATE_RCDATA_END_TAG_OPEN)] = myhtml_tokenizer_end_state_rcdata_end_tag_open;
myhtml->parse_state_func[(MyHTML_TOKENIZER_STATE_LAST_ENTRY
+ MyHTML_TOKENIZER_STATE_RCDATA_END_TAG_NAME)] = myhtml_tokenizer_end_state_rcdata_end_tag_name;

// rawtext
myhtml->parse_state_func[(MyHTML_TOKENIZER_STATE_LAST_ENTRY
+ MyHTML_TOKENIZER_STATE_RAWTEXT)] = myhtml_tokenizer_end_state_rawtext;
myhtml->parse_state_func[(MyHTML_TOKENIZER_STATE_LAST_ENTRY
+ MyHTML_TOKENIZER_STATE_RAWTEXT_LESS_THAN_SIGN)] = myhtml_tokenizer_end_state_rawtext_less_than_sign;
myhtml->parse_state_func[(MyHTML_TOKENIZER_STATE_LAST_ENTRY
+ MyHTML_TOKENIZER_STATE_RAWTEXT_END_TAG_OPEN)] = myhtml_tokenizer_end_state_rawtext_end_tag_open;
myhtml->parse_state_func[(MyHTML_TOKENIZER_STATE_LAST_ENTRY
+ MyHTML_TOKENIZER_STATE_RAWTEXT_END_TAG_NAME)] = myhtml_tokenizer_end_state_rawtext_end_tag_name;

// for ends plaintext
myhtml->parse_state_func[(MyHTML_TOKENIZER_STATE_LAST_ENTRY
+ MyHTML_TOKENIZER_STATE_PLAINTEXT)] = myhtml_tokenizer_end_state_plaintext;

// for ends doctype
myhtml->parse_state_func[(MyHTML_TOKENIZER_STATE_LAST_ENTRY
+ MyHTML_TOKENIZER_STATE_DOCTYPE)] = myhtml_tokenizer_end_state_doctype;
myhtml->parse_state_func[(MyHTML_TOKENIZER_STATE_LAST_ENTRY
+ MyHTML_TOKENIZER_STATE_BEFORE_DOCTYPE_NAME)] = myhtml_tokenizer_end_state_before_doctype_name;
myhtml->parse_state_func[(MyHTML_TOKENIZER_STATE_LAST_ENTRY
+ MyHTML_TOKENIZER_STATE_DOCTYPE_NAME)] = myhtml_tokenizer_end_state_doctype_name;
myhtml->parse_state_func[(MyHTML_TOKENIZER_STATE_LAST_ENTRY
+ MyHTML_TOKENIZER_STATE_AFTER_DOCTYPE_NAME)] = myhtml_tokenizer_end_state_after_doctype_name;
myhtml->parse_state_func[(MyHTML_TOKENIZER_STATE_LAST_ENTRY
+ MyHTML_TOKENIZER_STATE_CUSTOM_AFTER_DOCTYPE_NAME_A_Z)] = myhtml_tokenizer_end_state_custom_after_doctype_name_a_z;
myhtml->parse_state_func[(MyHTML_TOKENIZER_STATE_LAST_ENTRY
+ MyHTML_TOKENIZER_STATE_BEFORE_DOCTYPE_PUBLIC_IDENTIFIER)] = myhtml_tokenizer_end_state_before_doctype_public_identifier;
myhtml->parse_state_func[(MyHTML_TOKENIZER_STATE_LAST_ENTRY
+ MyHTML_TOKENIZER_STATE_DOCTYPE_PUBLIC_IDENTIFIER_DOUBLE_QUOTED)] = myhtml_tokenizer_end_state_doctype_public_identifier_double_quoted;
myhtml->parse_state_func[(MyHTML_TOKENIZER_STATE_LAST_ENTRY
+ MyHTML_TOKENIZER_STATE_DOCTYPE_PUBLIC_IDENTIFIER_SINGLE_QUOTED)] = myhtml_tokenizer_end_state_doctype_public_identifier_single_quoted;
myhtml->parse_state_func[(MyHTML_TOKENIZER_STATE_LAST_ENTRY
+ MyHTML_TOKENIZER_STATE_AFTER_DOCTYPE_PUBLIC_IDENTIFIER)] = myhtml_tokenizer_end_state_after_doctype_public_identifier;
myhtml->parse_state_func[(MyHTML_TOKENIZER_STATE_LAST_ENTRY
+ MyHTML_TOKENIZER_STATE_DOCTYPE_SYSTEM_IDENTIFIER_DOUBLE_QUOTED)] = myhtml_tokenizer_end_state_doctype_system_identifier_double_quoted;
myhtml->parse_state_func[(MyHTML_TOKENIZER_STATE_LAST_ENTRY
+ MyHTML_TOKENIZER_STATE_DOCTYPE_SYSTEM_IDENTIFIER_SINGLE_QUOTED)] = myhtml_tokenizer_end_state_doctype_system_identifier_single_quoted;
myhtml->parse_state_func[(MyHTML_TOKENIZER_STATE_LAST_ENTRY
+ MyHTML_TOKENIZER_STATE_AFTER_DOCTYPE_SYSTEM_IDENTIFIER)] = myhtml_tokenizer_end_state_after_doctype_system_identifier;
myhtml->parse_state_func[(MyHTML_TOKENIZER_STATE_LAST_ENTRY
+ MyHTML_TOKENIZER_STATE_BOGUS_DOCTYPE)] = myhtml_tokenizer_end_state_bogus_doctype;

// for ends script
myhtml->parse_state_func[(MyHTML_TOKENIZER_STATE_LAST_ENTRY
+ MyHTML_TOKENIZER_STATE_SCRIPT_DATA)] = myhtml_tokenizer_end_state_script_data;
myhtml->parse_state_func[(MyHTML_TOKENIZER_STATE_LAST_ENTRY
+ MyHTML_TOKENIZER_STATE_SCRIPT_DATA_LESS_THAN_SIGN)] = myhtml_tokenizer_end_state_script_data_less_than_sign;
myhtml->parse_state_func[(MyHTML_TOKENIZER_STATE_LAST_ENTRY
+ MyHTML_TOKENIZER_STATE_SCRIPT_DATA_END_TAG_OPEN)] = myhtml_tokenizer_end_state_script_data_end_tag_open;
myhtml->parse_state_func[(MyHTML_TOKENIZER_STATE_LAST_ENTRY
+ MyHTML_TOKENIZER_STATE_SCRIPT_DATA_END_TAG_NAME)] = myhtml_tokenizer_end_state_script_data_end_tag_name;
myhtml->parse_state_func[(MyHTML_TOKENIZER_STATE_LAST_ENTRY
+ MyHTML_TOKENIZER_STATE_SCRIPT_DATA_ESCAPE_START)] = myhtml_tokenizer_end_state_script_data_escape_start;
myhtml->parse_state_func[(MyHTML_TOKENIZER_STATE_LAST_ENTRY
+ MyHTML_TOKENIZER_STATE_SCRIPT_DATA_ESCAPE_START_DASH)] = myhtml_tokenizer_end_state_script_data_escape_start_dash;
myhtml->parse_state_func[(MyHTML_TOKENIZER_STATE_LAST_ENTRY
+ MyHTML_TOKENIZER_STATE_SCRIPT_DATA_ESCAPED)] = myhtml_tokenizer_end_state_script_data_escaped;
myhtml->parse_state_func[(MyHTML_TOKENIZER_STATE_LAST_ENTRY
+ MyHTML_TOKENIZER_STATE_SCRIPT_DATA_ESCAPED_DASH)] = myhtml_tokenizer_end_state_script_data_escaped_dash;
myhtml->parse_state_func[(MyHTML_TOKENIZER_STATE_LAST_ENTRY
+ MyHTML_TOKENIZER_STATE_SCRIPT_DATA_ESCAPED_DASH_DASH)] = myhtml_tokenizer_end_state_script_data_escaped_dash_dash;
myhtml->parse_state_func[(MyHTML_TOKENIZER_STATE_LAST_ENTRY
+ MyHTML_TOKENIZER_STATE_SCRIPT_DATA_ESCAPED_LESS_THAN_SIGN)] = myhtml_tokenizer_end_state_script_data_escaped_less_than_sign;
myhtml->parse_state_func[(MyHTML_TOKENIZER_STATE_LAST_ENTRY
+ MyHTML_TOKENIZER_STATE_SCRIPT_DATA_ESCAPED_END_TAG_OPEN)] = myhtml_tokenizer_end_state_script_data_escaped_end_tag_open;
myhtml->parse_state_func[(MyHTML_TOKENIZER_STATE_LAST_ENTRY
+ MyHTML_TOKENIZER_STATE_SCRIPT_DATA_ESCAPED_END_TAG_NAME)] = myhtml_tokenizer_end_state_script_data_escaped_end_tag_name;
myhtml->parse_state_func[(MyHTML_TOKENIZER_STATE_LAST_ENTRY
+ MyHTML_TOKENIZER_STATE_SCRIPT_DATA_DOUBLE_ESCAPE_START)] = myhtml_tokenizer_end_state_script_data_double_escape_start;
myhtml->parse_state_func[(MyHTML_TOKENIZER_STATE_LAST_ENTRY
+ MyHTML_TOKENIZER_STATE_SCRIPT_DATA_DOUBLE_ESCAPED)] = myhtml_tokenizer_end_state_script_data_double_escaped;
myhtml->parse_state_func[(MyHTML_TOKENIZER_STATE_LAST_ENTRY
+ MyHTML_TOKENIZER_STATE_SCRIPT_DATA_DOUBLE_ESCAPED_DASH)] = myhtml_tokenizer_end_state_script_data_double_escaped_dash;
myhtml->parse_state_func[(MyHTML_TOKENIZER_STATE_LAST_ENTRY
+ MyHTML_TOKENIZER_STATE_SCRIPT_DATA_DOUBLE_ESCAPED_DASH_DASH)] = myhtml_tokenizer_end_state_script_data_double_escaped_dash_dash;
myhtml->parse_state_func[(MyHTML_TOKENIZER_STATE_LAST_ENTRY
+ MyHTML_TOKENIZER_STATE_SCRIPT_DATA_DOUBLE_ESCAPED_LESS_THAN_SIGN)] = myhtml_tokenizer_end_state_script_data_double_escaped_less_than_sign;
myhtml->parse_state_func[(MyHTML_TOKENIZER_STATE_LAST_ENTRY
+ MyHTML_TOKENIZER_STATE_SCRIPT_DATA_DOUBLE_ESCAPE_END)] = myhtml_tokenizer_end_state_script_data_double_escape_end;

return MyHTML_STATUS_OK;
}

Name: Anonymous 2016-03-13 9:52

>>856
Pretty impressive at devouring memory? Yeah.

Name: Anonymous 2016-03-13 12:50

>>859
This makes me depressed.

Name: Cudder !cXCudderUE 2016-03-13 16:20

>>856
It took SunOracle over two decades to get Java even slightly approaching what could be considered reasonable efficiency. And it's still a ridiculously bloated pig.

>>859
The disgustingly long variable names, excessive use of typedefs, "== NULL", and general idiocy of dynamically allocating an array which will be filled completely with constants in every fucking parser object strongly reeks of Enterprise Java-ism.

>>861
Here's something to cheer you up, the tag name processing in my tokeniser:

; found '<', what comes next?
call [ebp+htmlp.func_getchar] ; EOF - emit text
eof_emittext1:
call isalpha
jc not_alpha
tag_name:
call emit_text ; emit this text
lea eax, [esi-1]
mov [ebp+htmlp.data_ptr], eax ; start of tag name
push 1
pop [ebp+htmlp.data_len]
mov [ebp+htmlp.n_attrs], 0
tag_name_loop:
call [ebp+htmlp.func_getchar] ; EOF - ignore
call chk_tag_end_iswhs
jz find_attr_loop
inc dword ptr [ebp+htmlp.data_len]
jmp tag_name_loop


Those 14 instructions (and the procs it calls) are essentially all that's needed to detect and process a tag name. The spec and the monster posted above need substantially more to do the same thing. The jumps to code you can't see are the state transitions directly. None of that mh_state_set (try to find in which of those files it's defined and how many levels of indirection there are, then facepalm when you see what it actually does), returning from a function, then going through an indirect call maybe back into the same damn function bullshit.

Name: Anonymous 2016-03-13 17:27

>>862
Are you seriously saying you'd write HTML parser in assembly?

What a load of shit.

Name: Anonymous 2016-03-14 1:47

>>863
Are you seriously saying you decided to post without ever having read most of this thread?

What a fucking idiot.

Name: Anonymous 2016-03-14 15:15

The number of times you piss and moan about long variable names is inversely proportional to your IQ.

Go solve a real problem, you annoying fuck. Everything you do is just code masturbation, and nobody wants to see it or touch it.

Name: Anonymous 2016-03-15 12:16

>>865
Check 'em

Name: Anonymous 2016-03-15 12:20

>>865
I want to touch Cudder's penis!

Name: Anonymous 2016-03-15 14:26

I want to touch the penis of Cudder's daughter!

Name: Anonymous 2016-03-17 18:18

Enterprise leads to death

Name: Cudder !cXCudderUE 2016-03-18 11:31

CSS parsing. The way the spec is written makes it seem like you have to extract the whole series of tokens for the selector before parsing them separately, when all that's needed is to parse the selectors at the same time as everything else, as they come from the tokeniser. I wonder how many implementations out there needlessly allocate a secondary array of tokens (or allocate tokens, for that matter) just so it can rescan them to parse selectors...

It's also funny to see developers chasing the parallelism dream with things like Servo that have only demonstrated at best 2-3x speedup with multiple cores and even more complex code, when I can probably get 10x speedup with simple, single-threaded Asm that's at least an order of magnitude smaller. The CSS matching is also easily parallelisable if I really want to.

Name: Anonymous 2016-03-18 12:55

>>870
Protip:The world of computing and web browsing is far larger than just Windows XP on x86.

Name: Anonymous 2016-03-18 18:01

>>870
Boss, I've just obliterated the number of instructions and allocations in our CSS selector parser!
Great, what's the speedup gain?
It runs in a thousandth of the time for simple selectors!
I mean when loading our test pages.
Well, time spent parsing CSS is already dwarfed by network and graphical operations, and I haven't done most of the spec yet, so not that much. Like a fraction of a percent.
How long did this take?
About four weeks.
Will the interns be able to read it?
Sure, the ASM fits on two screens!
You're fired.

Name: Anonymous 2016-03-18 18:32

>>870
expert shitposting

Name: Anonymous 2016-03-19 7:37

>>871
It's not that much larger.

>>872
Get The Fuck Out of my /prog/ you corporate drone. We are artists. Not greedy intellectually dead whores who care only about money and increasing the number of bullets we can write on word document formatted resumes.

Name: Anonymous 2016-03-19 9:38

>>874
I second that. Programming is the art, coding is the job. And I sure haven't seen this board called /coding/.

Name: Anonymous 2016-03-19 11:28

>>874
If you followed the thread, Cudder is writing this browser as part of shim's job.

>>875
Coding is merely the act of writing code down. Programming is making computers do things, usually via coding. Development is producing a software piece.
To make a house-building analogy: Coding is hammering, programming is building, and development is, well, development.

Name: Anonymous 2016-03-19 20:36

checkem

Name: Anonymous 2016-03-23 13:04

>>876
If you followed the thread, Cudder is writing this browser as part of shim's job.
You are wrong.

Programming is making computers do things, usually via coding
No. Programming in conjuring the spirits of the computer with our spells. Nothing more and nothing less. All other things within the software world are irrelevant, a waste of one's time, and intellect.

Name: Cudder !cXCudderUE 2016-03-26 17:01

CSS parser can be really simple.

css-file = (WHS* | CDO | CDC | '@' ID [ '{' block '}' ] | selector-block)*
block = (WHS* | '@' ID [ '{' block '}' ] | selector-block)*
selector-block = selector [ ',' WHS* selector ]* '{' decl-list '}'
selector = nodeselect [ sel-comb nodeselect ]*
sel-comb = WHS* ['+' | '>' | '~'] WHS* | WHS+
nodeselect = [ID | '*'] attrsel* | attrsel+
attrsel = '[' WHS* ID WHS* [matchop WHS* [ID|STR] WHS*]? ']'
| '#' NAME
| '.' ID
| ':' ':'? ID
matchop = ['~' | '|' | '^' | '$' | '*' ]? '='

A few hundred bytes should be sufficient to parse this.

Name: Anonymous 2016-03-26 17:10

>>879
How's progress on CSS parser, tokeniser and box generation?
I want to see you get into layout development already and pass the freaking Acid2 test.
Show those bastards, Cudder!

Newer Posts