>>25Its parser in the sense it 'parses'(regex-search-and-replace) by finding relevant elements, not that it builds DOM trees, walking and constructing list nodes:
From the article:
Many of Cloudflareโs services rely on parsing and modifying HTML pages as they pass through our edge servers. For example, we can insert the Google Analytics tag, safely rewrite http:// links to
https://, exclude parts of a page from bad bots, obfuscate email addresses, enable AMP, and more by modifying the HTML of a page.
And it worked until several conditions of 'normal HTML' were violated:
In order for the memory to leak the following had to be true:
The final buffer containing data had to finish with a malformed script or img tag
The buffer had to be less than 4k in length (otherwise NGINX would crash)
The customer had to either have Email Obfuscation enabled (because it uses both the old and new parsers as we transition),
โฆ or Automatic HTTPS Rewrites/Server Side Excludes (which use the new parser) in combination with another Cloudflare feature that uses the old parser. โฆ and Server-Side Excludes only execute if the client IP has a poor reputation (i.e. it does not work for most visitors).
That explains why the buffer overrun resulting in a leak of memory occurred so infrequently.