diff options
Diffstat (limited to 'content/posts/2022-03-09-fastest-js-html-escape.md')
| -rw-r--r-- | content/posts/2022-03-09-fastest-js-html-escape.md | 194 | 
1 files changed, 194 insertions, 0 deletions
diff --git a/content/posts/2022-03-09-fastest-js-html-escape.md b/content/posts/2022-03-09-fastest-js-html-escape.md new file mode 100644 index 0000000..ee28920 --- /dev/null +++ b/content/posts/2022-03-09-fastest-js-html-escape.md @@ -0,0 +1,194 @@ +--- +slug: fastest-js-html-escape +title: "Fastest JavaScript HTML Escape" +date: "2022-03-09T21:42:57-04:00" +--- +What is the fastest [JavaScript][] [HTML][] escape implementation?  To +answer that question I did the following: + +1. Wrote [10 different JavaScript HTML escape implementations][impls]. +2. Created a web-based benchmarking tool which uses [web workers][] and the +   [Performance API][] to test the with a variety of string sizes and +   generates a downloadable [CSV][] of results. +3. A set of scripts to aggregate and plot the results. + +## Results + +The times are from 64-bit [Chrome 99][chrome] running in [Debian][] on a +[Lenovo Thinkpad X1 Carbon (9th Gen)][laptop]; the specific timing +results may vary for your system, but the relative results should be +comparable. + +The first chart shows implementations' mean call time (95% [CI][]) as +the string length varies: + +{{< figure +  src="/files/posts/fastest-js-html-escape/sizes.svg" +  class=image +  caption="String Size vs. HTML Escape Function Call Time (μs)" +>}} + +The second chart comparse implementations' mean call time (95% [CI][]) +for 3000 character strings: + +{{< figure +  src="/files/posts/fastest-js-html-escape/times.svg" +  class=image +  caption="HTML Escape Function Call Times" +>}} + +The red, blue, and green bars in this chart indicate the slow, medium, +and fast functions, respectively. + +### Slow Functions + +Anything that uses a capturing [regular expression][re]. + +#### Example: h2 + +```js +const h2 = (() => { +  // characters to match +  const M = /([&<>'"])/g; + +  // map of char to entity +  const E = { +    '&': '&', +    '<': '<', +    '>': '>', +    "'": ''', +    '"': '"', +  }; + +  // build and return escape function +  return (v) => v.replace(M, (_, c) => E[c]); +})(); +``` +  + +The capture is definitely at fault, because the call times for identical +non-capturing implementations (example: `h4`) are comparable to +everything else. + +### Medium Functions + +Except for the capturing [regular expression][re] implementations in the +previous section, the remaining implementations' call times were comparable +with one another.  This includes: + +* Reducing an array of string literals and calling `replace()`. +* Several variants of reducing an array of  non-capturing [regular +  expression][re] with `replace()`. + +#### Example: h4 + +```js +const h4 = (() => { +  // characters to match +  const M = /[&<>'"]/g; + +  // map of char to entity +  const E = { +    '&': '&', +    '<': '<', +    '>': '>', +    "'": ''', +    '"': '"', +  }; + +  // build and return escape function +  return (v) => v.replace(M, (c) => E[c]); +})(); +``` + +### Fast Functions + +Three implementations are slightly faster than the others.  They all use +`replaceAll()` and match on string literals.  Their call times are +indistinguishable from one another: + +* h7: Reduce, Replace All +* h8: Reduce, Replace All, Frozen +* h9: Replace All Literal + +#### Example: h7 + +```js +const h7 = (() => { +  const E = [ +    ['&', '&'], +    ['<', '<'], +    ['>', '>'], +    ["'", '''], +    ['"', '"'], +  ]; + +  return (v) => E.reduce((r, e) => r.replaceAll(e[0], e[1]), v); +})(); +``` +  + +## The Winner: h9 + +Even though the call times for `h7`, `h8`, and `h9` are +indistinguishable, I actually prefer `h9` because: + +* The most legible.  It is the easiest implementation to read for +  beginning developers and developers who are uncomfortable with +  functional programming. +* The simplist parse (probably). +* Slightly easier for browsers to optimize (probably). + +Here it is: + +```js +// html escape (replaceall explicit) +const h9 = (v) => { +  return v.replaceAll('&', '&') +    .replaceAll('<', '<') +    .replaceAll('>', '>') +    .replaceAll("'", ''') +    .replaceAll('"', '"'); +}; +``` +  + +## Notes + +* The benchmarking interface, aggregation and plotting scripts, and +  additional information are available in the [companion GitHub +  repository][repo]. +* I also wrote a [DOM][]/`textContent` implementation, but I couldn't +  compare it with the other implementations because [web workers][] +  don't have [DOM][] access.  I would be surprised if it was as fast as +  the fast functions above. +* `Object.freeze()` doesn't appear to help, at least not in +  [Chrome][]. + + +[repo]: https://github.com/pablotron/fastest-js-html-escape +  "Fastest JavaScript HTML Escape" +[js]: https://en.wikipedia.org/wiki/ECMAScript +  "JavaScript programming language." +[html]: https://en.wikipedia.org/wiki/HTML +  "HyperText Markup Language" +[impls]: https://github.com/pablotron/fastest-js-html-escape/blob/main/public/common.js +  "Variety of JavaScript HTML escape implementations." +[web workers]: https://en.wikipedia.org/wiki/Web_worker +  "JavaScript that runs in a background thread and communicates via messages with HTML page." +[performance api]: https://developer.mozilla.org/en-US/docs/Web/API/Performance +  "Web performance measurement API." +[csv]: https://en.wikipedia.org/wiki/Comma-separated_values +  "Comma-Separated Value file." +[chrome]: https://www.google.com/chrome/ +  "Google Chrome web browser." +[debian]: https://debian.org/ +  "Debian Linux distribution." +[laptop]: https://en.wikipedia.org/wiki/ThinkPad_X1_series#X1_Carbon_(9th_Gen) +  "Lenovo Thinkpad X1 Carbon (9th Gen)" +[re]: https://en.wikipedia.org/wiki/Regular_expression +  "Regular expression." +[ci]: https://en.wikipedia.org/wiki/Confidence_interval +  "Confidence interval." +[dom]: https://en.wikipedia.org/wiki/Document_Object_Model +  "Document Object Model"  | 
