diff options
Diffstat (limited to 'content/posts/2022-03-09-fastest-js-html-escape.md')
-rw-r--r-- | content/posts/2022-03-09-fastest-js-html-escape.md | 194 |
1 files changed, 194 insertions, 0 deletions
diff --git a/content/posts/2022-03-09-fastest-js-html-escape.md b/content/posts/2022-03-09-fastest-js-html-escape.md new file mode 100644 index 0000000..ee28920 --- /dev/null +++ b/content/posts/2022-03-09-fastest-js-html-escape.md @@ -0,0 +1,194 @@ +--- +slug: fastest-js-html-escape +title: "Fastest JavaScript HTML Escape" +date: "2022-03-09T21:42:57-04:00" +--- +What is the fastest [JavaScript][] [HTML][] escape implementation? To +answer that question I did the following: + +1. Wrote [10 different JavaScript HTML escape implementations][impls]. +2. Created a web-based benchmarking tool which uses [web workers][] and the + [Performance API][] to test the with a variety of string sizes and + generates a downloadable [CSV][] of results. +3. A set of scripts to aggregate and plot the results. + +## Results + +The times are from 64-bit [Chrome 99][chrome] running in [Debian][] on a +[Lenovo Thinkpad X1 Carbon (9th Gen)][laptop]; the specific timing +results may vary for your system, but the relative results should be +comparable. + +The first chart shows implementations' mean call time (95% [CI][]) as +the string length varies: + +{{< figure + src="/files/posts/fastest-js-html-escape/sizes.svg" + class=image + caption="String Size vs. HTML Escape Function Call Time (μs)" +>}} + +The second chart comparse implementations' mean call time (95% [CI][]) +for 3000 character strings: + +{{< figure + src="/files/posts/fastest-js-html-escape/times.svg" + class=image + caption="HTML Escape Function Call Times" +>}} + +The red, blue, and green bars in this chart indicate the slow, medium, +and fast functions, respectively. + +### Slow Functions + +Anything that uses a capturing [regular expression][re]. + +#### Example: h2 + +```js +const h2 = (() => { + // characters to match + const M = /([&<>'"])/g; + + // map of char to entity + const E = { + '&': '&', + '<': '<', + '>': '>', + "'": ''', + '"': '"', + }; + + // build and return escape function + return (v) => v.replace(M, (_, c) => E[c]); +})(); +``` + + +The capture is definitely at fault, because the call times for identical +non-capturing implementations (example: `h4`) are comparable to +everything else. + +### Medium Functions + +Except for the capturing [regular expression][re] implementations in the +previous section, the remaining implementations' call times were comparable +with one another. This includes: + +* Reducing an array of string literals and calling `replace()`. +* Several variants of reducing an array of non-capturing [regular + expression][re] with `replace()`. + +#### Example: h4 + +```js +const h4 = (() => { + // characters to match + const M = /[&<>'"]/g; + + // map of char to entity + const E = { + '&': '&', + '<': '<', + '>': '>', + "'": ''', + '"': '"', + }; + + // build and return escape function + return (v) => v.replace(M, (c) => E[c]); +})(); +``` + +### Fast Functions + +Three implementations are slightly faster than the others. They all use +`replaceAll()` and match on string literals. Their call times are +indistinguishable from one another: + +* h7: Reduce, Replace All +* h8: Reduce, Replace All, Frozen +* h9: Replace All Literal + +#### Example: h7 + +```js +const h7 = (() => { + const E = [ + ['&', '&'], + ['<', '<'], + ['>', '>'], + ["'", '''], + ['"', '"'], + ]; + + return (v) => E.reduce((r, e) => r.replaceAll(e[0], e[1]), v); +})(); +``` + + +## The Winner: h9 + +Even though the call times for `h7`, `h8`, and `h9` are +indistinguishable, I actually prefer `h9` because: + +* The most legible. It is the easiest implementation to read for + beginning developers and developers who are uncomfortable with + functional programming. +* The simplist parse (probably). +* Slightly easier for browsers to optimize (probably). + +Here it is: + +```js +// html escape (replaceall explicit) +const h9 = (v) => { + return v.replaceAll('&', '&') + .replaceAll('<', '<') + .replaceAll('>', '>') + .replaceAll("'", ''') + .replaceAll('"', '"'); +}; +``` + + +## Notes + +* The benchmarking interface, aggregation and plotting scripts, and + additional information are available in the [companion GitHub + repository][repo]. +* I also wrote a [DOM][]/`textContent` implementation, but I couldn't + compare it with the other implementations because [web workers][] + don't have [DOM][] access. I would be surprised if it was as fast as + the fast functions above. +* `Object.freeze()` doesn't appear to help, at least not in + [Chrome][]. + + +[repo]: https://github.com/pablotron/fastest-js-html-escape + "Fastest JavaScript HTML Escape" +[js]: https://en.wikipedia.org/wiki/ECMAScript + "JavaScript programming language." +[html]: https://en.wikipedia.org/wiki/HTML + "HyperText Markup Language" +[impls]: https://github.com/pablotron/fastest-js-html-escape/blob/main/public/common.js + "Variety of JavaScript HTML escape implementations." +[web workers]: https://en.wikipedia.org/wiki/Web_worker + "JavaScript that runs in a background thread and communicates via messages with HTML page." +[performance api]: https://developer.mozilla.org/en-US/docs/Web/API/Performance + "Web performance measurement API." +[csv]: https://en.wikipedia.org/wiki/Comma-separated_values + "Comma-Separated Value file." +[chrome]: https://www.google.com/chrome/ + "Google Chrome web browser." +[debian]: https://debian.org/ + "Debian Linux distribution." +[laptop]: https://en.wikipedia.org/wiki/ThinkPad_X1_series#X1_Carbon_(9th_Gen) + "Lenovo Thinkpad X1 Carbon (9th Gen)" +[re]: https://en.wikipedia.org/wiki/Regular_expression + "Regular expression." +[ci]: https://en.wikipedia.org/wiki/Confidence_interval + "Confidence interval." +[dom]: https://en.wikipedia.org/wiki/Document_Object_Model + "Document Object Model" |