aboutsummaryrefslogtreecommitdiff
path: root/content/posts/2022-03-09-fastest-js-html-escape.md
blob: 0dc85af2392e309a344defe7a1982770ea75363e (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
---
slug: fastest-js-html-escape
title: "Fastest JavaScript HTML Escape"
date: "2022-03-09T21:42:57-04:00"
---
What is the fastest [JavaScript][js] [HTML][] escape implementation?  To
find out, I:

1. Wrote [10 different JavaScript HTML escape implementations][impls].
2. Created a web-based benchmarking tool which uses [web workers][] and
   the [Performance API][] to test with a variety of string sizes and
   generates a downloadable [CSV][] of results.
3. Created a set of scripts to aggregate and plot the results as
   [SVGs][svg].

## Results

The times are from 64-bit [Chrome 99][chrome] running in [Debian][] on a
[Lenovo Thinkpad X1 Carbon (9th Gen)][laptop]; the specific timing
results may vary for your system, but the relative results should be
comparable.

The first chart shows implementations' mean call time (95% [CI][]) as
the string length varies:

{{< figure
  src="/files/posts/fastest-js-html-escape/sizes.svg"
  class=image
  caption="String Size vs. HTML Escape Function Call Time (&mu;s)"
>}}

The second chart comparse implementations' mean call time (95% [CI][])
for 3000 character strings:

{{< figure
  src="/files/posts/fastest-js-html-escape/times.svg"
  class=image
  caption="HTML Escape Function Call Times"
>}}

The red, blue, and green bars in this chart indicate the slow, medium,
and fast functions, respectively.

### Slow Functions

Anything that uses a capturing [regular expression][re].

#### Example: h2

```js
const h2 = (() => {
  // characters to match
  const M = /([&<>'"])/g;

  // map of char to entity
  const E = {
    '&': '&amp;',
    '<': '&lt;',
    '>': '&gt;',
    "'": '&apos;',
    '"': '&quot;',
  };

  // build and return escape function
  return (v) => v.replace(M, (_, c) => E[c]);
})();
```
&nbsp;

The capture is definitely at fault, because the call times for identical
non-capturing implementations (example: `h4`) are comparable to
everything else.

### Medium Functions

Except for the capturing [regular expression][re] implementations in the
previous section, the remaining implementations' call times were comparable
with one another.  This includes:

* Reducing an array of string literals and calling `replace()`.
* Several variants of reducing an array of  non-capturing [regular
  expression][re] with `replace()`.

#### Example: h4

```js
const h4 = (() => {
  // characters to match
  const M = /[&<>'"]/g;

  // map of char to entity
  const E = {
    '&': '&amp;',
    '<': '&lt;',
    '>': '&gt;',
    "'": '&apos;',
    '"': '&quot;',
  };

  // build and return escape function
  return (v) => v.replace(M, (c) => E[c]);
})();
```

### Fast Functions

Three implementations are slightly faster than the others.  They all use
`replaceAll()` and match on string literals.  Their call times are
indistinguishable from one another:

* h7: Reduce, Replace All
* h8: Reduce, Replace All, Frozen
* h9: Replace All Literal

#### Example: h7

```js
const h7 = (() => {
  const E = [
    ['&', '&amp;'],
    ['<', '&lt;'],
    ['>', '&gt;'],
    ["'", '&apos;'],
    ['"', '&quot;'],
  ];

  return (v) => E.reduce((r, e) => r.replaceAll(e[0], e[1]), v);
})();
```
&nbsp;

## The Winner: h9

Even though the call times for `h7`, `h8`, and `h9` are
indistinguishable, I actually prefer `h9` because:

* The most legible.  It is the easiest implementation to read for
  beginning developers and developers who are uncomfortable with
  functional programming.
* The simplist parse (probably).
* Slightly easier for browsers to optimize (probably).

Here it is:

```js
// html escape (replaceall explicit)
const h9 = (v) => {
  return v.replaceAll('&', '&amp;')
    .replaceAll('<', '&lt;')
    .replaceAll('>', '&gt;')
    .replaceAll("'", '&apos;')
    .replaceAll('"', '&quot;');
};
```
&nbsp;

## Notes

* The benchmarking interface, aggregation and plotting scripts, and
  additional information are available in the [companion GitHub
  repository][repo].
* I also wrote a [DOM][]/`textContent` implementation, but I couldn't
  compare it with the other implementations because [web workers][]
  don't have [DOM][] access.  I would be surprised if it was as fast as
  the fast functions above.
* `Object.freeze()` doesn't appear to help, at least not in
  [Chrome][].

**Update (2022-03-11):** I posted the benchmarking tool online at the
following URL: [https://pmdn.org/fastest-js-html-escape/][site].

[repo]: https://github.com/pablotron/fastest-js-html-escape
  "Fastest JavaScript HTML Escape"
[js]: https://en.wikipedia.org/wiki/ECMAScript
  "JavaScript programming language."
[html]: https://en.wikipedia.org/wiki/HTML
  "HyperText Markup Language"
[impls]: https://github.com/pablotron/fastest-js-html-escape/blob/main/public/common.js
  "Variety of JavaScript HTML escape implementations."
[web workers]: https://en.wikipedia.org/wiki/Web_worker
  "JavaScript that runs in a background thread and communicates via messages with HTML page."
[performance api]: https://developer.mozilla.org/en-US/docs/Web/API/Performance
  "Web performance measurement API."
[csv]: https://en.wikipedia.org/wiki/Comma-separated_values
  "Comma-Separated Value file."
[chrome]: https://www.google.com/chrome/
  "Google Chrome web browser."
[debian]: https://debian.org/
  "Debian Linux distribution."
[laptop]: https://en.wikipedia.org/wiki/ThinkPad_X1_series#X1_Carbon_(9th_Gen)
  "Lenovo Thinkpad X1 Carbon (9th Gen)"
[re]: https://en.wikipedia.org/wiki/Regular_expression
  "Regular expression."
[ci]: https://en.wikipedia.org/wiki/Confidence_interval
  "Confidence interval."
[dom]: https://en.wikipedia.org/wiki/Document_Object_Model
  "Document Object Model"
[svg]: https://en.wikipedia.org/wiki/Scalable_Vector_Graphics
  "Scalable Vector Graphics"
[site]: https://pmdn.org/fastest-js-html-escape/
  "JavaScript HTML Escape benchmark tool."