aboutsummaryrefslogtreecommitdiff
path: root/content/articles/the-birthday-paradox.md
diff options
context:
space:
mode:
Diffstat (limited to 'content/articles/the-birthday-paradox.md')
-rw-r--r--content/articles/the-birthday-paradox.md114
1 files changed, 78 insertions, 36 deletions
diff --git a/content/articles/the-birthday-paradox.md b/content/articles/the-birthday-paradox.md
index 6259677..aa7f14b 100644
--- a/content/articles/the-birthday-paradox.md
+++ b/content/articles/the-birthday-paradox.md
@@ -60,8 +60,8 @@ tables:
## Introduction
-I share my birthday with my mom. Yesterday while talking her on ~~my~~
-her birthday I asked if she had heard of the [Birthday Problem][bp]:
+I share my birthday with my mom. On ~~my~~ her birthday I asked if she
+had heard of the following puzzle:
> How many people have to be in a room before the probability that there
> is a shared birthday is greater than 50%?
@@ -72,12 +72,11 @@ the *birthday paradox*. It is an interesting problem because:
* The answer is counterintuitive (hence the name *birthday paradox*).
* The solution relies on elements of [combinatorics][], [set theory][],
and [probability][].
-* The ramifications affect security, particularly [cryptographic
- hash algorithms][hash].
+* The ramifications affect the security of [cryptographic hash
+ algorithms][hash].
The next section covers the math that you'll need. If you're already
-familiar with [probability][], feel free to skip ahead to [Solving the
-Birthday Problem][solving].
+familiar with [probability][], feel free to [skip ahead][solving].
## Probability Crash Course
@@ -154,16 +153,14 @@ P(R ⋂ D) = 3/8 × 1/3 = 0.125 = 12.5%
## Solving the Birthday Problem
-Let's get back to the solution. I'll save you a bit of time and a lot
-of headache by telling you in advance that the [birthday problem][bp] is
-one of those pesky probability problems where it much easier to
-calculate the compliment; that is, the [probability][] that everyone
-has a *unique* birthday.
+Let's get back to finding a solution.
-Now that that piece of bookkeeping is out of the way, let's get started.
+To save time, I'll tell you in up front advance that this is one of
+those problems where it much easier to calculate the compliment; that
+is, the [probability][] that everyone has a *unique* birthday.
-Ff there is one person in the room, then the probability that everyone
-has a unique birthday is `1.0` (`100%`).
+If there is only one person in the room, then the probability that
+everyone present has a unique birthday is `1.0` (`100%`).
```sh
P1 = P(unique first birthday) = 365/365
@@ -172,9 +169,9 @@ P1 = 1.0 = 100%
 
If there are two people in the room, the probability that everyone has a
-unique birthday can be calculated by multiplying `P1` -- the probability
-from the previous step -- by the probability that the second person's
-birthday is not the same as the first person's birthday. Algebraically:
+unique birthday can be calculated by multiplying the probability from
+the previous step by the probability that the second person's birthday
+is not the same as the first person's birthday. Algebraically:
```sh
P2 = P(unique second birthday)
@@ -200,17 +197,19 @@ The probability that everyone in a group of `N` people has a unique
birthday is the product of the numbers between `365` and `365 - (N - 1)`
(inclusive), divided by `365{{< sup "N" >}}`.
-To calculate the product of all the numbers between `365` and `365 - (N
-- 1)` in the numerator, we can use a [factorial][] trick to cancel out
-the terms that we don't need:
+We can calculate the product of the numbers between `365 - (N - 1)` and
+`365` by dividing `N!` by the product of terms we don't care about. I
+don't have a name for this (the equation is the same as a [permutation
+without repetition][], but that's just a coincidence), so let's call it
+a "truncated factorial":
```sh
-Π(X, Y) = product of numbers between X and Y (inclusive) where X < Y
-Π(X, Y) = Y!/(X - 1))!
+TF(X, Y) = product of numbers between X and Y (inclusive) where X ∈ [1, Y)
+TF(X, Y) = Y!/(X - 1))!
-Π((365 - (N - 1)), 365) = 365!/((365 - (N - 1) - 1))!
-Π((365 - (N - 1)), 365) = 365!/((365 + -N + 1 + -1))!
-Π((365 - (N - 1)), 365) = 365!/(365 - N)!
+TF((365 - (N - 1)), 365) = 365!/((365 - (N - 1) - 1))!
+TF(366 - N, 365) = 365!/((365 + -N + 1 + -1))!
+TF(366 - N, 365) = 365!/(365 - N)!
```
&nbsp;
@@ -244,7 +243,7 @@ following:
`N` people for `N ∈ [1, 50]`.
2. Exclude intermediate rows in the table to keep it short.
3. Highlight the relevant results (values of `N` where `S(N) > 0.5`).
-4. Render the bar chart and table shown below.
+4. Render the bar chart and table in the next section.
The full script is available in the [companion GitHub
repository][github]. Here is the [Ruby][] code for our derived
@@ -264,35 +263,62 @@ end
```
&nbsp;
-Here are the results:
+## Solution
+
+Here are the results of the script from the previous section:
{{< figure
src="/files/articles/birthday-paradox/people-vs-probability.svg"
class="image"
width=614
height=461
- caption="Number of People versus Probability of Shared Birthday."
+ caption="Number of People versus Probability of Shared Birthdays."
>}}
{{< table "probs" >}}
So the answer to the birthday problem is:
-**If there are 23 people in the room, then the probability of at least
-one shared birthday is greater than 50%**.
+**If there are 23 or more people in the room, then the probability of at
+least one shared birthday is greater than 50%**.
## Implications
-This result is called the *birthday paradox* is because it doesn't feel
-intuitive. When asked, most people will guess that a much higher
-number of people (50 or 130) are needed before the probability is
-greater than 50%.
+The answer is called the *birthday paradox* is because it doesn't feel
+intuitive. When asked, most people guess that a much higher number of
+people (50 or 130) are needed before the probability of a shared
+birthday is greater than 50%.
+
+The mistake is that we aren't just looking for a person in the room that
+shares a birthday with a single person. Instead, we are looking for
+*anyone* who shares a birthday with *anyone else*. As a result, the
+probability of a shared birthday increases much faster than expected.
+
+This has implications for the [collision resistance][] property of
+[cryptographic hash functions][hash]. [Collision resistance][] means
+that it is hard to find two inputs which hash to the same output.
+
+The birthday paradox means that for all [hash functions][] with an N-bit
+output, the probability of finding a collision for a random input is
+greater than `0.5` after calculating `2{{< sup "N/2" >}}` hashes. To
+make matters worse, searching for hash collisions is an [embarassingly
+parallel][] problem, and programs like [hashcat][] are network-aware and
+[GPGPU][]-accelerated.
+
+This is one of the reasons that the output sizes of [cryptographic hash
+functions][hash] are much larger than non-cryptographic [hash
+functions][]. [SHA-2][] and [BLAKE2][], for example, generate an output
+of 256 bits or more, while non-cryptographic [hash functions][]
+typically only generate an output of 32 or 64 bits.
+
+## Conclusion
-*TODO:* segue into hash algorithms.
+My mom was not amused when I asked if 23 people were present when I was
+born.
## Further Reading
-* [GitHub repository for the scripts used in this article][github]
+* [Companion GitHub repository][github]
* [The Birthday Problem (Wikipedia)][bp]
[bp]: https://en.wikipedia.org/wiki/Birthday_problem
@@ -331,3 +357,19 @@ greater than 50%.
"GitHub repository for scripts used in this article."
[solving]: #solving-the-birthday-problem
"Solving the Birthday Problem"
+[collision resistance]: https://en.wikipedia.org/wiki/Collision_resistance
+ "Property of cryptographic hash functions which makes it hard to find two inputs which hash to the same output."
+[hash functions]: https://en.wikipedia.org/wiki/Hash_function
+ "Hash function"
+[embarassingly parallel]: https://en.wikipedia.org/wiki/Embarrassingly_parallel
+ "Problems with solutions that can be calculated in parallel."
+[sha-2]: https://en.wikipedia.org/wiki/SHA-2
+ "SHA-2 cryptographic hash algorithm."
+[blake2]: https://en.wikipedia.org/wiki/BLAKE_(hash_function)#BLAKE2
+ "BLAKE2 cryptographic hash function."
+[hashcat]: https://hashcat.net/hashcat/
+ "Password hash cracking tool."
+[gpgpu]: https://en.wikipedia.org/wiki/General-purpose_computing_on_graphics_processing_units
+ "General Purpose computation on Graphics Processing Units."
+[permutation without repetition]: https://en.wikipedia.org/wiki/Permutation#Permutations_without_repetitions
+ "Permutation without repetition."