diff options
author | Paul Duncan <pabs@pablotron.org> | 2021-11-13 02:28:18 -0500 |
---|---|---|
committer | Paul Duncan <pabs@pablotron.org> | 2021-11-13 02:28:18 -0500 |
commit | a4ee49e77cb24c2f85b47b161a1accaab57b574a (patch) | |
tree | 8373aa32fdbce9ee7dffd7b78201ab030186dc1c /content/articles/the-birthday-paradox.md | |
parent | 34a56b93819dd6080ef7c0fcdf3baa67884fca62 (diff) | |
download | pablotron.org-a4ee49e77cb24c2f85b47b161a1accaab57b574a.tar.bz2 pablotron.org-a4ee49e77cb24c2f85b47b161a1accaab57b574a.zip |
articles/the-birthday-paradox.md: explain truncated factorial, fix wording, populate implications, add conclusion
Diffstat (limited to 'content/articles/the-birthday-paradox.md')
-rw-r--r-- | content/articles/the-birthday-paradox.md | 114 |
1 files changed, 78 insertions, 36 deletions
diff --git a/content/articles/the-birthday-paradox.md b/content/articles/the-birthday-paradox.md index 6259677..aa7f14b 100644 --- a/content/articles/the-birthday-paradox.md +++ b/content/articles/the-birthday-paradox.md @@ -60,8 +60,8 @@ tables: ## Introduction -I share my birthday with my mom. Yesterday while talking her on ~~my~~ -her birthday I asked if she had heard of the [Birthday Problem][bp]: +I share my birthday with my mom. On ~~my~~ her birthday I asked if she +had heard of the following puzzle: > How many people have to be in a room before the probability that there > is a shared birthday is greater than 50%? @@ -72,12 +72,11 @@ the *birthday paradox*. It is an interesting problem because: * The answer is counterintuitive (hence the name *birthday paradox*). * The solution relies on elements of [combinatorics][], [set theory][], and [probability][]. -* The ramifications affect security, particularly [cryptographic - hash algorithms][hash]. +* The ramifications affect the security of [cryptographic hash + algorithms][hash]. The next section covers the math that you'll need. If you're already -familiar with [probability][], feel free to skip ahead to [Solving the -Birthday Problem][solving]. +familiar with [probability][], feel free to [skip ahead][solving]. ## Probability Crash Course @@ -154,16 +153,14 @@ P(R ⋂ D) = 3/8 × 1/3 = 0.125 = 12.5% ## Solving the Birthday Problem -Let's get back to the solution. I'll save you a bit of time and a lot -of headache by telling you in advance that the [birthday problem][bp] is -one of those pesky probability problems where it much easier to -calculate the compliment; that is, the [probability][] that everyone -has a *unique* birthday. +Let's get back to finding a solution. -Now that that piece of bookkeeping is out of the way, let's get started. +To save time, I'll tell you in up front advance that this is one of +those problems where it much easier to calculate the compliment; that +is, the [probability][] that everyone has a *unique* birthday. -Ff there is one person in the room, then the probability that everyone -has a unique birthday is `1.0` (`100%`). +If there is only one person in the room, then the probability that +everyone present has a unique birthday is `1.0` (`100%`). ```sh P1 = P(unique first birthday) = 365/365 @@ -172,9 +169,9 @@ P1 = 1.0 = 100% If there are two people in the room, the probability that everyone has a -unique birthday can be calculated by multiplying `P1` -- the probability -from the previous step -- by the probability that the second person's -birthday is not the same as the first person's birthday. Algebraically: +unique birthday can be calculated by multiplying the probability from +the previous step by the probability that the second person's birthday +is not the same as the first person's birthday. Algebraically: ```sh P2 = P(unique second birthday) @@ -200,17 +197,19 @@ The probability that everyone in a group of `N` people has a unique birthday is the product of the numbers between `365` and `365 - (N - 1)` (inclusive), divided by `365{{< sup "N" >}}`. -To calculate the product of all the numbers between `365` and `365 - (N -- 1)` in the numerator, we can use a [factorial][] trick to cancel out -the terms that we don't need: +We can calculate the product of the numbers between `365 - (N - 1)` and +`365` by dividing `N!` by the product of terms we don't care about. I +don't have a name for this (the equation is the same as a [permutation +without repetition][], but that's just a coincidence), so let's call it +a "truncated factorial": ```sh -Π(X, Y) = product of numbers between X and Y (inclusive) where X < Y -Π(X, Y) = Y!/(X - 1))! +TF(X, Y) = product of numbers between X and Y (inclusive) where X ∈ [1, Y) +TF(X, Y) = Y!/(X - 1))! -Π((365 - (N - 1)), 365) = 365!/((365 - (N - 1) - 1))! -Π((365 - (N - 1)), 365) = 365!/((365 + -N + 1 + -1))! -Π((365 - (N - 1)), 365) = 365!/(365 - N)! +TF((365 - (N - 1)), 365) = 365!/((365 - (N - 1) - 1))! +TF(366 - N, 365) = 365!/((365 + -N + 1 + -1))! +TF(366 - N, 365) = 365!/(365 - N)! ``` @@ -244,7 +243,7 @@ following: `N` people for `N ∈ [1, 50]`. 2. Exclude intermediate rows in the table to keep it short. 3. Highlight the relevant results (values of `N` where `S(N) > 0.5`). -4. Render the bar chart and table shown below. +4. Render the bar chart and table in the next section. The full script is available in the [companion GitHub repository][github]. Here is the [Ruby][] code for our derived @@ -264,35 +263,62 @@ end ``` -Here are the results: +## Solution + +Here are the results of the script from the previous section: {{< figure src="/files/articles/birthday-paradox/people-vs-probability.svg" class="image" width=614 height=461 - caption="Number of People versus Probability of Shared Birthday." + caption="Number of People versus Probability of Shared Birthdays." >}} {{< table "probs" >}} So the answer to the birthday problem is: -**If there are 23 people in the room, then the probability of at least -one shared birthday is greater than 50%**. +**If there are 23 or more people in the room, then the probability of at +least one shared birthday is greater than 50%**. ## Implications -This result is called the *birthday paradox* is because it doesn't feel -intuitive. When asked, most people will guess that a much higher -number of people (50 or 130) are needed before the probability is -greater than 50%. +The answer is called the *birthday paradox* is because it doesn't feel +intuitive. When asked, most people guess that a much higher number of +people (50 or 130) are needed before the probability of a shared +birthday is greater than 50%. + +The mistake is that we aren't just looking for a person in the room that +shares a birthday with a single person. Instead, we are looking for +*anyone* who shares a birthday with *anyone else*. As a result, the +probability of a shared birthday increases much faster than expected. + +This has implications for the [collision resistance][] property of +[cryptographic hash functions][hash]. [Collision resistance][] means +that it is hard to find two inputs which hash to the same output. + +The birthday paradox means that for all [hash functions][] with an N-bit +output, the probability of finding a collision for a random input is +greater than `0.5` after calculating `2{{< sup "N/2" >}}` hashes. To +make matters worse, searching for hash collisions is an [embarassingly +parallel][] problem, and programs like [hashcat][] are network-aware and +[GPGPU][]-accelerated. + +This is one of the reasons that the output sizes of [cryptographic hash +functions][hash] are much larger than non-cryptographic [hash +functions][]. [SHA-2][] and [BLAKE2][], for example, generate an output +of 256 bits or more, while non-cryptographic [hash functions][] +typically only generate an output of 32 or 64 bits. + +## Conclusion -*TODO:* segue into hash algorithms. +My mom was not amused when I asked if 23 people were present when I was +born. ## Further Reading -* [GitHub repository for the scripts used in this article][github] +* [Companion GitHub repository][github] * [The Birthday Problem (Wikipedia)][bp] [bp]: https://en.wikipedia.org/wiki/Birthday_problem @@ -331,3 +357,19 @@ greater than 50%. "GitHub repository for scripts used in this article." [solving]: #solving-the-birthday-problem "Solving the Birthday Problem" +[collision resistance]: https://en.wikipedia.org/wiki/Collision_resistance + "Property of cryptographic hash functions which makes it hard to find two inputs which hash to the same output." +[hash functions]: https://en.wikipedia.org/wiki/Hash_function + "Hash function" +[embarassingly parallel]: https://en.wikipedia.org/wiki/Embarrassingly_parallel + "Problems with solutions that can be calculated in parallel." +[sha-2]: https://en.wikipedia.org/wiki/SHA-2 + "SHA-2 cryptographic hash algorithm." +[blake2]: https://en.wikipedia.org/wiki/BLAKE_(hash_function)#BLAKE2 + "BLAKE2 cryptographic hash function." +[hashcat]: https://hashcat.net/hashcat/ + "Password hash cracking tool." +[gpgpu]: https://en.wikipedia.org/wiki/General-purpose_computing_on_graphics_processing_units + "General Purpose computation on Graphics Processing Units." +[permutation without repetition]: https://en.wikipedia.org/wiki/Permutation#Permutations_without_repetitions + "Permutation without repetition." |