diff options
Diffstat (limited to 'content/articles')
| -rw-r--r-- | content/articles/the-birthday-paradox.md | 114 | 
1 files changed, 78 insertions, 36 deletions
| diff --git a/content/articles/the-birthday-paradox.md b/content/articles/the-birthday-paradox.md index 6259677..aa7f14b 100644 --- a/content/articles/the-birthday-paradox.md +++ b/content/articles/the-birthday-paradox.md @@ -60,8 +60,8 @@ tables:  ## Introduction -I share my birthday with my mom.  Yesterday while talking her on ~~my~~ -her birthday I asked if she had heard of the [Birthday Problem][bp]: +I share my birthday with my mom.  On ~~my~~ her birthday I asked if she +had heard of the following puzzle:  > How many people have to be in a room before the probability that there  > is a shared birthday is greater than 50%? @@ -72,12 +72,11 @@ the *birthday paradox*.  It is an interesting problem because:  * The answer is counterintuitive (hence the name *birthday paradox*).  * The solution relies on elements of [combinatorics][], [set theory][],    and [probability][]. -* The ramifications affect security, particularly [cryptographic -  hash algorithms][hash]. +* The ramifications affect the security of [cryptographic hash +  algorithms][hash].  The next section covers the math that you'll need.  If you're already -familiar with [probability][], feel free to skip ahead to [Solving the -Birthday Problem][solving]. +familiar with [probability][], feel free to [skip ahead][solving].  ## Probability Crash Course @@ -154,16 +153,14 @@ P(R ⋂ D) = 3/8 × 1/3 = 0.125 = 12.5%  ## Solving the Birthday Problem -Let's get back to the solution.  I'll save you a bit of time and a lot -of headache by telling you in advance that the [birthday problem][bp] is -one of those pesky probability problems where it much easier to -calculate the compliment; that is, the [probability][] that everyone -has a *unique* birthday. +Let's get back to finding a solution. -Now that that piece of bookkeeping is out of the way, let's get started. +To save time, I'll tell you in up front advance that this is one of +those problems where it much easier to calculate the compliment; that +is, the [probability][] that everyone has a *unique* birthday. -Ff there is one person in the room, then the probability that everyone -has a unique birthday is `1.0` (`100%`). +If there is only one person in the room, then the probability that +everyone present has a unique birthday is `1.0` (`100%`).  ```sh  P1 = P(unique first birthday) = 365/365 @@ -172,9 +169,9 @@ P1 = 1.0 = 100%     If there are two people in the room, the probability that everyone has a -unique birthday can be calculated by multiplying `P1` -- the probability -from the previous step -- by the probability that the second person's -birthday is not the same as the first person's birthday.  Algebraically: +unique birthday can be calculated by multiplying the probability from +the previous step by the probability that the second person's birthday +is not the same as the first person's birthday.  Algebraically:  ```sh  P2 = P(unique second birthday) @@ -200,17 +197,19 @@ The probability that everyone in a group of `N` people has a unique  birthday is the product of the numbers between `365` and `365 - (N - 1)`  (inclusive), divided by `365{{< sup "N" >}}`. -To calculate the product of all the numbers between `365` and `365 - (N -- 1)` in the numerator, we can use a [factorial][] trick to cancel out -the terms that we don't need: +We can calculate the product of the numbers between `365 - (N - 1)` and +`365` by dividing `N!` by the product of terms we don't care about.  I +don't have a name for this (the equation is the same as a [permutation +without repetition][], but that's just a coincidence), so let's call it +a "truncated factorial":  ```sh -Π(X, Y) = product of numbers between X and Y (inclusive) where X < Y -Π(X, Y) = Y!/(X - 1))! +TF(X, Y) = product of numbers between X and Y (inclusive) where X ∈ [1, Y) +TF(X, Y) = Y!/(X - 1))! -Π((365 - (N - 1)), 365) = 365!/((365 - (N - 1) - 1))! -Π((365 - (N - 1)), 365) = 365!/((365 + -N + 1 + -1))! -Π((365 - (N - 1)), 365) = 365!/(365 - N)! +TF((365 - (N - 1)), 365) = 365!/((365 - (N - 1) - 1))! +TF(366 - N, 365) = 365!/((365 + -N + 1 + -1))! +TF(366 - N, 365) = 365!/(365 - N)!  ```    @@ -244,7 +243,7 @@ following:     `N` people for `N ∈ [1, 50]`.  2. Exclude intermediate rows in the table to keep it short.  3. Highlight the relevant results (values of `N` where `S(N) > 0.5`). -4. Render the bar chart and table shown below. +4. Render the bar chart and table in the next section.  The full script is available in the [companion GitHub  repository][github].  Here is the [Ruby][] code for our derived @@ -264,35 +263,62 @@ end  ```    -Here are the results: +## Solution + +Here are the results of the script from the previous section:  {{< figure    src="/files/articles/birthday-paradox/people-vs-probability.svg"    class="image"    width=614    height=461 -  caption="Number of People versus Probability of Shared Birthday." +  caption="Number of People versus Probability of Shared Birthdays."  >}}  {{< table "probs" >}}  So the answer to the birthday problem is: -**If there are 23 people in the room, then the probability of at least -one shared birthday is greater than 50%**. +**If there are 23 or more people in the room, then the probability of at +least one shared birthday is greater than 50%**.  ## Implications -This result is called the *birthday paradox* is because it doesn't feel -intuitive.  When asked, most people will guess that a much higher -number of people (50 or 130) are needed before the probability is -greater than 50%. +The answer is called the *birthday paradox* is because it doesn't feel +intuitive.  When asked, most people guess that a much higher number of +people (50 or 130) are needed before the probability of a shared +birthday is greater than 50%. + +The mistake is that we aren't just looking for a person in the room that +shares a birthday with a single person.  Instead, we are looking for +*anyone* who shares a birthday with *anyone else*.  As a result, the +probability of a shared birthday increases much faster than expected. + +This has implications for the [collision resistance][] property of +[cryptographic hash functions][hash].  [Collision resistance][] means +that it is hard to find two inputs which hash to the same output. + +The birthday paradox means that for all [hash functions][] with an N-bit +output, the probability of finding a collision for a random input is +greater than `0.5` after calculating `2{{< sup "N/2" >}}` hashes.  To +make matters worse, searching for hash collisions is an [embarassingly +parallel][] problem, and programs like [hashcat][] are network-aware and +[GPGPU][]-accelerated. + +This is one of the reasons that the output sizes of [cryptographic hash +functions][hash] are much larger than non-cryptographic [hash +functions][].  [SHA-2][] and [BLAKE2][], for example, generate an output +of 256 bits or more, while non-cryptographic [hash functions][] +typically only generate an output of 32 or 64 bits. + +## Conclusion -*TODO:* segue into hash algorithms. +My mom was not amused when I asked if 23 people were present when I was +born.  ## Further Reading -* [GitHub repository for the scripts used in this article][github] +* [Companion GitHub repository][github]  * [The Birthday Problem (Wikipedia)][bp]  [bp]: https://en.wikipedia.org/wiki/Birthday_problem @@ -331,3 +357,19 @@ greater than 50%.    "GitHub repository for scripts used in this article."  [solving]: #solving-the-birthday-problem    "Solving the Birthday Problem" +[collision resistance]: https://en.wikipedia.org/wiki/Collision_resistance +  "Property of cryptographic hash functions which makes it hard to find two inputs which hash to the same output." +[hash functions]: https://en.wikipedia.org/wiki/Hash_function +  "Hash function" +[embarassingly parallel]: https://en.wikipedia.org/wiki/Embarrassingly_parallel +  "Problems with solutions that can be calculated in parallel." +[sha-2]: https://en.wikipedia.org/wiki/SHA-2 +  "SHA-2 cryptographic hash algorithm." +[blake2]: https://en.wikipedia.org/wiki/BLAKE_(hash_function)#BLAKE2 +  "BLAKE2 cryptographic hash function." +[hashcat]: https://hashcat.net/hashcat/ +  "Password hash cracking tool." +[gpgpu]: https://en.wikipedia.org/wiki/General-purpose_computing_on_graphics_processing_units +  "General Purpose computation on Graphics Processing Units." +[permutation without repetition]: https://en.wikipedia.org/wiki/Permutation#Permutations_without_repetitions +  "Permutation without repetition." | 
