aboutsummaryrefslogtreecommitdiff
diff options
context:
space:
mode:
-rw-r--r--content/articles/the-birthday-paradox.md152
1 files changed, 91 insertions, 61 deletions
diff --git a/content/articles/the-birthday-paradox.md b/content/articles/the-birthday-paradox.md
index 1ab1e07..6259677 100644
--- a/content/articles/the-birthday-paradox.md
+++ b/content/articles/the-birthday-paradox.md
@@ -61,36 +61,33 @@ tables:
## Introduction
I share my birthday with my mom. Yesterday while talking her on ~~my~~
-her birthday I asked if she had heard of the [Birthday Paradox][bp]
-(also known as the [Birthday Problem][bp]).
+her birthday I asked if she had heard of the [Birthday Problem][bp]:
-The [birthday problem][bp] goes like this:
+> How many people have to be in a room before the probability that there
+> is a shared birthday is greater than 50%?
-> How many people have to be in a room before the probability that two
-> people in the room share a birthday is greater than 50%?
-
-Even if you already know the answer, the [birthday problem][bp] is a fun
-puzzle because:
+This is called the [Birthday Problem][bp], and the solution is known as
+the *birthday paradox*. It is an interesting problem because:
* The answer is counterintuitive (hence the name *birthday paradox*).
* The solution relies on elements of [combinatorics][], [set theory][],
and [probability][].
-* The implications affect security, particularly [cryptographic
+* The ramifications affect security, particularly [cryptographic
hash algorithms][hash].
The next section covers the math that you'll need. If you're already
-familiar with [probability][], feel free to skip to [Solving the
+familiar with [probability][], feel free to skip ahead to [Solving the
Birthday Problem][solving].
## Probability Crash Course
-Before we get started on the solution to the [Birthday Problem][bp],
-here is a brief crash course in [probability][]:
+Here is an extremely abbreviated summary of of math that you'll need
+from [combinatorics][], [probability][], and [set theory][]:
1. The expression `N!`, where `N` is a non-negative integer, is called a
- [factorial][], and it is means "the product of all the numbers from
- `N` to `1`". So `5! = 5 × 4 × 3 × 2 × 1 = 120`. To make our lives
- easier, `0!` is defined to be `1`.
+ [factorial][], and it is means "the product of all the integers from
+ `N` down to `1`". So `5! = 5 × 4 × 3 × 2 × 1 = 120`. To make our
+ lives easier, `0! = 1` by definition.
2. In [probability][], an occurance is called an *event*, and the
probability of an *event* `A` is denoted `P(A)`.
3. The value of a probability is a real number between `0` and `1`,
@@ -107,12 +104,12 @@ here is a brief crash course in [probability][]:
of three [independent][] events `A`, `B`, and `C` occurring is `P(A ⋂
B ⋂ C) = P(A) × P(B) × P(C)`, and so on. The upside-down U symbols
represent an [intersection][intersection].
-6. The probability of an event `A` **not** occurring is written `P(¬A)`
- and calculated as `1 - P(A)`. This is useful because sometimes it
- is much easier to calculate the probability that an event will not
- occur, as we'll see below.
+6. The probability of an event `A` **not** occurring is written `P(¬A)`,
+ called the *compliment*, and calculated as `1 - P(A)`. This is
+ useful because sometimes it is much easier to calculate the
+ probability that an event will not occur, as we'll see below.
-Here's what all chicanery above like visually:
+Here's what all the chicanery above looks like visually:
{{< figure
src="/files/articles/birthday-paradox/probability-basics.svg"
@@ -130,18 +127,19 @@ For example, if you put one red marble and three blue marbles in a jar
and then randomly choose a single marble out of the jar, then the
probability that you will choose the red marble is:
-```
+```sh
P(R) = probability of choosing a red marble
P(R) = number of red marbles / total number of marbles
P(R) = 1/(1 + 3) = 0.25 = 25%
```
+&nbsp;
Here's a more complicated example: Let's say you put 3 red marbles and 5
blue marbles in a jar, pick a marble from the jar at random, and then
roll a fair, [6-sided die][die]. What is the probability that you will
pick a red marble from the jar **and** roll a 5 or a 6 on the [die][]?
-```
+```sh
P(R) = probability of choosing a red marble
P(R) = number of red marbles / total number of marbles
P(R) = 3/(3 + 5) = 0.375 = 37.5%
@@ -156,77 +154,109 @@ P(R ⋂ D) = 3/8 × 1/3 = 0.125 = 12.5%
## Solving the Birthday Problem
-Let's get back to the problem at hand. I'll save you a bit of time and
-a lot of headache by telling you now that the [birthday problem][bp] is
-one of those pesky probability problems where it easier to calculate the
-opposite; that is, the [probability][] that everyone in the room has a
-*unique* birthday.
+Let's get back to the solution. I'll save you a bit of time and a lot
+of headache by telling you in advance that the [birthday problem][bp] is
+one of those pesky probability problems where it much easier to
+calculate the compliment; that is, the [probability][] that everyone
+has a *unique* birthday.
Now that that piece of bookkeeping is out of the way, let's get started.
Ff there is one person in the room, then the probability that everyone
has a unique birthday is `1.0` (`100%`).
-```
-P1 = P(one person) = 365/365
+```sh
+P1 = P(unique first birthday) = 365/365
P1 = 1.0 = 100%
```
+&nbsp;
If there are two people in the room, the probability that everyone has a
-unique birthday can be calculated by multiplying the probability from
-the previous step by the probability that the second person's birthday
-is not the same as the first person's birthday. Algebraically:
+unique birthday can be calculated by multiplying `P1` -- the probability
+from the previous step -- by the probability that the second person's
+birthday is not the same as the first person's birthday. Algebraically:
-```
-P2 = P(second person has a unique birthday)
+```sh
+P2 = P(unique second birthday)
P2 = P1 × 364/365
P2 = 365/365 × 364/365 = (365 × 364)/(365^2) = 364/365
P2 = ~0.9972 = 99.7%
```
+&nbsp;
-When we a third person, a pattern begins to emerge:
+Add a third person, and a pattern begins to emerge:
-```
-P2 = P(third person has a unique birthday)
+```r
+P3 = P(unique second birthday ⋂ unique third birthday)
P3 = P2 × 363/365
P3 = 365/365 × 364/365 × 363/365 = (365 × 364 × 363)/(365^3)
P3 = ~0.9917 = ~99.2%
```
+&nbsp;
Do you see the pattern?
-The probability that everyone in a room of `N` people has a unique
-birthday is the product of the numbers between `365` and `365 - N`
-(inclusive), divided by `365^N` (365 to the Nth power). In other words:
+The probability that everyone in a group of `N` people has a unique
+birthday is the product of the numbers between `365` and `365 - (N - 1)`
+(inclusive), divided by `365{{< sup "N" >}}`.
+
+To calculate the product of all the numbers between `365` and `365 - (N
+- 1)` in the numerator, we can use a [factorial][] trick to cancel out
+the terms that we don't need:
+```sh
+Π(X, Y) = product of numbers between X and Y (inclusive) where X < Y
+Π(X, Y) = Y!/(X - 1))!
+
+Π((365 - (N - 1)), 365) = 365!/((365 - (N - 1) - 1))!
+Π((365 - (N - 1)), 365) = 365!/((365 + -N + 1 + -1))!
+Π((365 - (N - 1)), 365) = 365!/(365 - N)!
```
-B(N) = P(everyone in a room of N people has a unique birthday)
-B(N) = 365!/((365 - N)! * 365^N)
+&nbsp;
+
+Combined with the previous equation:
+
+```sh
+U(N) = P(N unique birthdays)
+U(N) = 365!/((365 - N)! × 365^N)
```
+&nbsp;
-At this point you can continue and find a precise algebraic solution,
-but often for discrete propabilities it's convenient to write a script
-to do the work for you.
+Then we calculate the compliment of the probability of N unique
+birthdays to determine the probability of shared birthdays in a group of
+N people:
-So I wrote a [Ruby][] script which uses the complement of the algorithm
-derived above to calculate the probability of shared birthdays in a room
-containing `N` for `N ∈ [1, 50]`, and render the results as a table and
-a chart.
+```sh
+S(N) = P(shared birthdays among N people)
+S(N) = P(¬U(N)) = 1 - U(N)
+S(N) = 1 - 365!/((365 - N)! × 365^N)
+```
+&nbsp;
+
+At this point you can set `S(N) = 0.5` and solve for `N` to find an
+exact solution, but often for discrete probabilities it's easier to
+write a script and let a computer do the work for you.
+
+So I wrote a couple of scripts which use the derived algorithm to do the
+following:
-The full script a bit unwieldy for this article, but it is available in
-the [GitHub repository for this article][github].
+1. Calculate the probability of shared birthdays in a room containing
+ `N` people for `N ∈ [1, 50]`.
+2. Exclude intermediate rows in the table to keep it short.
+3. Highlight the relevant results (values of `N` where `S(N) > 0.5`).
+4. Render the bar chart and table shown below.
-Here is the portion of the script for the derived equasion:
+The full script is available in the [companion GitHub
+repository][github]. Here is the [Ruby][] code for our derived
+equation:
```ruby
# memoize 365!
F365 = 365.factorial
#
-# Return the probability that at least one pair of people in a room of N
-# people share a birthday.
-#
-# (This is the formula that we derived in the original article).
+# Calculate the probability of at least one shared birthday in a group
+# of N people.
#
def shared_birthdays(v)
1 - F365 / (365 - v).factorial * (365 ** -v)
@@ -234,8 +264,7 @@ end
```
&nbsp;
-Below is are the script results with the relevant chart bars and table
-rows highlighted:
+Here are the results:
{{< figure
src="/files/articles/birthday-paradox/people-vs-probability.svg"
@@ -247,9 +276,10 @@ rows highlighted:
{{< table "probs" >}}
-So the answer to the birthday problem is as follows: **If there are 23
-people in the room, then the probability of at least one shared birthday
-is greater than 50%**.
+So the answer to the birthday problem is:
+
+**If there are 23 people in the room, then the probability of at least
+one shared birthday is greater than 50%**.
## Implications