diff options
-rw-r--r-- | content/articles/the-birthday-paradox.md | 152 |
1 files changed, 91 insertions, 61 deletions
diff --git a/content/articles/the-birthday-paradox.md b/content/articles/the-birthday-paradox.md index 1ab1e07..6259677 100644 --- a/content/articles/the-birthday-paradox.md +++ b/content/articles/the-birthday-paradox.md @@ -61,36 +61,33 @@ tables: ## Introduction I share my birthday with my mom. Yesterday while talking her on ~~my~~ -her birthday I asked if she had heard of the [Birthday Paradox][bp] -(also known as the [Birthday Problem][bp]). +her birthday I asked if she had heard of the [Birthday Problem][bp]: -The [birthday problem][bp] goes like this: +> How many people have to be in a room before the probability that there +> is a shared birthday is greater than 50%? -> How many people have to be in a room before the probability that two -> people in the room share a birthday is greater than 50%? - -Even if you already know the answer, the [birthday problem][bp] is a fun -puzzle because: +This is called the [Birthday Problem][bp], and the solution is known as +the *birthday paradox*. It is an interesting problem because: * The answer is counterintuitive (hence the name *birthday paradox*). * The solution relies on elements of [combinatorics][], [set theory][], and [probability][]. -* The implications affect security, particularly [cryptographic +* The ramifications affect security, particularly [cryptographic hash algorithms][hash]. The next section covers the math that you'll need. If you're already -familiar with [probability][], feel free to skip to [Solving the +familiar with [probability][], feel free to skip ahead to [Solving the Birthday Problem][solving]. ## Probability Crash Course -Before we get started on the solution to the [Birthday Problem][bp], -here is a brief crash course in [probability][]: +Here is an extremely abbreviated summary of of math that you'll need +from [combinatorics][], [probability][], and [set theory][]: 1. The expression `N!`, where `N` is a non-negative integer, is called a - [factorial][], and it is means "the product of all the numbers from - `N` to `1`". So `5! = 5 × 4 × 3 × 2 × 1 = 120`. To make our lives - easier, `0!` is defined to be `1`. + [factorial][], and it is means "the product of all the integers from + `N` down to `1`". So `5! = 5 × 4 × 3 × 2 × 1 = 120`. To make our + lives easier, `0! = 1` by definition. 2. In [probability][], an occurance is called an *event*, and the probability of an *event* `A` is denoted `P(A)`. 3. The value of a probability is a real number between `0` and `1`, @@ -107,12 +104,12 @@ here is a brief crash course in [probability][]: of three [independent][] events `A`, `B`, and `C` occurring is `P(A ⋂ B ⋂ C) = P(A) × P(B) × P(C)`, and so on. The upside-down U symbols represent an [intersection][intersection]. -6. The probability of an event `A` **not** occurring is written `P(¬A)` - and calculated as `1 - P(A)`. This is useful because sometimes it - is much easier to calculate the probability that an event will not - occur, as we'll see below. +6. The probability of an event `A` **not** occurring is written `P(¬A)`, + called the *compliment*, and calculated as `1 - P(A)`. This is + useful because sometimes it is much easier to calculate the + probability that an event will not occur, as we'll see below. -Here's what all chicanery above like visually: +Here's what all the chicanery above looks like visually: {{< figure src="/files/articles/birthday-paradox/probability-basics.svg" @@ -130,18 +127,19 @@ For example, if you put one red marble and three blue marbles in a jar and then randomly choose a single marble out of the jar, then the probability that you will choose the red marble is: -``` +```sh P(R) = probability of choosing a red marble P(R) = number of red marbles / total number of marbles P(R) = 1/(1 + 3) = 0.25 = 25% ``` + Here's a more complicated example: Let's say you put 3 red marbles and 5 blue marbles in a jar, pick a marble from the jar at random, and then roll a fair, [6-sided die][die]. What is the probability that you will pick a red marble from the jar **and** roll a 5 or a 6 on the [die][]? -``` +```sh P(R) = probability of choosing a red marble P(R) = number of red marbles / total number of marbles P(R) = 3/(3 + 5) = 0.375 = 37.5% @@ -156,77 +154,109 @@ P(R ⋂ D) = 3/8 × 1/3 = 0.125 = 12.5% ## Solving the Birthday Problem -Let's get back to the problem at hand. I'll save you a bit of time and -a lot of headache by telling you now that the [birthday problem][bp] is -one of those pesky probability problems where it easier to calculate the -opposite; that is, the [probability][] that everyone in the room has a -*unique* birthday. +Let's get back to the solution. I'll save you a bit of time and a lot +of headache by telling you in advance that the [birthday problem][bp] is +one of those pesky probability problems where it much easier to +calculate the compliment; that is, the [probability][] that everyone +has a *unique* birthday. Now that that piece of bookkeeping is out of the way, let's get started. Ff there is one person in the room, then the probability that everyone has a unique birthday is `1.0` (`100%`). -``` -P1 = P(one person) = 365/365 +```sh +P1 = P(unique first birthday) = 365/365 P1 = 1.0 = 100% ``` + If there are two people in the room, the probability that everyone has a -unique birthday can be calculated by multiplying the probability from -the previous step by the probability that the second person's birthday -is not the same as the first person's birthday. Algebraically: +unique birthday can be calculated by multiplying `P1` -- the probability +from the previous step -- by the probability that the second person's +birthday is not the same as the first person's birthday. Algebraically: -``` -P2 = P(second person has a unique birthday) +```sh +P2 = P(unique second birthday) P2 = P1 × 364/365 P2 = 365/365 × 364/365 = (365 × 364)/(365^2) = 364/365 P2 = ~0.9972 = 99.7% ``` + -When we a third person, a pattern begins to emerge: +Add a third person, and a pattern begins to emerge: -``` -P2 = P(third person has a unique birthday) +```r +P3 = P(unique second birthday ⋂ unique third birthday) P3 = P2 × 363/365 P3 = 365/365 × 364/365 × 363/365 = (365 × 364 × 363)/(365^3) P3 = ~0.9917 = ~99.2% ``` + Do you see the pattern? -The probability that everyone in a room of `N` people has a unique -birthday is the product of the numbers between `365` and `365 - N` -(inclusive), divided by `365^N` (365 to the Nth power). In other words: +The probability that everyone in a group of `N` people has a unique +birthday is the product of the numbers between `365` and `365 - (N - 1)` +(inclusive), divided by `365{{< sup "N" >}}`. + +To calculate the product of all the numbers between `365` and `365 - (N +- 1)` in the numerator, we can use a [factorial][] trick to cancel out +the terms that we don't need: +```sh +Π(X, Y) = product of numbers between X and Y (inclusive) where X < Y +Π(X, Y) = Y!/(X - 1))! + +Π((365 - (N - 1)), 365) = 365!/((365 - (N - 1) - 1))! +Π((365 - (N - 1)), 365) = 365!/((365 + -N + 1 + -1))! +Π((365 - (N - 1)), 365) = 365!/(365 - N)! ``` -B(N) = P(everyone in a room of N people has a unique birthday) -B(N) = 365!/((365 - N)! * 365^N) + + +Combined with the previous equation: + +```sh +U(N) = P(N unique birthdays) +U(N) = 365!/((365 - N)! × 365^N) ``` + -At this point you can continue and find a precise algebraic solution, -but often for discrete propabilities it's convenient to write a script -to do the work for you. +Then we calculate the compliment of the probability of N unique +birthdays to determine the probability of shared birthdays in a group of +N people: -So I wrote a [Ruby][] script which uses the complement of the algorithm -derived above to calculate the probability of shared birthdays in a room -containing `N` for `N ∈ [1, 50]`, and render the results as a table and -a chart. +```sh +S(N) = P(shared birthdays among N people) +S(N) = P(¬U(N)) = 1 - U(N) +S(N) = 1 - 365!/((365 - N)! × 365^N) +``` + + +At this point you can set `S(N) = 0.5` and solve for `N` to find an +exact solution, but often for discrete probabilities it's easier to +write a script and let a computer do the work for you. + +So I wrote a couple of scripts which use the derived algorithm to do the +following: -The full script a bit unwieldy for this article, but it is available in -the [GitHub repository for this article][github]. +1. Calculate the probability of shared birthdays in a room containing + `N` people for `N ∈ [1, 50]`. +2. Exclude intermediate rows in the table to keep it short. +3. Highlight the relevant results (values of `N` where `S(N) > 0.5`). +4. Render the bar chart and table shown below. -Here is the portion of the script for the derived equasion: +The full script is available in the [companion GitHub +repository][github]. Here is the [Ruby][] code for our derived +equation: ```ruby # memoize 365! F365 = 365.factorial # -# Return the probability that at least one pair of people in a room of N -# people share a birthday. -# -# (This is the formula that we derived in the original article). +# Calculate the probability of at least one shared birthday in a group +# of N people. # def shared_birthdays(v) 1 - F365 / (365 - v).factorial * (365 ** -v) @@ -234,8 +264,7 @@ end ``` -Below is are the script results with the relevant chart bars and table -rows highlighted: +Here are the results: {{< figure src="/files/articles/birthday-paradox/people-vs-probability.svg" @@ -247,9 +276,10 @@ rows highlighted: {{< table "probs" >}} -So the answer to the birthday problem is as follows: **If there are 23 -people in the room, then the probability of at least one shared birthday -is greater than 50%**. +So the answer to the birthday problem is: + +**If there are 23 people in the room, then the probability of at least +one shared birthday is greater than 50%**. ## Implications |