--- slug: the-birthday-paradox title: "The Birthday Paradox" # date (optional for articles) date: "2021-11-11T06:46:00-04:00" # draft articles are not visible on live site draft: true # show on articles page show: true # uncomment to show table of contents toc: true tables: probs: cols: - id: num_people name: Number of People tip: Number of people in room. - id: probability name: Probability tip: Probability that at least two people in a room with this many people share a birthday. - id: percent name: Percent tip: Probability represented as a percentage. align: right rows: - num_people: 1 probability: 0.0 percent: 0.00% - num_people: 2 probability: 0.00273 percent: 0.27% - num_people: 3 probability: 0.0082 percent: 0.82% - num_people: 4 probability: 0.01635 percent: 1.64% - num_people: "..." - num_people: 22 probability: 0.47569 percent: 47.57% - num_people: 23 probability: 0.50729 percent: 50.73% _css: has-text-weight-bold has-background-info-light - num_people: 24 probability: 0.53834 percent: 53.83% _css: has-text-weight-bold has-background-info-light - num_people: 25 probability: 0.56869 percent: 56.87% _css: has-text-weight-bold has-background-info-light --- ## Introduction I share my birthday with my mom. Yesterday while talking her on ~~my~~ her birthday I asked if she had heard of the [Birthday Paradox][bp] (also known as the [Birthday Problem][bp]). The [birthday problem][bp] goes like this: > How many people have to be in a room before the probability that two > people in the room share a birthday is greater than 50%? Even if you already know the answer, the [birthday problem][bp] is a fun puzzle because: * The answer is counterintuitive (hence the name *birthday paradox*). * The solution relies on elements of [combinatorics][], [set theory][], and [probability][]. * The implications affect security, particularly [cryptographic hash algorithms][hash]. The next section covers the math that you'll need. If you're already familiar with [probability][], feel free to skip to [Solving the Birthday Problem][solving]. ## Probability Crash Course Before we get started on the solution to the [Birthday Problem][bp], here is a brief crash course in [probability][]: 1. The expression `N!`, where `N` is a non-negative integer, is called a [factorial][], and it is means "the product of all the numbers from `N` to `1`". So `5! = 5 × 4 × 3 × 2 × 1 = 120`. To make our lives easier, `0!` is defined to be `1`. 2. In [probability][], an occurance is called an *event*, and the probability of an *event* `A` is denoted `P(A)`. 3. The value of a probability is a real number between `0` and `1`, inclusive (written `P(A) ∈ [0, 1]`). Inclusive means that `0` and `1` are valid probabilities, where `0` means "this event is impossible" and `1` means "this event will occur with certainty". 4. You can convert a probability to a percentage by multiplying by `100` and appending a trailing `%`. So `P(A) = 0.1` means `10%`, `P(A) = 0.2` means `20%`, and so on. 5. The probability of two [independent][] events `A` and `B` both occurring is written `P(A ⋂ B)` and read as "the [joint probability][] of A and B". The value is calculated as `P(A) × P(B)`. This pattern of multiplication continues, so the probability of three [independent][] events `A`, `B`, and `C` occurring is `P(A ⋂ B ⋂ C) = P(A) × P(B) × P(C)`, and so on. The upside-down U symbols represent an [intersection][intersection]. 6. The probability of an event `A` **not** occurring is written `P(¬A)` and calculated as `1 - P(A)`. This is useful because sometimes it is much easier to calculate the probability that an event will not occur, as we'll see below. Here's what all chicanery above like visually: {{< figure src="/files/articles/birthday-paradox/probability-basics.svg" class="image" width=492 height=556 caption="Visual representation of probability basics." >}} To calculate a [discrete][] (countable) probability, you sum up all the matching events, then divide the sum by the total number of possible events. For example, if you put one red marble and three blue marbles in a jar and then randomly choose a single marble out of the jar, then the probability that you will choose the red marble is: ``` P(R) = probability of choosing a red marble P(R) = number of red marbles / total number of marbles P(R) = 1/(1 + 3) = 0.25 = 25% ``` Here's a more complicated example: Let's say you put 3 red marbles and 5 blue marbles in a jar, pick a marble from the jar at random, and then roll a fair, [6-sided die][die]. What is the probability that you will pick a red marble from the jar **and** roll a 5 or a 6 on the [die][]? ``` P(R) = probability of choosing a red marble P(R) = number of red marbles / total number of marbles P(R) = 3/(3 + 5) = 0.375 = 37.5% P(D) = probability of rolling a 5 or a 6 on the die P(D) = number of matching die faces / total number of die faces P(D) = 2/6 = ~0.333 = ~33.3% P(R ⋂ D) = P(R) × P(D) P(R ⋂ D) = 3/8 × 1/3 = 0.125 = 12.5% ``` ## Solving the Birthday Problem Let's get back to the problem at hand. I'll save you a bit of time and a lot of headache by telling you now that the [birthday problem][bp] is one of those pesky probability problems where it easier to calculate the opposite; that is, the [probability][] that everyone in the room has a *unique* birthday. Now that that piece of bookkeeping is out of the way, let's get started. Ff there is one person in the room, then the probability that everyone has a unique birthday is `1.0` (`100%`). ``` P1 = P(one person) = 365/365 P1 = 1.0 = 100% ``` If there are two people in the room, the probability that everyone has a unique birthday can be calculated by multiplying the probability from the previous step by the probability that the second person's birthday is not the same as the first person's birthday. Algebraically: ``` P2 = P(second person has a unique birthday) P2 = P1 × 364/365 P2 = 365/365 × 364/365 = (365 × 364)/(365^2) = 364/365 P2 = ~0.9972 = 99.7% ``` When we a third person, a pattern begins to emerge: ``` P2 = P(third person has a unique birthday) P3 = P2 × 363/365 P3 = 365/365 × 364/365 × 363/365 = (365 × 364 × 363)/(365^3) P3 = ~0.9917 = ~99.2% ``` Do you see the pattern? The probability that everyone in a room of `N` people has a unique birthday is the product of the numbers between `365` and `365 - N` (inclusive), divided by `365^N` (365 to the Nth power). In other words: ``` B(N) = P(everyone in a room of N people has a unique birthday) B(N) = 365!/((365 - N)! * 365^N) ``` At this point you can continue and find a precise algebraic solution, but often for discrete propabilities it's convenient to write a script to do the work for you. So I wrote a [Ruby][] script which uses the complement of the algorithm derived above to calculate the probability of shared birthdays in a room containing `N` for `N ∈ [1, 50]`, and render the results as a table and a chart. The full script a bit unwieldy for this article, but it is available in the [GitHub repository for this article][github]. Here is the portion of the script for the derived equasion: ```ruby # memoize 365! F365 = 365.factorial # # Return the probability that at least one pair of people in a room of N # people share a birthday. # # (This is the formula that we derived in the original article). # def shared_birthdays(v) 1 - F365 / (365 - v).factorial * (365 ** -v) end ```   Below is are the script results with the relevant chart bars and table rows highlighted: {{< figure src="/files/articles/birthday-paradox/people-vs-probability.svg" class="image" width=614 height=461 caption="Number of People versus Probability of Shared Birthday." >}} {{< table "probs" >}} So the answer to the birthday problem is as follows: **If there are 23 people in the room, then the probability of at least one shared birthday is greater than 50%**. ## Implications This result is called the *birthday paradox* is because it doesn't feel intuitive. When asked, most people will guess that a much higher number of people (50 or 130) are needed before the probability is greater than 50%. *TODO:* segue into hash algorithms. ## Further Reading * [GitHub repository for the scripts used in this article][github] * [The Birthday Problem (Wikipedia)][bp] [bp]: https://en.wikipedia.org/wiki/Birthday_problem "The birthday problem." [combinatorics]: https://en.wikipedia.org/wiki/Combinatorics "The mathematics of counting." [set theory]: https://en.wikipedia.org/wiki/Set_theory "The mathematics of sets." [probability]: https://en.wikipedia.org/wiki/Probability "The mathematics of determining how likely an event is to occur." [hash]: https://en.wikipedia.org/wiki/Cryptographic_hash_function "Cryptographic hash algorithm." [intersection]: https://en.wikipedia.org/wiki/Intersection_(set_theory) "Intersection." [independent]: https://en.wikipedia.org/wiki/Probability#Independent_events "Independent events in probability theory." [discrete]: https://en.wikipedia.org/wiki/Random_variable#Discrete_random_variable "A random value with a countable number of possible values." [joint probability]: https://en.wikipedia.org/wiki/Joint_probability_distribution#Coin_flips "Joint probability." [die]: https://en.wikipedia.org/wiki/Dice "Dice." [factorial]: https://en.wikipedia.org/wiki/Factorial "Factorial unary operator." [ruby]: https://ruby-lang.org/ "Ruby programming language." [csv]: https://en.wikipedia.org/wiki/Comma-separated_values "Comma-separated values." [html]: https://en.wikipedia.org/wiki/HTML "HyperText Markup Language" [yaml]: https://en.wikipedia.org/wiki/YAML "YAML Ain't a Markup Language" [hugo-table-shortcode]: https://github.com/pablotron/hugo-shortcode-table "Table shortcode for Hugo." [github]: https://github.com/pablotron/birthday-paradox "GitHub repository for scripts used in this article." [solving]: #solving-the-birthday-problem "Solving the Birthday Problem"