diff options
Diffstat (limited to 'content/articles')
-rw-r--r-- | content/articles/the-birthday-paradox.md | 303 |
1 files changed, 303 insertions, 0 deletions
diff --git a/content/articles/the-birthday-paradox.md b/content/articles/the-birthday-paradox.md new file mode 100644 index 0000000..1ab1e07 --- /dev/null +++ b/content/articles/the-birthday-paradox.md @@ -0,0 +1,303 @@ +--- +slug: the-birthday-paradox +title: "The Birthday Paradox" + +# date (optional for articles) +date: "2021-11-11T06:46:00-04:00" + +# draft articles are not visible on live site +draft: true + +# show on articles page +show: true + +# uncomment to show table of contents +toc: true + +tables: + probs: + cols: + - id: num_people + name: Number of People + tip: Number of people in room. + - id: probability + name: Probability + tip: Probability that at least two people in a room with this many people share a birthday. + - id: percent + name: Percent + tip: Probability represented as a percentage. + align: right + rows: + - num_people: 1 + probability: 0.0 + percent: 0.00% + - num_people: 2 + probability: 0.00273 + percent: 0.27% + - num_people: 3 + probability: 0.0082 + percent: 0.82% + - num_people: 4 + probability: 0.01635 + percent: 1.64% + - num_people: "..." + - num_people: 22 + probability: 0.47569 + percent: 47.57% + - num_people: 23 + probability: 0.50729 + percent: 50.73% + _css: has-text-weight-bold has-background-info-light + - num_people: 24 + probability: 0.53834 + percent: 53.83% + _css: has-text-weight-bold has-background-info-light + - num_people: 25 + probability: 0.56869 + percent: 56.87% + _css: has-text-weight-bold has-background-info-light +--- + +## Introduction + +I share my birthday with my mom. Yesterday while talking her on ~~my~~ +her birthday I asked if she had heard of the [Birthday Paradox][bp] +(also known as the [Birthday Problem][bp]). + +The [birthday problem][bp] goes like this: + +> How many people have to be in a room before the probability that two +> people in the room share a birthday is greater than 50%? + +Even if you already know the answer, the [birthday problem][bp] is a fun +puzzle because: + +* The answer is counterintuitive (hence the name *birthday paradox*). +* The solution relies on elements of [combinatorics][], [set theory][], + and [probability][]. +* The implications affect security, particularly [cryptographic + hash algorithms][hash]. + +The next section covers the math that you'll need. If you're already +familiar with [probability][], feel free to skip to [Solving the +Birthday Problem][solving]. + +## Probability Crash Course + +Before we get started on the solution to the [Birthday Problem][bp], +here is a brief crash course in [probability][]: + +1. The expression `N!`, where `N` is a non-negative integer, is called a + [factorial][], and it is means "the product of all the numbers from + `N` to `1`". So `5! = 5 × 4 × 3 × 2 × 1 = 120`. To make our lives + easier, `0!` is defined to be `1`. +2. In [probability][], an occurance is called an *event*, and the + probability of an *event* `A` is denoted `P(A)`. +3. The value of a probability is a real number between `0` and `1`, + inclusive (written `P(A) ∈ [0, 1]`). Inclusive means that `0` and + `1` are valid probabilities, where `0` means "this event is + impossible" and `1` means "this event will occur with certainty". +4. You can convert a probability to a percentage by multiplying by + `100` and appending a trailing `%`. So `P(A) = 0.1` means `10%`, + `P(A) = 0.2` means `20%`, and so on. +5. The probability of two [independent][] events `A` and `B` both + occurring is written `P(A ⋂ B)` and read as "the [joint + probability][] of A and B". The value is calculated as `P(A) × + P(B)`. This pattern of multiplication continues, so the probability + of three [independent][] events `A`, `B`, and `C` occurring is `P(A ⋂ + B ⋂ C) = P(A) × P(B) × P(C)`, and so on. The upside-down U symbols + represent an [intersection][intersection]. +6. The probability of an event `A` **not** occurring is written `P(¬A)` + and calculated as `1 - P(A)`. This is useful because sometimes it + is much easier to calculate the probability that an event will not + occur, as we'll see below. + +Here's what all chicanery above like visually: + +{{< figure + src="/files/articles/birthday-paradox/probability-basics.svg" + class="image" + width=492 + height=556 + caption="Visual representation of probability basics." +>}} + +To calculate a [discrete][] (countable) probability, you sum up all the +matching events, then divide the sum by the total number of possible +events. + +For example, if you put one red marble and three blue marbles in a jar +and then randomly choose a single marble out of the jar, then the +probability that you will choose the red marble is: + +``` +P(R) = probability of choosing a red marble +P(R) = number of red marbles / total number of marbles +P(R) = 1/(1 + 3) = 0.25 = 25% +``` + +Here's a more complicated example: Let's say you put 3 red marbles and 5 +blue marbles in a jar, pick a marble from the jar at random, and then +roll a fair, [6-sided die][die]. What is the probability that you will +pick a red marble from the jar **and** roll a 5 or a 6 on the [die][]? + +``` +P(R) = probability of choosing a red marble +P(R) = number of red marbles / total number of marbles +P(R) = 3/(3 + 5) = 0.375 = 37.5% + +P(D) = probability of rolling a 5 or a 6 on the die +P(D) = number of matching die faces / total number of die faces +P(D) = 2/6 = ~0.333 = ~33.3% + +P(R ⋂ D) = P(R) × P(D) +P(R ⋂ D) = 3/8 × 1/3 = 0.125 = 12.5% +``` + +## Solving the Birthday Problem + +Let's get back to the problem at hand. I'll save you a bit of time and +a lot of headache by telling you now that the [birthday problem][bp] is +one of those pesky probability problems where it easier to calculate the +opposite; that is, the [probability][] that everyone in the room has a +*unique* birthday. + +Now that that piece of bookkeeping is out of the way, let's get started. + +Ff there is one person in the room, then the probability that everyone +has a unique birthday is `1.0` (`100%`). + +``` +P1 = P(one person) = 365/365 +P1 = 1.0 = 100% +``` + +If there are two people in the room, the probability that everyone has a +unique birthday can be calculated by multiplying the probability from +the previous step by the probability that the second person's birthday +is not the same as the first person's birthday. Algebraically: + +``` +P2 = P(second person has a unique birthday) +P2 = P1 × 364/365 +P2 = 365/365 × 364/365 = (365 × 364)/(365^2) = 364/365 +P2 = ~0.9972 = 99.7% +``` + +When we a third person, a pattern begins to emerge: + +``` +P2 = P(third person has a unique birthday) +P3 = P2 × 363/365 +P3 = 365/365 × 364/365 × 363/365 = (365 × 364 × 363)/(365^3) +P3 = ~0.9917 = ~99.2% +``` + +Do you see the pattern? + +The probability that everyone in a room of `N` people has a unique +birthday is the product of the numbers between `365` and `365 - N` +(inclusive), divided by `365^N` (365 to the Nth power). In other words: + +``` +B(N) = P(everyone in a room of N people has a unique birthday) +B(N) = 365!/((365 - N)! * 365^N) +``` + +At this point you can continue and find a precise algebraic solution, +but often for discrete propabilities it's convenient to write a script +to do the work for you. + +So I wrote a [Ruby][] script which uses the complement of the algorithm +derived above to calculate the probability of shared birthdays in a room +containing `N` for `N ∈ [1, 50]`, and render the results as a table and +a chart. + +The full script a bit unwieldy for this article, but it is available in +the [GitHub repository for this article][github]. + +Here is the portion of the script for the derived equasion: + +```ruby +# memoize 365! +F365 = 365.factorial + +# +# Return the probability that at least one pair of people in a room of N +# people share a birthday. +# +# (This is the formula that we derived in the original article). +# +def shared_birthdays(v) + 1 - F365 / (365 - v).factorial * (365 ** -v) +end +``` + + +Below is are the script results with the relevant chart bars and table +rows highlighted: + +{{< figure + src="/files/articles/birthday-paradox/people-vs-probability.svg" + class="image" + width=614 + height=461 + caption="Number of People versus Probability of Shared Birthday." +>}} + +{{< table "probs" >}} + +So the answer to the birthday problem is as follows: **If there are 23 +people in the room, then the probability of at least one shared birthday +is greater than 50%**. + +## Implications + +This result is called the *birthday paradox* is because it doesn't feel +intuitive. When asked, most people will guess that a much higher +number of people (50 or 130) are needed before the probability is +greater than 50%. + +*TODO:* segue into hash algorithms. + +## Further Reading + +* [GitHub repository for the scripts used in this article][github] +* [The Birthday Problem (Wikipedia)][bp] + +[bp]: https://en.wikipedia.org/wiki/Birthday_problem + "The birthday problem." +[combinatorics]: https://en.wikipedia.org/wiki/Combinatorics + "The mathematics of counting." +[set theory]: https://en.wikipedia.org/wiki/Set_theory + "The mathematics of sets." +[probability]: https://en.wikipedia.org/wiki/Probability + "The mathematics of determining how likely an event is to occur." +[hash]: https://en.wikipedia.org/wiki/Cryptographic_hash_function + "Cryptographic hash algorithm." +[intersection]: https://en.wikipedia.org/wiki/Intersection_(set_theory) + "Intersection." +[independent]: https://en.wikipedia.org/wiki/Probability#Independent_events + "Independent events in probability theory." +[discrete]: https://en.wikipedia.org/wiki/Random_variable#Discrete_random_variable + "A random value with a countable number of possible values." +[joint probability]: https://en.wikipedia.org/wiki/Joint_probability_distribution#Coin_flips + "Joint probability." +[die]: https://en.wikipedia.org/wiki/Dice + "Dice." +[factorial]: https://en.wikipedia.org/wiki/Factorial + "Factorial unary operator." +[ruby]: https://ruby-lang.org/ + "Ruby programming language." +[csv]: https://en.wikipedia.org/wiki/Comma-separated_values + "Comma-separated values." +[html]: https://en.wikipedia.org/wiki/HTML + "HyperText Markup Language" +[yaml]: https://en.wikipedia.org/wiki/YAML + "YAML Ain't a Markup Language" +[hugo-table-shortcode]: https://github.com/pablotron/hugo-shortcode-table + "Table shortcode for Hugo." +[github]: https://github.com/pablotron/birthday-paradox + "GitHub repository for scripts used in this article." +[solving]: #solving-the-birthday-problem + "Solving the Birthday Problem" |