aboutsummaryrefslogtreecommitdiff
path: root/content/articles
diff options
context:
space:
mode:
Diffstat (limited to 'content/articles')
-rw-r--r--content/articles/the-birthday-paradox.md303
1 files changed, 303 insertions, 0 deletions
diff --git a/content/articles/the-birthday-paradox.md b/content/articles/the-birthday-paradox.md
new file mode 100644
index 0000000..1ab1e07
--- /dev/null
+++ b/content/articles/the-birthday-paradox.md
@@ -0,0 +1,303 @@
+---
+slug: the-birthday-paradox
+title: "The Birthday Paradox"
+
+# date (optional for articles)
+date: "2021-11-11T06:46:00-04:00"
+
+# draft articles are not visible on live site
+draft: true
+
+# show on articles page
+show: true
+
+# uncomment to show table of contents
+toc: true
+
+tables:
+ probs:
+ cols:
+ - id: num_people
+ name: Number of People
+ tip: Number of people in room.
+ - id: probability
+ name: Probability
+ tip: Probability that at least two people in a room with this many people share a birthday.
+ - id: percent
+ name: Percent
+ tip: Probability represented as a percentage.
+ align: right
+ rows:
+ - num_people: 1
+ probability: 0.0
+ percent: 0.00%
+ - num_people: 2
+ probability: 0.00273
+ percent: 0.27%
+ - num_people: 3
+ probability: 0.0082
+ percent: 0.82%
+ - num_people: 4
+ probability: 0.01635
+ percent: 1.64%
+ - num_people: "..."
+ - num_people: 22
+ probability: 0.47569
+ percent: 47.57%
+ - num_people: 23
+ probability: 0.50729
+ percent: 50.73%
+ _css: has-text-weight-bold has-background-info-light
+ - num_people: 24
+ probability: 0.53834
+ percent: 53.83%
+ _css: has-text-weight-bold has-background-info-light
+ - num_people: 25
+ probability: 0.56869
+ percent: 56.87%
+ _css: has-text-weight-bold has-background-info-light
+---
+
+## Introduction
+
+I share my birthday with my mom. Yesterday while talking her on ~~my~~
+her birthday I asked if she had heard of the [Birthday Paradox][bp]
+(also known as the [Birthday Problem][bp]).
+
+The [birthday problem][bp] goes like this:
+
+> How many people have to be in a room before the probability that two
+> people in the room share a birthday is greater than 50%?
+
+Even if you already know the answer, the [birthday problem][bp] is a fun
+puzzle because:
+
+* The answer is counterintuitive (hence the name *birthday paradox*).
+* The solution relies on elements of [combinatorics][], [set theory][],
+ and [probability][].
+* The implications affect security, particularly [cryptographic
+ hash algorithms][hash].
+
+The next section covers the math that you'll need. If you're already
+familiar with [probability][], feel free to skip to [Solving the
+Birthday Problem][solving].
+
+## Probability Crash Course
+
+Before we get started on the solution to the [Birthday Problem][bp],
+here is a brief crash course in [probability][]:
+
+1. The expression `N!`, where `N` is a non-negative integer, is called a
+ [factorial][], and it is means "the product of all the numbers from
+ `N` to `1`". So `5! = 5 × 4 × 3 × 2 × 1 = 120`. To make our lives
+ easier, `0!` is defined to be `1`.
+2. In [probability][], an occurance is called an *event*, and the
+ probability of an *event* `A` is denoted `P(A)`.
+3. The value of a probability is a real number between `0` and `1`,
+ inclusive (written `P(A) ∈ [0, 1]`). Inclusive means that `0` and
+ `1` are valid probabilities, where `0` means "this event is
+ impossible" and `1` means "this event will occur with certainty".
+4. You can convert a probability to a percentage by multiplying by
+ `100` and appending a trailing `%`. So `P(A) = 0.1` means `10%`,
+ `P(A) = 0.2` means `20%`, and so on.
+5. The probability of two [independent][] events `A` and `B` both
+ occurring is written `P(A ⋂ B)` and read as "the [joint
+ probability][] of A and B". The value is calculated as `P(A) ×
+ P(B)`. This pattern of multiplication continues, so the probability
+ of three [independent][] events `A`, `B`, and `C` occurring is `P(A ⋂
+ B ⋂ C) = P(A) × P(B) × P(C)`, and so on. The upside-down U symbols
+ represent an [intersection][intersection].
+6. The probability of an event `A` **not** occurring is written `P(¬A)`
+ and calculated as `1 - P(A)`. This is useful because sometimes it
+ is much easier to calculate the probability that an event will not
+ occur, as we'll see below.
+
+Here's what all chicanery above like visually:
+
+{{< figure
+ src="/files/articles/birthday-paradox/probability-basics.svg"
+ class="image"
+ width=492
+ height=556
+ caption="Visual representation of probability basics."
+>}}
+
+To calculate a [discrete][] (countable) probability, you sum up all the
+matching events, then divide the sum by the total number of possible
+events.
+
+For example, if you put one red marble and three blue marbles in a jar
+and then randomly choose a single marble out of the jar, then the
+probability that you will choose the red marble is:
+
+```
+P(R) = probability of choosing a red marble
+P(R) = number of red marbles / total number of marbles
+P(R) = 1/(1 + 3) = 0.25 = 25%
+```
+
+Here's a more complicated example: Let's say you put 3 red marbles and 5
+blue marbles in a jar, pick a marble from the jar at random, and then
+roll a fair, [6-sided die][die]. What is the probability that you will
+pick a red marble from the jar **and** roll a 5 or a 6 on the [die][]?
+
+```
+P(R) = probability of choosing a red marble
+P(R) = number of red marbles / total number of marbles
+P(R) = 3/(3 + 5) = 0.375 = 37.5%
+
+P(D) = probability of rolling a 5 or a 6 on the die
+P(D) = number of matching die faces / total number of die faces
+P(D) = 2/6 = ~0.333 = ~33.3%
+
+P(R ⋂ D) = P(R) × P(D)
+P(R ⋂ D) = 3/8 × 1/3 = 0.125 = 12.5%
+```
+
+## Solving the Birthday Problem
+
+Let's get back to the problem at hand. I'll save you a bit of time and
+a lot of headache by telling you now that the [birthday problem][bp] is
+one of those pesky probability problems where it easier to calculate the
+opposite; that is, the [probability][] that everyone in the room has a
+*unique* birthday.
+
+Now that that piece of bookkeeping is out of the way, let's get started.
+
+Ff there is one person in the room, then the probability that everyone
+has a unique birthday is `1.0` (`100%`).
+
+```
+P1 = P(one person) = 365/365
+P1 = 1.0 = 100%
+```
+
+If there are two people in the room, the probability that everyone has a
+unique birthday can be calculated by multiplying the probability from
+the previous step by the probability that the second person's birthday
+is not the same as the first person's birthday. Algebraically:
+
+```
+P2 = P(second person has a unique birthday)
+P2 = P1 × 364/365
+P2 = 365/365 × 364/365 = (365 × 364)/(365^2) = 364/365
+P2 = ~0.9972 = 99.7%
+```
+
+When we a third person, a pattern begins to emerge:
+
+```
+P2 = P(third person has a unique birthday)
+P3 = P2 × 363/365
+P3 = 365/365 × 364/365 × 363/365 = (365 × 364 × 363)/(365^3)
+P3 = ~0.9917 = ~99.2%
+```
+
+Do you see the pattern?
+
+The probability that everyone in a room of `N` people has a unique
+birthday is the product of the numbers between `365` and `365 - N`
+(inclusive), divided by `365^N` (365 to the Nth power). In other words:
+
+```
+B(N) = P(everyone in a room of N people has a unique birthday)
+B(N) = 365!/((365 - N)! * 365^N)
+```
+
+At this point you can continue and find a precise algebraic solution,
+but often for discrete propabilities it's convenient to write a script
+to do the work for you.
+
+So I wrote a [Ruby][] script which uses the complement of the algorithm
+derived above to calculate the probability of shared birthdays in a room
+containing `N` for `N ∈ [1, 50]`, and render the results as a table and
+a chart.
+
+The full script a bit unwieldy for this article, but it is available in
+the [GitHub repository for this article][github].
+
+Here is the portion of the script for the derived equasion:
+
+```ruby
+# memoize 365!
+F365 = 365.factorial
+
+#
+# Return the probability that at least one pair of people in a room of N
+# people share a birthday.
+#
+# (This is the formula that we derived in the original article).
+#
+def shared_birthdays(v)
+ 1 - F365 / (365 - v).factorial * (365 ** -v)
+end
+```
+&nbsp;
+
+Below is are the script results with the relevant chart bars and table
+rows highlighted:
+
+{{< figure
+ src="/files/articles/birthday-paradox/people-vs-probability.svg"
+ class="image"
+ width=614
+ height=461
+ caption="Number of People versus Probability of Shared Birthday."
+>}}
+
+{{< table "probs" >}}
+
+So the answer to the birthday problem is as follows: **If there are 23
+people in the room, then the probability of at least one shared birthday
+is greater than 50%**.
+
+## Implications
+
+This result is called the *birthday paradox* is because it doesn't feel
+intuitive. When asked, most people will guess that a much higher
+number of people (50 or 130) are needed before the probability is
+greater than 50%.
+
+*TODO:* segue into hash algorithms.
+
+## Further Reading
+
+* [GitHub repository for the scripts used in this article][github]
+* [The Birthday Problem (Wikipedia)][bp]
+
+[bp]: https://en.wikipedia.org/wiki/Birthday_problem
+ "The birthday problem."
+[combinatorics]: https://en.wikipedia.org/wiki/Combinatorics
+ "The mathematics of counting."
+[set theory]: https://en.wikipedia.org/wiki/Set_theory
+ "The mathematics of sets."
+[probability]: https://en.wikipedia.org/wiki/Probability
+ "The mathematics of determining how likely an event is to occur."
+[hash]: https://en.wikipedia.org/wiki/Cryptographic_hash_function
+ "Cryptographic hash algorithm."
+[intersection]: https://en.wikipedia.org/wiki/Intersection_(set_theory)
+ "Intersection."
+[independent]: https://en.wikipedia.org/wiki/Probability#Independent_events
+ "Independent events in probability theory."
+[discrete]: https://en.wikipedia.org/wiki/Random_variable#Discrete_random_variable
+ "A random value with a countable number of possible values."
+[joint probability]: https://en.wikipedia.org/wiki/Joint_probability_distribution#Coin_flips
+ "Joint probability."
+[die]: https://en.wikipedia.org/wiki/Dice
+ "Dice."
+[factorial]: https://en.wikipedia.org/wiki/Factorial
+ "Factorial unary operator."
+[ruby]: https://ruby-lang.org/
+ "Ruby programming language."
+[csv]: https://en.wikipedia.org/wiki/Comma-separated_values
+ "Comma-separated values."
+[html]: https://en.wikipedia.org/wiki/HTML
+ "HyperText Markup Language"
+[yaml]: https://en.wikipedia.org/wiki/YAML
+ "YAML Ain't a Markup Language"
+[hugo-table-shortcode]: https://github.com/pablotron/hugo-shortcode-table
+ "Table shortcode for Hugo."
+[github]: https://github.com/pablotron/birthday-paradox
+ "GitHub repository for scripts used in this article."
+[solving]: #solving-the-birthday-problem
+ "Solving the Birthday Problem"