Boy or girl paradox

This is an old revision of this page, as edited by AndyBloch (talk | contribs) at 22:19, 23 January 2009 (Frequentist approach). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

The Boy or Girl problem is a well-known example in probability theory:

  • A random two-child family whose older child is a boy is chosen. What is the probability that the younger child is a girl? (Or: choose a random two-child family assuring that the older one is a boy. What is the probability that the other one is a girl?)
  • A random two-child family with at least one boy is chosen. What is the probability that it has a girl? (Or: choose a random two-child family assuring that at least one is a boy. What is the probability that the other one is a girl?)

Investigation of these questions reveals that their answers are very different:

  • in the first case, there are two equally probable possibilities: the second one is a boy or a girl.
  • in the second case, there are three equally probable ways in which at least one child can be a boy: only the older one, only the younger one, or both.

Common assumptions

There are four possible combinations of children. Labeling boys B and girls G, and using the first letter to represent the older child, the possible combinations are:

{BB, BG, GB, GG}.

These four possibilities are taken to be equally likely a priori. This follows from three assumptions:

  1. That the determination of the sex of each child is an independent event.
  2. That each child is either male or female.
  3. That each child has the same chance of being male as of being female.

It is worth noting that these conditions form an incomplete model. By following these rules, we ignore the possibilities that a child is intersex, the ratio of boys to girls is not exactly 50:50, and (amongst other factors) the possibility of identical twins means that sex determination is not entirely independent. However, one can see intuitively that the occurrence of each of these exceptions is sufficiently rare to have little effect on our simple analysis of the general population.

First question

  • A random two-child family whose older child is a boy is chosen. What is the probability that the younger child is a girl?

When the older child is a boy, then the elements {GG} and {GB} of original sample space cannot be true, and must be deleted so that the problem reduces to:

Older child Younger child
Girl Girl
Girl Boy
Boy Girl
Boy Boy

Or, the set {BG, BB}.

Since both of the two possibilities in the new sample space {BG, BB} are equally likely, and only one of the two, BG, includes a girl, the probability that the younger child is a girl is 1/2.

Second question

  • A random two-child family with at least one boy is chosen. What is the probability that it has a girl?

An equivalent and perhaps clearer way of stating the problem is "Excluding the case of two girls, what is the probability that two random children are of different gender?"

Neither order nor age is important. There are four possible child combinations for a two-child family as seen in the sample space above. Three of these families meet the criteria of having at least one boy. The set of possibilities (possible combinations of children that meet the given criteria) is:

Older child Younger child
Girl Girl
Girl Boy
Boy Girl
Boy Boy

Bayesian approach

Consider the sample space of 2-child families.

  • Let X be the event that the family has one boy and one girl.
  • Let Y be the event that the family has at least one boy.
  • Then:
    •  

Or, the set {GB, BG, BB}, in which two out of the three possibilities includes a girl.

Therefore the probability is 2/3.

Third question

  • A random two-child family with at least one boy whose name is Jacob is chosen. What is the probability that it has a girl?

(It is allowed that a two-boy family has two Jacobs.) Does the additional bit of information that the boy's name is Jacob change anything?

Older child Younger child
Girl Girl
Boy Boy
Girl Jacob
Jacob Girl
Jacob Boy
Boy Jacob

Or, the set {GJ, JG, JB, BJ}, in which two out of the four possibilities includes a girl.

Therefore we might think that the probability returns to 1/2. But this is wrong because it doesn't take into account different frequencies of each of these answers. The likelihood of a boy being named Jacob and a boy not being named Jacob are not equal. Thus, we must replace our classical interpretation of probability with either a Frequentist or Bayesian interpretation. (Note that in real life child names are not independent of each other. In particular, people usually do not give the same name to two children. Thus, this discussion is purely theoretical).

Frequentist approach

Consider 10,000 families that have two children. Assume that the gender and name of each child is independent, within family and between families. Assume that the probability of each individual child being a girl is .5; otherwise the child is a boy. Assume that the probability of a child having the name Jacob is .01, and that all children with name Jacob are also boys.

In the table above, we have a list of all possible unique outcomes. But these outcomes do not have the same frequency. If we start with the assumption that the family has two children, we get the following frequency table:

Older child Younger child Frequency
Girl Girl 2500
Girl Boy 2500
Boy Girl 2500
Boy Boy 2500

With the additional bit of information that the family has a boy named Jacob, we can break every instance of "Boy" into two: "Jacob" and "Boy not Jacob". For every 50 Boys, 1 will fall into the "Jacob" bin and 49 into the "Boy not Jacob" bin. Thus, we have the following table:

Older child Younger child Frequency
Girl Girl 2500
Girl Jacob 50
Girl Boy not Jacob 2450
Jacob Girl 50
Boy not Jacob Girl 2450
Jacob Jacob 1
Boy not Jacob Jacob 49
Jacob Boy not Jacob 49
Boy not Jacob Boy not Jacob 2401

If we eliminate all instances that do not meet our given criteria ({Girl, Girl} {Girl, Boy not Jacob} {Boy not Jacob, Girl} {Boy not Jacob, Boy not Jacob}), then we eliminate 9801 of our events, leaving 199 possible events. Of those, the successful events are {Girl, Jacob} and {Jacob, Girl}, or 100 cases.

So if the probability of a boy being named Jacob is 1 in 50, then the probability that the family has a girl is 100/199, or roughly 50%. But this value will change depending on the popularity of the name. At the extreme, if all boys were given the same name, then being named Jacob would provide no more information than being a boy, and thus the probability would still be 2/3 that the family has a girl. As the likelihood of the name decreases, the likelihood of the two-Jacob case also decreases, and the probability of the family having a girl approaches the limit of 50%.

If we further assume that parents never name two children with the same name, we can eliminate {Jacob, Jacob}, leaving 198 possible events; thus it would appear that the probability of the family having a girl is 100/198, or 50/99. However, there are now 50 occurrences each of {Jacob, Boy not Jacob} and {Boy not Jacob, Jacob} making the probability of a girl 100/200, or exactly 1/2.

Conclusion

Many people coming across this paradox for the first time will agree with the answer to the first question, but some may be confused by the answer to the second question.

Two ways of explaining the error are as follows:

  1. The second question does not assume anything about the age of the boy. He might be the older or he might be the younger sibling. Therefore the thought that there are only three possibilities (2 boys {BB}, 2 girls {GG}, or a mix) does not take into account that the last of these three is twice as likely as either of the first two, because it can be either {GB} or {BG}.
  2. The chance that there are two boys is 1/4, the same as the chance that there are two girls. The chance that there is one boy and one girl (or one girl and one boy) consumes the remainder (1/2), therefore two boys are half as likely as a mixture.

Mistakes

A look at why some "explanations" are flawed can be very explanatory.

For example, to answer the second question someone may make this list of possibilities:

  1. The boy has an elder brother
  2. The boy has a younger brother
  3. The boy has an elder sister
  4. The boy has a younger sister

Apparently only the latter two are the ones sought for, giving a total probability of 1/2. The error here is that the first two statements are counted double. If there are two boys, we have no referent for "the boy". Therefore the first two possibilities should read:

  1. A boy has an elder brother
  2. A boy has a younger brother

But now it is clear that these two statements are equivalent – both effectively state that there are two boys – and therefore one should be removed.

Incomplete problem statements

The problem is often posed in a way that leave other interpretations open.

Example 1

Two old classmates, Mary and Brian, meet in the street, not having seen each other since they left school.

Mary asks Brian: "Have you got any children?"
Brian answers: "Yes, I've got two."
Mary: "Do you have a boy?"
Brian: "Yes, I do!"

Here, for some reason, the conversation is cut short.

Formally, this corresponds to the second version as Brian only has told Mary that at least one child is a boy. Accordingly, the probability that Brian has a girl should be 2/3. However, in real conversation, if Brian had two boys, he would be more likely to answer, e.g., "Yes, they are both boys" (Grice's maxim of quantity).[citation needed] The fact that he does not answer like that could reasonably be taken by Mary as a clue increasing her posterior probability of one child being a girl above 2/3. This highlights the need for precision when stating such problems in probability.


See also

References