Talk:Bayes' theorem/Archive 6

This is an archive of past discussions about Bayes' theorem. Do not edit the contents of this page. If you wish to start a new discussion or revive an old one, please do so on the current talk page.

Archive 1

←

Archive 4

Wrong likelihood function

I'm not an expert and thus not 100% sure. But I think the expression written in section 1.1

f_{\Theta }(\theta \mid B)={\text{constant}}\cdot f_{\Theta }(\theta )L(B\mid \theta )

is wrong due to the likelihood function $L(B\mid \theta )$ .

The correct version should be $L(\theta \mid B)$ , becuase the likelihood fuction should be a function of $\theta$ .

Please correct it if this is the case.

Xeddy (talk) 15:40, 24 February 2010 (UTC)

Talk content

The Talk content is daunting and needs its own Categories! :-( P64 (talk) 21:15, 10 March 2010 (UTC)

I've gone through and archived the old talk. Chris Cunningham (not at work) - talk 16:46, 23 June 2010 (UTC)

Historical remarks

Section "Historical remarks" seems to be the subject of Talk only #33 Application to the 'Fine-Tuning' Argument for the Existence of God? and I have replied there.

My revision of this section (yesterday 19:29 and 19:39) is difficult to follow using "Compare selected versions" at the article history, because that tool mistakenly finds I deleted most of some whole paragraphs and inserted some whole paragraphs. Elsewhere I have reconstructed those two edits in three parts with careful attention to help "Compare selected versions" line up the paragraphs. That correctly reveals my line editing. Comparison of the two comparisons may help anyone see how the tool works and thus how to edit in a more transparent way. See [User:P64/BayesLaw history], versions 17:04 and 17:30 today.

(version -03-08) The irony is that this label was introduced by R.A. Fisher in a derogatory sense. So, historically, Bayes was not a "Bayesian". It is actually unclear whether or not he was a Bayesian in the modern sense of the term, i.e. whether or not he was interested in inference or merely in probability ...

I have flagged the ironic derogatory sense "citation needed". Probably Fienberg is the source for everything in this paragraph. Even so we should have a date for Fisher's derogatory coinage.
I have deleted the remark that Bayes was not Bayesian "historically". It is expected and irrelevant that the moniker was coined after his time and that seems to be the only valid point. It isn't what we usually mean by "historically".
Does interest in inference rather than mere probability identify the modern sense of "Bayesian"? I doubt that too but I haven't touched it. Perhaps the so-called modern sense concerns Bayesian statistics, not Bayesian probability, and we should question whether Bayes was interested in statistics? --P64 (talk) 18:16, 11 March 2010 (UTC)

Bar graph

user:Riitoken has added a bar graph image to the simple example section repeatedly, e.g. [1]. This image doesn't actually show the application of Bayes' Theorem to this example - and has a variety of other issues as well, for example the terminology used in the image is different than the table or the text (pants vs. trousers). If an image is actually desired here, I'd suggest working with the folks at Wikipedia:Graphic Lab to design a better image. -- Rick Block (talk) 23:32, 21 June 2010 (UTC)

@Riitoken

It is time your vandalism comes to an end. Nijdam (talk) 11:27, 22 June 2010 (UTC)

It is now 7 times your edits have been reverted by several authors. Stop your childish behavior.Nijdam (talk) 16:52, 22 June 2010 (UTC)

Is subjective/objective redundant with frequentist/Bayesian?

The article says

However, there is disagreement between frequentist, Bayesian, subjective and objective statisticians with regard to the proper implementation and extent of Bayes' theorem.

Are "objective" and "subjective" statisticians appreciably distinct groups from "frequentist" and "Bayesian" statisticians? It's my impression that "subjective and objective" is redundant in that sentence. -- 216.243.27.133 (talk) 19:59, 8 August 2010 (UTC)

They are distinct viewpoints; there are plenty of objective Bayesians around. And I'm concerned that reducing the statement to "disagreement between frequentist and Bayesian statisticians" gives the impression that these two groups are monolithic, and that Bayesians all agree with each other. But I agree the current wording is cumbersome. How about just saying "disagreement among statisticians" instead? --Avenue (talk) 22:27, 8 August 2010 (UTC)

I agree that the present phrasing is too wordy and potentially confusing, and I endorse Avenue's solution. MartinPoulter (talk) 20:17, 1 September 2010 (UTC)

Tree

It might be useful to have a tree diagram as e.g. here [2]. Then one can see that p(A/B) [probability of A, having observed 'outcome' B] is simply p(A) x p(B/A) [probability of A x probability of A accompanying observed 'outcome' B] divided by p(B) [all the ways of getting the observed 'outcome' B] Jagdfeld (talk) 22:38, 10 August 2010 (UTC)

NB the conditional prob. is written as P(A|B), and not as P(A/B).

Introduction

The introduction is IMO too much a translation of the formula. Something like the following would better express the meaning:

If an event happens on the basis of several possibilities, the formula relates the probability when the event took place of one of the possibilities to the probabilities that the event takes place for each of the possibilities. Popular: if an event may be caused by several causes, the formula gives the probability that one of the causing events is the cause that the event took place in relation to the probability of the occurring of the event for each of the causes.

Nijdam (talk) 18:41, 1 September 2010 (UTC)

Sorry but I find the above very confusing and unnatural English. What's there at the moment is much better. MartinPoulter (talk) 20:04, 1 September 2010 (UTC)

I found the introduction rather poorly written. I already understand BT. If I didn't, the introduction would be of no help in understanding it. 97.126.94.136 (talk) 03:29, 25 September 2010 (UTC)

Also, I should note the article gives no intuitive understanding of BT. How do you expect someone to understand reams of dry equations? Especially when BT is so simple that it's almost trivial. I never cease to be amazed how poorly written (and often full of misinformation) wikipedia math articles are. 97.126.94.136 (talk) 03:49, 25 September 2010 (UTC)

Agreed. For the intuition one needs the Bayes' rule (odds formulation) of Bayes theorem. Posterior odds equals prior odds times likelihood ratio. That's all you need to know. Memorable, powerful, intuitive. Richard Gill (talk) 10:47, 11 January 2011 (UTC)

The statement "It implies that evidence has a stronger confirming effect if it was more unlikely before being observed." is incorrect, at least under a natural interpretation. E.g., P(E) may be lower because P(E|H) is lower, while P(E|-H) remains the same. In that case the evidence will have a stronger DISconfirming effect. I'm not sure exactly what is being referred to in Howson & Urbach in this regard. In any case, I change the sentence to something truthful and similarly relevant and leave in the reference as being appropriate to a remark on Bayesian confirmation theory. —Preceding unsigned comment added by 130.194.72.128 (talk) 04:21, 9 February 2011 (UTC)

Bayes' rule versus Bayes' theorem

In many applicications, also here on wikipedia, one needs Bayes' theorem in the odds form: posterior odds equal prior odds times likelihood ratio (aka Bayes factor). Actually, this is the result which I think deserves to be called a theorem: it is a genuinely beautiful, both surprising and intuitive result. It is memorable, powerful, easy to extend e.g. to sequential use. It is the mainstay of modern Bayesian statistics.

On the other hand the more traditional versions of Bayes' theorem are little more than a rewriting of the definition of conditional probability. The innovation in going from "theorem" to "rule" is to divide two "copies" of Bayes' theorem noting that all the ugly messy stuff disappears. The normalization constant will take care of itself, one should postpone figuring out what it actually is till as late in the game as possible.

I therefore reorganised the material on extensions and alternatives to Bayes' theorem creating a separate section on the "Bayes rule" version. I hope people will find this useful. More references (both within and outside wikipedia) are needed and it would be good to do some examples this way. Richard Gill (talk) 10:45, 11 January 2011 (UTC)

One idea that occurred to me today was to move the material on Bayes' rule into its own article, and merge the material from Bayes factor into it. Thoughts? Gnathan87 (talk) 04:58, 19 July 2011 (UTC)

Monty Hall

Is this concluding statement from the section on the MHP correct?:

"Under these two alternative assumptions, "always" and "never", the unconditional probability that the host opens the blue door happens to be 2/3 and 1/3, rather than 1/2 as above."

Does 'assumptions' mean 'alternatives'?
Is 'happens to be' a proper phrasing? Isn't the 2/3 & 1/3 frequencies for the favored & unfavored doors derived from the original random placement of the car? The host will open the unfavored door only when the car is behind the favored door, which is 1/3 of the time.
Is the word "unconditional" meaningful?
I found this sentence confusing as it refers to 2/3 & 1/3 vs 1/2 & 1/2". But not as the paradox' probabilities of the location of the car, rather as the probabilities the doors would be open in a situation different than the MHP.

It should be made clear that the part of the last paragraph beginning with "If the host always opens..." is not the MHP, but rather a variant for discussion purposes. While the host's behaviour may have changed, unless the contestant were made aware of this, he would still consider each door equally likely to be opened. The narrative has, without announcement or explanation, changed from the contestant's state of knowledge to a 3rd party's state of knowledge.

I don't see any references to reliable sources in the section. Glkanter (talk) 18:52, 11 March 2011 (UTC)

copy edit

Please verify grammar, style etc. in this part. --CiaPan (talk) 11:52, 29 March 2011 (UTC)

Sigh

Noone could come up with an *equivalent* example comparing frequentest and Bayesian inference then? Unsurprising :-( —Preceding unsigned comment added by 146.169.6.124 (talk) 20:19, 14 April 2011 (UTC)

Bayes decision theory

Bayes decision theory redirects here, but word ‘decision’ is never used in the current version of article. — Rxnt (talk) 20:19, 27 April 2011 (UTC)

Courtroom example misleading

It's my first day as an editor on WP and I'm not an expert on Bayes theorem so I've left the example alone, but there appears to be an issue with this example - certainly it confused me. I wonder if someone can check whether I've understood it correctly:

The way this example is worded implies that the calculation demonstrates that there's a 75% chance the defendant is guilty by mentioning the legal terms "beyond reasonable doubt" and "balance of probabilities". However all the example shows is that there's a 75% chance the person on the CCTV was male (albeit ignoring gender distribution in crime statistics which might be a significant in predicting whether the offender was male). But even if the probability was close to 100%, it doesn't infer anything about the guilt of the defendant - it's the sample size of people with bottles that is important. The CCTV evidence has done it's bit - it's established the offender carried a bottle - and gender doesn't matter, except to predict whether the CCTV shows a man or a woman.

If the example is intended to estimate P(guilt) it's the fact the defendant was seen with a bottle that might infer guilt, not gender. Assuming no other evidence and that crime rates amongst males/females are the same, guilt might be estimated simply as P(guilt) = 1/(no. people with bottles). Bayes theorem is not relevant in this case. So assuming 1000 people in the pub, we'll have 2% x 60% x 1000 = 12 males with bottles and 1% x 40% x 1000 = 4 females with bottles. The defendant is guilty with P = 1/16. Of course there's still p(man|bottle) = 12/16 = 75% probability the offender is a man as the example confirms, but P(guilt|offender is male) is still only 1/12. And if the defendant was female, the probabilities would stay the same except that P(guilt|offender is male) = 0, P(guilt|offender is female) = 1/4.

Apologies if I haven't followed WP etiquette - I'd have preferred to have contacted the author of the example directly but couldn't find out how to. — Preceding unsigned comment added by Zedeyepee (talk • contribs) 04:34, 16 September 2011 (UTC)

Thanks, you are absolutely right. I wrote this example not long ago, bit embarrassed this got past! Can't edit right now, but I'll do so tomorrow if nobody else gets there first. Gnathan87 (talk) 02:51, 17 September 2011 (UTC)

Done Gnathan87 (talk) 22:48, 20 September 2011 (UTC)

Mixed simultaneous distribution

@Gnathan87: I do not agree with your revert. What you write:

P(x\leq X\leq x+\Delta )=\int _{x}^{x+\Delta }f_{X}(\xi )\,\mathrm {d} \xi

\implies \lim _{\Delta \rightarrow 0}P(X=x)=f_{X}(x)\Delta

makes no sense, and is mathematically wrong. For instance:

\lim _{\Delta \rightarrow 0}P(X=x)=P(X=x)=0\neq f_{X}(x)\Delta

I do not know what you want to show, but this is unacceptable. Nijdam (talk) 20:13, 27 September 2011 (UTC)

Maybe this has arisen due to my engineering background. I am very used to seeing and writing this sort of thing, but perhaps it is not something a mathematician would write. I'm very happy to be corrected :) (In fact, as per my original edit I was more inclined just to write down

f_{X}(x)dx

etc, but I have a feeling that would be equally frowned upon.)

However, in response to your concern, surely

\textstyle \lim _{\Delta \rightarrow 0}f_{X}(x)\Delta =0

? To write this out more fully, expand

f_{X}(x)

and

f_{X}(x|Y=y)

as Taylor series and integrate:

{\begin{aligned}P(x\leq X\leq x+\Delta )&=f_{X}(x)\Delta \;+\;f_{X}^{'}(x){\frac {\Delta ^{2}}{2}}\;+\;f_{X}^{''}(x){\frac {\Delta ^{3}}{3}}\;+\;...\\P(x\leq X\leq x+\Delta |Y=y)&=f_{X}(x|Y=y)\Delta \;+\;f_{X}^{'}(x|Y=y){\frac {\Delta ^{2}}{2}}\;+\;f_{X}^{''}(x|Y=y){\frac {\Delta ^{3}}{3}}\;+\;...\end{aligned}}

Taking the quotient and then the limit,

\lim _{\Delta \rightarrow 0}{\frac {P(x\leq X\leq x+\Delta )}{P(x\leq X\leq x+\Delta |y=Y)}}={\frac {P(x=X)}{P(x=X|y=Y)}}={\frac {f_{X}(x)\Delta }{f_{X}(x|Y=y)\Delta }}={\frac {f_{X}(x)}{f_{X}(x|Y=y)}}

Of course, the relative rate at which the limit is taken matters. I must admit, this is something I have previously just taken for granted. I am struggling to satisfactorily justify why the rate should be the same, without explicitly defining the ratios of probability and probability density to be equal.

Gnathan87 (talk) 01:50, 28 September 2011 (UTC)

No need for this all. As I wrote before:

\lim _{\Delta \rightarrow 0}{\frac {P(x\leq X\leq x+\Delta )}{\Delta }}=\lim _{\Delta \rightarrow 0}{\frac {F_{X}(x+\Delta )-F_{X}(x)}{\Delta }}=f_{X}(x)

This allows you to say:

P(x\leq X\leq x+\Delta )\approx f_{X}(x)\Delta

But you're mistaken when you wrote:

\lim _{\Delta \rightarrow 0}{\frac {P(x\leq X\leq x+\Delta )}{P(x\leq X\leq x+\Delta |y=Y)}}={\frac {P(x=X)}{P(x=X|y=Y)}}

Instead you may want to write:

{\frac {P(x\leq X\leq x+\Delta )}{P(x\leq X\leq x+\Delta |y=Y)}}\approx {\frac {f_{X}(x)\Delta }{f_{X}(x|Y=y)\Delta }}

The problem with mixed distributions is that there is no simultaneous density. This makes it difficult to describe. To show what Bayes' means, use the normal form:

P(x\leq X\leq x+\Delta |Y=y)={\frac {P(Y=y|x\leq X\leq x+\Delta )P(x\leq X\leq x+\Delta )}{P(Y=y)}}

Then use the approximation:

f_{X}(x|Y=y)\Delta \approx {\frac {P(Y=y|x\leq X\leq x+\Delta )f_{X}(x)\Delta }{P(Y=y)}}

which in the limit becomes, after dividing out Δ:

f_{X}(x|Y=y)={\frac {P(Y=y|X=x)f_{X}(x)}{P(Y=y)}}

This is more or less standard procedure (specifically for engineers). Nijdam (talk) 08:56, 28 September 2011 (UTC)

OK, I think this is just confusion over notation. I was using

P(X=x)

as a convenient shorthand for

\lim _{\Delta \rightarrow 0}P(x\leq X\leq x+\Delta )

, since you know the variable is continuous. Also, I was being less rigorous with using

=

as opposed to

\approx

. Otherwise, we've described exactly the same method. Would you be happy with the following? A couple of comments: I have kept the approximation in one line. Ideally I want to keep this as short/readable aspossible, and I think people will know where that's come from. Secondly, from your version, I have changed

P(Y=y|x\leq X\leq x+\Delta )

to

P(Y=y|X=x)

. In this instance, I do think it is justified to write

X=x

directly - the Y probabilities are "continuous" along the x direction, but really are individual probabilities, not densities.

P(x\leq X\leq x+\Delta )=\int _{x}^{x+\Delta }f_{X}(\xi )\,\mathrm {d} \xi

\implies \lim _{\Delta \rightarrow 0}P(x\leq X\leq x+\Delta )\approx f_{X}(x)\Delta

(Edit: meant =)

P(x\leq X\leq x+\Delta |Y=y)=\int _{x}^{x+\Delta }f_{X}(\xi |Y=y)\,\mathrm {d} \xi

\implies \lim _{\Delta \rightarrow 0}P(x\leq X\leq x+\Delta |Y=y)\approx f_{X}(x|Y=y)\Delta

(Edit: meant =)

(Edit:

\lim _{\Delta \rightarrow 0}

)

P(x\leq X\leq x+\Delta |Y=y)={\frac {P(Y=y|X=x)P(x\leq X\leq x+\Delta )}{P(Y=y)}}

\implies

(Edit:

\lim _{\Delta \rightarrow 0}

)

f_{X}(x|Y=y)\Delta ={\frac {P(Y=y|X=x)f_{X}(x)\Delta }{P(Y=y)}}

\implies f_{X}(x|Y=y)={\frac {P(Y=y|X=x)f_{X}(x)}{P(Y=y)}}

Gnathan87 (talk) 10:35, 28 September 2011 (UTC)

This is mathematically not aceptable. I make some necessary changes:

For small Δ:

P(x\leq X\leq x+\Delta )=\int _{x}^{x+\Delta }f_{X}(\xi )\,\mathrm {d} \xi \approx f_{X}(x)\Delta

whence:

\lim _{\Delta \rightarrow 0}{\frac {1}{\Delta }}P(x\leq X\leq x+\Delta )=f_{X}(x)

Analogous:

\lim _{\Delta \rightarrow 0}{\frac {1}{\Delta }}P(x\leq X\leq x+\Delta |Y=y)=f_{X}(x|Y=y)

Also:

\lim _{\Delta \rightarrow 0}P(Y=y|x\leq X\leq x+\Delta )=P(Y=y|X=x)

Bayes" theorem states:

P(x\leq X\leq x+\Delta |Y=y)={\frac {P(Y=y|x\leq X\leq x+\Delta )P(x\leq X\leq x+\Delta )}{P(Y=y)}}

Hence, dividing by Δ, and taking the limit:

\implies f_{X}(x|Y=y)={\frac {P(Y=y|X=x)f_{X}(x)}{P(Y=y)}}

Nijdam (talk) 12:31, 28 September 2011 (UTC)

OK, I agree that the first part is more rigorous. I'm actually coming round to that, I was thinking that writing

\Delta

on the RHS would make it easier to follow, but maybe the extra complexity isn't actually so bad. However, I am still unhappy with this part:

\lim _{\Delta \rightarrow 0}P(Y=y|x\leq X\leq x+\Delta )=P(Y=y|X=x)

Bayes" theorem states:

P(x\leq X\leq x+\Delta |Y=y)={\frac {P(Y=y|x\leq X\leq x+\Delta )P(x\leq X\leq x+\Delta )}{P(Y=y)}}

P(Y=y|X=x) is continuous along the x axis, but it is NOT a density. If this is necessary at all, it should be clear that this step is e.g. taking the mean value from the range

x

to

x+\Delta

, not simply integrating. In any case, I think it is proper to write

P(Y=y|X=x)

even before taking the limit. It may be necessary to write

P(X=x)

and

P(X=x|Y=y)

in terms of limit processes, but

P(Y=y|X=x)

is a discrete set of events for a particular

x

. You need only write

\lim _{\Delta \rightarrow 0}P(x\leq X\leq x+\Delta |Y=y)={\frac {P(Y=y|X=x)P(x\leq X\leq x+\Delta )}{P(Y=y)}}

Finally, I am also still curious about the matter of

\Delta

necessarily vanishing at the same rate in both cases. Is there something I don't know here? This seems to be fine only if we first define

{\frac {P_{2}}{P_{1}}}={\frac {f_{2}}{f_{1}}}

. Try substituting

\beta f_{X}(x)

for

f_{X}(x|Y=y)

, and in the conditional case

\gamma

for

\Delta

. With some algebra you may show:

{\frac {P(x\leq X\leq x+\gamma |Y=y)}{P(x\leq X\leq x+\Delta )}}={\frac {f_{X}(x|Y=y)}{f_{X}(x)}}\left(1+{\frac {\int _{x+\Delta }^{x+\gamma }{f_{X}(\xi )d\xi }}{\int _{x}^{x+\Delta }{f_{X}(\xi )d\xi }}}\right)

It is clear that in general, in order for the RHS to =

\beta

you must set

\gamma =\Delta

. That then seems to justify the technique. However, it is a long way round - why not simply start from the assumption and substitute it straight into Bayes' theorem?

Gnathan87 (talk) 14:00, 28 September 2011 (UTC)

editing

Mixed distribution are problematic to deal with. Of course we cannot show own research here. The analysis I wrote down just serves to make the item plausible in a correct way, as one may easily understand that $\lim P(Y=y|x\leq X\leq x+\Delta )=P(Y=y|X=x)$ . A formal proof is not that easy. If you want to introduce the derivation in the article, I guess you have to look in the literature. Nijdam (talk) 08:45, 29 September 2011 (UTC)

You are mistaken, as I said earlier, if you write:

\lim _{\Delta \rightarrow 0}P(x\leq X\leq x+\Delta |Y=y)={\frac {P(Y=y|X=x)P(x\leq X\leq x+\Delta )}{P(Y=y)}}

.

In the first place you forget to take the limit of the right hand side, and even if you add the limit, both sides tend to 0, so it is of no use for the derivation of the formula. Nijdam (talk) 08:57, 29 September 2011 (UTC)

What I am learning here is that parts of my education on these things have been somewhat informal :) For example, I have always assumed that the limit symbol applies to both sides of the equation. i.e. the general relationship itself is established in the limit. I think I will spend some time learning this properly.

However, it still seems clear to me that if Y is discrete, then

P(Y=y|X=x)

is properly defined without any limit process. See the following diagram:

I do not understand your last question. If you will make more clear to me what troubles you, I'll be happy to help. Nijdam (talk) 09:01, 29 September 2011 (UTC)

new editing

Let me first say: P(X=x)=0 for continuous X. And definitely P(X=x) is not $f_{X}(x)$ . Now about $\Delta$ . The equation you are referring to has the same $\Delta$ on both sides. What else could it be? Taking the limit means taking the limit on both sides, and for (I hesitate to write this obvious fact) the same $\Delta \rightarrow 0$ . Does this solve your problem?

Concerning your picture: there are several mistakes, some just mistyping, I guess.

Let me sum up: X continuous, Y discrete, with joint distribution, given by the joint distribution function $F_{X,Y}$ , with the meaning:

F_{X,Y}(x,y)=P(X\leq x,Y\leq y)

We can find:

P(X\leq x,Y=y)=F_{X,Y}(x,y)-F_{X,Y}(x,y-)

From this:

P(Y=y)=\lim _{x\to \infty }P(X\leq x,Y=y)

and

P(X\leq x|Y=y)={\frac {P(X\leq x,Y=y)}{P(Y=y)}}

Now comes a tricky part: will this conditional distribution function be differentiable for every y? Let us just assume this is the case for this joint distribution. Then:

f_{X}(x|Y=y)={\frac {\partial P(X\leq x|Y=y)}{\partial x}}=

=\lim _{\Delta \to 0}{\frac {1}{\Delta }}P(x<X\leq x+\Delta |Y=y)

f_{X}(x|Y=y)={\frac {{\frac {\partial }{\partial x}}P(X\leq x,Y=y)}{P(Y=y)}}={\frac {f_{X,Y}(x,y)}{P(Y=y)}}

P(Y=y|X=x)={\frac {f_{X,Y}(x,y)}{f_{X}(x)}}={\frac {\lim _{\Delta \to 0}{\frac {1}{\Delta }}P(x<X\leq x+\Delta ,Y=y)}{\lim _{\Delta \to 0}{\frac {1}{\Delta }}P(x<X\leq x+\Delta )}}=

=\lim _{\Delta \to 0}{\frac {{\frac {1}{\Delta }}P(x<X\leq x+\Delta ,Y=y)}{{\frac {1}{\Delta }}P(x<X\leq x+\Delta )}}=\lim _{\Delta \to 0}{\frac {P(x<X\leq x+\Delta ,Y=y)}{P(x<X\leq x+\Delta )}}=\lim _{\Delta \to 0}P(Y=y|x<X\leq x+\Delta )

Any questions, please ask. Nijdam (talk) 19:21, 29 September 2011 (UTC)

I fear we are somehow confusing each other here, so I'll try to keep this short:

I will leave my other concern for now and come back to it later.
The above is fine.
About the mistakes in the image - do you mean, for example, "P(X=x) defined in limit"? If so, this was admittedly a careless choice of wording. To clarify what I mean, since X=x is the event we are interested in, P(X=x) must appear in Bayes' theorem. However, since this would be 0, it is in fact to be written as a limit. Conversely, $P(Y=y|X=x)\neq 0$ (in general). That is the effect of defining Y to be discrete. It may therefore be written directly in that form without any limit.
"In the first place you forget to take the limit of the right hand side, and even if you add the limit, both sides tend to 0, so it is of no use for the derivation of the formula." This is what prompted my comment about the limit. Could you explain what you meant by this? If you are referring to the $P(Y=y|X=x)$ term, again, that is intentionally written without a limit since we are assuming discrete $Y$ .

Furthermore, I do not see why both sides tending to 0 makes this useless. As you did yourself, you substitute the approximations to get the result.

Gnathan87 (talk) 05:01, 30 September 2011 (UTC)

The proper form of what you wrote, would be:

\lim _{\Delta \to 0}P(x\leq X\leq x+\Delta |Y=y)=\lim _{\Delta \to 0}{\frac {P(Y=y|X=x)P(x\leq X\leq x+\Delta )}{P(Y=y)}}=0

.

leading you nowhere. And then again, this equation is only correct derived because the result is 0. You still haven't proven:

\lim _{\Delta \to 0}P(Y=y|x\leq X\leq x+\Delta )=P(Y=y|X=x)

Nijdam (talk) 08:00, 30 September 2011 (UTC)

Above I've added some formulas to make the set complete. Nijdam (talk) 21:50, 30 September 2011 (UTC)

Bad introductory example

I can see where the writer was coming from, and its certainly an engaging example, but its probably a bad choice as it uses a fixed probability p=0.01 for the likelihood of a arbitary member of the population of having cancer. As the author no doubt new, this is a vast simplification, and in fact one could use Bayes' Theorem again to give an estimate of the likelihood that you have cancer, given that you've agreed to have a mammogram (significantly higher) since when making that choice you may be basing it on high prevalence of cancer in your family history etc etc. Perhaps with a modification saying that if all the population were screened when they were 50, irregardless of their family history (though then you'd have to also say that they were forced to have mammograms even if they didn't want to have them- as people declining would also require bayesian modification). You see where I'm coming from? I'd swap for a non biological example otherwise it complicated matters. Hai2410 (talk) 23:26, 24 May 2010 (UTC)

Same for the drug testing example, of course - you'd usually only test suspicious cases, where you expect the chances of actual abuse to be much higher. --195.57.192.25 (talk) 09:04, 17 February 2011 (UTC)

in the drug testing example, it would be useful to show the calculation for P(User|-); it would be 5 x 10^-5. So, even though there is a 0.33 probability that someone with a positive test is a user, that is still 6,600 times greater than the probability that someone with a *negative* test is a user. After all, users are 99 times more likely to have a positive test than non-users (positive LR=99). — Preceding unsigned comment added by 174.60.36.236 (talk) 03:46, 19 February 2012 (UTC)

I don't like the new one either. One point worth making is that implied asumptions are something which should be introduced later on when talking about probability, i.e. don't automatically assume that the liklihod of meeting a woman on a train, on the assumption that you are going to meet somebody, is .5. Women might tend not to travel so much on certain trains at certain times of day (an assumption I make - I do not know). It is a good idea to use a sex based example since not only are the sexes very distingushable but each sex has generally differing traits suich as what clothing each sex tends to wear - again, though, this is speaking in general terms. Why not try the faraway teacher example? A school specifically prohibits pupils climbing out of and into the ground floor windows of the school house. A teacher taking a lunchtime walk some distance from the school house sees a person wearing trousers jumping from a window of the school house. He is too far away to tell whether the person jumping from the window is male or female. He assumes that the person jumping from the window is a pupil (or there is aditional information allowing him to verify that fact). If the school comprises equal numbers of females and males and all the males wear trousers but only 40% of the females wear trousers then what is the probaility that the teacher has seen a female? Assume, importantly, that there are no social or psycological factors in play (females less likely to break school rules, males more likely to be outside anyway playing football and so on). Any thoughts? — Preceding unsigned comment added by Blueawr (talk • contribs) 13:35, 21 February 2012 (UTC)

Maybe somebody can come up with an example that isn't sexist? — Preceding unsigned comment added by 69.142.244.49 (talk) 03:38, 22 February 2012 (UTC)

Agree with above. Can we make a more culturally agnostic example than our quilting example so that readers do not have to grasp an assumption being made in order to understand it? Here is a rough idea - one based on the idea of rain - since its relevant properties are more timeless and more likely to be taken as trivially true regardless of a person's time and place:

Suppose I live in a relatively dry climate where it rains only 1 in 100 days. Thus if

R

is the event that it rained today, the probability that it rained today is

P(R)=0.01

provided we have no further information. Now suppose that we go outside and notice that the ground is wet; let this event be called

W

. This observed evidence drastically increases the probability that it rained today. As a result, our updated probability of

R

given our evidence

W

is going to be greater than

P(R)=0.01

. Let

P(R|W)

denote our updated probability - the probability that it rained today given that the ground is wet. We can calculate this quantity using Bayes' theorem as:

P(R|W)={\frac {P(W|R)P(R)}{P(W)}}={\frac {P(W|R)P(R)}{P(W|R)P(R)+P(W|\neg R)P(\neg R)}}

where

P(W|R)

is the probability that the ground is wet given that it rained,

P(\neg R)

is the probability that it did not rain, and

P(W|\neg R)

is the probability that the ground is wet given that it did not rain. Since there are several potential explanations for the ground being wet,

P(R|W)

will not be 1. For instance, it may be the case that it did not rain but that the ground is wet because of a flood. In this case, we would say that neither

P(W|\neg R)

nor

P(\neg R)

are 0.

Mmattb (talk) 20:47, 31 May 2012 (UTC)

An alternative approach

An alternative approach to finding conditional distributions is through the notion of disintegration.

Let X and Y be any two random variables (discrete or continuous or neither, in any combination). It's a theorem that the joint distribution of the two can be built up in the obvious way by combining (a) the marginal distribution of X, and (b) the family of conditional distributions of Y given X=x. Note on my use of the word the: it is a theorem any two choices of disintegration will be equal, up to events of probability zero.

So from this point of view all you have to do is to guess the answer and then check that it's correct. How to check the answer? well, you just have to check that the probabilities of enough events coincide, e,g. rectangles.

Reference: any modern book on measure theoretic probability, for instance David Pollard's book A User Guide to Measure Theoretic Probability (CUP).

Moral: no need to go through all this differentiation stuff. The defining property of a probability density is that when you integrate it you get probabilities of events. Differentiating a cumulative distribution is not the definition. It's a useful property of distribution functions which are smooth enough. There are also cumulative distribution functions which you can differentiate almost everywhere, but such that the result is not the probability density.

Conditional probability distributions are also defined by what happens when you integrate them. Example: we want to be able to calculate expectation values of functions of several random variables as follows: E(g(X,Y)) = E( E(g(X,Y)|X ) ). In other words: first of all fix X=x and compute E(g(x,Y)|X=x) by using the conditional distribution of Y given X=x. This results in something, say h(x), which in principle can depend on x. Now compute E h(X) by using the marginal distribution of X.

And yes indeed. It's a theorem that there exists a family of probability distributions of Y, one for each possible value of X, which we call the conditional distribution of Y given X, such that this recipe works (except possibly in situations where expectation values don't make any sense at all, ie. +infinity for the positive part, -infinity for the negative part). And moreover that family of probability distributions is uniquely defined (up to probability zero events for values of X).

Conclusion: use your intuition or use a heuristic limiting argument to guess the conditional distribution of Y given X. Then check that together with the marginal distribution of X it reproduces the joint distribution of X and Y. Richard Gill (talk) 13:55, 29 October 2011 (UTC)

PS see my essay on probability notation [3] Richard Gill (talk) 14:32, 29 October 2011 (UTC)

I did already some time ago, and I doubt whether it is always only lazyness for people to treat f(x) and f(y) as different functions. I would never show such nonsense to students. I also dislike the notation

p_{X|Y}(x|y)

, as what actually given is the event {Y=y}. This is shown, but really clumsy, in

p_{X|Y=y}(x)

. Nijdam (talk) 07:12, 30 October 2011 (UTC)

"___ interpretation of probability"

Would it be acceptable to revise the visible text "under the freqeuentist interpretation of probablility, probability is..." to "under the frequentist interpretation, probability is..." and "under the Bayesian interpretation of probablility, probability is..." to "under the Bayesian interpretation, probability is..." ? I think it would improve clarity. —Monado (talk) 04:41, 8 December 2011 (UTC)

Agree. I've just made some edits to the lead which include this. Gnathan87 (talk) 01:52, 11 December 2011 (UTC)

Introduction

I wasn't complete happy with the former introduction, but I'm less with the present. It's not the mathematical relation that counts, but the interpretation. Somehow I would explain that if an event A may occur on the basis of several "causes" B1,...Bn, Bayes shows how likely is that Bi was the cause when A occurred. If others agree, I (we) may find a suitable formulation. Nijdam (talk) 18:20, 11 December 2011 (UTC)

Just to check, are you referring to the description of the frequentist interpretation or the Bayesian interpretation? If frequentist, I totally agree. What I have found tricky is that a description similar to that you provided is really just an interpretation of conditional probability, not of Bayes' theorem. The best way to understand it seems to me just to have a good understanding of conditional probability... I think we should be wary of trying to explain anything complex in the lead. IMHO what is appropriate is just to provide a straightforward interpretation of terms in the simple statement. For the Bayesian interpretation, the idea in any case was to move the details to Bayesian inference, which is why I thought it suitable just to provide a flavour of belief updating here. Gnathan87 (talk) 21:46, 11 December 2011 (UTC)

No, just to the very first sentences. ... gives the relationship between the probabilities of A and B, P(A) and P(B), and the conditional probabilities of A given B and B given A, P(A | B) and P(B | A)... is not informative concerning he meaning of the theorem.Nijdam (talk) 22:09, 11 December 2011 (UTC)

Hmmm... I'm not sure that's so bad actually. The interpretation is not so straightforward, and the second part of the lead is clearly devoted to it (beginning with "The meaning of Bayes' theorem..."). I think it's better to put the more general, simpler material up front. Part of the motivation for putting that stuff there was also to address some of the concerns that were raised about the lack of definitions. Gnathan87 (talk) 22:24, 11 December 2011 (UTC)

I'll give it a try: An event A may occur simultaneously with another event B or not, with conditional probabilities given that B has happened and given that B not has happened. If A actually has occurred the theorem of Bayes' relates the conditional probability that it was B that happened with the before mentioned reverse probabilities. As an example think of a certain disease A that may be caused by a shortage V of vitamins or by another cause, with known conditional probabilities P(A|V) that the disease is caused by V and P(A|not V) that is has another cause. If the disease A actually occurs, what would be the conditional probability P(V|A) it was caused by V? Bayes' formula relates:....etcNijdam (talk) 09:47, 12 December 2011 (UTC)

OK, I see where you're going. I've adapted what you wrote for use as the description of the frequentist interpretation. (It's now in the same format as the description of the Bayesian interpretation. not to suggest I think the new version is perfect - but better than what was there.) To copy here:

In the frequentist interpretation, probability measures the proportion of outcomes in which an event occurs. Bayes' theorem then links inverse representations of the frequency with which events

A

and

B

occur. For example, suppose that members of a population may or may not have a risk factor for a medical condition, and may or may not have the condition. The proportion with the condition will depend on whether the group with or without the risk factor is examined. The proportion having the risk factor will depend on whether the group with or without the condition is examined. If the proportions are known in one view, they may be converted to the other using Bayes' theorem. For events

A

and

B

,

P(A|B)

is the proportion of outcomes in

B

that are outcomes in

A

, and

P(B|A)

is the proportion of outcomes in

A

that are outcomes in

B

.

P(A)

and

P(B)

are the overall proportions of

A

and

B

.

I'm still not sure that this should replace what is at the top - I suspect that a suitable description of the different interpretations is always going to be too long to fit in the first paragraph. I think it's better to use the opening paragraph for the most general - and neutral - description, which also serves as a convenient place to define the notation. Gnathan87 (talk) 22:43, 12 December 2011 (UTC)

Content

There are complaint about the technicality of the article. It always appears that articles tend to much more technicality than needed for an encyclopedia. We have to focus on the more ore less layman, interested in the subject. That's why I rewrote part of the introduction and gave the introductory example. Nijdam (talk) 23:59, 1 January 2012 (UTC)

Of course, I'm all for accessibility, particularly in the lead. The big issue I have with the current lead is NPOV. It reads like the subjective interpretation is the indisputable meaning of Bayes' theorem:

"In probability theory and statistics, Bayes' theorem is a method of incorporating new knowledge to update the value of the probability of occurence of an event."

Objectivists (such as Popper) strongly would disagree. I think both interpretations must be given equal weight in the lead. (Having said this, I also think that first foot forward in Bayes theorem should tend to be the frequency perspective, basically because Bayesian inference has its own article.) Starting with "Bayes' theorem is [one particular view]" is problematic from an NPOV perspective, which is why I have always kept the more neutral, mathematical explanation right at the top. It is also my preference, because this article is about Bayes theorem, not Bayesian inference, or anything else. Bayes' theorem is fundamentally a mathematical relation, and so that is how I think it should be introduced.

I would also point out that the description is actually wrong - Bayes' theorem is not a "method". The current description is of Bayesian inference.

Part of the problem with accessibility may well be that the frequency interpretation is difficult to explain succinctly (at least, I have found it difficult). It is certainly tricky to explain its significance to the layman. However, I would argue that this is not a reason to remove it from the lead. Incidentally, I would now suggest that we do not worry about interpreting each term in the lead, since that material is now laid out below. This will greatly improve accessibility.

Finally, I'm not sure about having the example up front like that. First of all, it is not NPOV. Secondly, although this is certainly the approach I would take if writing e.g. a textbook, it doesn't strike me as suitable here. Not everybody will want to use the article like a tutorial, some more like a reference. And it is pretty straightforward for the reader to skip to the examples as necessary (and no doubt they would expect that format anyway). Actually, before you added the example I was trying to avoid getting too much into Bayesian inference in Bayes' theorem, but having seen it I actually really like the idea of having a short Bayesian example alongside the others. (By the way, might the thing about quilting come over as politically incorrect?! ^^)

Gnathan87 (talk) 07:15, 2 January 2012 (UTC)

Edited the lead again, attempted to take everything that has been said on board. The new lead is definitely much more layman-friendly, and NPOV. Hopefully we are progressing towards a consensus? :) (Although, I am still not sure about the introductory example.) Gnathan87 (talk) 07:32, 8 January 2012 (UTC)

I'm not happy with the new intro. It seems that the meaning of the theorem and the interpretation of probability are somehow interwoven. Nijdam (talk) 20:09, 13 January 2012 (UTC)

That is in my view as it should be - is it not the interpretation of probability that bears directly on how the theorem is to be interpreted as a whole? For example, using a frequency interpretation, epistemological terms such as "updating beliefs" are meaningless. Of course, it is often possible to view the same example either way., e.g. the beetles example could be seen as either updating the state of knowledge about the rarity of the beetle, or calculating the frequency with which a patterned beetle is rare. Gnathan87 (talk) 14:01, 14 January 2012 (UTC)

event?

I speak English. I've read many books about science and physics and I can follow most of them without any problem. But when I read the introductory example on this page, it seemed like greek to me:Call W the event he spoke to a woman, and Q the event "a visitor of the quilt exhibition".

This may be a valid use of the word event in some technical circles, but its nonsense to most people. Does it mean "Call W the event WHERE he spoke to a woman."? or does it mean "Call W the probability that he spoke to a woman?" or perhaps it means "Call W the event OF speaking to a woman"? Or maybe it means something else entirely. I don't know and I haven't a clue what it means.Rodeored (talk) 00:27, 17 February 2012 (UTC)

Removed the technical tag

I've reworked the introductory example, and I hope I've made it a bit clearer (and also a bit less sexist). So I'm removing the {{technical}} template. Feel free to re-add if you still think this needs work. --Zvika (talk) 09:45, 10 June 2012 (UTC)

I do not like the new formulation of the introductory example. It's much more complicated than the former one. The one I introduced showed all that's needed. Nijdam (talk) 21:20, 10 June 2012 (UTC)

The present version looks OK to me. Of course there is the unstated assumption that equal numbers of men and women might be encountered on a train, which seems unlikely to hold ... perhaps it actually needs to be expanded to make this assumption clearer. If someone does have reason to replace the {{technical}} template, it would help if they could indicate a section rather than the whole article, or be more specific here on the talk page. Melcombe (talk) 21:45, 10 June 2012 (UTC)

What was wrong with my "quilt" example? Will anyone change the introductory example if they like their version better? Nijdam (talk) 08:36, 11 June 2012 (UTC)

Spelling

If English spelling demands Bayes's theorem is spelled this way, we should correct this anywhere it is necessary. Nijdam (talk) 11:43, 19 November 2012 (UTC)

No comment? Nijdam (talk) 16:49, 10 December 2012 (UTC)

See a few sections above. 81.98.35.149 (talk) 23:51, 27 January 2013 (UTC)

Spelling of Bayes's

I believe the proper grammar is Bayes's as opposed to Bayes'.

Nope, the proper way to make a noun ending in "s" possessive is to add an apostrophe without an additional "s". Hence "Thomas' shoes" and "Bayes' Theorem".

It is not that simple. Both New York Times and Oxford University Press write "Bayes's Theorem". Is "Bayes's" more British or more archaic version? I am not a native speaker, and I think I was taught to write "Charles's" many, many years ago... 82.181.47.81 (talk) 08:24, 24 June 2012 (UTC)

In the English language, the possessive of a singular noun is formed by adding apostrophe and s, regardless of whatever letter the word ends with, e.g., Thomas's pen. The possessive of a plural noun (such as Joneses) is formed by adding s and apostrophe, unless the word already ends in s, in which case just add apostrophe, e.g., the Joneses' house. The point is that apostrophe and s versus s and apostrophe marks a vital semantic distinction. The term Bayes' theorem thus correctly applies only to a theorem drafted by a group of people all named Baye. By contrast, Bayes's theorem denotes a theorem drafted by one person named Bayes.

Reverend Thomas Bayes was not a plural person, and his surname was not Baye.

Many Wikipedians will be inclined to cite any number of popular contrary uses as if they somehow made gratuitous confusion between singular possessive and plural possessive to be acceptable. Sadly, one of those uses would the title of Sharon Grayne's excellent book on the good Reverend's work. To those Wikipedians I would say: How do you distinguish between recording of specie (payment, singular) and recording of species (types, plural), if you'd say species' recording for both?

(It's possible that specie in that sense is a mass noun, not countable, but it's difficult to think offhand of a pair of good common nouns for this illustration, but I hope the point is clear anyway.)
(The unsigned comment added by 208.88.176.15 (talk) 1 November 2012)

I agree that proper use is Bayes's rather than Bayes', whereas the latter could be quite common due to popular confusion or mis-application of grammar rules. Most respected traditional English language grammar styles would unambiguously agree on "Bayes's". I'll go ahead and make changes unless there's a good overwhelming evidence that the current version (Bayes') is one of the few rare exceptions. cherkash (talk) 00:18, 26 January 2013 (UTC)

Don't ignore the preceding dscussion at http://en.wikipedia.org/wiki/Talk:Bayes%27_theorem/Archive_1#Spelling_of_of_possessive_ending_in_.27s.27 Further. a typical grammar book says "names ending in "-es" pronounced iz are treated like plurals and take only an apostrophe ..." (Oxford English, OUP). 81.98.35.149 (talk) 23:50, 27 January 2013 (UTC)

You will need to provide a better reference, the one you gave is very vague (is "Oxford English" a book? who's the author? ISBN?). The reference you alluding to is also dubious, as according to it Jones's should be Jones', etc. – which is not the case that most manuals of style would agree with. Besides, the above argument on Baye's and Bayes' is as reasonable as anything you could find in support of proper English grammar. Further, although some speakers may prefer to pronounce Bayes's as Bayes' (the main argument that most proponents of using loose rules on possessives would allude to), there's clear value to adhere to Bayes's in writing as it avoids ambiguity between plural and singular possessives. cherkash (talk) 01:22, 28 January 2013 (UTC)

The ref has isbn 0198691696, which is better detail than antyhing you have provided. You clearly haven't taken on board the difference in pronouncation between Jones and Bayes, which is of some importance. In particular, everyone says "Bayes theorem", rather than "Bayeses theorem". Further, Wikipedia rules are to follow the general usage within the general field concerned, here statistics and probability, rather than to impose some global set of rules for supposed uniformity. The fact is that general usage in statistics and probability is Bayes' theorem. 81.98.35.149 (talk) 08:44, 28 January 2013 (UTC)

Both spellings are acceptable in British English but Bayes' is far more common for this subject. Martin Hogbin (talk) 14:57, 22 March 2013 (UTC)

Introductory example - Sexism?

The sentence reads: "If he told you the person he spoke to was going to visit a quilt exhibition, it is far more likely than 50% it is a woman. " Clearly, in our modern society this is sexist, as men are just as likely to go to quilt exhibitions. I don't think that Wikipedia should be biased. — Preceding unsigned comment added by 129.215.5.255 (talk) 14:12, 12 April 2012 (UTC)

I don't know about sexist, but it could certainly do with being more explicit. Add, for example, that you read in the paper that 95% of quilt conference attendees are female, and you have something that relies less on your individual assumptions about gender roles and more on a rational interpretation of known facts. ...BTW, there are quilt conferences? I have never heard of this. — Preceding unsigned comment added by 184.187.186.33 (talk) 23:41, 24 May 2012 (UTC)

Shouldn't the question be "what percentage of people with long hair are women"? Not what percentage of women have long hair. Or are these recipricals of each other. I don't really understand these things very well. It's just a question.Longinus876 (talk) 12:15, 26 March 2013 (UTC)

P(W|L) and P(L|W), the chance someone with long hair is a woman and the chance a woman has long hair, are two different things and Bayes' theorem shows how they are related. In fact, P(W|L)/P(L|W)=P(W)/P(L). That's the whole point. Richard Gill (talk) 10:52, 20 April 2013 (UTC)

Incidentally, the example shows yet again that it is easier to explain Bayes to laypersons using the Bayes' rule version rather than the conventional Bayes' theorem version. I think the reason why conventional elementary textbooks use the clumsy and unintuitive formula rather than the simple and intuitive Bayes rule is because they are uncomfortable with using the concept of "proportionality" and scared of using the concept of "odds". However, both when explaining Bayes to laypersons, and in modern applications in science, statistics, information technology ... It is Bayes' rule which we use, every time. Richard Gill (talk) 11:16, 20 April 2013 (UTC)

Two distinct interpretations?

The lead io the article starts by saying that Bayes' theorem has two distinct interpretations. I think this is hardly true, and not important. Bayes' theorem is an elementary identity following from the definition of conditional probability (and, in some forms, the law of total probability). The article refers to distinct interpretations of probability, not of the theorem! Richard Gill (talk) 10:38, 20 April 2013 (UTC)

Lead rewritten. Better? Note that this article originated in a way that mixed Bayes' theorem with Bayesian inference, and so some aspects of this may linger. Melcombe (talk) 13:32, 20 April 2013 (UTC)

Much better! Richard Gill (talk) 15:16, 20 April 2013 (UTC)

Bayes' Rule vs Bayes' Theorem

The correct way in English to express the possessive when a name ends in an s is by a single apostrophe after the s which is already there. See for instance http://www.cs.ubc.ca/~murphyk/Bayes/bayesrule.html, http://plato.stanford.edu/entries/bayes-theorem/

The text on Bayes' rule said that it depended on the Bayesian interpretation of probability but that is not true. Bayes' rule is equivalent to Bayes' theorem and both are valid for any probability interpretation. The equivalance is a mathematical fact which follows from the normalization of probability. If a probability space is partitioned into some events A, B, C, ... and you know the probabilities of A, B, C ... up to proportionality, then you know them absolutely, you just have to divide by the sum of what you already have. Richard Gill (talk) 13:23, 22 March 2013 (UTC)

But they are not fully equivalent, as Bayes' rule is useable with improper distributions (Note: I have reverted a change relating to this point in Bayes' rule, so you should check that if you haven't spotted the reversion). Melcombe (talk) 13:41, 20 April 2013 (UTC)

But you agree that in the context of standard probability models, they are essentially equivalent? (Improper distributions fall outside of "normal" probability theory). Richard Gill (talk) 15:11, 20 April 2013 (UTC)

Well yes, but improper priors are mainstream-enough in statistics (Bayesian inference) that they can't be entirely ignored. The problem is to find a concise enough (and true) statement of what is known, with a source. Most of the supposed facts in this (and some other) articles are unsourced. Melcombe (talk) 15:41, 20 April 2013 (UTC)

Yes: a major problem with this and related articles (esp.: Bayes' rule) is to source them. I have been looking at free internet pdf textbooks on statistics and probability, together with my own collection of bought ebooks; so far I don't find anything suitable. It seems that there is a big gap between what everyone knows and teaches in their classes, and what is written in the standard textbooks.Richard Gill (talk) 10:38, 21 April 2013 (UTC)

Found two good (modern) books: Gelman et al; Lee.

Regarding the equivalence: Bayes' rule is 'posterior is proportional to prior times likelihood or equivalently posterior odds equals prior odds times likelihood ratio'. If the prior distribution is proper, then Bayes' rule implies Bayes theorem if we may assume the law of total probability to identify the normalization constant $\sum _{A}P(B|A)P(A)=P(B)$ ; here the events $A$ are assumed exclusive and exhaustive - they form a partition of the universe of possibilities.

If the prior is improper then we are outside of proper probability theory and $P(B)$ is not defined, there is no Bayes' theorem.

Conclusion: Bayes' rule is stronger than Bayes theorem. But I think that "equivalent" is OK in an encyclopedia article.″

Article typical of much wiki nonsense -- instead of helping layman becomes evermore unavailable

Can anyone read the first several paragraphs of this article and understand it without having a good understanding of probability?

I actually have a BS in math, from 30 years ago, but since I haven't used it frequently, I am immediately put off by notation like P(A|B) WHICH IS NEVER DEFINED IN THIS ARTICLE.

I am sure this article only gets more and more precise and accurate.

DOES IT EVER GET MORE ACCESSIBLE TO THE LAYMAN?

Making articles ever more technical, and ever less accessible is a form of wiki rot and wiki masturbation.

The guideline covering these point is WP:MTAA. I recently updated much of the current article, which from my point of view is "one level down". However, maybe I misjudged that. I have just made some minor revisions that are hopefully improvements. I think you do have to be careful though, because trying to make things too accessible or too self-sufficient can detract from quality. For example, what exactly would you hope to take from an explanation of Bayes' theorem without mention of conditional probability or events? Should these things be explained in every article they are used? I think it is good to expect a particular body of knowledge, as long as readers are also clearly pointed to the prerequisites. Gnathan87 (talk) 22:43, 19 November 2011 (UTC)

I agree totally with the OP. I see this all over the place. It's useless to teach this way. I don't think it's typical of Wiki, but it is common. I call it "show-off writing."

I don't really have a great aptitude for reading or writing formulas. I know they're needed, but when mixed with complicated explainations, total confusion aboundsLonginus876 (talk) 12:32, 26 March 2013 (UTC)

I suggest you obtain a introductory probability text to help remove your ignorance, rather than just moaning about it. These are encyclopaedia articles, not textbooks; they must assume a certain level of education or else each article would be enormous. — Preceding unsigned comment added by 86.148.236.176 (talk) 09:05, 29 May 2013 (UTC)

I'm sorry that I was moaning. I had no idea.Longinus876 (talk) 11:47, 28 June 2013 (UTC)

The article should aim to cover all levels of ability, knowledge and experience as far as possible, without any dumbing down of the subject. I will see if I can help. All comments on progress welcome. Martin Hogbin (talk) 10:31, 29 May 2013 (UTC)

I agree with much of the above. The article should be encyclopedic. It should also be accessible/helpful to beginners. I believe starting with a formula is of no help to anyone. It makes no sense to a beginner and those who know it -- already know it. I would recommend _starting_ with a 2x2 matrix crossing the two conditions with each other. Not easy for me to do in a text editor, but I trust the wiki mavens know how to insert diagrams. The matrix makes all of the basic stuff (and much of the complex stuff) visual, clear and accessible. (Ned)66.162.154.131 (talk) 04:16, 14 July 2013 (UTC)

Merge Bayes' rule with Bayes' theorem?

Reading the literature, both modern and old, it is clear to me that the phrases Bayes' theorem, Bayes' law, and Bayes' rule are all used interchangeably for any and all of the following mathematical results:

P(H|E)={\frac {P(H)P(E|H)}{P(E)}}

P(H_{i}|E)={\frac {P(H_{i})P(E|H_{i})}{\sum _{n}P(E|H_{n})P(H_{n})}}

Posterior odds equals prior odds times likelihood ratio

Posterior is proportional to prior times likelihood

In my opinion they are all mathematically equivalent (if we take the law of total probability as given). I suggest the articles on Bayes' theorem and Bayes' rule are merged. Richard Gill (talk) 14:36, 22 April 2013 (UTC)

Shouldn't it be Bayes's Theorem

Since the fellow's name is Bayes the proper name Bayes should be followed by 's to indicate ownership; this is because the ' to indicate ownership only happens when there is ownership and the word has been pluralized, which is clearly not the case here. 128.123.198.163 (talk) 02:10, 5 September 2013 (UTC)

The article title reflects what it's called in textbooks and academic literature. Names ending with an s can form a genitive without an additional s. MartinPoulter (talk) 09:27, 5 September 2013 (UTC)

Recent addition to drug testing example is confusing

"Note that in the structure of this computation, an assumption is being made that the sensitivity and specificity of this drug test depend directly on the incidence of drug users. This assumption can be controversial, however, as the accuracy of a drug's test is usually derived with known information about the individual being tested."

I read this a few times and can't work out what it is trying to say. Is the idea that the sensitivity and specifity of drug tests are usually not known very accurately? If so, maybe it would be better just to change the example to some other kind of medical test whose accuracy can be determined more easily. Or maybe the point is that drug tests are often only carried out when there is some other kind of evidence pointing to drug use, but the example does specify random drug testing. 130.88.99.231 (talk) 16:05, 22 October 2013 (UTC)

Those two sentences seem extraneous and serve to confuse naive readers about an otherwise straightforward and easy to comprehend example. I've removed them for the time being. If people think this removal was wrong please undo it. — Preceding unsigned comment added by 71.162.86.54 (talk) 04:39, 7 November 2013 (UTC)

At the very least, the example should be changed because the number .99 appears twice, for two different purposes. Perhaps change one of them to .98. — Preceding unsigned comment added by MathPerson (talk • contribs) 18:55, 31 January 2014 (UTC)

Bad introductory example

The introductory example is badly chosen. It is a numerical coincidence that in this case, P(W|L) turns out to be equal to P(L|W). Yet typical readers are struggling to understand and distinguish the different concepts behind these notations.

I agree and I find the it even more confusing that in the denominator P(W)+P(M)=1 and P(L|W)+P(L|M)=1.

The first must be true because M is the complementary event of W, but the second is pure coincidence.

The old example with P(L|M)=0.3 was much better IMHO. It was changed by an anonymous contributor with no explanation.

http://en.wikipedia.org/w/index.php?title=Bayes%27_theorem&diff=551239975&oldid=551139456

--88.217.5.95 (talk) 19:57, 21 May 2013 (UTC)

The example shows yet again, that to explain these things to newcomers, Bayes' rule is much better than Bayes' theorem: more insight, less blind calculation. Richard Gill (talk) 11:24, 20 April 2013 (UTC)

Moreover the "graphic" equation has the term P(L) yet P(L) is never defined in the text. Geĸrίtzl (talk) 00:27, 12 February 2014 (UTC)

"... sufficient to deduce all 24 values" ??

I can't really identify 24 distinct probabilities in the picture that illustrates tree diagrams (frequentist interpretation with tree diagrams); in the matter of fact I see only 16. — Preceding unsigned comment added by 85.110.18.176 (talk) 22:40, 18 February 2014 (UTC)