Wikipedia:Wikipedia Signpost/2016-04-24/Op-ed
Knowledge Engine and the Wales–Heilman emails
The Wikimedia Foundation board's communications in the wake of the removal of the community-selected trustee James Heilman have consisted mainly of silence. Jimmy Wales has been the most notable exception, having made strongly worded statements along with promises to provide further information – promises that have remained unfulfilled.
The "gaslighting" email
In recent weeks, onwiki debates on the Knowledge Engine and Heilman's dismissal have largely died down. The most recent substantial discussion took place three weeks ago on Wales' talk page, when Wales set out to explain certain comments he had made in a February 29 email sent to Heilman and Pete Forsyth, shortly after the resignation of executive director Lila Tretikov.
As Signpost readers will recall, Forsyth took the controversial decision to forward Wales' mail to the Wikimedia-l mailing list. Forsyth felt it could provide "important insight into the dynamics surrounding Heilman's dismissal". In the ensuing debate, a number of people pronounced themselves horrified by the tone and content of Wales' email, likening it to "gaslighting" (a form of mental abuse). Others criticised Forsyth for his decision to publish it.
In his email, Wales cast a string of aspersions on Heilman's character before taking particular issue with a February 24 post by Heilman in a Wikipedia Weekly Facebook discussion of the Knowledge Engine project.
The Facebook discussion
Heilman's Facebook comment had a context. In the discussion (accessible only to logged-in Facebook users), Liam Wyatt said he was unsure that Wales could be characterised "as having been 'kept in the dark'" about the Knowledge Engine project. "James has said that the board as a whole was presented with these plans – that it was described as 'a moonshot' and that they were presented with cost estimates in the tens-of-millions," Wyatt added, pinging Heilman in his post. Heilman then replied minutes later that he had indeed asked Board members in October whether they understood "that we were building a 'search engine' as before Oct I did not realize we were. JW said that he understood this all along and it was something we needed to do."
The post appears to have angered Wales. In his email, he wrote to Heilman:
As an example, and I'm not going to dig up the exact quotes, you said publicly that you wrote to me in October that we were building a Google-competing search engine and that I more or less said that I'm fine with it. Go back and read our exchange. There's just no way to get that from what I said – Indeed, I specifically said that we are NOT building a Google-competing search engine, and explained the much lower and much less complex ambition of improving search and discovery.
Attentive readers will note that the phrase "Google-competing search engine" appears nowhere in Heilman's post.[1] Heilman was responding to a post that said there was a search engine project that the board was told would cost tens of millions of dollars.
Selective quoting
When Peter Damian challenged Wales to dig up the exact quote, Wales produced it, and to back up his point published excerpts from the October email conversation, with selected quotes from Heilman and himself.
Heilman asserted that Wales' summary of the exchange was "far from complete", and "not an accurate representation of the overall discussion". He asked Wales whether he would have any objection to the complete exchange being posted, so the parts Wales had quoted could be seen in context.
Wales raised no such objection, and the full exchange, as made available to the Signpost by Heilman, is published below. It shows that the accusations Wales levelled at Heilman for his Facebook post were groundless and contrived. In the actual conversation, Wales said to Heilman that –
- the ambitious vision of a search engine project as presented to the Knight Foundation, offering "a unique search experience that will go beyond what Google and Bing are already providing their users", corresponds exactly to what was approved by the board;
- he is "broadly supportive" of that strategic vision;
- the scope of the project goes well beyond a Wikimedia-internal search function;
- it includes building a natural-language question answering system akin to Google's Knowledge Graph and answer boxes;
- the project is motivated by a desire to compete with Google: "users don't come to us", Wales said, because "Google just tells the answers";
- the project is a very major financial investment for the Foundation (Wales later confirmed onwiki that it was in the ballpark of $35 million, spread out over several years).
At the same time, Wales omitted to mention in his summary the concerns put forward by Heilman about the cost and scope of this long-term project, and the WMF's qualifications for undertaking it.
One way to look at this situation is that Wales has essentially been launching vigorous attacks on a strawman – the idea that the Foundation might be intending to build a search engine that does all the things Google does: crawling and indexing everything from books, journals and newspapers to social media sites, online shops and cinema schedules. But his apparent single-mindedness in pursuing this strawman cannot make up for the fact that this is not something Heilman has ever claimed. What Heilman did claim was that the Foundation was planning to build a search engine that would cost tens of millions of dollars. In that, he was undoubtedly correct.
The complete October email exchange
The passages Wales quoted on his talkpage are in green. Salient parts Wales omitted from his summary are in bold red.
James Heilman, Oct. 5
Hey Jimmy
Did you realize that we have been developing a search engine for about a year in an effort to compete with Google? Best
Jimmy Wales, Oct 6
I wouldn't have described it in that way, nor do I think the Foundation would, but yes, I'm aware of work in the area of improving search and discovery across all our properties.
James Heilman, Oct. 6
This document from Aug 5, 2015 states:
- "The foundation and its staff have a track record of success and a strong vision of what a search engine can do when it has the right principles, and the right people, firmly behind it."
- "Knowledge Engine by Wikipedia will be the Internet's first transparent search engine, and the first one originated by the Wikimedia Foundation."
The Sept 18, 2015 grant agreement states "the Knowledge Engine by Wikipedia, a system for discovery of reliable and trustworthy public information on the Internet" as the purpose.
The June 24th 2015 document show images of a Google like setup. While the June 30th document states "how is WMF going to build a unique search experience that will go beyond what Google and Bing are already providing their users?"
The plan appears to be for this search engine to go at www.wikipedia.org What else would you call what is being described? This is not a search tool for Wikimedia properties. It also appears to include Watson / Google graph type functionality
Jimmy Wales, Oct. 7
Yes, that sounds exactly like what Lila presented to the board for approval, and what was approved by the board.
This statement alone, omitted by Wales in his summary, seems ample justification for what Heilman wrote on Facebook. The exchange continued:
James Heilman, Oct. 10
Okay. I must say I am confused than, because Lila now denies that we are building a "search engine".
Yet from your perspective we were told that we were building the "Internet's first transparent search engine" and we approved that?
Jimmy Wales, Oct. 10
I'm not really sure what is causing your confusion here. Perhaps it is just the term "search engine" which in some contexts may mean "a website that one goes to as a destination in order to find things on the web, such as Google, Bing, or Yahoo" and in other contexts can mean "software for searching through a set of documents and resources".
But I'm not really sure what your concern is...
Right now the page at www.wikipedia.org is pretty useless. There's no question it could be improved. Is your concern that if we improve it and it starts to look like a "search engine" in the first definition this could cause us problems?
Are you concerned that in due course we might expand beyond just internal search (across all our properties)?
Right now when I type "Queen Elizabeth II" I am taken to the article about her. I'm not told about any other resources we may have about her.
If I type a search term for which there is no Wikipedia entry, I'm taken to our wikipedia search results page – which is pretty bad.
Here's an example: search for 'how old is tom cruise?'
It returns 10 different articles, none of which are Tom Cruise!
When I search in Google – I'm just told the answer to the question. Google got this answer from us, I'm quite sure.
So, yes, this would include Google graph type of functionality. Why is that alarming to you?
James Heilman, Oct. 10
Yes so I think an open source knowledge engine like IBM's Watson and an open source search engine are cool ideas but:
1) These things cost many hundreds of millions to build
2) We have no specific expertise in building them
3) This shift in strategy was done with little / no community consultation
4) What we as a board were told differs from what we are telling potential funders
So I do not believe we can accomplish what we are promising. And a massive effort on this will leave other more important projects uncompleted.
Additionally I believe the lack of transparency around its development is places the WMF/community relationship at serious risk.
Jimmy Wales, Oct. 10
Ok – this sounds like a set of issues you should raise with the board.
Here is how I would personally answer these questions, but I'm just another board member, albeit broadly supportive of Lila's strategic vision.
First, it is true that one might spend hundreds of millions or billions on something like this, but it is not true that there can be no positive results for a reasonable amount of spending. I believe we can materially improve the search/discovery process amongst all our properties for a price that we can afford – and I believe that early work (financed by this grant) should be focused on scoping out an achievable set of things that can be done for various levels of spending – $10 million is well within what we can afford to do.
Second, we have no specific expertise in building them – that's not much of an objection, as we can hire people who do.
Third, I am always in favor of more community consultation. But I've been fighting very hard for a long time against the absurd notion that the community should vote on software. Voters in the community will not all be well-informed and a populist campaign can easily come to the wrong answer on technical matters. So this consultation needs to happen in a much more hands-on way – and it isn't cheap to do.
So, I agree that this is a serious question. For me, it's more of a question of what kind of consultation should happen and when. A commitment to explore a concept through an external grant doesn't strike me as the right point necessarily to engage in a full-scale consultation.
Fourth, I don't agree that there's a serious gulf between what we have been told and what funders are being told.
And then for your 'additionally' – I think this is a serious point , as with your 3rd point.
James Heilman, Oct. 10
Yes and I will be raising these concerns soon. Want to hear Lila's comments on Thursday first. In reply to some of your comments:
1) With respect to improving search, we have already done this per "zero results rate cut in half, from approximately 25% to approximately 12.5%." [1] Stating that zero results are at 33% as of 4 days ago is not correct. If improving internal search was *all* that is planned / promised there would not be an issue and we would be nearly done.
3) These are Wikimedia Movement resources and the WMF is simply a steward of the resources. It is disclosure in normal English of our strategy / goals that I am currently requesting rather than full scale consultation. Also typically those most involved in a conversation are also some of the most informed (half of our medical editors are health care providers for example).
With respect to software the community should definitely not have it forced upon them. In fact software development should be directed in large part by the users. Us not doing this has resulted in some of our largest problems and is currently why the relationship between the WMF / community is what it is.
Jimmy Wales, Oct. 11
Oh, I don't agree at all. "zero results rate" is a pretty rock bottom metric. Our (internal) search engine is awful, is contrary to user experience everywhere else on the web, and fails to take advantage of changing user expectations of what computers can do.
Imagine if we could return results from Wikipedia / Wikimedia Commons / Wiktionary / Wikibooks / Wikivoyage in a beautiful presentation.
Imagine if we could handle a wide range of questions that are easy enough to do by using wikidata / data embedded in templates / textual analysis.
"How old is Tom Cruise?"
"Is Tom Cruise married?"
"How many children does Tom Cruise have?"
The reason this is relevant is that we are falling behind what users expect. 5 years ago, questions like that simple returned Wikipedia as the first result at Google. Now, Google just tells the answer and the users don't come to us.
--Jimbo
"Our entire fundraising future is at stake"
A comment Wales made in November 2015 in a three-way email discussion between Wales, Heilman and a WMF staffer sheds further light on his thinking. Wales responded as follows to the assertion that there clearly had been an attempt to fund a massive project to build a search engine that was then "scoped down to a $250k exploration for a fully developed plan":
In my opinion: There was and there is and there will be. I strongly support the effort, and I'm writing up a public blog post on that topic today. Our entire fundraising future is at stake.
No such blog post was ever published by Wales, to the Signpost's knowledge. But the Knowledge Engine grant agreement – originally withheld by the board, ostensibly because of "donor privacy" issues, and only released after the Signpost confirmed with the Knight Foundation that there were no privacy issues on the donor's side – is more suggestive of the notion that there was indeed a plan, one on which the Wikimedia Foundation's "entire fundraising future" hinged, according to Wales.
This is hard to reconcile with what Wales told the community in February:
There is no overarching master plan. There is a $250,000 grant to begin to explore ideas, with a very limited set of deliverables for phase one.
We see that when Heilman said in the above email conversation that this was "not a search tool for Wikimedia properties", Wales readily agreed, stressing the importance of answer engine functions in attracting users that today find their answers on Google. But to the community, Wales has been keen to convey the opposite impression, narrowly focusing on the project's first phase only:
- "The project presented to the board at the Wikimania board meeting in Mexico was about improving internal search and discovery, with some very reasonable and modest first steps outlined."
- "I don't think of improving internal search and discovery to be primarily about revenue and page views."
- "Perhaps you have a typo, or perhaps just a continued misunderstanding. It was clear to everyone (on the board) involved with the grant that this (the work to be funded by the grant) was just an internal search tool. No code, no architecture, no nothing other than a vague idea that maybe someday Wikipedia could also include other "non-commercial" results someday."
Wales specifically objected to the portrayal of the Knowledge Engine as something that would compete with Google. But in the exchange above, he himself twice emphasises that Wikipedia is failing to offer users the answers that Google is providing to them:
When I search in Google – I'm just told the answer to the question. Google got this answer from us, I'm quite sure. So, yes, this would include Google graph type of functionality. ...
Google just tells the answer and the users don't come to us.
Referring to the Knowledge Engine grant agreement, Wales says, "I don't agree that there's a serious gulf between what we have been told and what funders are being told." Yet what funders were told was that "Knowledge Engine by Wikipedia will be the Internet's first transparent search engine, and the first one originated by the Wikimedia Foundation ... a system for discovery of reliable and trustworthy public information on the Internet ... a unique search experience that will go beyond what Google and Bing are already providing their users".
In the above email exchange, Wales also alludes to the possibility that "in due course", the Knowledge Engine project "might expand beyond just internal search (across all our properties)". In recent months, he has multiple times referred to the possibility that "non-WMF resources might be included in a revamped discovery experience" or that "some important scholarly/academic and open access resources could be crawled and indexed in some useful way relating to Wikipedia entries" while insisting that any suggestions "that this is some kind of broad Google competitor remain completely and utterly false."
In the "gaslighting" email, Wales also objects to the fact that Heilman included Wikia Search in a timeline published in the February 3 Signpost issue. But a key element of Wikia Search was "public curation of relevance" – volunteers determining how high up in Wikia's search results Internet pages should be ranked (a process that at times led to hilarious results). And public curation of relevance is also a key element of the latter stages of the Knowledge Engine project, as outlined to the Knight Foundation and described in the official project documentation.
To be sure, the Knowledge Engine is not conceived as a full-fledged Google competitor, complete with shopping results, opening hours of shops and restaurants, cinema times, search results from Twitter and Facebook, and so forth (and Heilman never claimed it was).
But judging from the documentation available, it was – or is – conceived at the very least as a niche competitor to Google, crawling and indexing both Wikimedia properties and selected other Internet content and replicating Google's answer engine and Knowledge Graph functionality. When Jimmy Wales says that the Wikimedia Foundation's entire fundraising future depends on the idea, the hope surely is to draw a significant number of eyeballs to Wikipedia.org by providing answers to natural-language questions, following the lead of other AI assistants, and providing search result listings that take users to relevant pages anywhere in the Wikimedia universe, complemented by a broad range of open access and/or academic sources.
It is an ambitious idea, but not in any way objectionable in itself. What is clear however is that building such a search engine will cost tens of millions of dollars. Heilman's concern was that
- this was a major decision about the Wikimedia Foundation's long-term strategic direction that the community should be involved in,
- this was something that should be openly disclosed rather than kept secret,
- the financial investment required to undertake this ambitious project would lead to other projects being underfunded,
- if the project should fail to gain traction with users, this could result in tens of millions of donor dollars being wasted.
These were not idle concerns. And the fact that Heilman expressed them in no way justifies the repeated vilifications he has had to endure.
- ^ The complete post read: "Yes I asked individuals on the board in Oct if they understand that we were building a "search engine" as before Oct I did not realize we were. JW said that he understood this all along and it was something we needed to do.."
Discuss this story
This is the best-grounded look at the whole Heilman affair since it began, aided of course by the digging you folks at the Signpost have done and by the addition of the actual email chain between Wales and Heilman.
What a tale of technical overreach, fiduciary irresponsibility, behind-the-scenes machinations, treachery and duplicity!
Magnificent wordsmithing by Andreas Kolbe. →StaniStani 00:10, 25 April 2016 (UTC)[reply]
My compliments on another excellent piece of work, Andreas. You should really try to get these articles more widely distributed. -- Seth Finkelstein (talk) 01:28, 25 April 2016 (UTC)[reply]
Sorry but this is just a bunch of misconceptions. A query dialog engine is not a Google competitor, it is not even close. (Why do I waste time on reading this?;/) Jeblad (talk) 06:21, 25 April 2016 (UTC)[reply]
Some thoughts
Hi. Sorry if this is a daft question, but this piece is marked as an op-ed. What opinion is being expressed?
Does anyone disagree that our internal search needs improvement? I would think that Andreas and others would be supportive of efforts to have free, open, and independent search functionality. Below other mission-critical services such as providing SQL and XML data dumps, search is pretty important infrastructure, especially as the Wikimedia projects grow.
If we took an input string such as "How old is Tom Cruise?" and broke it up into pieces, I think we could, with some effort, program this and similar queries to return specific data points. We could look at the most relevant Wikidata item (d:Q37079) to extract the "date of birth" field's value ("3 July 1962") and then do a simple date calculation to show that Tom Cruise is currently 53 years old. Or, if we can get the search results to be better, we can pull out and highlight specific data points alongside the search results.
After we solve "How old is [famous person]?"-type queries, we can add support for alternate phrases such as "What age is [famous person]?" Once we solve that, we can move on to programmatically answering other "easy" queries. I don't think what's being described here requires artificial intelligence or IBM's Watson.
You want a concrete opinion? The search results at Special:Search/How old is Tom Cruise? are currently terrible. Tom Cruise bafflingly doesn't appear in the top 100 results. If Tom Cruise did appear in these results, we could look at the search input, see that it uses a known keyword ("age" or "old"), and then extract that information programmatically to serve our reader/researcher more quickly. Who opposes doing this?
Let's talk about how we can improve search and what that will require. Does an organization similar to the Wikimedia Foundation (or the Knight Foundation, for that matter) need to be involved? What value do these organizations provide? I think there's plenty of room for intelligent and thoughtful discussion about priorities and functionality and serving our readers. Can we start now? --MZMcBride (talk) 03:23, 26 April 2016 (UTC)[reply]
About Jimmy's behaviour and character
Was the related email, from Jimbo to Doc James of 30 December 2015 ever shared publicly? HolidayInGibraltar (talk) 19:10, 27 April 2016 (UTC)[reply]
Kudos to Andreas for an incisive and revealing exposé. Is Wales any longer appropriate as a WMF board member and self-appointed WP figurehead? Given the long, damaging record of evasions, obfuscations, manipulations, lies, misdirections, misrepresentations, distortions, and self-serving personal attacks, the answer couldn’t be more obvious. Writegeist (talk) 19:11, 29 April 2016 (UTC)[reply]
Data sources
"In recent months, [Jimmy] has multiple times referred to the possibility that 'non-WMF resources might be included in a revamped discovery experience' or that 'some important scholarly/academic and open access resources could be crawled and indexed in some useful way relating to Wikipedia entries' while insisting that any suggestions 'that this is some kind of broad Google competitor remain completely and utterly false.'"
Please don't forget Fox News appearing in a sample search result. --NaBUru38 (talk) 16:11, 29 April 2016 (UTC)[reply]