Wikipedia:Wikipedia Signpost/2024-03-29/Technology report
Millions of readers still seeing broken pages as "temporary" disabling of graph extension nears its second year
A lack of technical support for interactive content on Wikimedia projects was lamented in a wide-ranging discussion on Wikimedia-l and elsewhere over the last two months. In particular, several community members expressed deep frustration about the state of the Graph MediaWiki extension, which had been disabled in April last year due to security vulnerabilities in the underlying third-party Vega framework (Signpost coverage). Back then, a Wikimedia Foundation representative had stated that "My hope is we can maybe restore some functionality in the next week or so." But eleven months later, graphs and charts remain deactivated, replaced by a prominent error message in many Wikipedia articles - despite extensive discussions about possible solutions.
Basque Wikimedian Galder Gonzalez Larrañaga (User:Theklan) opened the Wikimedia-l discussion by decrying this state of affairs:
All the solutions proposed have been dismissed, but every two months there's a proposal to make a new roadmap to solve the issue. We have plenty of roadmaps, but no vehicle to reach our destination.
He contrasted Wikimedia projects with e.g. "a place like Our World in Data [which] has been publishing data and interactive content with a compatible license for years". Several other Wikimedians likewise voiced their frustration about the lack of progress in getting graphs re-enabled.
Marshall Miller, Senior Director of Product at WMF, acknowledged these concerns on the mailing list, stating that:
to support graphs and other interactive content, we would need to take a step back and make a substantial investment in sustainable architecture to do it – so that it works well, safely, and is built to last. And because that’s a substantial investment, we need to weigh it against other important investments in order to decide whether and when to do it.
I know that it is very frustrating that the Graph extension has not been operational for many months – it means readers haven’t been seeing graphs in articles, and editors haven’t been able to use graphs to do things like monitor backlogs in WikiProjects. Over the months of trying to find a way to turn graphs back on, it has become clear that there isn’t a safe shortcut here and that the path forward will require a substantial investment – one that we have not yet started given the other priorities we’ve been working on.
Proposed solutions fail
How did things get to this point? Several proposals and plans had been pursued since the discovery of the XSS vulnerability in April 2023:
- One proposed solution (phab:T336595) was to re-enable the feature but treat the Vega code for graphs "like other dangerous content (Javascript, CSS) and restrict editing it to a small, trusted set of users" (similar to Interface administrators). However, this option was eventually abandoned in favor of a different approach:
- To sandbox the graphs into an iframe (a separately loaded part of a web page). This option (phab:T222807) had in fact already been proposed back in 2019 out of general security concerns, long before the discovery of the current vulnerabilities in Vega. By November 2023, some progress appears to have been made on its implementation. But serious performance problems remained due to complexities around caching. What's more, the iframe approach met principled objections by WMF engineer Timo Tijhof (User:Krinkle), who argued that it "severely limits our audience, technical choices, and assumptions going forward, and I think negatively impacts our mission and reach in a way that is in direct contradiction and violation of principles that thus far have been accepted without question" (citing the Wikimedia Foundations Guiding Principles and other WMF engineering conventions). In particular, Tijhof worried about the redistribution of content in other forms (citing examples including "Kiwix, IPFS and Apple Dictionary"). Noting that e.g. the Wayback Machine "consistently fails to archive [...] non-trivial JavaScript pipelines", he argued "there can be no JavaScript requirement for fundamental access to content". Lastly, as noted by WMF security engineer Scott Bassett, while the iframe sandbox "would reduce the risk of running potentially dangerous javascript within a user's browser, [it would] not eliminate the risk entirely." The iframe task was eventually closed as "Declined" earlier this month.
- Other options that had been proposed early on included sanitizing the input. But this approach was described as "extremely tricky to get right (after all,vega failing at it is how we landed in this situation)."
- Update the underlying library: As noted in a WMF FAQ, the Graphs extension is based on a very outdated version of the Vega library: "The last upstream release (bugfix or security) of Vega 2.x was in January 2017. Vega 5 was released in March 2019 and is still under active maintenance and development." However, by May 2023, initial hopes had been dashed that an upgrade to version 5 would suffice to resolve security concerns. The WMF's eventual plan to fix the Graphs extension (apparently now abandoned too) envisaged the version update and eventual deprecation of version 2 only in combination with the sandbox solution. What's more, as detailed in the FAQ, migrating to Vega 5 will necessitate the conversion of existing graph templates and modules, much of which needs to be done by hand (although aided by a translation tool).
Quantifying the impact
Rarely, if ever, has there been a software issue that affects Wikipedia content so visibly for such a prolonged time. By January 2024, User:Sj estimated that it had "already conservatively affected 100M pageviews." According to a September 2023 analysis, over 1.3 million pages are impacted across all Wikimedia projects - the vast majority of them (1.16 million) on the Arabic Wikipedia.
Still, on English Wikipedia, only 19,160 pages were affected. (Those numbers likely already reflect the manual removal of broken graphs from many pages.) In a more detailed 2020 analysis, volunteer developer User:Bawolff had found that "the graph extension is used on 26,238 pages [on English Wikipedia]. However, most of these are in non-content namespaces, from a template that generates a graph of page views for a specific page (w:Template:PageViews graph). There are 4,140 pages on en.wikipedia.org in the main namespace that use graphs. [...] As a percentage, that's 0.07% overall, 0.2% of "Good Articles", 0.3% of Featured Articles." Another Wikipedian reported that "In ruwiki, interactive Lua-based graphs are used in more than 26000 articles about settlements and administrative units through https://ru.wikipedia.org/wiki/Module:Statistical (also, more than 8000 on ukwiki, etc.)."
How much interactivity is needed, anyway?
Several users questioned whether the full interactive functionality of the Vega library was really needed, arguing e.g. that "Most graphs on wiki are simple bar/pie/line charts. These could be produced quite easily using even a language like Lua."
WMF engineer Gergő Tisza (who appears to have done much of the technical work on the aforementioned iframe solution) observed that
Interactive animations were very much part of Yuri's vision for the Graph extension, but during the decade Graph was deployed in production the number of such animations made was approximately zero. [...] Instead, both gadgets and Graph usage are mostly focused on very basic things like showing a chess board or showing bar charts, because those are the things that can be reused across a large number of articles without manually tailoring the code to each
Concretely, Bawolff had observed in his 2020 analysis of the usage of graphs on English Wikipedia that:
Most of these are simple static graphs. Some notable exceptions is interactive time scale maps, such as the one at w:Template:Interactive COVID-19 maps and the one at w:List_of_countries_by_carbon_dioxide_emissions, which shows how geographic data evolves over time (See also w:template:Global Heat Maps by Year). Also the graph at w:Vancouver_Whitecaps_FC. Nonetheless, I have yet to see any examples where a graph based visualization makes what would otherwise be a difficult concept clear, or where the visualization stops me in my tracks, and is core to my understanding of the article.
Volunteer developer TheDJ even argued "let's be honest... the interactivity-part has been an 8 year long nightmare. Maybe its time to put that to bed and accept defeat."
On the other hand, Galder titled his post that opened the Wikimedia-l discussion "We need more interactive content: we are doing it wrong". He took a much wider view, arguing that the WMF's failure to get graphs working again was just one example of wider stagnation and lack of progress towards the goal that "By 2030, Wikimedia will become the essential infrastructure of the ecosystem of free knowledge" (quoting from a 2017 strategy document). Besides Our World in Data, Galder named several other educational websites that have surpassed Wikimedia projects on interactive content, e.g.:
- Wolfram Alpha is like a light year ahead us on giving interactive solutions to knowledge questions [...] That's also "free knowledge".
- Brilliant (https://brilliant.org/) is brilliant if you want to learn lots of things, like geometry or programming. Way better than Wikipedia. But... you need to pay for it.
(Other parts of the mailing list discussion focused on MediaWiki's shortcomings with regard to video.)
In her op-ed in this Signpost issue, Maryana Pinchuk, the Wikimedia Foundation's Principal Product Manager, pushes back against such proposals, reporting that at an event last fall, she "heard many Wikipedians express concern about where pursuing this strategy could lead us. There was fear of making Wikipedia into something it isn’t. There was also fear about the cost and risks of building big new software features and trying to compete with massive for-profit technology companies for users. I think all of these concerns are very valid."
In the Wikimedia-l discussion, Wikipedian and former WMF engineer Ori Livneh argued that direct comparisons with sites that do not contain user-generated content may severely underestimate the additional engineering work to implement such interactive features on Wikipedia. He pointed out security engineering as a bottleneck at the Foundation holding up such work:
The critical issue is *security*. Security is the reason the graph extension is not enabled. Security is the reason why interactive SVGs are not enabled. Interactive visualizations have a programmatic element that consists of code that executes in the user's browser. Such code needs to be carefully sandboxed to ensure it cannot be used to exfiltrate user data or surreptitiously perform actions on wiki.
The bar for shipping security-critical features is high. You can ship code with crummy UX [user experience] and iterate on it. But something that touches security requires a higher amount of up-front technical design work and close scrutiny in the form of peer review. And this means that it cannot progress spontaneously, through sporadic bursts of effort here and there (which is how a lot of engineering work happens) but requires a solid commitment of focused attention from multiple people with relevant expertise.
There are engineers at the Wikimedia Foundation and in the technical contributor community with the relevant expertise but as a rule they are extremely oversubscribed. My recommendation would be to engage them in crafting a job description for this role and in reviewing candidates.
On March 26, the WMF invited feedback on "the Product & Technology draft key results for next fiscal year. They aim to explain what outcomes we are working towards" as part of the 2024/25 annual plan. In reaction, Galder noted that "there's no single mention to this [Graphs outage problem], nor to improving the multimedia experience". In a discussion on the talk page, Miller said that "we are working on a possible plan for graphs, but I'm not sure yet what its scope will be or when we would resource it if we proceed with that plan".
Discuss this story
But it doesn't look like there is any obvious technical solution - at least, apparently no one has offered any that are agreeable to everyone, at which point someone could actually add the associated price tag.
More importantly, isn't the real question whether Wikipedia's volunteer base is interested and able to use more complex solutions **to any large extent**? If not, then - at least for the next couple of years, or at least until a good technical solution presents itself, a better approach might well be an automated pass at existing graphs, to convert them to images that don't carry security risks. (Store the snipets of code and data so that they are can be used in the future, if the relevant one ever arrives.) -- John Broughton (♫♫) 23:08, 29 March 2024 (UTC)[reply]
The continued failure to re-introduce graphs is the latest, most visible instance of Wikimedia's complete inability to do software projects. There are plenty of talented developers who work there, but for some reason the foundation can't project manage their way out of a paper bag. "We plan to, in a couple months, release the date for when we'll have a roadmap for doing the development" is a deranged statement for any software product feature. They could have implemented non-interactive graphs in a month with a single developer. Heck, it's been so long now they could have implemented interactive graphs from scratch. I get that this isn't a priority for them, but I can't help but notice that pretty much every single feature they work on has absurdly long timelines to produce anything. --PresN 02:23, 30 March 2024 (UTC)[reply]
A somewhat related discussion is ongoing at Wikipedia:Village_pump_(WMF)#Proposal:_WMF_should_hire_a_full-time_developer_to_do_basic_maintenance_on_MediaWiki. Regards, HaeB (talk) 02:53, 30 March 2024 (UTC)[reply]
I find the comparison to WolframAlpha very silly. WP:NOT is among our largest and most referenced policies because, as it turns out, including too much information in a reference work makes it less useful. We have a lane and we should stick to it. Mach61 04:47, 30 March 2024 (UTC)[reply]
I agree with Tisza and TheDJ; we should be focusing on getting the basic graph functionality running rather than trying to devise interactive solutions. While the fanciful visions of interactive content may sound all good and nice, fact is that we've got something broken, and the vast majority of use cases don't currently require interactivity. So we should expend the energy on fixing the broken thing and then the interactive stuff can be added later. And flashy graphs with complex interactive features are better served by other sites anyway; we shouldn't try to be everything to everyone. ― novov (t c) 05:13, 30 March 2024 (UTC)[reply]
- This problem with tables is only the most glaring example of software problems that don't get fixed. It takes a lot of resources, and I figure those resources ought to be beefed up. Amateurish enthusiasm works all right for editing articles, surprising as it may seem, but software coding seems to be a more delicate kind of clockwork. My own little problems are mostly with the interworkings of the WP App, Commons App, Wikidata, and their various mapping methods. They do not connect to each other neatly, and when teaching new editors we have to introduce them to various workarounds. I like to think there must be some other part of the budget that could be robbed to pay for better software, not just for tables but for mobile. The mobile site is good enough for readers, but it's poor for editors, and the internal mobile apps are what ought to work together to make life easier for mobile editors. Jim.henderson (talk) 15:33, 30 March 2024 (UTC)[reply]
First, HaeB, thanks for this write-up, I really appreciated getting up to speed without having to read all the mailing list discussions. In my view, this comes down to two lines: and The WMF isn't going to spend millions of dollars to fix something that's only a problem on 20k enwiki pages. Bottom line is bottom line, unfortunately: graphs aren't popular enough to care about, by and large. Personally, I think they're great and could be profitably used on millions of pages, but you can't argue with the fact that they just aren't very popular, not very widely used, and thus low-priority when compared with other features that are more widely used. Levivich (talk) 16:41, 30 March 2024 (UTC)[reply]I think I would be willing and able to develop a Lua module that replaces the Graph extension for the most common use cases (bar charts, line graphs and maybe even pie charts). However, the Wikimedia Technology Fund has been marked as Permanently on hold with no explanation given (at least that I know of). Perhaps this year I'll find the time and energy to attempt it as a volunteer, but it would really help to get some compensation for it, since it will require a lot of time and effort that I could otherwise spend in remunerated work, or just reading a book. Sophivorus (talk) 23:01, 30 March 2024 (UTC)[reply]
Hello everyone. I'm Marshall Miller; I'm a Senior Director of Product at WMF (I was quoted in the article above), and I work with teams that build features for the reading and editing experiences (e.g. Discussion Tools or Night Mode). Thank you all for thinking about and discussing the graphs issue. I know that it is frustrating that this remains unresolved -- it has been a deceptively complex problem, involving issues around security and scalability. I just wanted to post here saying that I am following this discussion, and that I have been working with colleagues to propose a path forward for graphs. We'll be resuming the discussion and updates on Meta, and I will also post a link here once there is a new update over on that wiki. -- MMiller (WMF) (talk) 05:06, 31 March 2024 (UTC)[reply]
I agree with the frustration, but.... Those of you who consult resources hosted by the British Library are aware of the cyberattack that brought down their system in October. Some of the functionality is still not there and won't come back until the architecture is redesigned (see the report Learning Lessons From The Cyber-Attack). Security threats are real. I suspect most people would not want Wikimedia projects to be hacked and all off its content erased in such a way where it was not retrievable. Perhaps the best way forward is to ask WMF to devote more resources to security and replacing those parts of the infrastructure which are at risk. - kosboot (talk) 14:49, 1 April 2024 (UTC)[reply]
I find it hilarious that they had enough time, effort, and energy to debut the unwanted vector 2022 mandatory "upgrade", but can't (or won't) get this fixed. Incompetence be thy name, foundation. TomStar81 (Talk) 21:12, 1 April 2024 (UTC)[reply]
and normal PHPand a smatter of JS, does seem a lot easier than graphs. Not to mention that they spent 4 years on it anyway. Aaron Liu (talk) 21:42, 1 April 2024 (UTC)[reply]<graph>
, or a pure-HTML-based bar chart. When Vega went offline, instead of displaying the "can't do that" message, they simply updated their template so that all requests for the graph version instead get the HTML bar chart. It's less satisfying (note: discussion in Russian) than the interactive graphs, but it conveys most of the same data.It is worth noting, I think, that parts of the problem have been solved. For example, the much-discussed Template:OSM_Location_map, used by 5,600 pages on the English Wikipedia, is functioning again since 11 March 2024, with some newly added features. See the Return to service article. On the other hand, those 5,600 pages were not included in the count of 19,160 -- a count that underestimates the true impact of the problem, leaving out all the templates that are indirectly rendered inoperable. Renerpho (talk) 07:46, 5 April 2024 (UTC)[reply]
<noinclude>
documentation section, so that it's only applied to the template itself, not its consumers.)<graph>
in a template, no matter how indirect, would cause every page that transcludes the template to be counted. It's only the pages where the transclusion was removed (as the article notes), or where the transcluded template was modified to eliminate its use of graphs (like{{OSM Location map}}
), that wouldn't be counted. FeRDNYC (talk) 17:00, 6 April 2024 (UTC)[reply]-- This subject demonstrates Wikimedians' propensity to blather on, building towering walls of text in place of taking action. The way forward seems blindingly obvious: implement a safe static graph facility while separating off the issue of whether and how to provide dynamic and/or interactive graphics safely. The most efficient way is likely to be to subset the Vega framework (or just syntax) to those constructs that can only produce safe, static code. The point is to actually act on that rather than talk about that. --R. S. Shaw (talk) 23:09, 8 April 2024 (UTC)[reply]
How many of the 4,000 articles that use broken graphs use only pie charts or bar charts? There are working templates for both of those,[1][2] so long as you don't miss the hover-over feature. And you shouldn't miss it. Pretty sure most users just want a simple, easy-to-read image in most cases - before they spend millions on making something, they should check if people would actually find it more useful. Wizmut (talk) 03:44, 10 April 2024 (UTC)[reply]
Hello everyone. I wanted to follow up here on my earlier comment above. I just posted an update to the Graph project page laying out a proposal in which WMF would build a new extension for making graphs. We've come to this proposal after talking to community members over the past weeks, analyzing data, and thinking through architecture with staff. This would be a substantial amount of work, and I hope that community members can weigh in on whether this seems like the right approach and to help us plan the project. Please join the discussion on the talk page! -- MMiller (WMF) (talk) 23:15, 10 April 2024 (UTC)[reply]
OWID