Description
Details
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Open | None | T255502 Goal: Save Timing median back under 1 second | |||
Resolved | Krinkle | T277788 Save Timing improvements (2021-2022) | |||
Resolved | Ladsgroup | T292300 Eliminate unnecessary duplicate parses (2021-2022) | |||
Resolved | Ladsgroup | T285987 Do not generate full html parser output at the end of Wikibase edit requests | |||
Resolved | Ladsgroup | T288639 SpamBlacklistHooks::onEditFilterMergedContent causes every edit to be rendered twice |
Event Timeline
Change 711662 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):
[mediawiki/extensions/SpamBlacklist@master] Avoid using deprecated WikiPage::prepareContentForEdit
https://gerrit.wikimedia.org/r/711662
I see that the patch doesn't change any behavior (and doesn't let you save page with spam in them) while reducing the CPU time to half and wall time drops drastically as well:
To compare, these two are basically the try of the same edit, one with the patch, the other without:
I don't know how it managed to caused such a big fix for wikidata but it is the result of before and after:
https://performance.wikimedia.org/xhgui/run/symbol?id=6114280a2df779cbf2be4232&symbol=MediaWiki%5CRevision%5CRenderedRevision%3A%3AgetSlotParserOutput
https://performance.wikimedia.org/xhgui/run/symbol?id=61142762fcbdc7ca7a4d0b62&symbol=MediaWiki%5CRevision%5CRenderedRevision%3A%3AgetSlotParserOutput
Change 711662 merged by jenkins-bot:
[mediawiki/extensions/SpamBlacklist@master] Avoid using deprecated WikiPage::prepareContentForEdit
https://gerrit.wikimedia.org/r/711662
Change 710725 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):
[mediawiki/extensions/SpamBlacklist@wmf/1.37.0-wmf.17] Avoid using deprecated WikiPage::prepareContentForEdit
https://gerrit.wikimedia.org/r/710725
Change 711706 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):
[mediawiki/extensions/SpamBlacklist@wmf/1.37.0-wmf.18] Avoid using deprecated WikiPage::prepareContentForEdit
https://gerrit.wikimedia.org/r/711706
Change 711706 merged by jenkins-bot:
[mediawiki/extensions/SpamBlacklist@wmf/1.37.0-wmf.18] Avoid using deprecated WikiPage::prepareContentForEdit
https://gerrit.wikimedia.org/r/711706
Change 710725 merged by jenkins-bot:
[mediawiki/extensions/SpamBlacklist@wmf/1.37.0-wmf.17] Avoid using deprecated WikiPage::prepareContentForEdit
https://gerrit.wikimedia.org/r/710725
Mentioned in SAL (#wikimedia-operations) [2021-08-11T21:29:29Z] <ladsgroup@deploy1002> Synchronized php-1.37.0-wmf.18/extensions/SpamBlacklist/includes/SpamBlacklistHooks.php: Backport: [[gerrit:711706|Avoid using deprecated WikiPage::prepareContentForEdit (T288639)]] (duration: 01m 07s)
For reference, this is at least the *fourth* time this has happened:
- T59026#639359 (Nov. 2013)
- T198483#4404330 (July 2018)
- T205369#4743547 (Nov. 2018)
I think this merits some kind of deeper analysis / proper fixing so it stops regressing.
Mentioned in SAL (#wikimedia-operations) [2021-08-11T21:42:02Z] <ladsgroup@deploy1002> Synchronized php-1.37.0-wmf.17/extensions/SpamBlacklist/includes/SpamBlacklistHooks.php: Backport: [[gerrit:710725|Avoid using deprecated WikiPage::prepareContentForEdit (T288639)]] (duration: 01m 08s)
The calls to that certain part has been removed but the save timing hasn't improved. My working hypothesis is that I just moved double rendering from one function call to another. Is there any function in mediawiki that wouldn't trigger a render?
Yup, AbstractContent::getParserOutput creates a new PO, renders it and send it back every time it's called and it's called in lots of places.
@Lucas_Werkmeister_WMDE suggested we can still trigger a double render but with generate_html to false for SpamBlacklist which would speed it up drastically.
Change 712121 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):
[mediawiki/extensions/SpamBlacklist@master] Don't generate HTML when asking for PArserOutput
https://gerrit.wikimedia.org/r/712121
Change 712121 merged by jenkins-bot:
[mediawiki/extensions/SpamBlacklist@master] Don't generate HTML when asking for ParserOutput
https://gerrit.wikimedia.org/r/712121
Change 711721 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):
[mediawiki/extensions/SpamBlacklist@wmf/1.37.0-wmf.18] Don't generate HTML when asking for ParserOutput
https://gerrit.wikimedia.org/r/711721
Change 711721 merged by jenkins-bot:
[mediawiki/extensions/SpamBlacklist@wmf/1.37.0-wmf.18] Don't generate HTML when asking for ParserOutput
https://gerrit.wikimedia.org/r/711721
Mentioned in SAL (#wikimedia-operations) [2021-08-12T21:57:16Z] <ladsgroup@deploy1002> Synchronized php-1.37.0-wmf.18/extensions/SpamBlacklist/includes/SpamBlacklistHooks.php: Backport: [[gerrit:711721|Don't generate HTML when asking for ParserOutput (T288639)]] (duration: 00m 58s)
Save timing in general has too much noise but might show improvements when some time passes.
Well, yes, for wikitext you actually need to do almost all the parsing work in order to fill links tables, headings, and all sorts of other metadata. Wikidata content, where you can get the metadata without generating HTML, is a special case. (Technically, I suppose you could skip actually generating HTML for wikitext too, but you’ll still need to do all the work of parsing, expanding templates, calling Lua modules, and so on, so I wouldn’t expect a significant difference there.)
Yes and no. It still shouldn't trigger a double render and use edit stash. Problem is that Parser just combines everything together.
Change 713262 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):
[mediawiki/extensions/SpamBlacklist@master] Try to use EditStash before re-rendering
https://gerrit.wikimedia.org/r/713262
Guess what I found, another extension that trigger an extra render:
https://performance.wikimedia.org/xhgui/run/symbol?id=611a5306f8b9883f5f87dc9e&symbol=AbstractContent%3A%3AgetParserOutput
Change 712965 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):
[mediawiki/extensions/SpamBlacklist@wmf/1.37.0-wmf.18] Try to use EditStash before re-rendering
https://gerrit.wikimedia.org/r/712965
Change 713262 merged by jenkins-bot:
[mediawiki/extensions/SpamBlacklist@master] Try to use EditStash before re-rendering
https://gerrit.wikimedia.org/r/713262
Change 712965 merged by jenkins-bot:
[mediawiki/extensions/SpamBlacklist@wmf/1.37.0-wmf.18] Try to use EditStash before re-rendering
https://gerrit.wikimedia.org/r/712965
Mentioned in SAL (#wikimedia-operations) [2021-08-16T17:33:18Z] <ladsgroup@deploy1002> Synchronized php-1.37.0-wmf.18/extensions/SpamBlacklist/includes/SpamBlacklistHooks.php: Backport: [[gerrit:712965|Try to use EditStash before re-rendering (T288639)]] (duration: 00m 59s)
I suggest reverting the changes that moved away from the (soft-deprecated) WikiPage::prepareContentForEdit method. This method is to my knowledge the only appropiate way we have for extension hooks during an edit stash or edit submit request to obtain the ParserOutput in a shared and cached way.
The method was prematurely deprecated in T242249. The recommended replacement involves boilerplate that's inappropiate for extension code (and nobody seems to have followed that recommendation). And what people actually did in some cases (call Content->getParserOutput) is problematic for performance. This method should only be called by code in charge of authorotativey parsing something for its own purpose, e.g. core when submitting an edit, or in rare cases where an extension perhaps constructs an unsaved Content object to render ad-hoc outside any revision or parsercache. In general that's not the case and such method should not be called.
Then after the regression is fixed attention can focus on revisiting that unfinished deprecation (T242249), and on detecting future regressions (T288707).
I agree: keep using WikiPage::prepareContentForEdit() until Id5ba40a21 lands, then start using that. Feedback on that patch would be appreciated.
I'm sorry for the confusion that was caused by that deprecation. The original plan was to make WikiPage::getDerivedDataUpdater() public. When we changed that decision, we failed to provide a viable alternative to WikiPage::prepareContentForEdit() before priorities shifted away from finishing MCR work.
Conceptually, managing the state of an ongoing edit in WikiPage is pretty terrible. But until all the relevant hooks have been changed to pass along all the relevant info, there really is no alternative.
I'm okay with undeprecating that function but I want to emphasize that even with using the deprecated function, every edit would trigger a render twice (three times if you count the edit stash). The xhgui link in the task description is for when it was using the good but then-deprecated function.
stalled in wmde on the refactors in core getting merged: https://gerrit.wikimedia.org/r/c/mediawiki/core/+/701074
Once these are done, We should double check and make sure this doesn't happen again.
A different approach could be to implement MultiContentSave hook instead of EditFilterMergedContent hook. MultiSaveComplete already gives us a RenderedRevision that should be reused for page saving. Additional benefit - we can replace multiple hook handlers with this one.
Possible drawback - need to test how well the status set in MultiContentSave hook is propagated back to the UI in various scenarios.
Change 726739 had a related patch set uploaded (by Ppchelko; author: Ppchelko):
[mediawiki/extensions/SpamBlacklist@master] Implement MultiContentSave hook insted of EditFilter
https://gerrit.wikimedia.org/r/726739
Is there anything that we (=WMDE) can do to help with this moving forward? (This task is on one of our workboards, but I'm not sure if there is anything that we are expected to contribute, that's why I'm asking.)
Removing Wikidata-Campsite (Team A Hearth 🏰🔥) as it seems like there is nothing for us to do here. Please add it back if there is any support needed from us!
Change 749807 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):
[mediawiki/extensions/SpamBlacklist@master] Use PreparedUpdate to avoid double parse
https://gerrit.wikimedia.org/r/749807
Change 749807 merged by jenkins-bot:
[mediawiki/extensions/SpamBlacklist@master] Use PreparedUpdate to avoid double parse
https://gerrit.wikimedia.org/r/749807
Change 752270 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):
[mediawiki/extensions/SpamBlacklist@wmf/1.38.0-wmf.16] Use PreparedUpdate to avoid double parse
https://gerrit.wikimedia.org/r/752270
Change 752270 merged by jenkins-bot:
[mediawiki/extensions/SpamBlacklist@wmf/1.38.0-wmf.16] Use PreparedUpdate to avoid double parse
https://gerrit.wikimedia.org/r/752270
Mentioned in SAL (#wikimedia-operations) [2022-01-10T06:15:59Z] <ladsgroup@deploy1002> Synchronized php-1.38.0-wmf.16/extensions/SpamBlacklist/includes/SpamBlacklistHooks.php: Backport: [[gerrit:752270|Use PreparedUpdate to avoid double parse (T288639)]] (duration: 01m 00s)
Change 752529 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):
[mediawiki/extensions/SpamBlacklist@master] Give priority to PreparedUpdate
https://gerrit.wikimedia.org/r/752529
Change 752277 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):
[mediawiki/extensions/SpamBlacklist@wmf/1.38.0-wmf.16] Give priority to PreparedUpdate
https://gerrit.wikimedia.org/r/752277
Change 752277 merged by jenkins-bot:
[mediawiki/extensions/SpamBlacklist@wmf/1.38.0-wmf.16] Give priority to PreparedUpdate
https://gerrit.wikimedia.org/r/752277
Change 752529 merged by jenkins-bot:
[mediawiki/extensions/SpamBlacklist@master] Give priority to PreparedUpdate
https://gerrit.wikimedia.org/r/752529
Mentioned in SAL (#wikimedia-operations) [2022-01-10T14:49:46Z] <ladsgroup@deploy1002> Synchronized php-1.38.0-wmf.16/extensions/SpamBlacklist/includes/SpamBlacklistHooks.php: Backport: [[gerrit:752277|Give priority to PreparedUpdate (T288639)]] (duration: 01m 00s)
SpamBlacklist shows up in the list of duplicated parses and that confused me a lot. After checking in depth, it turned out that SpamBlacklist is just one of the traces, the other one is always the problematic one. This is resolved.