Page MenuHomePhabricator

SpamBlacklistHooks::onEditFilterMergedContent causes every edit to be rendered twice
Closed, ResolvedPublic

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Change 711662 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[mediawiki/extensions/SpamBlacklist@master] Avoid using deprecated WikiPage::prepareContentForEdit

https://gerrit.wikimedia.org/r/711662

I see that the patch doesn't change any behavior (and doesn't let you save page with spam in them) while reducing the CPU time to half and wall time drops drastically as well:

To compare, these two are basically the try of the same edit, one with the patch, the other without:

Change 711662 merged by jenkins-bot:

[mediawiki/extensions/SpamBlacklist@master] Avoid using deprecated WikiPage::prepareContentForEdit

https://gerrit.wikimedia.org/r/711662

Change 710725 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[mediawiki/extensions/SpamBlacklist@wmf/1.37.0-wmf.17] Avoid using deprecated WikiPage::prepareContentForEdit

https://gerrit.wikimedia.org/r/710725

Change 711706 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[mediawiki/extensions/SpamBlacklist@wmf/1.37.0-wmf.18] Avoid using deprecated WikiPage::prepareContentForEdit

https://gerrit.wikimedia.org/r/711706

Change 711706 merged by jenkins-bot:

[mediawiki/extensions/SpamBlacklist@wmf/1.37.0-wmf.18] Avoid using deprecated WikiPage::prepareContentForEdit

https://gerrit.wikimedia.org/r/711706

Change 710725 merged by jenkins-bot:

[mediawiki/extensions/SpamBlacklist@wmf/1.37.0-wmf.17] Avoid using deprecated WikiPage::prepareContentForEdit

https://gerrit.wikimedia.org/r/710725

Mentioned in SAL (#wikimedia-operations) [2021-08-11T21:29:29Z] <ladsgroup@deploy1002> Synchronized php-1.37.0-wmf.18/extensions/SpamBlacklist/includes/SpamBlacklistHooks.php: Backport: [[gerrit:711706|Avoid using deprecated WikiPage::prepareContentForEdit (T288639)]] (duration: 01m 07s)

For reference, this is at least the *fourth* time this has happened:

I think this merits some kind of deeper analysis / proper fixing so it stops regressing.

Mentioned in SAL (#wikimedia-operations) [2021-08-11T21:42:02Z] <ladsgroup@deploy1002> Synchronized php-1.37.0-wmf.17/extensions/SpamBlacklist/includes/SpamBlacklistHooks.php: Backport: [[gerrit:710725|Avoid using deprecated WikiPage::prepareContentForEdit (T288639)]] (duration: 01m 08s)

For reference, this is at least the *fourth* time this has happened:

I think this merits some kind of deeper analysis / proper fixing so it stops regressing.

Yes. I filed T288707: Detect and monitor against multiple Parser invocation during edit requests.

The calls to that certain part has been removed but the save timing hasn't improved. My working hypothesis is that I just moved double rendering from one function call to another. Is there any function in mediawiki that wouldn't trigger a render?

Yup, AbstractContent::getParserOutput creates a new PO, renders it and send it back every time it's called and it's called in lots of places.

@Lucas_Werkmeister_WMDE suggested we can still trigger a double render but with generate_html to false for SpamBlacklist which would speed it up drastically.

Change 712121 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[mediawiki/extensions/SpamBlacklist@master] Don't generate HTML when asking for PArserOutput

https://gerrit.wikimedia.org/r/712121

Change 712121 merged by jenkins-bot:

[mediawiki/extensions/SpamBlacklist@master] Don't generate HTML when asking for ParserOutput

https://gerrit.wikimedia.org/r/712121

Change 711721 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[mediawiki/extensions/SpamBlacklist@wmf/1.37.0-wmf.18] Don't generate HTML when asking for ParserOutput

https://gerrit.wikimedia.org/r/711721

Change 711721 merged by jenkins-bot:

[mediawiki/extensions/SpamBlacklist@wmf/1.37.0-wmf.18] Don't generate HTML when asking for ParserOutput

https://gerrit.wikimedia.org/r/711721

Mentioned in SAL (#wikimedia-operations) [2021-08-12T21:57:16Z] <ladsgroup@deploy1002> Synchronized php-1.37.0-wmf.18/extensions/SpamBlacklist/includes/SpamBlacklistHooks.php: Backport: [[gerrit:711721|Don't generate HTML when asking for ParserOutput (T288639)]] (duration: 00m 58s)

Well, yes, for wikitext you actually need to do almost all the parsing work in order to fill links tables, headings, and all sorts of other metadata. Wikidata content, where you can get the metadata without generating HTML, is a special case. (Technically, I suppose you could skip actually generating HTML for wikitext too, but you’ll still need to do all the work of parsing, expanding templates, calling Lua modules, and so on, so I wouldn’t expect a significant difference there.)

Yes and no. It still shouldn't trigger a double render and use edit stash. Problem is that Parser just combines everything together.

Change 713262 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[mediawiki/extensions/SpamBlacklist@master] Try to use EditStash before re-rendering

https://gerrit.wikimedia.org/r/713262

At least PageImages only renders the first section, maybe a bit better?

Tested in mwdebug2001 ^ works just fine and finds the stash

Change 712965 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[mediawiki/extensions/SpamBlacklist@wmf/1.37.0-wmf.18] Try to use EditStash before re-rendering

https://gerrit.wikimedia.org/r/712965

Change 713262 merged by jenkins-bot:

[mediawiki/extensions/SpamBlacklist@master] Try to use EditStash before re-rendering

https://gerrit.wikimedia.org/r/713262

Change 712965 merged by jenkins-bot:

[mediawiki/extensions/SpamBlacklist@wmf/1.37.0-wmf.18] Try to use EditStash before re-rendering

https://gerrit.wikimedia.org/r/712965

Mentioned in SAL (#wikimedia-operations) [2021-08-16T17:33:18Z] <ladsgroup@deploy1002> Synchronized php-1.37.0-wmf.18/extensions/SpamBlacklist/includes/SpamBlacklistHooks.php: Backport: [[gerrit:712965|Try to use EditStash before re-rendering (T288639)]] (duration: 00m 59s)

I suggest reverting the changes that moved away from the (soft-deprecated) WikiPage::prepareContentForEdit method. This method is to my knowledge the only appropiate way we have for extension hooks during an edit stash or edit submit request to obtain the ParserOutput in a shared and cached way.

The method was prematurely deprecated in T242249. The recommended replacement involves boilerplate that's inappropiate for extension code (and nobody seems to have followed that recommendation). And what people actually did in some cases (call Content->getParserOutput) is problematic for performance. This method should only be called by code in charge of authorotativey parsing something for its own purpose, e.g. core when submitting an edit, or in rare cases where an extension perhaps constructs an unsaved Content object to render ad-hoc outside any revision or parsercache. In general that's not the case and such method should not be called.

Then after the regression is fixed attention can focus on revisiting that unfinished deprecation (T242249), and on detecting future regressions (T288707).

I agree: keep using WikiPage::prepareContentForEdit() until Id5ba40a21 lands, then start using that. Feedback on that patch would be appreciated.

I'm sorry for the confusion that was caused by that deprecation. The original plan was to make WikiPage::getDerivedDataUpdater() public. When we changed that decision, we failed to provide a viable alternative to WikiPage::prepareContentForEdit() before priorities shifted away from finishing MCR work.

Conceptually, managing the state of an ongoing edit in WikiPage is pretty terrible. But until all the relevant hooks have been changed to pass along all the relevant info, there really is no alternative.

I'm okay with undeprecating that function but I want to emphasize that even with using the deprecated function, every edit would trigger a render twice (three times if you count the edit stash). The xhgui link in the task description is for when it was using the good but then-deprecated function.

stalled in wmde on the refactors in core getting merged: https://gerrit.wikimedia.org/r/c/mediawiki/core/+/701074

Once these are done, We should double check and make sure this doesn't happen again.

A different approach could be to implement MultiContentSave hook instead of EditFilterMergedContent hook. MultiSaveComplete already gives us a RenderedRevision that should be reused for page saving. Additional benefit - we can replace multiple hook handlers with this one.

Possible drawback - need to test how well the status set in MultiContentSave hook is propagated back to the UI in various scenarios.

Change 726739 had a related patch set uploaded (by Ppchelko; author: Ppchelko):

[mediawiki/extensions/SpamBlacklist@master] Implement MultiContentSave hook insted of EditFilter

https://gerrit.wikimedia.org/r/726739

It's being done by Petr, I'm mostly here for emotional support.

Is there anything that we (=WMDE) can do to help with this moving forward? (This task is on one of our workboards, but I'm not sure if there is anything that we are expected to contribute, that's why I'm asking.)

Removing Wikidata-Campsite (Team A Hearth 🏰🔥) as it seems like there is nothing for us to do here. Please add it back if there is any support needed from us!

Change 749807 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[mediawiki/extensions/SpamBlacklist@master] Use PreparedUpdate to avoid double parse

https://gerrit.wikimedia.org/r/749807

Change 749807 merged by jenkins-bot:

[mediawiki/extensions/SpamBlacklist@master] Use PreparedUpdate to avoid double parse

https://gerrit.wikimedia.org/r/749807

Change 752270 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[mediawiki/extensions/SpamBlacklist@wmf/1.38.0-wmf.16] Use PreparedUpdate to avoid double parse

https://gerrit.wikimedia.org/r/752270

Change 752270 merged by jenkins-bot:

[mediawiki/extensions/SpamBlacklist@wmf/1.38.0-wmf.16] Use PreparedUpdate to avoid double parse

https://gerrit.wikimedia.org/r/752270

Mentioned in SAL (#wikimedia-operations) [2022-01-10T06:15:59Z] <ladsgroup@deploy1002> Synchronized php-1.38.0-wmf.16/extensions/SpamBlacklist/includes/SpamBlacklistHooks.php: Backport: [[gerrit:752270|Use PreparedUpdate to avoid double parse (T288639)]] (duration: 01m 00s)

Change 752529 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[mediawiki/extensions/SpamBlacklist@master] Give priority to PreparedUpdate

https://gerrit.wikimedia.org/r/752529

Change 752277 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[mediawiki/extensions/SpamBlacklist@wmf/1.38.0-wmf.16] Give priority to PreparedUpdate

https://gerrit.wikimedia.org/r/752277

Change 752277 merged by jenkins-bot:

[mediawiki/extensions/SpamBlacklist@wmf/1.38.0-wmf.16] Give priority to PreparedUpdate

https://gerrit.wikimedia.org/r/752277

Change 752529 merged by jenkins-bot:

[mediawiki/extensions/SpamBlacklist@master] Give priority to PreparedUpdate

https://gerrit.wikimedia.org/r/752529

Mentioned in SAL (#wikimedia-operations) [2022-01-10T14:49:46Z] <ladsgroup@deploy1002> Synchronized php-1.38.0-wmf.16/extensions/SpamBlacklist/includes/SpamBlacklistHooks.php: Backport: [[gerrit:752277|Give priority to PreparedUpdate (T288639)]] (duration: 01m 00s)

Ladsgroup claimed this task.

SpamBlacklist shows up in the list of duplicated parses and that confused me a lot. After checking in depth, it turned out that SpamBlacklist is just one of the traces, the other one is always the problematic one. This is resolved.