Page MenuHomePhabricator

page proofread status is not displayed in index page for some pages
Closed, ResolvedPublic

Description

Proofread status is not shown on index for some pages, mainly if a page was not edited since 2012 (but also for some newer pages), eg.:
https://pl.wikisource.org/wiki/Strona:PL_Nowodworski-Encyklopedia_koscielna_T.5_001.jpeg
https://pl.wikisource.org/wiki/Strona:PL_Nowodworski-Encyklopedia_koscielna_T.5_011.jpeg
https://pl.wikisource.org/wiki/Strona:PL_Nowodworski-Encyklopedia_koscielna_T.5_470.jpeg
The index page is:
https://pl.wikisource.org/wiki/Indeks:Encyklopedia_ko%C5%9Bcielna_(tom_V)

A page status became displayed after the page is null-edited (the null-edit is non-null in fact). Tested on
https://pl.wikisource.org/wiki/Strona:PL_Nowodworski-Encyklopedia_koscielna_T.5_003.jpeg

Some backward compatibility problem in proofreadpage?

EKt5.png (1×1 px, 463 KB)

(in response to Billinghurst's request for a checklist) Add to the checklist and fill it in once the pages on each wiki have been refreshed:

[ progress information concerning moving to the new page structure per Wikisource moved to a separate task: T200118 ]

Event Timeline

It's probably because of change rEPRP502ff8adeddde1749001df704e0f389c37cb6e5e that uses a page property for the quality lookup instead of the categories. It's much leaner and it allows to have the same storage for all Wikisources independently of the quality category name. The page property have been introduced by rEPRPec94d4f460c1 last September so I hoped most pages would have been purged but it seems it's not the case...

If there are too many affected pages I could maybe add a category based fallback.

@Tpt purge seems to be not enough here (is it a bug?). A null edit is needed to update the page. And I am afraid a bit that null-editing ALL pages may flood RC (as null edits are often not null actually; eg https://pl.wikisource.org/w/index.php?title=Strona:PL_Nowodworski-Encyklopedia_koscielna_T.5_003.jpeg&diff=prev&oldid=1805135 )

Is there a way (using quarry or API) to identifu/count pages needing update? This may be as well 1000 as 10000 or 100000 of them.

Comparing these two pages in the database:
https://quarry.wmflabs.org/query/27891
I cannot guess why one of them has the property:
https://pl.wikisource.org/w/index.php?title=Strona:Album_zas%C5%82u%C5%BConych_Polak%C3%B3w_wieku_XIX_t.1.djvu/015&action=info
while the other has not:
https://pl.wikisource.org/w/index.php?title=Strona:Album_zas%C5%82u%C5%BConych_Polak%C3%B3w_wieku_XIX_t.1.djvu/014&action=info

Any hints?

Ankry renamed this task from page proofread status is not displayed in index page for pages not edited since 2012 to page proofread status is not displayed in index page for some pages.Jun 30 2018, 6:05 AM
Ankry updated the task description. (Show Details)

@Ankry here is the query you are looking for: https://quarry.wmflabs.org/query/27922

I'm going to write a change to ProofreadPage that fallbacks to categories if the page property is not available.

Change 443209 had a related patch set uploaded (by Tpt; owner: Tpt):
[mediawiki/extensions/ProofreadPage@master] Fallbacks to the quality category when the page quality level page property is not set yet

https://gerrit.wikimedia.org/r/443209

Change 443209 merged by jenkins-bot:
[mediawiki/extensions/ProofreadPage@master] Fallbacks to the quality category when the page quality level page property is not set yet

https://gerrit.wikimedia.org/r/443209

@Tpt purge seems to be not enough here (is it a bug?). A null edit is needed to update the page. And I am afraid a bit that null-editing ALL pages may flood RC (as null edits are often not null actually; eg https://pl.wikisource.org/w/index.php?title=Strona:PL_Nowodworski-Encyklopedia_koscielna_T.5_003.jpeg&diff=prev&oldid=1805135 )

Had a play at enWS, and the touch becomes an edit on many occasions, maybe all occasions. That is a lot of edits. Fallback seems reasonable.

Fallback have been implemented and merged into ProofreadPage. It is live now on https://en.wikisource.beta.wmflabs.org and will be deployed on Wikisources next Tuesday.

@Tpt: thanks. Extended a bit: https://quarry.wmflabs.org/query/27928
It seems that except de.ws, it.ws and zh.ws, in most wikisources 50-95% pages are affected.

Workaround is a good point. However, I think that the bot job is also to be done at some point; not urgent now.
Maintaining backward-compatibility code may be expensive ane leading to sudden unexpected problems while implementing new features. So I think it will be dropped in future. And many affected pages are pages verified years ago, so not intended to be touched except maintenance.

Tpt: Thanks for considering the fall-back, and I was going to ask if this change is related to an old ticket I raised a few years ago? T185722 (and the related ticket elsewehere a T172408)

Per a conversation in enWS, there may be the need to touch all the pagestatus:0 and pagestatus:4 pages on wikis to ensure that they are pushed to utilise the newer system, rather than wait for these pages to be edited (as the bulk of them are unlikely to be edited again). We will need to recognise that this will kick a lot of edits, for example for enWS that looks to be circa 500k pages (5.8 days at 1 edit/second)

We should wait until the fix is out next week, then have a systematic approach to resolving, and for every language wikisource.

If that is acceptable to the community and to the technical teams managing servers, we will need to have a plan to

  1. get sufficient bot operators to assist
  2. get bot rights set suitably on the wikis
  3. identify a suitable rate of editing

We can use wikisource-bot on toolforge for this task, it is one of the reasons that it was set-up, and has multiple operators.

@Aklapper Is it possible to easily build a checklist (of the wikisources) within a phabricator ticket to ensure that we have done the tasks per wiki? If that capability exists, can you please point to some instructions. thanks.

pagestatus:4 pages at WSes (39)

pagestatus:0 pages at WSes (31)

I have been through and merged the duplicates (csWS) and added the missing, predominantly teWS, huWS, knWS. for ps0 and ps4.

[I note that some of the wikisources do not have a complete set of ps0, ps1, ps2, ps3, ps4 categories. ps0 and ps2 missing the most].

We can use wikisource-bot on toolforge for this task, it is one of the reasons that it was set-up, and has multiple operators.

@Aklapper Is it possible to easily build a checklist (of the wikisources) within a phabricator ticket to ensure that we have done the tasks per wiki? If that capability exists, can you please point to some instructions. thanks.

Just FYI: this task has already been done by me (AkBot) on pl.ws, de.ws and sourceswiki and by @Candalua (CandalBot) on it.ws. So these four may be omitted.
Concerning eficiency, I did it with speed 2-4 per second with no measurable effect in database lag or job queue (and note: CandalBot operated on it.ws at the same time). en/fr ws are a bit larger, but I think 1-2 per second ratio would be safe also for them.

I can also utilize my bot for this job if receiving bot flag on some wikis (it operates from two servers outside wikimedia infrastructure).

So, let's see if I got it right:

The status in the index pages (on enWS) are currently broken because some pages were last edited long ago or for some reason do not have the status property set. There was a fallback supposed to be deployed a couple of days ago, but as far as I can see nothing has changed (e.g. https://en.wikisource.org/wiki/Index:The_Book_of_the_Thousand_Nights_and_One_Night,_Vol_1.djvu, which was all yellow before, now it's mostly white and <50% yellow). The solution is to modify (null edit or otherwise) the affected pages one by one, either manually or with a bot, which has been done for a few languages, but not for enWS.

Is this correct? Any estimation when enWS could be done? Can I help?

So, let's see if I got it right:

The status in the index pages (on enWS) are currently broken because some pages were last edited long ago or for some reason do not have the status property set. There was a fallback supposed to be deployed a couple of days ago, but as far as I can see nothing has changed (e.g. https://en.wikisource.org/wiki/Index:The_Book_of_the_Thousand_Nights_and_One_Night,_Vol_1.djvu, which was all yellow before, now it's mostly white and <50% yellow). The solution is to modify (null edit or otherwise) the affected pages one by one, either manually or with a bot, which has been done for a few languages, but not for enWS.

Is this correct? Any estimation when enWS could be done? Can I help?

I have plans to start runs on enWS over the weekend on pages listed ps:0 and ps:4 category pages as the priority, and then if all is going well to progress to ps:2, ps:3 and ps:1 in that order. ps:1 are least urgent and I am guessing our biggest set.

Depending on bot rights for others and what I can organise with the stewards, I will look to get to the others at an orderly rate. Small wikis we will just whack all the pages in a hit, the mids maybe I haven't got that far. though quite possibly the same, just not too many concurrent sets on the same underlying servers (which I have to check).

Global bot request at https://meta.wikimedia.org/wiki/Steward_requests/Bot_status

hard link to request https://meta.wikimedia.org/wiki/Special:diff/18183889

As a comment, I think that there would be value in having a WIKISOURCE BOTS group that limits the action of our bots, and makes these things more readily manageable by us without concerns to the larger community.

There was a fallback supposed to be deployed a couple of days ago, but as far as I can see nothing has changed

Unfortunately this week no Mediawiki deploy was scheduled. So the release of the fallback will happen with the next deploy, which should reach Wikisources on 11 July.

noWS all jobs running (Wikisource-bot: local bot rights)
enWS ps:0 and ps:4 jobs running (Wikisource-bot: local bot rights)

General note: lots of these wikis are not on the global bots list. My request for rights for Wikisource-bot also has a rights for ratelimit derestriction alteration too.

I am also leaving a note at each of these smaller wikis asking them to consider to opt-in to the global bots wikiset to allow these fixes to happen more easily and with a wider use of resources.

@Billinghurst Just FYI: for zh.ws it would be more effective (IMO) to go through pre-genarated page list than through all pages / selected categories. This is because relatively small ratio of pages is affected for this wiki.
Pre-generated list of ALL affected pages van be found here: https://quarry.wmflabs.org/query/28053

@Billinghurst Just FYI: for zh.ws it would be more effective (IMO) to go through pre-genarated page list than through all pages / selected categories. This is because relatively small ratio of pages is affected for this wiki.
Pre-generated list of ALL affected pages van be found here: https://quarry.wmflabs.org/query/28053

@Billinghurst, if needed pywikibot provides support for SQL queries, e.g.

python scripts/listpages.py -mysqlquery:"SELECT page_namespace, CONCAT('Page:',page_title) FROM zhwikisource_p.page LEFT JOIN zhwikisource_p.page_props ON pp_page = page_id AND pp_propname = 'proofread_page_quality_level' WHERE pp_propname IS NULL AND page_namespace = 104 AND NOT page_is_redirect;"  -limit:10

The change that should fix this problem have been deployed yesterday. A purge of the pages where blank ids are displayed should fix the problem (for Index: pages it could be done with a not null edit on MediaWiki:Proofreadpage_index_template)

Sorry, I was wrong, the fix is not deployed yet. It should be deployed this evening UTC.

Tpt claimed this task.

The fallback to the categories seems to work. I plan to keep it as long as their exists Page: pages that do not contain the pageproperty

I confused. Is the issue resolved? It is necessary give the bot permission in wikisources for resolve the issue? Will it make only "touch" of Index pages?
If the bot is nominated for get rights of global bot, then not need to give the bot permission in ruwikisource? Can just close nomination for bot in ruwikisource and other?

I confused. Is the issue resolved? It is necessary give the bot permission in wikisources for resolve the issue? Will it make only "touch" of Index pages?
If the bot is nominated for get rights of global bot, then not need to give the bot permission in ruwikisource? Can just close nomination for bot in ruwikisource and other?

Indeed, this was not clear as the initial problem was resolved using a workaround, but the source of this problem is not resolved. I've created another task T200118 for this.

I'm wondering if it'd not be easier if we could do this server-side. Do we have a maintenance script that could purge/touch pages instead? Thanks.

I'm wondering if it'd not be easier if we could do this server-side. Do we have a maintenance script that could purge/touch pages instead? Thanks.

If there is such a script, there is also a question: how long will it take to make the decission to run it?
If a week or two, then it might be easier; if few months or longer, then not.