Wikipedia:Bots/Requests for approval/Tom.Bot 7
- The following discussion is an archived debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA. The result of the discussion was Withdrawn by operator.
Operator: Tom.Reding (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)
Time filed: 03:18, Wednesday, October 17, 2018 (UTC)
Automatic, Supervised, or Manual: manual
Programming language(s): AWB
Source code available:
Function overview: Replace non-standard non-keyboard apostrophes with the WP:MOS#Apostrophes-preferred keyboard apostrophe '
.
Links to relevant discussions (where appropriate): WP:AWB/Typos#Apostrophe S
Edit period(s): One-time bulk run with sparse follow-ups
Estimated number of pages affected: ~565,000
Exclusion compliant (Yes/No): Yes
Already has a bot flag (Yes/No): Yes
Function details: An expansion to the "apostrophe-s" AWB typo rule by Smasongarrison has identified at least 500k pages with this fix (the database scanner maxed out, but I can get a better estimate if requested). As pointed out by John of Reading, the frequency of this rule firing may often swamp other changes to the same page.
Due to the size, simplicity, and inseverity of this fix, it seems like an ideal bot task.
The regex will resemble (\w)[´ˈ׳᾿’′Ꞌꞌ`]s\b(?<!\.[^\s\.]{0,999})
, to be replaced with $1's
. Additional non-keyboard apostrophes may be added, with the exception of the {{okina}} (for just-in-case-ness, as brought up by Certes).
WP:GenFixes can be left on.
All participants of the originating discussion have already been pinged, above, for their information and possible input to questions raised here.
Discussion
edit- Needs wider discussion. If you want to make such a minor change on half a million pages you will need to establish a strong consensus in a well attended location first. Suggest you open a discussion at WP:VPR. — xaosflux Talk 03:28, 17 October 2018 (UTC)[reply]
- Especially as you want to run genfixes blindly on all the pages as well. — xaosflux Talk 03:34, 17 October 2018 (UTC)[reply]
- I said "WP:GenFixes can be left on" (my emphasis). ~ Tom.Reding (talk ⋅dgaf) 04:08, 17 October 2018 (UTC)[reply]
- Xaosflux, what # of pages would you say would not require a WP:VPR? The reason being that there will be some histogram of pages with this correction (e.g. many with 1 correction, fewer with 2, still fewer with 3, etc.), and I can restrict this BRFA only to those pages with, say, 5 or more corrections (whatever number-of-corrections is required to bring the total # of pages affected down to BFRA-level). Then, if/when I want to complete the rest of them, the ones with, say 1-4 corrections, I can go to WP:VPR later. ~ Tom.Reding (talk ⋅dgaf) 04:08, 17 October 2018 (UTC)[reply]
- Technical trial can certainly be tested. If you think this is worthwhile, what issue do you see with getting wider discussion? — xaosflux Talk 04:11, 17 October 2018 (UTC)[reply]
- The pages with a large number of corrections each is the main issue here; I don't want perfection to get in the way of progress. I'd rather take care of the worst cases quickly than all of them slowly. ~ Tom.Reding (talk ⋅dgaf) 04:17, 17 October 2018 (UTC)[reply]
- @Tom.Reding: perhaps an example would help illustrate this, please provide a list of 10 pages with 10+ instances below for general review. — xaosflux Talk 13:02, 17 October 2018 (UTC)[reply]
- Xaosflux, as requested:
- @Tom.Reding: perhaps an example would help illustrate this, please provide a list of 10 pages with 10+ instances below for general review. — xaosflux Talk 13:02, 17 October 2018 (UTC)[reply]
- The pages with a large number of corrections each is the main issue here; I don't want perfection to get in the way of progress. I'd rather take care of the worst cases quickly than all of them slowly. ~ Tom.Reding (talk ⋅dgaf) 04:17, 17 October 2018 (UTC)[reply]
- Technical trial can certainly be tested. If you think this is worthwhile, what issue do you see with getting wider discussion? — xaosflux Talk 04:11, 17 October 2018 (UTC)[reply]
- Especially as you want to run genfixes blindly on all the pages as well. — xaosflux Talk 03:34, 17 October 2018 (UTC)[reply]
Questions: How will this bot avoid introducing unintended italic and bold formatting into pages? How will it avoid replacing prime marks and other intentional punctuation that resembles quotation marks but should not be replaced? – Jonesey95 (talk) 05:05, 17 October 2018 (UTC)[reply]
- Jonesey95, I think what I'll do is use that regex to crudely estimate how many uses there are per page, then use AWB's typo fixing on the worst ones. Since AWB employs many of its own checks to make sure typos are applied in the right spots, I won't bother trying to replicate it all. This means it won't be in automatic mode but manual. This is what I've been doing, but people such as Materialscientist have objected, at least to pages where this is the only change. Materialscientist, what #, if any, of these apostrophe changes on a page, assuming worse-case that they are the only changes to a page, would not cause objection? ~ Tom.Reding (talk ⋅dgaf) 13:07, 17 October 2018 (UTC)[reply]
- That sounds reasonable to me. BTW, I support these edits. The reason for my question is that I have done similar edits on a small scale, and I occasionally run into a curly quote mark immediately following italic markup, which then becomes bold formatting if you don't catch it.
-
- I do not think Materialscientist's objections are based on any valid policy or rule; I change curly quotes to straight quotes as my only edit with some frequency; the edits may be minor, but they absolutely change the rendered page. A minor edit to fix one visible problem on a mainspace page is a valid edit in nearly every circumstance.
-
- Have you considered working on quotation marks as well? – Jonesey95 (talk) 15:26, 17 October 2018 (UTC)[reply]
- Then I think I'll add a slightly safer precaution
'\w
to the live rule's negative lookbehind, which can be changed to''\w
after the bulk are complete. - Yes, I considered working on quotations for all of...a few minutes...
- I would be useful if there was some way to programatically determine, before saving in AWB, how many times a rule fired, like parsing the final edit summary, but I'm not sure how to do this without recreating all of the scaffolding around typo fixes. ~ Tom.Reding (talk ⋅dgaf) 17:01, 17 October 2018 (UTC)[reply]
- I see you have changed your request to purely manual, how do you propose to reduce the scope/# of pages to something that can be done manually? — xaosflux Talk 02:05, 18 October 2018 (UTC)[reply]
- Xaosflux, I've been running a scan, which will still take about another day to finish. In fact, being in manual, and after seeing Jonesey95's comments & support, I might just withdraw the request until some later time when I'm more confident to put the bot in at least supervised mode, basically by isolating this particular typo so that others don't fire. That request would be for the vast majority of corrections (i.e. < 5 or < 10 corrections per page) and after much testing. ~ Tom.Reding (talk ⋅dgaf) 12:53, 18 October 2018 (UTC)[reply]
- @Tom.Reding: OK, let us know one way or the other. (Or eventually this will just move to 'expired' if discussion dies out). — xaosflux Talk 12:57, 18 October 2018 (UTC)[reply]
- Xaosflux, I don't know how long it'll take before I get around to this (could be a few days, could be a few weeks). If it's ok with you, I'd prefer to keep this request active until either it expires or I'm more sure of my time/interest, whichever comes first. If not, I can withdraw. ~ Tom.Reding (talk ⋅dgaf) 17:02, 18 October 2018 (UTC)[reply]
- Withdrawn by operator. Will continue to do this manually. ~ Tom.Reding (talk ⋅dgaf) 01:10, 11 November 2018 (UTC)[reply]
- Xaosflux, I don't know how long it'll take before I get around to this (could be a few days, could be a few weeks). If it's ok with you, I'd prefer to keep this request active until either it expires or I'm more sure of my time/interest, whichever comes first. If not, I can withdraw. ~ Tom.Reding (talk ⋅dgaf) 17:02, 18 October 2018 (UTC)[reply]
- @Tom.Reding: OK, let us know one way or the other. (Or eventually this will just move to 'expired' if discussion dies out). — xaosflux Talk 12:57, 18 October 2018 (UTC)[reply]
- Xaosflux, I've been running a scan, which will still take about another day to finish. In fact, being in manual, and after seeing Jonesey95's comments & support, I might just withdraw the request until some later time when I'm more confident to put the bot in at least supervised mode, basically by isolating this particular typo so that others don't fire. That request would be for the vast majority of corrections (i.e. < 5 or < 10 corrections per page) and after much testing. ~ Tom.Reding (talk ⋅dgaf) 12:53, 18 October 2018 (UTC)[reply]
- Then I think I'll add a slightly safer precaution
- Have you considered working on quotation marks as well? – Jonesey95 (talk) 15:26, 17 October 2018 (UTC)[reply]
- Note the other bot requests on this page RonBot 12 and Galobot 2, trying to fix the problems listed in CAT:MISSFILE. Some of these are known to occur by semi-auto edits changing file names with odd punctuation. An example is File:Forrest Guth, Clancy Lyall and Amos “Buck” Taylor 117380.jpg, where an AWB edit (diff) changed the "fancy quotes" to "normal quotes" (edit summary = standard quote handling in WP;standard Apostrophe/quotation marks in WP; MOS general fixes), and of course that broke the link. How will the bot avoid changing file names? - Which will be either of the form of the full file name or without the "File:" prefix in an infobox. Ronhjones (Talk) 01:27, 18 October 2018 (UTC)[reply]
- Ronhjones, I used the 'before' version of your example diff to isolate exactly what AWB's WP:GenFixes & WP:AWB/Typos fixed, and what was done by that user. AWB ignores files, images, and a host of other potential-problem-areas, which have been found over years of development. However, a user can write their own rules/code that aren't as well thought out, which was the case here, and which won't be the case for this bot. ~ Tom.Reding (talk ⋅dgaf) 12:53, 18 October 2018 (UTC)[reply]
- Thanks for that. Apparently this is not an unusual event. Nice to know that your bot will not add to the problems. Ronhjones (Talk) 15:38, 18 October 2018 (UTC)[reply]
- The above discussion is preserved as an archive of the debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA.