Jump to content

User talk:Citation bot/Archive 0

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia

This is an old revision of this page, as edited by Smith609 (talk | contribs) at 21:20, 12 February 2011 (→‎Confusing issue and page numbers: Explain reason for bug; request solutions). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

Note This page is deprecated as I try to make bugs simpler to report and resolve. New bugs are now reported at User talk:Citation bot.


Perennial problems

Updating year for articles on final publication

One other change I noticed in that same edit. It's quite common to cite medical articles when they have been published online but have not been officially assigned year, volume, and pages. For example, Autism therapies formerly contained this citation:

Shimabukuro TT, Grosse SD, Rice C (2007). "Medical expenditures for children with an autism spectrum disorder in a privately insured population". J Autism Dev Disord. doi:10.1007/s10803-007-0424-y. PMID 17690969.{{cite journal}}: CS1 maint: multiple names: authors list (link)

because the paper was published online in 2007. Eventually the paper was published in the official journal in 2008, and Citation bot updated the citation by adding volume=38 and pages=546, resulting in this partially-improved version:

Shimabukuro TT, Grosse SD, Rice C (2007). "Medical expenditures for children with an autism spectrum disorder in a privately insured population". J Autism Dev Disord. 38: 546. doi:10.1007/s10803-007-0424-y. PMID 17690969.{{cite journal}}: CS1 maint: multiple names: authors list (link)

To finish the improvement, I had to manually change the year=2007 to year=2008, add issue=3, and add the last page number (552), resulting in the following:

Shimabukuro TT, Grosse SD, Rice C (2008). "Medical expenditures for children with an autism spectrum disorder in a privately insured population". J Autism Dev Disord. 38 (3): 546–52. doi:10.1007/s10803-007-0424-y. PMID 17690969.{{cite journal}}: CS1 maint: multiple names: authors list (link)

I understand that Citation bot does not have the issue=3 and the last-page 552 information available, so it cannot fix that part of the citation. However, it does have the date available, so it could update year=2007 to year=2008, thus saving me a bit of work. (I have to clean up after the Citation bot a lot, so every bit would help.) Could you please fix the citation bot to add 1 to the year if necessary, when it adds a volume= and pages= info? Thanks. Eubulides (talk) 16:54, 14 October 2008 (UTC)

I can do this. The downside is that where the data in the central database is incorrect, there is no way for users to stop the bot inputting the incorrect year each time it visits a page. I'll leave it up to you to decide which will cause editors more inconvenience – it's a tricky one to resolve! Martin (Smith609 – Talk) 23:01, 14 October 2008 (UTC)
The common pattern I run into is that I cite a prepublication version of a paper dated 2008, and then the final version comes out in 2009. Typically the prepublication version lacks volume and page number (since it hasn't been decided yet). So how about this heuristic: if the citation lacks volume and page number and its year is lower than the published year and its year does not have a comment, then update the year; otherwise, leave the year alone. This heuristic would handle most of the problems I run into, and should be easy to override (with a comment) in the rare cases that it goes awry. Eubulides (talk) 16:08, 6 July 2009 (UTC)

Undesirable location= and publisher= for Cite book

This edit to Autism added "|publisher= AMERICAN PSYCHIATRIC PRESS INC (DC) |location= United States" to two citations of DSM-IV-TR. In both cases, the publisher= and location= information is undesirable: a "location= United States" is useless for an American organization, and a "|publisher= AMERICAN PSYCHIATRIC PRESS INC (DC)" is simply duplicated (and poorly-capitalized) information for a citation that already says "|author= American Psychiatric Association". The Citation bot used to not make changes like this; can you please fix it so that it continues to not make these changes, or let me know how to shut it off for these citations? In the mean time I cleaned up by hand. Thanks. Eubulides (talk) 04:54, 23 October 2008 (UTC)

I've been thinking about this; can you propose a solution for how the bot can work out when it's inappropriate to add a publisher and location to a citation? If not, the usual trick of adding a <!-- comment --> into any field you want the bot to ignore will work. And I'll make the capitalisation prettier when I get the chance. Martin (Smith609 – Talk) 15:38, 25 October 2008 (UTC)
Hmm, well, I can't think of a good heuristic in general. But one thing does stick out: how about not inserting "|location=" if the ISBN is present? With modern books, the location information is almost invariably useless and even misleading information. Readers don't need to know that the Oxford University Press is in Oxford, and for a major publisher like McGraw-Hill it's pretty much irrelevant whether the book says that it was published in New York or in Chicago. Eubulides (talk) 16:13, 6 July 2009 (UTC)
Omitting location information from citations generally would need to be discussed more widely, perhaps at "Wikipedia talk:Citing sources" or the village pump. It is still the practice in many contexts (journal articles and library catalogues, for instance) to provide the location of publishers. Also, some publishers produce different editions of the same book in different locations, so the location information helps to differentiate one country edition from another. — Cheers, JackLee talk 18:03, 6 July 2009 (UTC)

Bot never finishes on "Causes of autism"

When I visit http://toolserver.org/~verisimilus/Bot/DOI_bot/ and enter "Causes of autism", check only the "Thorough mode" box (without committing edits), and hit "Submit Query", the bot seems to give up about halfway through. The last few lines of output look like this. Maybe that citation is putting it into a loop?

Mercury exposure and child development outcomes
Already has a DOI. All details present – no need to query CrossRef. No CrossRef record found.
Determining format of URL...assessing URL Done.
Checking that the DOI is operational...

Eubulides (talk) 19:29, 14 November 2008 (UTC)

I'll look into it; there are still some issues with the toolserver servers which are making debugging difficult at the moment, so it might be a short while. Martin (Smith609 – Talk) 22:27, 14 November 2008 (UTC)
Thorough mode is ugly – it might be a while before I can fix this. Meanwhile, it works in standard mode. Martin (Smith609 – Talk) 03:23, 17 February 2009 (UTC)


UPPERCASE change to Titlecase inappropriately

Some journals have all uppercase words, e.g., FEBS Journal. Your otherwise very useful bot changes these to titlecase (Febs in this case). Xasodfuih (talk) 20:05, 14 January 2009 (UTC)

For practical reasons I have to add these exclusions on an individual basis. Let me know if there are any others. Martin (Smith609 – Talk) 03:14, 17 February 2009 (UTC)
You can now list exclusions at User:Citation bot/capitalisation exclusions. Martin (Smith609 – Talk) 21:12, 13 May 2009 (UTC)

Suggestion: ISBN

Any reason why the bot doesn't search for ISBNs for {{cite book}}?Headbomb {ταλκκοντριβςWP Physics} 08:46, 17 December 2008 (UTC)

API thingy whatever an API is.Headbomb {ταλκκοντριβςWP Physics}
The database only permits 100 automated queries per day; once the bot has exceeded this limit it cannot search for more. These queries are prioritised so manually-initiated uses of the bot get first dibs on the queries. Martin (Smith609 – Talk) 17:30, 3 January 2009 (UTC)

Outstanding bugs and suggestions

Citation bot 3 is removing all author's first initials

When it replaces 'author' with 'last/first, last2/first2', etc., a name like Smith AB is replaced with Smith B. Also, it sometimes combines adjacent author names, ie 'Smith AB, Jones CD' may become 'Smith J.C.', with the periods added. I have been undoing these edits to the ref templates in my watchlist... as I'm pathologically nitpicky. Anyway, just a heads up.-- Rcej (talk) 02:11, 6 September 2009 (UTC)

Oh dear, I thought I'd fixed that. There seem to be so many subtly different ways of coding author parameters, and no easy way to automatically distinguish them. I recall running across your name in many of the problematic cases; I suspect that you had been very helpfully adding authors that the bot had missed to the list using comma separators. Unfortunately this format confused the bot, as I couldn't create an algorithm that could robustly determine whether 'SMITH, JO' is Jo Smith or J.O. Smith. However, the edits you describe as seeing are less easy to rationalise... could I enquire, are you only encountering the errors on citations you have edited by hand (is it possible that they occur elsewhere)? If so, is the magnitude of such pages of the order that I could manually go through and correct them myself? The bot is now much better trained at finding authors, so it should no longer be necessary for users to manually add second authors – if we can fix the existing citations, then the bug should not manifest itself again. Sorry to have inconvenienced you. Martin (Smith609 – Talk) 01:25, 7 September 2009 (UTC)

The ref templates that I have edited are the ones I initially 'queue jumped', and are still in my watchlist... which are too many to keep track of, but my 9/5 thru 9/6/09 contribs list shows many of them. I've never had to add an author; my edits prior to undoing Citation Bot 3's 9/5 activity consisted of –

1. reformatting author listings from the default 'Smith, Ab;' to read 'Smith AB,'

2. removing any url that doesn't provide 'Free full text', though indicated to; or removing urls that point to relevant disease info websites instead of the journal abstract/article. Also, when there is a PMC for the citation, removing the url lets the PMC hyperlink the title... in some of those instances, I've removed viable urls only if there is a PMC that gives a more easily accessible 'Free full text' version.

3. adding a missing PMID, which is a rare occurrance

Those are pretty much my routine edits to the templates I initiated through Citation Bot. I hope this info. helps... sorry if it's too sketchy sounding. It seemed like templates I've edited were the only ones CB3 edited, but I haven't confirmed that.-- Rcej (talk) 03:07, 7 September 2009 (UTC)

Escaped pipe in title

This edit seems to have been an error due to the bot ignoring the <nowiki></nowiki> tags around a pipe character in the title field of the citation. The result was a broken template so I reverted. --Dbratland (talk) 17:54, 16 May 2010 (UTC)

Perhaps the bot could fix this in future by replacing the escaped pipe with &#x7c;, which renders as |. Would this work in all cases? Martin (Smith609 – Talk) 18:11, 16 May 2010 (UTC)
Makes it pretty hard for humans to read, unless everyone knows what things like &#x7c; mean. It might work but I would lean towards the page being written for the convenience of human editors rather than bots. --Dbratland (talk) 18:59, 16 May 2010 (UTC)

The bot tends to add links to JSTOR even if equivalent DOI is already given. Here is an example. Two major objections:

Therefore, I would strongly object to adding direct links to JSTOR if equivalent DOI link is already given. Maxal (talk) 17:47, 19 May 2010 (UTC)

Not everyone has the same set of subscriptions. In the case you give, my library provides access to that paper via JSTOR but not via the DOI which resolves to Cambridge Journals Online. Both have utility. LeadSongDog come howl! 18:10, 19 May 2010 (UTC)
I think, resolving DOI does not depend on subscription. Are you sure that doi:10.2307/2008781 resolves to Cambridge Journals Online (the journal Mathematics of Computation has nothing to do with Cambridge)? If so, could you please show a particular link to which it resolves? Maxal (talk) 21:56, 19 May 2010 (UTC)
My apologies, I somehow confused it with doi:10.1017/S0305004100038470, which was from the other ref affected in that edit. Too many windows open at once, I'm afraid. Of course a DOI that resolves to a JSTOR page is equivalent to or better than the JSTOR link itself. LeadSongDog come howl! 02:05, 20 May 2010 (UTC)
The bot should not add links to JSTOR. It should add |jstor=2008771, or |id={{JSTOR|2008771}}. Headbomb {talk / contribs / physics / books} 20:22, 19 May 2010 (UTC)
While I agree on not adding links to JSTOR, I think that having multiple references to JSTOR is still excessive. Either |doi=10.2307/2008781, or |jstor=2008771 is enough, but not both (the former is preferred, I guess). But in the cases, when doi is missing or resolves to a website different from JSTOR, having |jstor=2008771 is useful. Maxal (talk) 22:10, 19 May 2010 (UTC)
In some cases, the 10.2307/ doi does not work with JSTOR articles (in others, it does). The bot should be able to determine this a fair but not 100% level of accuracy. Given that, once you reach a consensus on the optimal behaviour, I'll be happy to implement it. Martin (Smith609 – Talk) 21:06, 19 May 2010 (UTC)
I think, the optimal behavior for the bot is: (i) if doi is present and resolves to JSTOR, do nothing; (ii) if doi is missing or resolves to a site different from JSTOR, add |jstor=... (if it's not yet present); (iii) remove existing links |url=... to JSTOR and process the remaining citation as in (i) and (ii). Maxal (talk) 18:44, 26 May 2010 (UTC)
Maxal's suggestion makes sense to me. LeadSongDog come howl! 17:36, 1 June 2010 (UTC)

[1] This problem still exists. And also, for some reason Citation bot adds |publisher= although |journal= etc. is given. —bender235 (talk) 20:22, 24 July 2010 (UTC)

This is from three days ago: [2]. Waltham, The Duke of 09:23, 28 September 2010 (UTC)
(i) Could somebody explain why using |jstor = ? It is not documented in {{cite journal}} and I couldn't make it work by experiment. (ii) Mind you, {{JSTOR}} is an extra template, and the number of templates per article is limited. Materialscientist (talk) 06:11, 2 January 2011 (UTC)

Addition of incorrect publisher, changing of URLs

Citation bot 1 is adding incorrect publisher data to correctly-formatted citations. In the above example, it added Nielsen Business Media, Inc. as both the publisher and the author of two Billboard articles written in the 1940s and 1950s. Although Billboard was owned by Nielsen between 1993 and 2009, it wasn't the publisher when the article was written, nor is it the publisher now. The redundant "Nielsen Business Media, Inc. Nielsen Business Media, Inc." is also not helpful. Also, the bot appears to be changing working (valid) URLs that were tested as working using other Wiki tools. Firsfron of Ronchester 18:56, 28 May 2010 (UTC)

Also, I tried to use Citation bot late last year and got different but also incorrect author and publisher information. Firsfron of Ronchester 19:04, 28 May 2010 (UTC)
Citation bot has recently misidentified two preprints[3][4] on Arxiv.org that were cited in the article Astronomical unit as being published by some organization called "The Journal of Business". This is wrong on three counts:
  • It disagrees with the publication details given at Axiv.org,
  • It is prima facie unlikely that astronomical articles would be published in a business journal,
  • It is incorrect for a journal name to be listed as a publisher.
Could you tidy up this error in the bot; I've taken care of the article. Thanks SteveMcCluskey (talk) 21:29, 6 June 2010 (UTC); revised 22:01, 6 June 2010 (UTC)
Thanks for this report and for fixing the article; the publisher data is obtained from http://referee.freebaseapps.com/ and I have contacted the maintainer of this API to request that he examine this bug. Martin (Smith609 – Talk) 14:34, 7 June 2010 (UTC)
  • hi, i've checked the examples and the recent history of the citation bot, and nothing has come from my api. the first example, the url is from google books, and if you click 'About this magazine' you'll see 'Nielsen Business Media, Inc.' .. so if its an error, its google's error.

(if its sent to my api, google isnt a publisher, so its blank.- [5] .)
I imagine the Arxiv error is similar. Spencerk (talk) 18:13, 7 June 2010 (UTC)

Page number from appendix

Is this edit correct? It looks like the bot may be treating a page number from appendix "D" as a range of pages (from "D" through 5). --Stepheng3 (talk) 05:05, 9 June 2010 (UTC)

It needn't be an appendix. It might be a section identifier; I have come across books before where there were (say) four main sections, each of which began page numbering at 1: thus there were pages A-1 to A-64, B-1 to B-32, C-1 to C-196 etc. --Redrose64 (talk) 11:02, 9 June 2010 (UTC)
Nevertheless, Stephen is right that in this instance, a hyphen rather than an en-dash would be the appropriate punctuation. Could you suggest a rule whereby the bot could discern between a section-page and and page numbers containing letters (e.g. e142–e159; xi–4)? Martin (Smith609 – Talk) 15:05, 9 June 2010 (UTC)

Repeated authors?

Hi, do you know why {{Cite doi/10.2307.2F604080}} (JSTOR 604080) contains last3, author1 and author2? There should be (at most) two authors. Shreevatsa (talk) 02:35, 15 June 2010 (UTC)

Hm, I bet it's something to do with how Jstor stores special characters. I'll investigate this when I get the opportunity. Martin (Smith609 – Talk) 15:18, 15 June 2010 (UTC)

Is this one related? It's for a paper with a single author, with an accented name, on jstor; the bot added a second copy of the name without the accent. —David Eppstein (talk) 17:38, 5 August 2010 (UTC)

ed = editor 3rd

Here "ed = 3rd" was corrected to "editor = 3rd". I changed this to "edition = 3rd" (of course) but wanted to report the bug. --Fama Clamosa (talk) 10:29, 19 June 2010 (UTC)

Problem with |unused_data when no spaces exist

Example --JimWae (talk) 10:39, 19 June 2010 (UTC)

Mangled author info

In this edit, the bot added clearly mangled author information for this book. (I had left the author info in the citation template blank because the work is authored by the publishing organization, and Google Books' consequent description of the author is somewhat lame.) Magic♪piano 10:48, 15 July 2010 (UTC)

Google just reflects the lame author data from the Library of Congress (who did the scanning for the Internet Archive). See OCLC 1850353 and its linked handles too. Bot fails at handling the complex case |author=Norwich (Conn.); General Society of Colonial Wars (U.S.). Connecticut. LeadSongDog come howl! 11:33, 15 July 2010 (UTC)
This bug was fixed in r172. Martin (Smith609 – Talk) 13:54, 15 July 2010 (UTC)

It is still happening, are these real authors?

-- SWTPC6800 (talk) 01:12, 22 July 2010 (UTC)

These were added because there was no publisher, editor etc information, so the expansion seemed warranted. I would propose that the best way to avoid these bugs would be to maintain a list of words that aren't likely to appear in names, i.e. not to add as an author anything containing "Society", "Corporation", "Magazines", etc. I can implement this when I get the chance if it sounds workable. Martin (Smith609 – Talk) 14:17, 23 July 2010 (UTC)
I don't have time to check this thoroughly right now, but it looks like there's a period at the end of each of these authors in their worldcat entries. This might be a useful indication of corporate authorship. See, e.g. this. LeadSongDog come howl! 16:46, 23 July 2010 (UTC)
Many magazines publish articles that have no attributed author, these are normally but not always written by a member of the magazine's staff. I have also used advertisements in magazines as a reference. Who is the author, the advertising agency or the product company? It most certainly is not the magazine publisher. Ad for Radio Hat There is no need to add an author to every reference. -- SWTPC6800 (talk) 02:50, 24 July 2010 (UTC)

Dead link: false positive

In this edit, Citation bot flagged this link as dead. It is not. --Stemonitis (talk) 16:48, 16 July 2010 (UTC)

False positives seem quite common with dead links and are presumably caused by short-term outages of the hosting servers. I am contemplating disabling the checking of links for activity; what do people think? Martin (Smith609 – Talk) 21:00, 16 July 2010 (UTC)
How about queuing a list of them for later revisitation? If they're back when revisited the tag can be deleted, otherwise it will sit in Category:Articles with dead external links for a couple of years until fixed. LeadSongDog come howl! 21:26, 16 July 2010 (UTC)
Disabled link-checking in r178; may re-enable when this issue and the placement of the deadlink template can be resolved.

Keep cite ordering in-place

In this edit[6], the bot fixed the spelling of a cite parameter (good), but also moved it to the end of the citation, away from the related name parameters (not so good). Could you perhaps please leaving the ordering/grouping of related parameters as-found. Many Thanks, —Sladen (talk) 10:19, 21 July 2010 (UTC)

Hm, I attempted to implement this in r175; I'm not sure why this edit slipped through the algorithm. I'll investigate as soon as I can. Martin (Smith609 – Talk) 18:41, 21 July 2010 (UTC)

Incorrect "broken doi" tagging; seems critical

Example, not an isolated one. The tagged dois are valid and are clickable before the bot operation (the bot actually used some of those dois to expand the refs.). Materialscientist (talk) 23:39, 4 August 2010 (UTC)

Blocked the bot. While the bot is enormously useful, this bug appears damaging and might need a re-run of the bot over recent edits. Materialscientist (talk) 23:47, 4 August 2010 (UTC)
Thanks for stepping in with a block whilst I fixed this. It is resolved in r184 and no longer marks the problematic DOIs in Benzene as broken, so it should now be safe to unblock the bot. If the problem recurs, feel free to block the bot again. Martin (Smith609 – Talk) 21:15, 5 August 2010 (UTC)
Unblocked, thanks. Materialscientist (talk) 22:08, 5 August 2010 (UTC)

Resumed. Especially clear when the bot is run over an article once and then again (I do that often because the bot misses some parameters from one run, e.g. adding both doi and pmid, etc). Focus on doi:10.2113/gsecongeo.39.2.109. Here, the bot expanded it and then tagged as broken. I then went to the bot history and stumbled upon a funny example when the bot first untagged dois and then re-tagged them. Reverse example [7] [8]. Materialscientist (talk) 06:11, 13 August 2010 (UTC)

The best explanation I can think of is that the bot is temporarily unable to connect to the server, so think that the DOI is broken. To get around this, I guess that I can try the bot on a DOI that is known to work, and see if it also thinks that this is broken; if so, it won't mark the DOI as inavtive. Does this sounds like a workable solution? Martin (Smith609 – Talk) 16:05, 17 September 2010 (UTC)
I think you're right that this falls into the category of server/connection glitches. I haven't seen this recently. Your idea to have some reference for the bot to sense server/connection problems sounds great, as it might avoid other problems I don't see, not only doi-related. Materialscientist (talk) 05:09, 28 September 2010 (UTC)

Confusing issue and page numbers

Citation bot 1 keeps confusing issue numbers for page numbers for online publications. See this edit of Lemur, particularly the refs named "2009Groeneveld", "2008Braune", and "2008Orlando". – VisionHolder « talk » 18:49, 1 September 2010 (UTC)

I suspect that this is an error in the publisher's database. I'll investigate further when I get the opportunity. Martin (Smith609 – Talk) 15:57, 17 September 2010 (UTC)
Any update on this? The bot is still doing this, and is one of the most common errors I have to go back and fix. One of the latest examples: [9] – VisionHolder « talk » 13:19, 3 December 2010 (UTC)
Looking at the XML from PMID 18442367, we see

               <JournalIssue CitedMedium="Internet">
                   <Volume>8</Volume>
                   <PubDate>
                       <Year>2008</Year>
                   </PubDate>
               </JournalIssue>

and

           <Pagination>
               <MedlinePgn>121</MedlinePgn>
           </Pagination>

so clearly they've got the same error. I'd suggest that in such cases, where no true issue number nor page number applies, we would do better to follow the format given in Citing Medicine as shown as example "36. Journal article on the Internet with location/extent expressed as an article number". To whit, it shows: Pasanen K, Parkkari J, Pasanen M, Hiilloskorpi H, Mäkinen T, Järvinen M, Kannus P. Neuromuscular training and the risk of leg injuries in female floorball players: cluster randomised controlled study. BMJ [Internet]. 2008 Jul 1 [cited 2008 Nov 17];337:a295 [7 p.]. Available from: http://www.bmj.com/cgi/reprint/337/jul01_2/a295 Free full text article. DOI: 10.1136/bmj.a295 Of course that implies that we define a new |articleno=a295 which overrides both issue and page.LeadSongDog come howl! 20:33, 3 December 2010 (UTC)

I'm fine with this. If someone implements it, let me know and I can fix up some of my articles. Alternatively, I can let Citation bot actually save me work rather than create it. – VisionHolder « talk » 06:09, 19 December 2010 (UTC)
The data is actually coming from CrossRef rather than PubMed. CrossRef's first_page data is usually good, so there are some possibilities:
  • Report the error to CrossRef and hope that they fix it
  • Don't add any page numbers if they happen to be the same as an issue number
  • Don't add page numbers if the journal is equal to BMC biology
The first option would be the best if it worked. Any other suggestions are welcome! Martin (Smith609 – Talk) 21:20, 12 February 2011 (UTC)

Special symbols

When expanding doi:10.1007/BF01397171, the bot took letters ü as �, but I could copy/paste those letters from the doi-target page to wikipedia. Is it possible to preserve such symbols or they are lost somewhere at crossref? Materialscientist (talk) 11:07, 15 September 2010 (UTC)

Resolved

in r249.

Name mangling in doi templates

Has this sort of thing been fixed? Rich Farmbrough, 04:23, 18 September 2010 (UTC).


Redundant data?

Seems to have the Jstor number thrice, in the second citation changed. Rich Farmbrough, 04:38, 18 September 2010 (UTC).

Four times... Rich Farmbrough, 04:41, 18 September 2010 (UTC).
Resolved
in r253. Martin (Smith609 – Talk) 21:07, 12 February 2011 (UTC)

Multiple names

The attempted application at Copula (statistics) for cite to Onken et al. left all authors on Last= line and 5 rather than 4 "first*=". Result edited manually. Melcombe (talk) 11:14, 21 September 2010 (UTC)

Google Books URLs again

Two kinds of edits to Google Books URLs are happening. After a recent edit by the bot, I tested each of the edited URLs and found none of the edits were correct. One type of edit was previously discussed. The other is visible in this comparison of diffs from before the bot ran until after I edited to the correct URLs, thus bypassing the immediate result of the bot, thus showing the additional kind of edit by the bot. Thanks. Nick Levinson (talk) 06:03, 23 September 2010 (UTC)

In each of the changes in the first edit, the bot seems to have removed the redundant parameter &hl=en; the desired behaviour. In the second edit, the bot appears not to have changed "false" to "true", which you modified yourself. I cannot determine the difference that this makes to the page rendered by Google but I may be missing something. Is there an algorithm that the bot can use to determine, for each link, whether to modify the "f" parameter from that set by the initial editor? Or perhaps the bot should remove it entirely, as it does not seem to 'do' anything? If so, I'll be happy to implement it. Martin (Smith609 – Talk) 12:42, 23 September 2010 (UTC)
In each case, I tested by pasting the bot-generated URL into a new browser tab and not by typing it, copying it from another source, or doing a new search in a search engine. Thus, it was the bot-generated URL that was redirected by Google to another URL, which means that, from Google's perspective, the bot is making errors for both kinds of changes that Google effectively rejects by redirecting. If Google wants the directory structure and the two parameters the way they are after the redirect, then we should supply them to be sure of getting the page even if Google stops redirecting to it because of low traffic through the redirect. Nick Levinson (talk) 15:53, 23 September 2010 (UTC)
I've not been able to find any indication that Google are planning to change their URL structures. Perhaps you could point me to the details? Martin (Smith609 – Talk) 12:55, 27 September 2010 (UTC)
It's the bot or its user that's making assumptions about Google's directory/parameter structure and those assumptions don't match what Google is doing now. The URLs I entered into the articles are the ones Google was using at that moment, both before and after the bot ran. The URLs generated by the running of the bot are not what Google is using. Therefore, the speculation is the bot's or bot user's.
I don't understand why anyone or any bot should be changing any URL to any form not preferred by a destination website owner. It generally either will make no difference, and therefore is a waste of editorial time, or will lower the ability of Wikipedia users to access files the URLs represent.
There's a case where I change URLs. One forum produces a URL after a word search that arranges to highlight those words in a topic. When removing the hilite parameters, the resulting URL works just as well at accessing the page and provides a usually-clearer page because we can read the topic without highlights that may be irrelevant to a reader's purpose. But in that case I test the resulting URL to be sure it works before posting it, so there'd be no need for a bot to change it. That same principle, of using the site's preferred form, applies to any website that I know of. I don't know of an exception warranting the edits this bot is doing.
Testing URLs to be sure they work (including Google Books URLs that use what's believed to be an old URL structure), marking those that fail or that succeed only through permanent redirection, and proposing alternatives are appropriate. That's because redirection involves a visitor's browser, since the visitor is (silently) forced to go to the new address, so it's possible to discover the fact of redirection and the fact of its intended permanence.
What I'm doubting is altering a URL that is probably generated recently by the destination website itself and therefore is probably the best URL without the bot's changes.
Thanks. Nick Levinson (talk) 05:03, 28 September 2010 (UTC)
Resolved

Completely wrong

The details are completely wrong for this DOI Template:Cite_doi/10.1007.2F978-3-540-89982-2_59. pgr94 (talk) 11:07, 25 November 2010 (UTC)

Wow, that was complex. The ISBN was for a book of conference proceedings, containing the paper incorrectly cited with {{cite journal}}. That paper is offered at the DOI given, one doesn't have to order the entire book. The JSTOR number seemed to bear no relation to that paper. The journal field and author1 were populated from the unrelated JSTOR lookup's title and author results. The other author fields were correctly populated from the DOI. The series title was missed entirely. I've manually edited it for now. LeadSongDog come howl! 20:13, 8 February 2011 (UTC)
Resolved
- sounds like this could have been avoided by judiciously-used templates. Martin (Smith609 – Talk) 21:03, 12 February 2011 (UTC)

Bizzare edit by bot

http://en.wikipedia.org/w/index.php?title=Template%3ACite_doi%2F10.1021.2Fop9700385&action=historysubmit&diff=398313359&oldid=374677940  Ronhjones  (Talk) 22:46, 25 November 2010 (UTC)

Something similar has happened before; however I'm completely unable to replicate this. If anyone has any ideas, please let me know. Martin (Smith609 – Talk) 22:56, 25 November 2010 (UTC)
Evidently the pasted text was from Alcohol and cancer. Was the bot working on that article when the problem arose?LeadSongDog come howl! 06:54, 26 November 2010 (UTC)

et al

I think the bot should detect if 'et al' is present in situations like this one. Ruslik_Zero 19:17, 6 December 2010 (UTC)

You're right, that's clearly a bug. On finding lastn=et al. or similar, adding values for firstn, for lastn+1, etc should only happen if a substantive value is provided for lastn in lieu of et al. Of course that doesn't mean the additional names must be displayed, but the metadata should be corrected in any case. LeadSongDog come howl! 21:54, 6 December 2010 (UTC)

Diacritic characters in authornames

At this edit the bot seems to have mangled the diacritics in author names.LeadSongDog come howl! 19:59, 9 December 2010 (UTC)

Resolved

Strange bug - the bot's fascination with Magnus Barelegs

For some reason, the bot seems to have a bit of a fascination with using the information about this paper instead of the paper that the doi/pmid actually links to. I've now seen it across multiple articles, for example in Template:Cite pmid/2439888, Template:Cite doi/10.1038.2F174515a0 and there are about 50 other occurences. Any idea what's up? If you can let me know how to fix them relatively speedily then I'll have a go. (you must sometimes wish you ever bothered inventing the bot musn't you?) SmartSE (talk) 23:43, 2 January 2011 (UTC)

Impressive spot! It's rather curious; I don't seem able to replicate the error. I'll scratch my head after dinner... Martin (Smith609 – Talk) 00:06, 3 January 2011 (UTC)
I'm not sure what you did, but it looks to have been fixed. Thanks SmartSE (talk) 22:08, 3 January 2011 (UTC)t
I fixed them manually; couldn't get at the underlying cause, though. I'll add a "die()" function to the bot at some point, to see if I can stop it in its tracks. Martin (Smith609 – Talk) 22:13, 3 January 2011 (UTC)