Magnus Manske

About this board

Edit description

Previous discussion was archived at User talk:Magnus Manske/Archive 9 on 2024-01-01.

Start a new topic

catalog/3534

8 comments • 10:29, 21 November 2024 2 days ago

8

Gerwoman (talkcontribs)

Could you please scrape the dates?

Reply 09:56, 20 November 2024 3 days ago

Epìdosis (talkcontribs)

We also need dates on https://mix-n-match.toolforge.org/#/catalog/6530 and https://mix-n-match.toolforge.org/#/catalog/6531. Thanks!

Reply 09:23, 21 November 2024 2 days ago

Magnus Manske (talkcontribs)

6530 and 6531 have dates where available, I believe. Can you find me an example of 6530 that has dates in the source but not on MnM?

Reply 09:52, 21 November 2024 2 days ago

Epìdosis (talkcontribs)

I have no examples, in fact; I see dates everywhere in the entries. But if dates are everywhere in the entries, I don't understand why in the "Jobs" I don't see the options to match on name-date, on birthdate and on deathdate, that should appear since the dates are present in the entries.

Reply 09:56, 21 November 2024 2 days ago

Magnus Manske (talkcontribs)

3534 done

Reply 10:19, 21 November 2024 2 days ago

Magnus Manske (talkcontribs)

6530 and 6531 had their dates imported rather than scraped from the description, so the "has_person_date" flag was not set for the catalog. It is set now, and the date jobs should be available.

Reply 10:22, 21 November 2024 2 days ago

Epìdosis (talkcontribs)

Thanks! I was thinking if the possibility of setting the flag "has_person_date" could be made available in catalog_editor (e.g. https://mix-n-match.toolforge.org/#/catalog_editor/6530) so that MnM admins can set this flag in case of dates imported without disturbing you.

Reply Edited 10:26, 21 November 2024 2 days ago

Epìdosis (talkcontribs)

Another small request: I'm often using the https://mix-n-match.toolforge.org/#/issues/ to find wrong matches in the catalogs, or duplicates, and it is very useful; it is unclear to me when and how MnM analyses a catalog to find potential issues, and if it does it only one for each catalog (when it is imported) or it does it for each catalog periodically; would it be possible to add a job for each catalog that allows to trigger a new check of potential issues? I think it would be very useful!

Reply 10:29, 21 November 2024 2 days ago

Reply to "catalog/3534"

6527 catalog

2 comments • 09:23, 21 November 2024 2 days ago

2

Gerwoman (talkcontribs)

Could you extract the dates from the description? Thanks

12:38, 15 November 2024 8 days ago

Magnus Manske (talkcontribs)

Done.

12:08, 18 November 2024 5 days ago

Cradle not authenticating

One comment • 18:25, 12 November 2024 11 days ago

1

Fuzheado (talkcontribs)

Hi Magnus, could you take a look at this and see if there is an easy fix for Cradle? Thanks!

https://github.com/magnusmanske/cradle/issues/15

Reply 18:25, 12 November 2024 11 days ago

Reply to "Cradle not authenticating"

P13124 and 1120 catalog

7 comments • 12:38, 11 November 2024 12 days ago

7

Gerwoman (talkcontribs)

Could you please asign property to catalog?

Thank you

Reply 14:57, 4 November 2024 19 days ago

Magnus Manske (talkcontribs)

Are you sure about this? The BioMed Central journal ID (P13124) identifiers appear to be alphanumeric, wheres the catalog IDs are numeric.

Reply 15:03, 4 November 2024 19 days ago

Gerwoman (talkcontribs)

Good point. We should change the catalog or the property.

Reply 09:34, 9 November 2024 14 days ago

Magnus Manske (talkcontribs)

I made a new catalog with a new scrape, linked it to the property, synced from Wikidata, and set the remaining ones from the old catalog where available, preserving user and timestamp.

Not sure why it's only 300 and not 328 though.

Reply 11:09, 11 November 2024 12 days ago

Magnus Manske (talkcontribs)

this has only 300 entries

Reply 11:11, 11 November 2024 12 days ago

Gerwoman (talkcontribs)

Thank you. Something changed since 2018...

Reply 12:22, 11 November 2024 12 days ago

Gerwoman (talkcontribs)

For example: World Allergy Organization Journal is no longer published by BMC. The journal is continuing in cooperation with a new publisher, Elsevier.

Reply 12:38, 11 November 2024 12 days ago

Reply to "P13124 and 1120 catalog"

Remove autoscrape from catalogues

12 comments • 12:29, 4 November 2024 19 days ago

12

Solidest (talkcontribs)

Hi, could you please remove autoscrape settings from these catalogues? I've been monitoring them for a long time and they stopped having autoscrape working a long time ago, but people are constantly restarting autoscrape job which hangs up in the queue for quite a long without any results.

https://mix-n-match.toolforge.org/#/catalog/1 odnb (at one point it gave wrong id's which had to be manually marked as not applicable, but now it has stopped creating anything, although new id's appear monthly on the site)
https://mix-n-match.toolforge.org/#/catalog/1011 WNS number (stucks in the queue for a few days and gives nothing for a couple of years)
https://mix-n-match.toolforge.org/#/catalog/1486 mb artists (musicbrainz blocks request with non-browser UserAgent - so mix-n-match now always sees white page at every request)
https://mix-n-match.toolforge.org/#/catalog/3476 mb genre (same as above)
https://mix-n-match.toolforge.org/#/catalog/2043 letterboxd (gives no results for quite some time, although new ids appear regularly)
https://mix-n-match.toolforge.org/#/catalog/3658 pitchfork (the scrapped page has been heavily redesigned and it no longer working)

Reply Edited 11:42, 15 January 2024 10 months ago

Solidest (talkcontribs)

There is also still the problem I mentioned in this post. I've launched through api autoscrape start for a set of catalogues. And the ones where autoscrape doesn't exist hang in the queue with a constant restarts via schedule. The catalogues are:

https://mix-n-match.toolforge.org/#/jobs/5923 Deezer music genre
https://mix-n-match.toolforge.org/#/jobs/5161 Napster music genre [archived]
https://mix-n-match.toolforge.org/#/jobs/4905 MasterClass music genre guide
https://mix-n-match.toolforge.org/#/jobs/4939 Free Music Archive music genre
https://mix-n-match.toolforge.org/#/jobs/5000 PromoDJ music genre

This probably needs handling with a response and stopping the task if autoscrape is requested when it can't be executed.

(I also tried running a "pause" task in https://mix-n-match.toolforge.org/#/jobs/5923 , which is also saved in the tasks. This probably needs handling too to avoid potential vandalism)

Reply 11:46, 15 January 2024 10 months ago

Magnus Manske (talkcontribs)

Done, and thanks for the list. I made a new job status "BLOCKED" that can not be started from the web interface.

Reply 11:55, 15 January 2024 10 months ago

Magnus Manske (talkcontribs)

I also filter for existing job action types now

Reply Edited 12:00, 15 January 2024 10 months ago

Solidest (talkcontribs)

Thanks for the quick resolution. Btw I also saw PAUSE status in the queue a few days ago. I think it would be useful to have it in the web interface (or ABORT button) to stop at least your own tasks to avoid misclicks or unnecessary schedules as in the case of music genres above.

Reply 12:05, 15 January 2024 10 months ago

Solidest (talkcontribs)

Hi. Could you please also block autoscrape on these catalogs. Most of them have either changed their layout and no longer work, or are stuck permanently with no new results for over a year.

UPD: Oops, it appears I've reported them all above. They've probably been unblocked and are back in the queue?

Reply Edited 12:29, 4 November 2024 19 days ago

Solidest (talkcontribs)

And by the way, is it possible to remove the "automatch by search" task from the regular repetition here https://mix-n-match.toolforge.org/#/catalog/3789 and https://mix-n-match.toolforge.org/#/jobs/5195 ? I hit "purge automatches" every quarter, as without types syncing is more harmful due to clogging than helpful. It would probably be useful to have a button to remove regular tasks from the schedule in such cases (at least for those who have admin rights).

Reply Edited 06:40, 2 November 2024 21 days ago

Gerwoman (talkcontribs)

Hi, I can't see any problem with the https://mix-n-match.toolforge.org/#/catalog/1011 WNS number . The URL and IDs seems to be the same. The web site is updated periodically, last time 9 October 2024. The web has now 116 680 postage stamps registered but in the catalog there are only 90 241. What can be the problem? Thank you.

Reply 08:29, 2 November 2024 21 days ago

Solidest (talkcontribs)

Hi, Gerwoman. Autoscrape doesn't work in this catalogue. Given that the catalogue is relatively small, autoscrapping should take no more than half an hour, by my experience. But in this catalogue autoscrape regularly falls into the queue and hangs there for several days/weeks without adding anything new for years. This can be verified in this way: The latest mix-n-match ID in this catalogue is https://mix-n-match.toolforge.org/#/entry/91669282. The latest mix-n-match IDs in recent catalogues have 172699175 ID. # 91m vs # 172m means that none of the autoscrape jobs for 2-3 years has not added a single new ID in this catalog. Thus, it does not work and constantly hangs in the queue.

If you know that IDs can still be autocollected from the site, you just need to reconfigure autoscrape from scratch by specifying 1011 in Catalog ID here https://mix-n-match.toolforge.org/#/scraper/new and configuring everything as you did the first time. If you can't reconfigure it again, it's better to just disable the autoscraper in it.

Reply Edited 08:54, 2 November 2024 21 days ago

Gerwoman (talkcontribs)

Thank you Solidest, but I can't remember how configured it the first time. I don't have access to the URL or regex...

Reply Edited 09:42, 2 November 2024 21 days ago

Gerwoman (talkcontribs)

I've reconfigured the autoscrape. Let's see.

Reply 17:25, 2 November 2024 21 days ago

Solidest (talkcontribs)

Yeah, it worked, the catalogue entries are now at 116k. Thanks! I've crossed that catalogue off the list.

Reply 12:29, 4 November 2024 19 days ago

Reply to "Remove autoscrape from catalogues"

Insertion of VIAF and GND redirects 2024-10-28/29

One comment • 21:27, 31 October 2024 23 days ago

1

Zghbv (talkcontribs)

Reply Edited 21:26, 31 October 2024 23 days ago

Reply to "Insertion of VIAF and GND redirects 2024-10-28/29"

Automated duplicate creation via catalogs - e.g. 2018-08-01 Q55862034 François Perrault

One comment • 16:43, 31 October 2024 23 days ago

1

Zghbv (talkcontribs)

https://www.wikidata.org/w/index.php?title=Q55862034&action=history

The users that made the duplicate visible via DDB/GND IDs:

19:48, 18 October 2024 User:KababyZMinsem 22,455 bytes +352 ‎Created claim: DDB person (GND) ID (P13049): 104207116, batch #238665 undothank Tag: quickstatements [2.0] (restore)
08:34, 23 October 2024 User:Lorenz Karsten 22,803 bytes +348 ‎Created claim: GND ID (P227): 104207116, batch #238900 undothank Tag: quickstatements [2.0] (restore)

have been blocked.

Reply 16:43, 31 October 2024 23 days ago

Reply to "Automated duplicate creation via catalogs - e.g. 2018-08-01 Q55862034 François Perrault"

Magister via catalog into label

One comment • 15:06, 31 October 2024 23 days ago

1

Zghbv (talkcontribs)

https://www.wikidata.org/w/index.php?title=Q113446597&action=history

Magister should probably not be in the name field, cf. GND - not doing it that way

Reply 15:06, 31 October 2024 23 days ago

Reply to "Magister via catalog into label"

Automated duplicate creation via catalogs - e.g. 2021-10-06 Q108811951 Hans Hermann Walter Seestern-Pauly

One comment • 10:08, 31 October 2024 23 days ago

1

Zghbv (talkcontribs)

https://www.wikidata.org/w/index.php?title=Q108811951&action=history

The users that made the duplicate visible via DDB/GND IDs:

16:05, 27 October 2024 User:KababyZMinsem 6,952 bytes +355 ‎Created claim: DDB person (GND) ID (P13049): 1031582746, batch #239034 undothank Tag: quickstatements [2.0] (restore)
01:42, 29 October 2024 User:Daubpushyd 7,303 bytes +351 ‎Created claim: GND ID (P227): 1031582746, #quickstatements; #temporary_batch_1730166071073 undothank Tag: quickstatements [2.0] (restore)

have been blocked.

Reply Edited 10:08, 31 October 2024 23 days ago

Reply to "Automated duplicate creation via catalogs - e.g. 2021-10-06 Q108811951 Hans Hermann Walter Seestern-Pauly"

Automated duplicate creation via catalogs - e.g. 2023-06-27 Q119999189 Wilhelm Kast

One comment • 10:07, 31 October 2024 23 days ago

1

Zghbv (talkcontribs)

https://www.wikidata.org/w/index.php?title=Q119999189&action=history

The users that made the duplicate visible via DDB/GND IDs:

18:48, 23 October 2024 User:KababyZMinsem 5,868 bytes +355 ‎Created claim: DDB person (GND) ID (P13049): 1012276988, batch #238911 undothank Tag: quickstatements [2.0] (restore)
06:29, 24 October 2024 User:Lorenz Karsten 6,219 bytes +351 ‎Created claim: GND ID (P227): 1012276988, batch #238929 undothank Tag: quickstatements [2.0] (restore)

have been blocked.

Reply Edited 09:52, 31 October 2024 23 days ago

Reply to "Automated duplicate creation via catalogs - e.g. 2023-06-27 Q119999189 Wilhelm Kast"