Commons:Batch uploading/NYPL Maps

From Wikimedia Commons, the free media repository
Jump to navigation Jump to search
  • Describe the works to be uploaded in detail (audio files, images by …):
    • Tiff high resolution scan of maps. There are KML files for some maps.
  • Which license tag(s) should be applied?
  • Is there a template that could be used on the file description pages? Do you think a special template should be created?

(talk) 08:05, 2 April 2014 (UTC)[reply]

Technical detail

[edit]
File naming
[edit]

The maps have an NYPL imageID in the gallery record, see http://digitalgallery.nypl.org/nypldigital/dgkeysearchdetail.cfm?imageID=1952939. Maps may have multiple "parts" which will need additional sequence numbers.

This suggests the following simple unique name pattern using fields in the gallery record:

<Image Title> NYPL<imageID>.tif
What to upload
[edit]

In the first instance only tif files will be uploaded. The jpg files on the site are relatively small and we could probably generate better quality versions without uploading these as variants.

Some maps have KML files available (which pins them to a map representation). I am currently unsure how best to make use of these, though they could be added as a later tranche of uploads, in fact as these are being added by volunteers we would need to be able to go back and re-fresh these from time to time.

Templates
[edit]

Considering the metadata, it seems easiest to use a specific ingestion template for this project of 20,000 images.

{{NYPL map}}

Opinions

[edit]

I think the test image was successful, both the image and metadata look good. However, I'm not sure the use of the {{WMUK equipment}} template is appropriate, as it says that the image "was generated WMUK's equipment", which seems incorrect — surely the image was generated with NYPL equipment. Just to be clear, I'm not opposed to giving a nod to WMUK if they are supporting the transfer of images from the NYPL map repository to Commons, it's just that from the doc at Template:WMUK equipment it looks like it is intended that {{WMUK equipment}} be used for images created with a WMUK camera, not for images transferred to Commons with a WMUK computer. Maybe {{Supported by Wikimedia UK}} would be a better choice? —RP88 21:57, 2 April 2014 (UTC)[reply]

Good point, I'll swap over to using that template. -- (talk) 22:34, 2 April 2014 (UTC)[reply]

Innocently searching for maps of my home turf, I have clearly stumbled into a large work that intends to proceed by methods beyond my Wikitechnical ken. So, after creating a geographical subcat and moving a few dozen files by Cat-a-lot (my sophistication is such that this tool seems the cat's meow) I have stopped. My intention is to create a tree within Category:NYPL maps for my territories, on the way to integrating the maps into the appropriate geographical and local chronological trees. If this method threatens to interfere with greater works, I shall undo, otherwise simply pause until those workers give the green light. Jim.henderson (talk) 14:20, 6 May 2014 (UTC)[reply]

I have created a couple of new sub-categories as examples (such as the aerial photographs). As the upload is mostly done, at this point there are around 300 left to upload but I am pausing until WMF Operations give the green light again, there is no issue with you having a go at categorizing in a similar way. -- (talk) 23:37, 12 May 2014 (UTC)[reply]
Splendid. I shall diffuse those that depict my area of interest, slowly. Jim.henderson (talk) 20:19, 14 May 2014 (UTC)[reply]

Progress

[edit]
Assigned to Progress Bot name Category
Fæ: Investigate batch upload options, run test. Status:    Done Approximately 600 maps uploaded using a custom script. - Category:NYPL maps
Fæ: Investigate possible use of GWToolset for these uploads.

Bug request in to allow uploads from http://link.nypl.org

Status:    Done - -
Fæ: Complete uploads using GWToolset. For a list of files uploaded using the tool, see catscan2.

A small proportion of the maps are over 50,000,000 pixels in resolution. Files of this pixel size are above the limit for Commons to create thumbnails for. They are being sub-categorized at Category:NYPL maps (over 50 megapixels).

Warning WMF Operations have asked for a pause again until we agree how the strain on the servers for the large images involved can be reduced. This project is establishing the limits of what Commons can currently handle...

A significant proportion of images fail to have their metadata returned from the API. This may be due to a lack of catalogue data or a bug to tease out after the current run finishes. I'm guesstimating 8,000 maps rather than the claimed 20,000 will be suitable for upload.


"Tranche 4"

The previous (alphabetic) index stopped at "S". I think the NYPL website cannot handle a gallery report over a certain number of pages long. I am trying a reverse alphabetic sort, so this tranche goes from "Z" upwards, hopefully hitting "S" again... For this reason I'm now estimating 12,000 files, and a couple more days of uploading, but may still be significantly off.

Status:    paused

95% completed (estimate)

   

Special:GWToolset Category:NYPL maps
Fæ: Coordinate "housekeeping".
Duplicates
Refer to email. The GWT does not avoid uploading digitally identical duplicates. It should be possible to report on these and add a duplicate check category by bot.
Status:    In progress