Commons:Batch uploading/NYPL Maps
- Source to upload from:
- List of warped maps: http://maps.nypl.org/warper/maps?show_warped=1
- Blog post: http://www.openculture.com/2014/03/new-york-public-library-puts-20000-hi-res-maps-online.html
- Did you observe an URL pattern
- Do you know whether the site has an API
- Yes, but there are various limitations so it may be simpler to scrape the metadata from the html catalogue version.
- What else can ease uploading (is the site valid XHTML, WCM they use…)?
- Checking...
- Did you contact the site owner?
- Blog post seems to address what we need in terms of permissions.
- Describe the works to be uploaded in detail (audio files, images by …):
- Tiff high resolution scan of maps. There are KML files for some maps.
- Which license tag(s) should be applied?
- Is there a template that could be used on the file description pages? Do you think a special template should be created?
- Ingestion template {{NYPL map}} created.
Fæ (talk) 08:05, 2 April 2014 (UTC)
Technical detail
[edit]File naming
[edit]The maps have an NYPL imageID in the gallery record, see http://digitalgallery.nypl.org/nypldigital/dgkeysearchdetail.cfm?imageID=1952939. Maps may have multiple "parts" which will need additional sequence numbers.
This suggests the following simple unique name pattern using fields in the gallery record:
<Image Title> NYPL<imageID>.tif
What to upload
[edit]In the first instance only tif files will be uploaded. The jpg files on the site are relatively small and we could probably generate better quality versions without uploading these as variants.
Some maps have KML files available (which pins them to a map representation). I am currently unsure how best to make use of these, though they could be added as a later tranche of uploads, in fact as these are being added by volunteers we would need to be able to go back and re-fresh these from time to time.
Templates
[edit]Considering the metadata, it seems easiest to use a specific ingestion template for this project of 20,000 images.
Opinions
[edit]I think the test image was successful, both the image and metadata look good. However, I'm not sure the use of the {{WMUK equipment}} template is appropriate, as it says that the image "was generated WMUK's equipment", which seems incorrect — surely the image was generated with NYPL equipment. Just to be clear, I'm not opposed to giving a nod to WMUK if they are supporting the transfer of images from the NYPL map repository to Commons, it's just that from the doc at Template:WMUK equipment it looks like it is intended that {{WMUK equipment}} be used for images created with a WMUK camera, not for images transferred to Commons with a WMUK computer. Maybe {{Supported by Wikimedia UK}} would be a better choice? —RP88 21:57, 2 April 2014 (UTC)
- Good point, I'll swap over to using that template. --Fæ (talk) 22:34, 2 April 2014 (UTC)
Innocently searching for maps of my home turf, I have clearly stumbled into a large work that intends to proceed by methods beyond my Wikitechnical ken. So, after creating a geographical subcat and moving a few dozen files by Cat-a-lot (my sophistication is such that this tool seems the cat's meow) I have stopped. My intention is to create a tree within Category:NYPL maps for my territories, on the way to integrating the maps into the appropriate geographical and local chronological trees. If this method threatens to interfere with greater works, I shall undo, otherwise simply pause until those workers give the green light. Jim.henderson (talk) 14:20, 6 May 2014 (UTC)
- I have created a couple of new sub-categories as examples (such as the aerial photographs). As the upload is mostly done, at this point there are around 300 left to upload but I am pausing until WMF Operations give the green light again, there is no issue with you having a go at categorizing in a similar way. --Fæ (talk) 23:37, 12 May 2014 (UTC)
- Splendid. I shall diffuse those that depict my area of interest, slowly. Jim.henderson (talk) 20:19, 14 May 2014 (UTC)
Progress
[edit]Assigned to | Progress | Bot name | Category |
---|---|---|---|
Fæ: Investigate batch upload options, run test. | Status: Done Approximately 600 maps uploaded using a custom script. | - | Category:NYPL maps |
Fæ: Investigate possible use of GWToolset for these uploads.
Bug request in to allow uploads from http://link.nypl.org |
Status: Done | - | - |
Fæ: Complete uploads using GWToolset. For a list of files uploaded using the tool, see catscan2.
A small proportion of the maps are over 50,000,000 pixels in resolution. Files of this pixel size are above the limit for Commons to create thumbnails for. They are being sub-categorized at Category:NYPL maps (over 50 megapixels). A significant proportion of images fail to have their metadata returned from the API. This may be due to a lack of catalogue data or a bug to tease out after the current run finishes. I'm guesstimating 8,000 maps rather than the claimed 20,000 will be suitable for upload.
The previous (alphabetic) index stopped at "S". I think the NYPL website cannot handle a gallery report over a certain number of pages long. I am trying a reverse alphabetic sort, so this tranche goes from "Z" upwards, hopefully hitting "S" again... For this reason I'm now estimating 12,000 files, and a couple more days of uploading, but may still be significantly off. |
Status: paused
95% completed (estimate) |
Special:GWToolset | Category:NYPL maps |
Fæ: Coordinate "housekeeping".
|
Status: In progress |