Data talk:Memory of the World Register.tab

From Wikimedia Commons, the free media repository
Jump to navigation Jump to search

Means of Collection

This data was collected semi-automatically by manually moving through each page of the register and running the following code on Firefox 64.0.2. It should be made fully automatic now that this method has been tested to work.

/** Load a module using unpkg.com. Returns a promise. */
function loadModule(name) {
  return new Promise((res, rej) => {
    var script = document.createElement('script');
    script.type = 'text/javascript';
    script.src = 'https://unpkg.com/' + name;
    script.onload = () => res(script.src);
    script.onerror = e => rej(e);
    document.head.appendChild(script);
  });
}

/** Copies document info from the current page. Returns array that can be pasted into the Data:.tab page. */
async function main() {
  await loadModule('lodash@4.17');

  return _.chunk($(".content h4, .content .csc-textpic-text>:first-child, .content h4+div a:not(.lightbox)"), 3)
  .map(chunk => {
    return {
      title: chunk[0].textContent.trim(),
      description: chunk[1].textContent.trim(),
      link: chunk[2].href
    };
  })
  .map(c => [c.title, c.description, c.link])
}

// Copy the result and add it to the .tab page.
copy(await main());

To add the "year_accepted" column, the data generated from above was passed through the following code (as `d`). The result then replaced the current data. Note there were 2 manual edits that had to be made to make this work. 1 description had the words "<//span>" appended to it; another was missing the period at the end of the sentence.

copy(d.map(row => [...row, parseInt(row[1].match(/(\d+)\.$/), 10)]))

--Hardwigg (talk) 00:52, 31 January 2019 (UTC)[reply]