Page MenuHomePhabricator

donate.wikimedia.org landing pages don't have descriptions in search engine results
Closed, ResolvedPublic

Description

Three issues I see with donate.wikimedia.org relating to search engine results:

  1. These pages do not have description meta tag
  2. They contain "noindex,nofollow" robots meta tag which should exclude them from search results entirely (but does not)
  3. The robots.txt file for donate.wikimedia.org is disallowing the /w/ directory from being crawled with lines 148-152:
User-agent: *
Allow: /w/api.php?action=mobileview&
Allow: /w/load.php?
Allow: /api/rest_v1/?doc
Disallow: /w/

Perhaps there are reasons why these are set up in this manner, but Google is indexing donate.wikimedia.org very high (see screenshot of a private window search) and the lack of a description meta tag results in an incomplete entry.

Ideally description tags could be unique to each donation page rather than set at a global level.

FireShot Screen Capture #146 - 'donate to wikipedia - Google Search' - www_google_com_search_q=donate+to+wikipedia&rlz=1C1SQJL_enUS819US819&oq=donate+.png (977×1 px, 129 KB)

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
AKanji-WMF subscribed.

Moving back to Triage so we can re-evaluate as a team

Hi @AKanji-WMF and @greg
I'd like to share this SEO audit link we've received from Comms agency as it might be helpful for this task
https://app.surferseo.com/audits/s/LAzOtfSx5me5tj6XybqvUZDRb16K09XW

Confirming that desired timing would be pre-Q2. Upon review by Greg, decided we need to re-triage with the team.

Noting priority: ideally in place before Q2.

Flagging mention of this in the Big English comms plan. Ideally we work on this in Sprint T

@EWilfong_WMF, @Pcoombe

Ideally description tags could be unique to each donation page rather than set at a global level.

How do you distinguish between different 'donation pages'? Since they are all based on Special:LandingPage, we can maybe look at a particular query-string parameter to look up the appropriate string. What parameter would that be? And what's a good default description?

I'll defer to Peter on how to distinguish between different donation pages from a technical perspective, but from strategy perspective, you would want to be able to customize the meta description on different pages based on the intended target audience. For example, a page shared on social media might have a different description than a page that shows up high in Google search results.

For a default description, again, I defer to the fundraising team but suggest that it's language crafted to explain the reason why donating is important. I like this text from https://wikimediafoundation.org/support/:

Wikipedia is a free and open encyclopedia, hosted by the Wikimedia Foundation. The heart and soul of Wikipedia is our global community of over 295,000+ volunteer contributors, billions of readers, and donors like yourself – all united to share unlimited access to reliable information. Your donations keep our knowledge projects like Wikipedia freely available to everyone. Please help us keep Wikipedia growing.

Hi all, jumping in with more detail on this request linked to the work on SEO I am doing with Comms.

We are looking to update the link from a global SEO perspective, to improve the view on search results (see image attached.)

I'll ask Sophie and the Content team to provide this small copy asap, as it needs to be in line with what the user will find in the landing page:
https://donate.wikimedia.org/w/index.php?title=Special:LandingPage&country=us&uselang=en&utm_medium=spontaneous&utm_source=fr-redir&utm_campaign=spontaneous.

Screenshot 2023-10-18 at 09.27.21.png (340×1 px, 50 KB)

Hi @Ejegg,

Here's the copy from Sophie:

Donate to the Wikimedia Foundation, the nonprofit that hosts Wikipedia and other crucial free knowledge projects. Each year, the generosity of the 2% of readers who donate allows us to expand the reach of Wikipedia and its sister projects. Our goal is to ensure knowledge remains freely available for generations to come and for that, we need your support.

OR [if the first option is too long]:

Donate to the Wikimedia Foundation, the nonprofit that hosts Wikipedia and other crucial free knowledge projects. Each year, because of the 2% of readers who give to support our mission, we can expand the reach of free knowledge.

Please ping me once the titles are updated.

Thanks,

OK @MSuijkerbuijk_WMF, I'll add that text in the description meta tag to Special:LandingPage (I think the first one will fit). I'll put it in an i18n file, so it will go up on translatewiki.net and translations will filter in, but all languages will fall back to English initially.

If we want to use different variations of the description text for different contents of the LandingPage, we can add that logic later.

Change 968265 had a related patch set uploaded (by Ejegg; author: Ejegg):

[mediawiki/extensions/FundraiserLandingPage@master] Add description meta tag to Special:LandingPage

https://gerrit.wikimedia.org/r/968265

Noting that there are two other blockers for the meta descriptions showing up in search engine results. #2 and #3 in the original task description:

  1. They contain "noindex,nofollow" robots meta tag which should exclude them from search results entirely (but does not)
  2. The robots.txt file for donate.wikimedia.org is disallowing the /w/ directory from being crawled...

Should these be addressed in this task or broken out into separate ones?

Change 968265 merged by jenkins-bot:

[mediawiki/extensions/FundraiserLandingPage@master] Add description meta tag to Special:LandingPage

https://gerrit.wikimedia.org/r/968265

The code change has been approved and is ready to go out with the main cluster deploy train next week.

Hi @Ejegg would you mind pinging me once this is deployed so I can check if the searches show the description?
Thanks!

Hi @MSuijkerbuijk_WMF , donatewiki is in the group that gets updated Wednesdays. The latest batch of updates (including this one) go out to a few wikis today, and as long as no problems show up they go to a larger group (including donatewiki) tomorrow, then to the really big ones like English Wikipedia on Thursday.

Tomorrow's deployment timeslot is 18:00–20:00 UTC / 11:00–13:00 PDT / 14:00–16:00 EDT. I see a comment above showing what version number this update has been tagged with:

ReleaseTaggerBot added a project: MW-1.42-notes (1.42.0-wmf.3; 2023-10-31).

So I'll look for that 1.42.0-wmf.3 to show up on https://donate.wikimedia.org/wiki/Special:Version and confirm here once I see that it's up.

Noting that the change is deployed and the new description meta tag is present on landing pages. However the descriptions still aren't showing in Google (https://search.google.com/test/rich-results/result?id=19KQ6zMWoXJuyfGjdb3Kvw) because of the issues @EWilfong_WMF flagged above.

Ah, ok, I was a bit confused about the impact of those changes. I thought the conclusion was that googlebot was ignoring those directives completely.

Change 971277 had a related patch set uploaded (by Ejegg; author: Ejegg):

[mediawiki/extensions/FundraiserLandingPage@master] Fix robots meta tag to allow indexing

https://gerrit.wikimedia.org/r/971277

Change 971281 had a related patch set uploaded (by Ejegg; author: Ejegg):

[operations/mediawiki-config@master] Allow crawling FundraiserLandingPage in robots.txt

https://gerrit.wikimedia.org/r/971281

Change 971277 merged by jenkins-bot:

[mediawiki/extensions/FundraiserLandingPage@master] Fix robots meta tag to allow indexing

https://gerrit.wikimedia.org/r/971277

Looks like the robots meta tag is live. Checking in to see when the robots.txt change will be pushed live. Once that's out, Google should be able to read the description meta tag and show that in the search results.

Thanks @EWilfong_WMF
Please team, could you update us once this change is live?

Turns out the robots.txt was actually editable on-wiki! That was the last of the changes to make. The <meta> tags for description and indexing have been deployed for a little while.

@MSuijkerbuijk_WMF the new descriptions should hopefully show up in search results soon.

Change 971281 abandoned by Ejegg:

[operations/mediawiki-config@master] Allow crawling FundraiserLandingPage in robots.txt

Reason:

Implemented on-wiki as per @hashar's comment

https://gerrit.wikimedia.org/r/971281

Huh, I had no idea that was editable on wiki. Thanks @Ejegg !

Confirming that I see the change to robots.txt and that Google is indeed able to crawl the page noted above: https://search.google.com/test/rich-results/result?id=7JM3WeWxIU0EBukJwGLgKw

However, one potential complication is that Google is indexing this URL in it's results: https://donate.wikimedia.org/?uselang=en. And that page is still blocked from Google by robots.txt: https://search.google.com/test/rich-results/result?id=psUMei-EYA6e6ZsKnFe1eA. That page is just a redirect to the link above, so maybe it's okay and Google will still pick up the description? You might just have to wait and see once the page is crawled by Google.

If someone has Search Console access, you can request the https://donate.wikimedia.org/?uselang=en page to be crawled to hopefully speed the process up: https://developers.google.com/search/docs/crawling-indexing/ask-google-to-recrawl

Thanks @EWilfong_WMF , I've added that shorter URL too. Sounds like @Pcoombe has the search console access, per T331043.

Weird, I see https://donate.wikimedia.org/?uselang=en allowed at the bottom of https://donate.wikimedia.org/robots.txt but the Search Console still suggests it's blocked.

Screenshot 2023-11-29 at 12.45.34.png (784×945 px, 87 KB)

If I click "Request indexing" to try and manually index it then it also shows as blocked

Screenshot 2023-11-29 at 12.46.02.png (827×945 px, 87 KB)

I noticed that as well, @Pcoombe, when I was using the search test from Google. Looking at the robots.txt syntax, I don't see why it isn't working. I don't see anything that would indicate the root of the site would be disallowed from Googlebot accessing it. I do think Google is ignoring the query strings, but not sure if that's having any effect here.

Considering Google's syntax documentation, perhaps the allow line for the root should be edited to look like this:

Allow: /$

I'm not sure that will resolve the issue, but I also don't have another idea at the moment.

I guess it's worth a try @EWilfong_WMF ! I've just added that line.

Same result in the search console when I tried just now :(

@Pcoombe I just tried adding

Allow: /w/index.php

to the file.

Want to hit the recrawl button when you have a minute?

Change 988708 had a related patch set uploaded (by Ejegg; author: Ejegg):

[mediawiki/extensions/FundraiserLandingPage@master] Allow index and follow on Special:FundraiserRedirector

https://gerrit.wikimedia.org/r/988708

Change 988708 abandoned by Ejegg:

[mediawiki/extensions/FundraiserLandingPage@master] Allow index and follow on Special:FundraiserRedirector

Reason:

Whoops, those are just reflected in the <meta> tags, which are never rendered in this redirect-only page

https://gerrit.wikimedia.org/r/988708

@Pcoombe I notice the first redirect from bare donate.wikimedia.org is Special:FundraiserRedirector, so I just added

Allow: /wiki/Special:FundraiserRedirector

Maybe the 42nd time is the charm?

Turns out the robots.txt was actually editable on-wiki! That was the last of the changes to make. The <meta> tags for description and indexing have been deployed for a little while.

I have rediscovered robots.txt is editable while reviewing Ejegg's patch which was adding an entry to the mediawiki-config robots.txt. My review:

/robots.txt is indeed shared and it is more or less obsolete or at least a remnant of the past.

It is not in the document root (which is at /w) indicating the file is never served directly. Instead we have /w/robots.php and reading that files it supports retrieving the content of Mediawiki:robots.txt to serve it.

Thus I guess you can edit https://donate.wikimedia.org/wiki/MediaWiki:Robots.txt which currently has the default message:

# Lines here will be added to the global robots.txt

The content of the article is appended after the content of the mediawiki-config /robots.txt

Example: https://meta.wikimedia.org/wiki/MediaWiki:Robots.txt

And the resulting robots:

$ curl -fs https://meta.wikimedia.org/robots.txt|grep meta.wik
# robots.txt for http://meta.wikimedia.org/ #
# Edit at http://meta.wikimedia.org/w/index.php?title=MediaWiki:Robots.txt&action=edit

And for sure that worked!

It works! Confirmed in Google Search Console that https://donate.wikimedia.org/?uselang=en is now indexed, and if I google "donate wikipedia"

Screenshot 2024-01-10 at 14.17.34.png (288×1 px, 64 KB)

\o/