Page MenuHomePhabricator

Add cookie when blocking anonymous users
Closed, ResolvedPublic5 Estimated Story Points

Description

T5233 has now been resolved, and seems to work great when autoblocking accounts. However it would be nice to have this same cookie block functionality when blocking IPs. For instance, say I am vandalizing while logged out on a mobile device and my IP gets blocked. I walk down the street, and I have a new IP, and am able to resume vandalizing. If a cookie is stored with a reference to the IP block, it could be reloaded. Another use case is blocking residential IPs, where their IPv6 address may be refreshed quickly. Currently, we may do a range block to effectively block all IPs allocated to that end user. However many admins are not comfortable doing this, and it seems a cookie would do the job just as well for your common vandal who isn't particularly tech-savvy.

For this I think the cookie should last the length of an autoblock (24 hours for Wikimedia projects), to minimize collateral damage.


Requirements

  • When the system identifies a person is accessing the wiki from an IP address or range that is currently blocked, a cookie should be dropped on their computer.
  • If a user with the active cookie edits from a different IP address, the should see a block message with the original block ID
  • Block cookies should expire after 24 hours or the length of the actual block, whichever is shorter.
  • T191542: Implement event logging for IP cookie blocks to make sure it's working reasonably
  • Add feature flag for block cookie behavior to disable if needed.
  • When in doubt, emulate T5233

Related Objects

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Yes, those requirements refer to Wikimedia installations. The default stays off. In the next major release (1.32) we can consider turning both on by default for third parties. I agree with Max that it would not make sense to turn on a new feature by default, especially whilst a related less-impactful version of the feature has existed for longer and is not enabled by default (in MediaWiki core; WMF is separate)

So you're saying we should create a new patch to enable this by default at a later date?

I don't think we should risk autoblocks gone wild on new wikis either, or those who upgrade to master.

So you're saying we should create a new patch to enable this by default at a later date?

No, doing so is likely out of scope for this task. Wikimedia's configuration is located in the operations/mediawiki-config.git repository. A typical feature will be developed behind a feature flag. The feature itself is then gradually turned on through wmf-config. Typically first on the Beta Cluster, then on a test wiki in prod via mwdebug verification, and then one deployment group at a time. Typically using the same grouping as for the train, but sometimes using custom groups, based on product manager's discretion.

@Krinkle whoops. I mean to say a new ticket to enable this by default at a later date, not a new patch.

I've created T191922 to keep track of it. Setting $wgCookieSetOnIpBlock default to false for now

Let's deploy as disabled and gradually enable.

Although we're highly confident this won't cause any problems, better safe than sorry.

Note that the description shouldn't be "When a person is accessing" but "When it is attempting to edit" (or equivalent), which is when we check for blocks. Attempting to check blocks on every page view would kill the site ☺

Note that the description shouldn't be "When a person is accessing" but "When it is attempting to edit" (or equivalent), which is when we check for blocks. Attempting to check blocks on every page view would kill the site ☺

We're plugging into existing logic, we're not adding any new logic to detect if an IP user is blocked.

And yes, "do not cause whitepages or the wiki to crash" is an implicit requirement. 😆

@dmaza I'm not sure if this is an issue with this change or not, so if it's not I can open a new ticket for you. This isn't a big deal... because as far as I can tell it would only affect the person who was blocked (rather than someone else who gets blocked after that.

I blocked my own IP address and tried to edit:

Screenshot-2018-4-12 View source for Gotham - Local(2).png (1×2 px, 304 KB)

Then I added this to the top of index.php

$_SERVER['REMOTE_ADDR'] = '1.1.1.1';

and did a hard refresh:

Screenshot-2018-4-12 View source for Gotham - Local.png (1×2 px, 303 KB)

Then I did another hard refresh without making any changes:

Screenshot-2018-4-12 View source for Gotham - Local(1).png (2×2 px, 327 KB)

So it looks like the block reason is displayed before the autoblock is created? Again, probably not a big deal, but I thought I'd document it.

@dbarratt — I think that's the same problem that Dayllan was discussing with me at our offsite. IIRC that's just a problem with autoblocks. Can you test please?

Important update: after a privacy review the WMF Legal has signed-off on our implementation of this feature and has given us the go-ahead to release when we're ready.

@dbarratt — I think that's the same problem that Dayllan was discussing with me at our offsite. IIRC that's just a problem with autoblocks. Can you test please?

Yeah, as far as I can tell, it's a problem with autoblocks in general.

Important update: after a privacy review the WMF Legal has signed-off on our implementation of this feature and has given us the go-ahead to release when we're ready.

Just wanted to say that when you do announce this to the community, make sure to include a link or put on the WMF hat to say that legal has signed off. Lesson learned from last time :)

@dmaza If I'm reading this correctly, by adding a cookie, it automatically tells Varnish not to cache the page:
https://phabricator.wikimedia.org/source/mediawiki/browse/master/includes/OutputPage.php;d888d84e60acf7d5284762d74a42a12615e045c8$2283

So it should be safe to do it on every page, however, if the user gets the cached (non-cookie) version, then the cookie wont be set... if there is no cached version and the user hits origin, the blocked user's version wont be stored in cache (it will wait for the next non-cookie request).

Am I reading this right @kaldari & @MaxSem?

I also created T194396 to document the problem.

@dbarratt There is also a safeguard in varnish that disables caching if Set-Cookie is present. The reason why we decided to move the set cookie into specific pages that are never cached was to avoid side effects if any of this implicit(?) dependencies change.
At this point, considering that editing a page goes through different paths in mobile, I'm running out of ideas on where this logic should be if it is not on User::loadFromSession().

@dmaza what I'm saying is that it looks like we send a no-cache (or private) Cache-Control header when there is a cookie anyways, so even if the varnish config changed, MediaWiki itself would still be instructing Varnish to not cache the request. (if I'm reading this correctly).

@dbarratt yup, that's why I started my comment saying "there is also" :)

@dbarratt yup, that's why I started my comment saying "there is also" :)

Oh I see. Then yeah I don't really see a problem with just adding it to all the requests like you had it before.

If mobile is troublesome here we can split it into a different task and ship desktop cookie blocking first.

Hey everyone! I'm going to throw a monkeywrench into this task at the last minute. I know that is utterly frustrating, but I hope you'll hear me out...

I don't think we should be creating new autoblocks based on this block cookie (requirement #2 in the description). We don't do this for existing block cookies and I think it would be exceeding what the community is expecting here. Autoblocks (i.e. new automatically created 24-hour block records for IP addresses) are currently only created via user blocks, and only when explicitly chosen as an option during the blocking process (they aren't even an option for anon IP blocks). If we start propagating autoblocks from anon IP blocks, that's a significant change in how autoblocks function. I don't think the community will be expecting this and they may object to the idea of autoblocks being propagated without the blocking admin choosing this as an option during the blocking process.

This behavior also increases the risk of the feature (as mentioned previously by Max). Let's say for example, that all of the UK is temporarily routed through 1 IP address and that IP gets blocked (as has actually happened). That means that everyone trying to edit from the UK will get the blocking cookie and then when they are switched back to their normal IP addresses, thousands of new autoblock records will be created. If we only set the cookies, the situation is easily resolved by unblocking the original IP block. If we're also autoblocking, the admins have to manually unblock thousands of autoblocks.

I would like to suggest changing requirement #2 to just: "If a user with the active cookie attempts to edit from any IP address, they are prevented from doing so." This will match the behavior we are already implementing with the existing block cookies. I think this will make things less confusing for the community and less confusing in the code. (This basically means removing the code changes in Block.php.)

For Trevor's edification: The block cookie doesn't need an autoblock to be effective. It functions on its own as a sticky pointer to the original block. In our existing implementation of block cookies, no additional autoblocks are created. (It's a bit confusing though, as the current block cookies are created as part of the autoblocking process.)

I know that I should have brought this up 2 months ago, but I have to admit that I didn't fully think through the implications of the requirements until now. This was a failure on my part and I apologize. Also, I would love to get a sanity check on my comments above from @MusikAnimal and @MaxSem (who are both administrators and have some familiarity with this workflow).

That sounds right... I think?

If a user with the active cookie attempts to edit from any IP address, they are prevented from doing so

How does it do this without imposing a block on the new IP?

There is a variant of a "hard block" for IPs, which is the "Prevent logged-in users from editing from this IP address" option. I don't know if behind the scenes this creates an autoblock on the account (which in turn may autoblock other IPs used), but either way indeed I wouldn't expect this to happen from the cookie alone. I think having the cookie simply serve as a pointer to the original block, as Kaldari says, is the way to go, if we're able to make it work in this way. The vandals that this feature is intended for are the type who wouldn't know how to manually refresh their IP (or else they'd probably know to clear their cookies, too). The main use case to me is blocking residential IPv6 addresses, which are commonly automatically refreshed within a /64 range. Many admins don't know how to get the /64 range that they should block, or are uncomfortable doing so. Instead they play a tedious game of whack-a-mole, or end up protecting pages, shutting out unrelated IPs. This is an increasingly growing problem as IPv6 implementation becomes more widespread, and I think a cookie would alleviate this.

How does it do this without imposing a block on the new IP?

Checkout the relavent code in User.php. User::getBlockedStatus() first looks for an "actual" block or autoblock in the database (line 1802), and then if it doesn't find that it checks for the cookie instead. If it finds the cookie, it then looks to see if there is a valid "actual" block associated with the cookie's block ID. So as long as the original block is still in place (even if it's for a different IP address), the cookie will work.

Checkout the relavent code in User.php. User::getBlockedStatus() first looks for an "actual" block or autoblock in the database (line 1802), and then if it doesn't find that it checks for the cookie instead. If it finds the cookie, it then looks to see if there is a valid "actual" block associated with the cookie's block ID. So as long as the original block is still in place (even if it's for a different IP address), the cookie will work.

Shows how much I know! Does that mean the new pseudo-autoblocked IP won't show up at Special:BlockList? I guess as long as the "you're blocked" message shows the block ID or the original IP, they can properly request an unblock, so we're in good shape. Taking autoblocks out of the equation sounds wonderful because that would effectively reduce the risk of collateral damage to zero, except maybe for shared mobile devices or laptops that change IPs (and even then, only one person is affected at a time).

  • Block cookies should expire after 24 hours or the length of the actual block, whichever is shorter.

Maybe we don't need this either, then?

I'm sorry I didn't think of these things, too!

To elaborate on the 24-hour cookie expiry -- This sounds good except it won't be as effective for extended blocks, which we'll want to do. So if I'm in a residential /64 range, and I vandalize long-term, ideally the admin could block a single IPv6 address for however long and it will work for all addresses in the range (assuming they're using the same device and browser). If we don't impose any autoblocks it seems there's little risk involved in doing this. Right?

To elaborate on the 24-hour cookie expiry...

I'm fine with us extending the expiration time on block cookies, as there isn't a lot of risk in doing this (since they are tied to actual blocks). However, I don't think that should be done as part of this task/patchset.

@TBolliger What was your reasoning for wanting an autoblock T152462#4050805 ?

To elaborate on the 24-hour cookie expiry...

I'm fine with us extending the expiration time on block cookies, as there isn't a lot of risk in doing this (since they are tied to actual blocks). However, I don't think that should be done as part of this task/patchset.

I agree on both things here. If we are gonna move forward with this I think the cookie expiry should match the block length since this is limited to the device anyway.

I also agree that autoblocking IP(s) is unexpected as part of this feature but I'm not very familiar with the Admin workflow (real scenario) when it comes to blocking/unblocking.

In order to move forward I'd say that we do what Kaldari suggested and remove the autoblock "logic" as part of this patch. And if we are gonna change the cookie expiry we can work it out in a different ticket.

Yay or Nay and I'll get to work :)

If we are gonna move forward with this I think the cookie expiry should match the block length since this is limited to the device anyway.

Yes that is what I meant.

Yay or Nay and I'll get to work :)

Yay! 😃

Dayllan and I discussed this in standup, and we are going to strip anon cookie autoblocking from the requirements. He will make a new patch and submit for code review, then will create a new Phab ticket to extend the cookie duration for the length of the block. 🤘

Oh, and each of us owe Dayllan a beer for his efforts. 🍺

Change 425313 abandoned by Dmaza:
Send a cookie with IP/IP-Range blocks when blocking logged-out users

Reason:
Requirements changed, starting up fresh in a new patch

https://gerrit.wikimedia.org/r/425313

Change 433624 had a related patch set uploaded (by Dmaza; owner: Dmaza):
[mediawiki/core@master] Send a cookie with IP/IP-Range blocks when blocking logged-out users

https://gerrit.wikimedia.org/r/433624

Oh, and each of us owe Dayllan a beer for his efforts. 🍺

haha, no need.

Now I wonder if we need some version of https://gerrit.wikimedia.org/r/#/c/427938/. We would obviously not track autoblocks 'cause that's not happening anymore but do we want to track if we are setting the cookie? I think we don't need to. What do you think?

Now I wonder if we need some version of https://gerrit.wikimedia.org/r/#/c/427938/. We would obviously not track autoblocks 'cause that's not happening anymore but do we want to track if we are setting the cookie? I think we don't need to. What do you think?

Good idea. I modified the patch slightly to track the two block cookie events.

We should remove this debug tracking after we've been able to determine the effectiveness of the feature. It's in User::trackBlockWithCookie() for future reference.

@MaxSem: After this new code is deployed via the train, can you help David and Dayllan with the debug tracking? Please teach them how to access and filter the debug logs. For reference, the two debug strings to look for are:

wfDebug( __METHOD__ . ': User is IP blocked, setting cookie to track' );
wfDebug( __METHOD__ . ': User is autoblocked, setting cookie to track' );

Well, for starters wfDebug() is useless as it doesn't go anywhere on most wikis. All output that's actually intended to be monitored in prod should be sent to a specific logging channel via Monolog, or - in the worst case - wfDebugLog().

@MaxSem: What's the simplest way to be able to monitor these two events in production (just temporarily)? I took a look at https://www.mediawiki.org/wiki/Manual:MonologSpi, but it looks extremely complicated. We could use EventLogging, but that also seems like overkill. Is wfDebugLog() really a bad solution for this?

wfDebugLog() calls Monolog internally, however it's always better to inject the logger instead of relying on global state. If you want to call it from existing classes like User, global state is often an unavoidable evil.

@MaxSem: The documentation on logging is a bit of a mess :P Would something like this make sense:

use MediaWiki\Logger\LoggerFactory;
$logger = LoggerFactory::getInstance( __CLASS__ );
$logger->info( __METHOD__ . ': User is IP blocked, setting cookie to track' );

or would it make more sense to have a custom logging channel like

$logger = LoggerFactory::getInstance( 'cookieblocks' );

First of all, what's that logging supposed to achieve? The current message provides no additional information - if all that's needed is to know the number of block propagations, Graphite would work nicely. However, if you need to debug, the message needs block IDs, usernames or whatever you may find useful. Let's set the goal clearly first.

@MaxSem: The main goal is to see how many people are affected by setting a cookie for IP blocks. It would also be good to be able to compare this against how many people are affected by setting a cookie for autoblocks. We need numbers for this feature specifically (for the QCIs) rather than just a general change in the number of blocks. We don't need to collect the data permanently, however.

I don't really see the point on getting "numbers" for this. When enabled, it will create a cookie for all logged-out users on IP blocks without exceptions. How is this information gonna be useful?

If you need the numbers, use Graphite.

@dmaza: We need the numbers to see if it's working as intended and to be able to report how many people it's affecting. Most IP addresses that are blocked will not get the cookie, since most are either preventative proxy blocks or blocks of single-use sock-puppets.

@MaxSem: I'm fine with using Graphite, but how do you download data through Graphite to get the actual numbers? I've only used it to view graphs. Regardless of which solution works best, please work with Dayllan to get some sort of temporary tracking implemented by the end of the week so we can get this patch merged. Thanks!

@dmaza: Please work with Max to figure out the proper tracking solution and get it implemented. If Graphite is the best way to go, it should look something like:

$stats = MediaWikiServices::getInstance()->getStatsdDataFactory();
$stats->increment( 'block.ipblock.setCookie.success' );

I've updated the patch to use statsd and removed the logging for cookies on autoblock, I don't think we need that anymore.

I'm fine with using Graphite, but how do you download data through Graphite to get the actual numbers?

The only way I know is by creating the graph you want and using it's url with format=json|csv. If there is some other way I would like to know too.
Grafana also allows you to export if I remember correctly.

@dmaza, MaxSem: I've merged the patch. Be sure to monitor for problems during both the Tuesday deployment to the test wikis and Thursday's deployment to Wikipedia. That means monitoring the error logs, checking the Graphite data, and keeping an eye on the relevant community discussion boards (on English Wikipedia, that's probably the Administrator's Noticeboard and the Technical Village Pump). Dayllan, if you need any help figuring out how to monitor the error logs during the deployment, please ask Max.

Change 433624 merged by jenkins-bot:
[mediawiki/core@master] Send a cookie with IP/IP-Range blocks when blocking logged-out users

https://gerrit.wikimedia.org/r/433624

@kaldari this is behind a config flag $wgCookieSetOnAutoblock. I'll enable it tomorrow in the morning swat for test Wikipedia. Is there a list of wiki(s) you would rather have this enable first when it gets deploy on Thursday?

@dmaza In hindsight, we should have probably determined if the user had a session and then set the cookie T194396#4498543