Topic on User talk:BrokenSegue

Jump to navigation Jump to search
Mokadoshi (talkcontribs)

Following up from the Github issue where I said I wanted to discuss the frequency of updates.

I've read Wikidata:Requests for comment/Frequency of YouTube follower count data and I understand the consensus is that we should only update subscriber numbers once per year, or whenever it differs by 10% or more. I'm perfectly fine with this, I agree with the consensus. What I would like to better understand though is how we will deal with manual updates. On enwiki, IP users and other users update these numbers extremely often. I got emailed yesterday about a change that differed by about 0.25%. If these users can't directly update the enwiki article anymore so they're coming here to do this, is that going to cause problems?

I wonder if there is a compromise. For example, what if we stored 2 different values, one that tracks the value over time and one that is simply overwritten with the latest value. Then the latter we can update as often as we want, and this is the value we display on the enwiki article. Or am I misunderstanding the motivations behind the frequency we use today? Mokadoshi (talk) 22:58, 20 February 2024 (UTC)

BrokenSegue (talkcontribs)

Nothing stops users from updating the counts more frequently than the bot. The agreement in the RfC only restricts what BorkedBot does. I don't think the updates on certain popular youtubers happening more often by random users is a problem. I'm more worried that people inexperienced with Wikidata will not know how to update this information (it requires a few steps that aren't obvious to newbies). That's why I think someone (I would love wikimedia germany to do this) should make a "friendly" UI geared to making specific kinds of edits (e.g. add a new subscription count) in a simpler way.

The "2 value" idea you suggest was previously discussed. Wikidata generally has a bias against deleting old (but true) values. But there isn't much precedent here for the kinds of time series data we're considering. But I'm not sure the idea fully resolves the problem. We want to limit the number of values stored but also not crowd the history too much. I'm wondering what update schedule you would want? Weekly? Every 5%? One thing we could consider is having a list of "prominent" youtubers whose values we want to update more frequently.

A bot to compare ourselves to is User:Github-wiki-bot (whose output is also being used in enwiki) which has made hundreds of edits to e.g. webpack (Q22909730) and does so even for items without wiki articles to consume the info.

Mokadoshi (talkcontribs)

I'm personally perfectly fine with the current implementation of: 10%/60 days, checked weekly. It's just that I know this will be too slow for some editors, I've already seen talks that some people would like these to be updated no slower than once per month. Also, the number has to be updated quickly after reaching a major milestone, as an announcement video or tweet will probably cause people to check the article.

So, what do you think about this proposal?

First, we keep the 10%/60 day threshold with the weekly checks, but we add in a daily cron to check the channels that are close to a new milestone. The RFC already allows us to do an out-of-band update for crossing milestones. This fixes the problem of people updating to match an announcement, but doesn't fix the problem of data generally being around 60 days old.

Second, I like your idea of the "prominent" youtubers. I don't know yet who this list would be, I think we can play it by ear. These could be updated each month or so, assuming we are still covering the milestones as said above. I'm guessing we'd need to do another RFP for this, but we can cross that bridge once we get to it.

BrokenSegue (talkcontribs)

Yeah I mean we could do another RfC for this but the github wiki bot had no such process and it spams way more than BorkedBot. I'm just trying to be a good citizen. We could also go the "ask forgiveness" approach.

What counts as a milestone? I presume 1 million. Is 200k a milestone? Just the levels youtube hands out awards?

As for identifying "prominent youtubers". We could just make a wiki page somewhere on enwiki and let people manually list youtubers to get more frequent updates? It's akin to having the "update now" website but more async and documented.

Once per month seems a bit much to me personally. Many youtubers are stagnant for years. If we want this then we would have to do something like the "2 value" idea you mentioned.

Mokadoshi (talkcontribs)

My goal really is just to get ahead of editors feeling like they need to make manual edits, because that is where we run into problems like: how do we teach them how to do this? do we need to make tooling to make it easier for them? etc. It is my opinion that a major way we can get ahead of this is to be proactive in updating for crossing milestones (defined in the RFC as 100k, 1M, 10M, etc). So my suggestion is that if someone is at 99k or similar, we should be checking each day to be the first to update the value when they break 100k so another editor doesn't have the opportunity to fuss with manual edits to do it themselves.

Re: the "prominent youtubers" thing, I think that's a good idea. I think we should implement code to support this but leave it to the WikiProject to deal with figuring out who should be in that list.

Re: once per month, I understand your concern. I know some people expressed interest in once per month, so it would be a shame to put in work to get this adopted just to have people vote against it because of this problem. But maybe we should just cross that bridge when we get to it: if this is really a critical problem for people then it wouldn't be hard to change the code later.

So to summarize, if you agree about the daily checking for milestone crossings, then we can talk on Github about implementing this. The "prominent youtubers" thing I think we should also write an implementation for, but this doesn't need to be done now.

BrokenSegue (talkcontribs)

to implement milestones we just need to run the whole script daily. while it would be optimal to do the "check what's near" logic I think it's just easier to recheck them all daily. I think it only takes 1h to run anyways and compute is free (to me). I can do that.

The prominent yt thing requires new work but isn't hard. We could also use a hidden wikipedia category for this logic? Or maybe a template param? I'm indifferent.

Yeah I think optimally we would have a custom UI where it asks you for the new sub count and it writes to wikidata for you. In a perfect world this would be reusable for lots of cases where updating wikidata by inexperienced users would be wanted. Realistically that isn't going to happen though (unless you want a project to do?) and might even require collaboration with WMF-Germany (though I think they would be game, I hear they are interested in this direction maybe). Otherwise maybe we could just record a video explaining how to do it?

Reply to "BorkedBot - YT update frequency"