https://phabricator.wikimedia.org/T273741 Page MenuHomePhabricator * [ ]SearchConfigure Global Search Log In Create Task Maniphest T273741 Investigate unusual media traffic pattern for AsterNovi-belgii-flower-1mb.jpg on Commons Open, MediumPublic Actions * Edit Task * Edit Related Tasks... * Create Subtask * Edit Parent Tasks * Edit Subtasks * Merge Duplicates In * Close As Duplicate * Edit Related Objects... * Edit Commits * Edit Mocks * Edit Revisions * Subscribe * Mute Notifications * Protect as security issue * Award Token * Flag For Later Assigned To None Authored By Joe Wed, Feb 3, 11:21 AM Tags * SRE (Backlog) * Traffic (Triage) * Commons (Incoming) * Patch-For-Review Subscribers aaroncarson0 Addshore Aklapper Amorymeltzer AMuigai AntiCompositeNumber Asartea View All 69 Subscribers Tokens "Barnstar" token, awarded by jijiki."Barnstar" token, awarded by mmodell."Cookie" token, awarded by Ladsgroup."Y So Serious" token, awarded by Prtksxna."Meh!" token, awarded by KartikMistry."The World Burns" token, awarded by Amire80."Cup of Joe" token, awarded by Elitre. Description Please avoid adding drive-by comments such as "hello from Hacker News" to this task as they are not helpful. Thank you. We've noticed today that we get about 90M requests per day from various ISPs in India, all with the same characteristics: URL: https://upload.wikimedia.org/wikipedia/commons/thumb/1/16/ AsterNovi-belgii-flower-1mb.jpg/ 1280px-AsterNovi-belgii-flower-1mb.jpg Referer: "-" User-Agent: "-" These are very strange, as they come from wildly different IPs, follow a daily traffic pattern, so we are hypothesising there is some mobile app predominantly used in india that hotlinks the above image for e.g. a splash screen. We need to investigate this further as this kind of requests consitutes about 20% of all requests we get in EQSIN for media. Details Project Branch Lines Subject +/- operations/ production +40 -0 upload-frontend: ban a specific url puppet with no referer nor UA Customize query in gerrit Related Objects Search... * Task Graph * Mentions Status Subtype Assigned Task Restricted Task T273741 Investigate unusual media traffic Open None pattern for AsterNovi-belgii-flower-1mb.jpg on Commons Mentioned In T274228: Phabricator should cache tasks for a few minutes for logged-out users Event Timeline There are a very large number of changes, so older changes are hidden. Show Older Changes jcrespo added a parent task: Restricted Task.Sat, Feb 6, 2:27 PM Peachey88 added a subscriber: Peachey88.Sat, Feb 6, 2:31 PM Addshore added a subscriber: Addshore.Sat, Feb 6, 2:53 PM Milimetric added a subscriber: Milimetric.Mon, Feb 8, 5:28 PM Comment Actions +1 to @Gilles's idea. Reverse image searches don't yield anything obvious. mforns added a subscriber: mforns.Mon, Feb 8, 6:00 PM Comment Actions The crazy request volume starts on July 2020 https://pageviews.toolforge.org/mediaviews/?project= commons.wikimedia.org&platform=&referer=all-referers&start=2020-01-01 &end=2020-12-31&files=AsterNovi-belgii-flower-1mb.jpg elukey added a subscriber: elukey.Mon, Feb 8, 6:17 PM Comment Actions In T273741#6812144, @mforns wrote: The crazy request volume starts on July 2020 https://pageviews.toolforge.org/mediaviews/?project= commons.wikimedia.org&platform=&referer=all-referers&start= 2020-01-01&end=2020-12-31&files=AsterNovi-belgii-flower-1mb.jpg Note for readers: on the left panel there is a "Agent" dropdown, make sure "Spider" is selected. I got "User" autoselected and it wasn't showing the right picture. Thanks Marcel for the link! mforns added a comment.Mon, Feb 8, 7:04 PM Comment Actions If it's an app, it would need to be very popular. Maybe Aarogya Setu, the app for reducing Covid infections? IIUC it's mandatory in India. Samwalton9 added a comment.Mon, Feb 8, 7:08 PM Comment Actions Maybe Aarogya Setu, the app for reducing Covid infections? If it is, it isn't part of the initial app setup process, which I just tested out of curiosity. Got stuck on needing to add my phone number :) nshahquinn-wmf added a comment.Mon, Feb 8, 7:21 PM Comment Actions In T273741#6812424, @mforns wrote: If it's an app, it would need to be very popular. Maybe Aarogya Setu, the app for reducing Covid infections? IIUC it's mandatory in India. I just installed it and poked around (I was able to do all the initial setup and get to the main functionality), but I didn't see that photo anywhere. There actually weren't any photos all (just illustrations and videos), so it seems very unlikely it was somewhere in the app that I didn't see. MoritzMuehlenhoff added a subscriber: MoritzMuehlenhoff.Mon, Feb 8, 9:53 PM Mvolz added a subscriber: Mvolz.Mon, Feb 8, 10:17 PM Legoktm added a subscriber: Legoktm.Mon, Feb 8, 10:33 PM Michaelrhanson added a subscriber: Michaelrhanson.Mon, Feb 8, 10:41 PM Comment Actions I found several places where this URL is being used in sample code, which might have been picked up by somebody and built into an app: https://stackoverflow.com/questions/18586466/ foursqaure-photo-add-against-checkin https://stackoverflow.com/questions/18232898/ node-js-http-get-with-node-js-step-module https://html.developreference.com/article/14455997/ Downloading+image+from+the+web+with+imagemagick+and+saving+to+parse seems like this particular flower has been kicking around as a sample image for quite a few years. Daniel.gayo added a subscriber: Daniel.gayo.Mon, Feb 8, 10:42 PM Comment Actions Could it be this app? https://apps.apple.com/hk/app/iclass-corporate/id1439400748?l=en The picture appears in a screenshot... cscott added a subscriber: cscott.Mon, Feb 8, 10:48 PM Legoktm added a subscriber: spinda.Edited * Mon, Feb 8, 11:07 PM Comment Actions @spinda found that this image is used in quite a few different places: * https://github.com/triniwiz/nativescript-image-cache-it/issues/11 * https://github.com/veerajongit/image-loader/blob/master/src/ index.html * https://stackoverflow.com/questions/18586466/ foursqaure-photo-add-against-checkin * https://stackoverflow.com/questions/18232898/ node-js-http-get-with-node-js-step-module * https://ti-qa-archive.github.io/question/131027/ how-can-i-deal-with-a-memory-leak-when-reading-httpclient-download-progress-from-ondatatream-on-android.html Michaelrhanson added a comment.Mon, Feb 8, 11:10 PM Comment Actions Hm! It is included in the imagenet URL list, I think. Could we be looking at some CV training pipeline that's not caching properly? http://image-net.org/api/text/imagenet.synset.geturls?wnid=n11934807 ssingh added a comment.Edited * Mon, Feb 8, 11:14 PM Comment Actions In T273741#6813531, @Michaelrhanson wrote: I found several places where this URL is being used in sample code, which might have been picked up by somebody and built into an app: https://stackoverflow.com/questions/18586466/ foursqaure-photo-add-against-checkin https://stackoverflow.com/questions/18232898/ node-js-http-get-with-node-js-step-module https://html.developreference.com/article/14455997/ Downloading+image+from+the+web+with+imagemagick+and+saving+to+parse seems like this particular flower has been kicking around as a sample image for quite a few years. It is most likely an app, given the header information above and also based on some other connection attributes. The question is which app though as some of us have gone through the popular apps in India but haven't been able to identify which app it is. It is also possible that the code was embedded in some app and that it requests the image but does not display it. ssingh added a comment.Mon, Feb 8, 11:14 PM Comment Actions In T273741#6813536, @Daniel.gayo wrote: Could it be this app? https://apps.apple.com/hk/app/iclass-corporate/id1439400748?l=en The picture appears in a screenshot... Unlikely, given the volume of the requests and the popularity/rating of this app. fdans added a subscriber: fdans.Mon, Feb 8, 11:18 PM Comment Actions In T273741#6813616, @Michaelrhanson wrote: Hm! It is included in the imagenet URL list, I think. Could we be looking at some CV training pipeline that's not caching properly? http://image-net.org/api/text/imagenet.synset.geturls?wnid= n11934807 That's an interesing idea, and that list includes several other commons images, but none with as much traffic as OP's. fdans added a comment.Edited * Mon, Feb 8, 11:29 PM Comment Actions As was suggested on Twitter, this surge coincides almost perfectly with the ban of TikTok, as well as other 223 Chinese apps, in India Wiki article AntiCompositeNumber added a project: Commons.Mon, Feb 8, 11:46 PM Hubzi added a subscriber: Hubzi.Mon, Feb 8, 11:48 PM Peteskomoroch added a subscriber: Peteskomoroch.Tue, Feb 9, 12:24 AM * Joe added a comment.Tue, Feb 9, 12:26 AM Comment Actions Another suggestion coming from twitter is https://play.google.com/ store/apps/details?id=com.app.rcn, which anyways doesn't seem popular enough (in india specifically) to cause that volume of requests. At this point, I'd bet it's one of those two mobile apps, possibly both. I would suggest that we start banning requests for this image without a UA, while we try to contact the app authors. It will likely have the side-effect of breaking some code samples using that (admittedly beautiful) photo. * * Ladsgroup added a comment.Tue, Feb 9, 12:32 AM Comment Actions I don't have much knowledge about India's internet infrastructure but from experience of Iran and blocking apps/websites. They show you a page from a reserved IP (so it's not accessible to the outside) saying "Sorry, this page is not accessible in Iran". It might be part of that page and we can't see it because of that. Specially given that the raise coincides with the block of TikTok in India. What happens if you try to access TikTok (or any other blocked app/ website) in India? I totally understand Iran't "internet" is different from what most countries have so I might be talking rubbish here. Xxpor added a subscriber: Xxpor.Tue, Feb 9, 12:44 AM Mahir256 added a subscriber: Mahir256.Tue, Feb 9, 12:48 AM Preinheimer added a subscriber: Preinheimer.Tue, Feb 9, 12:59 AM Comment Actions Going to the TikTok website from India results in the regular TikTok page loading, with a banner from TikTok saying that the service is unavailable in India. Not a dedicated block page. Screenshot (I have access to proxy servers in India). AntiCompositeNumber added a subscriber: AntiCompositeNumber.Tue, Feb 9, 1:02 AM Dzahn added a subscriber: Dzahn.Edited * Tue, Feb 9, 1:13 AM Comment Actions https://newshimalaya.com/2021/02/09/ %E2%9A%93-t273741-investigate-unusual-media-traffic-pattern-for-asternovi-belgii-flower-1mb-jpg-on-commons / ^ wut? I tried to search for links to this image and found... this Phabricator ticket content on a Nepali news site? [preview-Screenshot_at_2021-02-] varenc added a subscriber: varenc.Tue, Feb 9, 1:35 AM Comment Actions In T273741#6813823, @Preinheimer wrote: Going to the TikTok website from India results in the regular TikTok page loading, with a banner from TikTok saying that the service is unavailable in India. Not a dedicated block page. The lack of a User-Agent and all other distinguishing headers means this can't be coming from a web browser. I would look up the IPs and see if some of them are from IP blocks associated with cellular providers. Since in general only mobile phones are on those IPs, that'll provide be some strong evidence that this is from a mobile app. My guess is an online connectivity check? I assume the detailed IP/request logs aren't public or I'd go investigate this myself. Peteskomoroch removed a subscriber: Peteskomoroch.Tue, Feb 9, 2:05 AM SuperHamster added a subscriber: SuperHamster.Tue, Feb 9, 2:47 AM aaroncarson0 added a subscriber: aaroncarson0.Tue, Feb 9, 3:00 AM Izno added a subscriber: Izno.Tue, Feb 9, 3:09 AM tomglynch added a subscriber: tomglynch.Edited * Tue, Feb 9, 3:16 AM Comment Actions Hi all, I've been doing a bit of research into possible apps that could be causing this and found two potential culprits that I am currently investigating. The first is Mitron TV, (news article here), an Indian TikTok alternative which was made available again on the app store June 6th. The second is Say Namaste, (news article here), an Indian Zoom alternative which was launched on the app stores June 9th. Both fall into the timeline of huge increases, have millions of users and may be using '1280px-AsterNovi-belgii-flower-1mb.jpg' to check the users internet connection - especially for Say Namaste to ensure video connectivity. I've reached out to some developers at both companies and will report back. Let me know your thoughts. EDIT: I have also noticed the dates match the reopening after lockdown for the whole of India: "This first phase of reopening was termed as "Unlock 1.0"[13] and permitted shopping malls, religious places, hotels and restaurants to reopen from 8 June." from Wikipedia Tom MZMcBride added a subscriber: MZMcBride.Tue, Feb 9, 3:17 AM ssingh added a comment.Tue, Feb 9, 3:29 AM Comment Actions Thank you everyone for the comments and suggestions. I just wanted to share that we have identified the app and will update this task tomorrow. (And yes, it is a mobile app.) Vahurzpu added a subscriber: Vahurzpu.Tue, Feb 9, 3:34 AM Chlod added a subscriber: Chlod.Tue, Feb 9, 3:51 AM Michaelbrabec added a subscriber: Michaelbrabec.Tue, Feb 9, 3:54 AM mfkp69 added a subscriber: mfkp69.Tue, Feb 9, 4:02 AM Phuzion added a subscriber: Phuzion.Tue, Feb 9, 4:10 AM PatsagornY added a subscriber: PatsagornY.Tue, Feb 9, 4:15 AM * * mmodell added a subscriber: mmodell.Tue, Feb 9, 4:33 AM Comment Actions In T273741#6813839, @Dzahn wrote: ^ wut? I tried to search for links to this image and found... this Phabricator ticket content on a Nepali news site? Looks like it's just a big rss aggregator? * * mmodell awarded a token.Tue, Feb 9, 4:33 AM This comment was removed by mmodell. rootkea added a subscriber: rootkea.Tue, Feb 9, 4:54 AM TheOv3rminD added a subscriber: TheOv3rminD.Tue, Feb 9, 4:55 AM Comment Actions In T273741#6813995, @mmodell wrote: Also, hello hacker news! https://news.ycombinator.com/item?id= 26072025 Hello From us Hacker News readers ;) Str0nArm added a subscriber: Str0nArm.Tue, Feb 9, 6:43 AM Ltrlg added a subscriber: Ltrlg.Tue, Feb 9, 7:17 AM Devenvdev added a subscriber: Devenvdev.Tue, Feb 9, 7:20 AM Zardula added a subscriber: Zardula.Tue, Feb 9, 7:27 AM This comment was removed by Zardula. Majavah updated the task description. (Show Details)Tue, Feb 9, 7:30 AM Thibaut120094 added a subscriber: Thibaut120094.Tue, Feb 9, 7:30 AM miyuru added a subscriber: miyuru.Tue, Feb 9, 7:45 AM R4356th added a subscriber: R4356th.Tue, Feb 9, 8:27 AM Gilles mentioned this in T274228: Phabricator should cache tasks for a few minutes for logged-out users.Tue, Feb 9, 8:58 AM Shizhao added a subscriber: Shizhao.Tue, Feb 9, 9:00 AM Comment Actions Rename to a new filename? Asartea added a subscriber: Asartea.Tue, Feb 9, 10:36 AM Amorymeltzer added a subscriber: Amorymeltzer.Tue, Feb 9, 10:59 AM IKhitron added a subscriber: IKhitron.Tue, Feb 9, 12:43 PM semenko added a subscriber: semenko.Tue, Feb 9, 12:48 PM Matafagafo added a subscriber: Matafagafo.Tue, Feb 9, 1:22 PM MBH added a subscriber: MBH.Tue, Feb 9, 1:34 PM GeneralNotability added a subscriber: GeneralNotability.Tue, Feb 9, 1:47 PM wkandek added a subscriber: wkandek.Tue, Feb 9, 1:52 PM lmata added a subscriber: lmata.Tue, Feb 9, 2:34 PM Tks4Fish added a subscriber: Tks4Fish.Tue, Feb 9, 2:58 PM rabbbit added a subscriber: rabbbit.Tue, Feb 9, 3:08 PM DannyS712 added a subscriber: DannyS712.Tue, Feb 9, 3:22 PM AntiCompositeNumber added a comment.Tue, Feb 9, 4:01 PM Comment Actions In T273741#6814266, @Shizhao wrote: Rename to a new filename? The Commons community generally avoids moving files, as it can break attribution and cause issues for (reasonable) external reusers. "Turn it off and see who screams" is a valid method when less disruptive methods fail, but it appears to be unnecessary in this case. Others have previously suggested serving a different file for users with the matching user agent header, which wouldn't break all external links to the file. While that solution would require more work than simply moving the file, it may also have been more effective (depending on how it was used). Mvolz updated the task description. (Show Details)Tue, Feb 9, 4:01 PM gerritbot added a comment.Tue, Feb 9, 4:27 PM Comment Actions Change 663004 had a related patch set uploaded (by Giuseppe Lavagetto; owner: Giuseppe Lavagetto): [operations/puppet@production] upload-frontend: ban a specific url with no referer nor UA https://gerrit.wikimedia.org/r/663004 gerritbot added a project: Patch-For-Review.Tue, Feb 9, 4:27 PM Chlod removed a subscriber: Chlod.Tue, Feb 9, 4:29 PM ssingh added a comment.Tue, Feb 9, 5:17 PM Comment Actions Update: Thank you for the interest in this task! Like we shared yesterday, we have identified that the traffic is coming from a popular mobile app in India. We have initiated contact with the app developers, and are waiting to hear back from them. In the meantime, given the volume of requests, we have decided to ban those specific requests until the issue is resolved. While we will refrain from naming the app at this time, we can share that it is not on the list of apps mentioned in this task. Nevertheless, we thank you for your comments and suggestions on how to debug this! Since there has been some interest in how we narrowed it down to this particular app: 1. The header attributes suggested that this was a mobile app. We then queried Hive, we determined that the connection attributes related to these headers (User-Agent and Referer) were mostly from IPv6 addresses, further confirming the theory that this was a popular mobile app. 2. We then tried to isolate connections from geographical regions and ISPs in India but it was clear that there was no pattern there, as users were spread across the country. 3. A few things were clear given the volume of the requests: it was a popular app with traffic throughout the day (and even late at night) with a peak on December 31 2020, suggesting that it may be a chat or social media app. 4. We noticed that the image/app gained popularity somewhere around the time India blocked Chinese internet services and websites, thus affecting popular apps in India like TikTok. (This was pointed out by a user.) 5. Based on the information above, we gathered a list of popular chat and social media applications in the country, especially apps that gained popularity after the above censorship event. 6. We first started by downloading and running these apps to see if we could identify the image in their splash screens or within the apps. We also asked the community on the ground and there are many unnamed people who helped us with this -- thank you! 7. This unfortunately didn't work as none of the apps we tested had the image anywhere -- neither in the splash screen nor in the apps themselves. The community in India was equally surprised given the popularity of this image/app and the fact that they had not seen it in their daily usage. 8. It was then speculated that the app fetches the image but does not show it. (This was based on this comment.) 9. To recap, we were aware of the following at this stage: + it is a popular chat/social media mobile app used in India + it sets the User-Agent and Referer to '-' + it fetches the image from Wikimedia Commons but does not display it 10. To narrow down the app, we decided to observe connections to the image from clients (phones) to our servers. We did this by opening the popular apps one-by-one and noting down the time. After doing this for all the apps, we then ran this query in Hive: SELECT * FROM wmf.webrequest WHERE year=2021 AND month=2 AND day=9 AND parse_media_file_url(uri_path).base_name='/ wikipedia/commons/1/16/AsterNovi-belgii-flower-1mb.jpg' AND webrequest_source='upload' AND uri_host = 'upload.wikimedia.org' AND user_agent='-' AND ip=; 11. We then found the specific app that was making the request by matching the time when it was opened and the time image was requested from our servers, restricting the results to the User-Agent '-' and from the IP we tested. 12. By this time, we had isolated the app and were convinced that this is the one that is fetching the image on startup. We could not find the image anywhere in the app, confirming our theory that it fetches the image but does not display it. 13. To further confirm this finding and to ensure that we had the correct app, we decided to log DNS queries from a phone by setting up a local resolver to capture DNS traffic. After pointing the phone towards it and launching the app, we noticed that it was indeed the one looking up upload.wikimedia.org on startup. fdans added a comment.Tue, Feb 9, 5:23 PM Comment Actions @ssingh said it yesterday on chat but this is such stellar data detective work. Congrats on finding the culprit!! * Majavah added a comment.Tue, Feb 9, 5:25 PM Comment Actions Is the effect that the block will have in the app known? rootkea removed a subscriber: rootkea.Tue, Feb 9, 5:25 PM jijiki awarded a token.Tue, Feb 9, 5:40 PM calbon added a subscriber: calbon.Tue, Feb 9, 5:49 PM Comment Actions Just to second what @fdans said, the data detective work was great and this was such a fun ticket to watch. * Joe added a comment.Tue, Feb 9, 6:02 PM Comment Actions In T273741#6815874, @Majavah wrote: Is the effect that the block will have in the app known? No, hence we tried to reach out to them, although it seems that there is no good way to get in touch with them through email (I sent an email to all publicly available channels, only to get back an autoresponder that assumes I'm an user of the app and asking for my phone number). I eventually resorted to DM their CEO on twitter. I think this is more than enough courtesy on our part. Anyways, the block will clearly link to this task as the reason for the block, see https://gerrit.wikimedia.org/r/c/operations/puppet/+/663004/4/modules /varnish/templates/upload-frontend.inc.vcl.erb#380, so that whoever is responsible for this can figure out how to reach us. Log In to Comment Content licensed under Creative Commons Attribution-ShareAlike 3.0 (CC-BY-SA) unless otherwise noted; code licensed under GNU General Public License (GPL) or other open source licenses. By using this site, you agree to the Terms of Use, Privacy Policy, and Code of Conduct. * Wikimedia Foundation * Privacy Policy * Code of Conduct * Terms of Use * Disclaimer * CC-BY-SA * GPL