[HN Gopher] 20% of requests for Wikimedia Commons are for one im...
       ___________________________________________________________________
        
       20% of requests for Wikimedia Commons are for one image of a flower
        
       Author : IfOnlyYouKnew
       Score  : 1298 points
       Date   : 2021-02-08 23:53 UTC (23 hours ago)
        
 (HTM) web link (phabricator.wikimedia.org)
 (TXT) w3m dump (phabricator.wikimedia.org)
        
       | andi999 wrote:
       | Title is misleading. It is 20% of requests to the eqsin cluster
       | located at Changi airport.
        
         | tassu wrote:
         | It isn't _at_ the airport, that's just the closest airport. The
         | WMF names its clusters with the initials of the data center
         | vendor and the closest airport's code.
        
           | andi999 wrote:
           | True. Hahaha
        
       | mlester wrote:
       | anyone check if it's stegnography?
        
         | dane-pgp wrote:
         | Nice idea, but if it's getting 90 million requests per day,
         | then either there are a lot of people requesting the same
         | message (so it's not very secret), or the few people requesting
         | it are very forgetful (in that they have to keep re-requesting
         | the same message multiple times per day).
         | 
         | I suppose the contents of the image/"message" could change
         | every day, but presumably that would be very obvious in the
         | edit history of that file[0], unless Wikimedia Commons were
         | suppressing the fact that the file is constantly changing. If
         | they are part of the conspiracy, though, you'd think they would
         | have taken down the task from Phabricator too.
         | 
         | [0] https://commons.wikimedia.org/wiki/File:AsterNovi-belgii-
         | flo...
        
       | jacquesm wrote:
       | At the height of the browser wars I once woke up to Microsoft
       | hotlinking a small button for downloading our software from the
       | MSN homepage. I tried to reach someone there for hours but nobody
       | cared enough to do something about it. The image was small (no
       | more than a few K), but the millions of requests that page got
       | were enough to totally kill our server.
       | 
       | Finally, I replaced the image on there with a 'Netscape Now'
       | button. Within 15 minutes the matter was resolved.
        
         | rdescartes wrote:
         | One of my friends used the same strategy to block DDOS from
         | China : just put "Falun Gong" on there and it was resolved
         | instantly.
        
           | himinlomax wrote:
           | I remember someone doing that with the goatse picture. The
           | hotlinker was pissed and all sorts of amusing drama ensued.
        
             | MisterTea wrote:
             | That was popular in the early ebay days when you had to
             | host your own images. A friend had someone selling similar
             | items using his image links. So he changed the images to
             | goatse. Problem solved.
        
             | gipp wrote:
             | The Tribalwar forums did this to CNN after 9/11, CNN had
             | hotlinked one of those images where people were trying to
             | pick out "demon faces" in the smoke
        
             | daveslash wrote:
             | That was exactly how I learned what goatse was. My MySpace
             | page was all decked out with images that I was hotlinking
             | from some server... The server owner realized this and
             | replaced all the images with Goatse. One day a friend goes
             | _" Hey... uh, what's up with your MySpace page... that's
             | pretty gross"_. So I went to log in: Goastse. Goastse
             | _everywhere_ (gestures with hand). And my eyes were never
             | the same again tth_tth
             | 
             | Edit: grammar.
        
           | thaumasiotes wrote:
           | > One of my friends used the same strategy to block DDOS from
           | China : just put "Falun Gong" on there and it was resolved
           | instantly.
           | 
           | ...because attacks from China are horrified at the thought of
           | disrupting Falun Gong?
        
             | dspillett wrote:
             | Because it is one of the things that will get you added to
             | the blocklists that form part of the Great Firewall of
             | China.
             | 
             | It won't stop a hacker who is probably bypassing parts of
             | that anyway, but the more casual requests such as those
             | caused by deep linking will generally stop getting through.
        
         | snoshy wrote:
         | The old school response, weaponized without being
         | inappropriate.
        
         | rattray wrote:
         | That's hilarious!
         | 
         | Did they continue to link to your software after that? (I'm
         | curious - what was your software?)
        
           | jacquesm wrote:
           | Yes, they did, they actually thought it was quite funny. They
           | even cached the actual download once they realized we
           | wouldn't be able to deal with that either. The software was
           | the first version of the public peer-to-peer webcam software
           | I wrote:
           | 
           | http://web.archive.org/web/20000510010712/http://www.camarad.
           | ..
        
             | phinnaeus wrote:
             | Oh my! This is a blast from the past. I was a kid, probably
             | 10 years old or something, and I had a LEGO MovieMaker
             | webcam. I was trying to set it up as a sort of
             | security/monitoring camera for the back door of the small
             | business my parents ran. I remember using this software and
             | supposedly getting it working.
             | 
             | I invited my parents to come see what I had done, and
             | somehow typed the website wrong and ended up on a spanish-
             | language porn site. I could not hit the back button fast
             | enough. Possibly one of the most embarrassing memories of
             | my childhood.
             | 
             | I have no idea what my parents thought I was up to.
        
               | alickz wrote:
               | Haha I know that pain.
               | 
               | When I was a kid I asked my mom to print me out Grand
               | Theft Auto cheats from Gamewinners.com while she was in
               | work.
               | 
               | Somehow I got the address wrong and she wanted to know
               | why I wanted to print out pages and pages from a site
               | dedicated to men cheating on their wives. Got there in
               | the end though and I still have some of those GTA cheats
               | memorised.
        
               | acct776 wrote:
               | Your mom might have another family in the Greater Toronto
               | Area now, just so you are aware!
        
               | prawn wrote:
               | Not sure if you're aware, but it's interesting that you
               | mention Lego as the person you're responding to once
               | accidentally bought literally tons of bulk Lego and later
               | designed an automated Lego sorting machine. It's a fun
               | read:
               | 
               | https://jacquesmattheij.com/sorting-two-metric-tons-of-
               | lego/
        
               | jacquesm wrote:
               | Heh. Hilarious story, thank you! Camarades.com had just
               | about everything, from people being born to people dying
               | and everything in between. It was a pretty honest
               | (sometimes brutally honest) slice of life.
               | 
               | One of the most popular cams for years was an old person
               | that was extremely ill and that rarely moved but he had
               | pretty big fanclub and he thought it was quite funny that
               | he was more famous on what eventually became his deathbed
               | than he had ever been while he was still active. After he
               | died his family asked to remove all the images and close
               | the account which of course we did. Makes you wonder if
               | all those people wishing him well over the years kept him
               | going a bit longer. What is interesting is that if you
               | did this today I'm pretty sure the jerks would drown out
               | the nice people by a considerable margin, of course there
               | were jerks back then as well, but on the whole the
               | internet seemed to be a much nicer place to hang out than
               | it is today.
        
               | [deleted]
        
               | macintux wrote:
               | My then-wife was watching over my shoulder once as I
               | typed something into the address bar. "Freshmeat.net"
               | auto-completed, drawing a suspicious look from her.
        
             | xrisk wrote:
             | How is this on the Wayback machine?!
        
               | loktarogar wrote:
               | You can click "about this capture" for more information
               | 
               | > Starting in 1996, Alexa Internet has been donating
               | their crawl data to the Internet Archive. Flowing in
               | every day, these data are added to the Wayback Machine
               | after an embargo period.
        
               | Thorrez wrote:
               | Fun fact: Amazon's home assistant was named after Alexa
               | Internet. Amazon owns Alexa Internet.
        
               | phpnode wrote:
               | it's not named after it, it's just amazon is so massive
               | they have to reuse brand names. AWS has exhausted not
               | only the supply of IPv4 addresses but also the supply of
               | 3 letter initialisms.
        
             | aasasd wrote:
             | Your 'Netscape Now' pic is whopping 1.9 KB--could probably
             | be optimized quite a lot, if for some weird reason the GIF
             | didn't have jpeg-y artifacts on the background and a ton of
             | blur on the text. Basically, you've brought that DoS on
             | yourselves.
        
             | DonHopkins wrote:
             | As pioneer of "<something> On Internet", do you regret not
             | turning out like Russ Hanneman? ;) (OR DID YOU???!)
             | 
             | https://www.youtube.com/watch?v=BzAdXyPYKQo&ab_channel=yate
             | 5...
             | 
             | https://silicon-valley.fandom.com/wiki/Russ_Hanneman
             | 
             | I'm just glad I didn't turn out like Erlich Bachman! (OR
             | DID I???!)
             | 
             | https://www.reddit.com/r/SiliconValleyHBO/comments/4jmlv9/w
             | h...
             | 
             | https://silicon-valley.fandom.com/wiki/Erlich_Bachman
        
               | vidarh wrote:
               | I've finally gotten around to watching this series, and
               | it's disturbing how many moments I've watched that were
               | more familiar than they should have been, and too many
               | characters I could instantly put a real name to....
        
             | rattray wrote:
             | Beautiful. The internet was a truly different place back
             | then...
        
               | jacquesm wrote:
               | With 100K visitors / day or so we were in the top 30
               | websites world wide in 1998. The really big boosts came
               | from the Space Shuttle webcasts and an Yves St. Laurent
               | fashion show webcast from Paris.
               | 
               | Hard to believe now, a typical blog post will already
               | pick up 30K visitors without too much trouble.
        
               | NetOpWibby wrote:
               | I could listen to stories of the Old Net all day.
        
               | jacquesm wrote:
               | Enjoy:
               | 
               | https://jacquesmattheij.com/story-behind-wwcom-
               | camaradescom/
               | 
               | And apologies for the non-working images.
        
               | gowld wrote:
               | Serves you right for hotlinking ;-)
        
               | jacquesm wrote:
               | Yes, but at least it was my own domain :)
               | 
               | I didn't see that consequence coming when camarades.com
               | shut down. I really should dig up those images and repair
               | the blog but the todo list isn't really getting any
               | shorter on this end.
        
         | aembleton wrote:
         | Back around 2002, I had a pdf icon on my website. It got deep
         | linked by a few others but the number one source of traffic
         | came from the website of a lawyer who specialised in
         | intellectual property. There was something on there about how
         | it was illegal to deep link.
         | 
         | I was tempted to replace with goatse but I think I just changed
         | it to a screenshot of his website saying that it was illegal to
         | deep link.
         | 
         | It soon got changed.
        
           | gumby wrote:
           | Even though it's not illegal!
        
           | jacquesm wrote:
           | That's a neat example of recursion :)
        
         | failrate wrote:
         | We used something like this technique back in the Flash days.
         | Sites would straight up steal your games, so one defense was to
         | have the game grab its sprites from a server local endpoint.
         | Thieving sites would get either no graphics or deliberately
         | corrupted graphics.
        
       | ramraj07 wrote:
       | I'm in India now, is it possible for me to install some traffic
       | snooper and monitor if any wikimedia requests go out? I can then
       | install some popular apps and see if anything bites!
        
       | rozab wrote:
       | This page someone noticed is very interesting.
       | 
       | https://newshimalaya.com/2021/02/09/%E2%9A%93-t273741-invest...
       | 
       | I was sure I'd seen this website before, and sure enough, it's
       | scraping and rehosting almost everything that's posted on HN...
        
         | bottled_poe wrote:
         | Nothing unusual here, just run of the mill online copyright
         | violation.
        
           | BlueTemplar wrote:
           | Yeah, here's the unusual one :
           | 
           | http://n-gate.com/
        
       | ip26 wrote:
       | 145kB for a connectivity check, ouch. This is a poster child for
       | why many apps guzzle so much data.
       | 
       | (On a 500MB/mo plan you start noticing)
        
       | eyelidlessness wrote:
       | You gotta respect the suggested approach to take preventive
       | measures by banning requests to this individual image without a
       | User Agent header _and_ to try to identify who might be affected.
       | I'm sure I'm not the only one here who would just treat it as
       | abuse and ban without followup.
        
       | dr_dshiv wrote:
       | This happening now! Some suspect a failure in a CV training
       | pipeline. Others suggest an extremely popular app with a
       | hotlinked image.
        
         | magicalhippo wrote:
         | If it was my site I'd replace the image with goatse and see who
         | complains, but I guess that's a bit drastic for Wikimedia.
        
           | waheoo wrote:
           | I'd give them a pass.
        
           | ISL wrote:
           | Even just serving a giant blinking red X gif to 10^-6 of the
           | requests might be sufficient.
           | 
           | Only 10^-6 of the "legitimate" requests would be affected,
           | but a whole lot of the "undesireable" requests would see
           | it...
        
             | viraptor wrote:
             | Serving a giant blinking red gif to unsuspecting internet
             | users is a bad idea.
             | https://en.wikipedia.org/wiki/Photosensitive_epilepsy
             | 
             | There was a better idea posted in comments - serve a
             | picture with a very short explanation and an email to
             | contact.
        
               | EE84M3i wrote:
               | There is no reason it needs to blink quickly.
        
               | Taniwha wrote:
               | But he's suggesting that they only be served to 10^-6
               | people - that's one 1,000,000th of a person - I suspect
               | it will have little effect
        
           | hinkley wrote:
           | Tarpit the image and it will take care of itself.
           | 
           | Same advice I gave a w3c.org admin who was lamenting how much
           | traffic people generate by not caching xml schemas. Yes, you
           | have to serve the requests. But you don't have to try to
           | serve them in 100 ms. If a human is on the other end, 1-2
           | seconds is just fine. If a human is not, then the human will
           | surely notice when their batch process goes from 3 minutes to
           | 10 minutes because it fetches the same schema 200 times.
        
             | rictic wrote:
             | Could you end up executing a slow loris style attack on
             | yourself by doing this?
             | 
             | I guess a couple seconds won't matter unless the server is
             | already redlining it and the tarpitted traffic is a small
             | proportion.
        
               | toast0 wrote:
               | Yes, but slowloris isn't really a big deal if you've got
               | a modern http(s) server with async i/o. It costs nearly
               | nothing to have a idle connection while waiting 3 seconds
               | before sendfiling the schema xml.
        
               | mekkkkkk wrote:
               | Can you not run out of sockets though? I know it used to
               | be a thing anyway. Maybe it's handled somehow nowadays.
        
               | toast0 wrote:
               | You can run out of sockets, but that's easy to tune. I
               | don't know the limits on other systems, but FeeeBSD lets
               | you set the maximum up to Physical Pages / 4 with just
               | boot time setings. So about 1 million sockets per 16 GB
               | of ram.
               | 
               | Worst case, if you start running out of sockets because
               | you're sleeping, sample the socket count once a second
               | and adjust sleep time to avoid hitting the cap. Also, you
               | could use that sampling to drive decisions about keeping
               | http sockets open or closed.
               | 
               | I should add, select on millions of sockets is going to
               | suck; so you'll need kqueue/epoll/whatever your kernel
               | select but better interface is.
        
               | hinkley wrote:
               | Well any time you start yanking levers and spinning dials
               | you'd better know where the breaking points in your
               | system are.
               | 
               | If you care about the traffic because you're already
               | having trouble with that many simultaneous requests, then
               | you are definitely not going to solve that problem by
               | increasing the response time by a factor of 10.
               | 
               | But an important property of reverse proxies is that once
               | the proxy sees the last byte of the response, the
               | originating server is no longer involved in the
               | transaction. The proxy server is stuck ferrying bits over
               | a slow connection, and hopefully is designed for that
               | sort of work load. If the payload is a static file, as it
               | is in both of these cases, then it should be cheap for
               | the server to retrieve them.
        
         | Clewza313 wrote:
         | It can't be a training pipeline, because the IPs are all around
         | India.
         | 
         | Sample code from Stack Overflow being used by some major app is
         | the most likely candidate. It's also possible that the image
         | fetch call is a vestigial appendix that doesn't even display
         | the image, which will make tracking this down extra
         | challenging.
        
           | eli wrote:
           | Perhaps a very inefficient "check if we have working internet
           | access" routine.
        
           | rootw0rm wrote:
           | i hate it when vestigial appendices go awry. pretty sure the
           | evidence is mounting, however
        
       | m3kw9 wrote:
       | Maybe some sort of social network using it as the default profile
       | pic and isn't caching
        
       | [deleted]
        
       | [deleted]
        
       | blunte wrote:
       | And now thanks to this HN post, 21% of requests are for that same
       | flower!
        
         | Mattwmaster58 wrote:
         | I was curious what the actual amount of requests HN would have
         | to muster, and with a frequency of 90,000,000 reqs/day, HN
         | would need to hit it with 4,500,000 requests.
        
         | HotVector wrote:
         | HN ain't that big
        
       | mitchs wrote:
       | Heh, a popular consumer electronics product a room mate worked on
       | shipped an update that used example.com as a connectivity test.
       | Apparently they were on pace to rack up $20k/month in server
       | costs. At least their user agent made it obvious who to contact.
        
       | fariss wrote:
       | can this be some sort of botnet checking whether a host is
       | connected to the internet or not?
        
       | andrewmatte wrote:
       | 90M requests daily from India? I wonder if KaiOS is checking
       | whether it's got internet access.
        
       | batch12 wrote:
       | Superficial reversing shows that the ravn app mentioned,
       | com.app.rcn may use the file as part of a speedtest:
       | 
       | com.app.rcn/smali/com/app/rcn/utils/InternetSpeedCalculator.smali
       | : "hxxps://upload.wikimedia.org/wikipedia/commons/1/16/AsterNovi-
       | belgii-flower-1mb.jpg"
       | 
       | edit: defanged the link to maybe save the wikimedia team some
       | bytes
        
       | [deleted]
        
       | wiz21c wrote:
       | FTB (from the bug) :
       | 
       | You could even serve another image in its place to this UA, with
       | some text and an email address to contact. You'd probably find
       | out pretty quickly what it is from users of that mysterious
       | thing. A throwaway email address is probably best
       | 
       | Really good idea :-)
        
       | crazygringo wrote:
       | I'll just be looking forward to the follow-up post on HN
       | announcing when they figure out what the culprit was!
       | 
       | Per the comments, right now the top suspect seems to be the app
       | "Josh" or another TikTok clone because of how traffic surged
       | immediately after the TikTon ban:
       | 
       | https://twitter.com/bwaber/status/1358915338637873154
        
       | IfOnlyYouKnew wrote:
       | Wikimedia is unique in running some of the most popular websites
       | with open access to almost all systems. As someone who has never
       | been on the inside of FAANG, I found it rather interesting to
       | browse around the backend infrastructure.
       | 
       | See, for example, their statistics at
       | https://grafana.wikimedia.org/d/000000102/production-logging...
        
         | dalbasal wrote:
         | In interviews with Jimmy Wales, he seems somewhat regretful of
         | not having made Wikipedia a for-profit. At the least, he's
         | fairly adamant that Wikipedia could have been Wikipedia as a
         | for profit.
         | 
         | The way he structured wikipedia, from back-end infrastructure
         | to ownership/governance structure was just the logical way of
         | doing the project. Times were different. Online culture was
         | different.
         | 
         | I don't want to overinterpret the man, or put words in his
         | mouth... but... I got the impression that Wales thinks that if
         | he was starting Wikipedia now, he'd just do it asd a startup
         | and also succeed.
         | 
         | To me, this is almost sad. Besides being an awesome
         | encyclopedia, wikipedia is existence proof for something of
         | scale outside the norm. Something that isn't a corporation. A
         | lot of things are deterministic to the structure of an
         | organization.
         | 
         | For example, take the current postpostmodern war over truth and
         | stuff: platforming/deplatforming, freedom of speech,
         | censorship, bias, manipulation, narrative = power issues, etc.
         | Wikipedia is at the very centre all these problems. Whatever
         | difficulties Twitter is experiencing should be 100X worse for
         | wikipedia. Meanwhile, Wikipedia is withstanding far better, and
         | with far more integrity. I don't think this is a coincidence.
         | 
         | Dunking on wikipedia's budget/spending is popular. Meanwhile,
         | Wikipedia uses <1% of the resources/budget of Twitter. They are
         | operating @ >100X efficiency compared to a realistic for-profit
         | equivalent. That's a flying shuttle.
         | 
         | We know that Wikipedia, Linux & The Worldwide Web are possible
         | because they exist. We literally wouldn't know otherwise.
         | Theory couldn't have gotten us to this knowledge. Each is
         | existence proof for other ways of doing things. They aren't
         | necessarily roadmaps, but I'm a big believer in existence
         | proofs. What Jimmy made is 100X better, more important and non-
         | inevtiable than what Zuck made. The thought that he wants to be
         | Zuck bums me out.
        
           | BlueTemplar wrote:
           | Yeah, the Web was quite impressive (though we already had the
           | Minitel), but it was Wikipedia that _really_ blew my mind
           | (even though we already had Encarta). (In fact I consider
           | Wikipedia to be the Web 's "killer app", even more than
           | Google and other search engines were.)
        
             | dalbasal wrote:
             | Out of all the "killer apps" for the web... wikipedia is
             | the one that implement the www most faithfully. Hypertext
             | articles. Most apps got the web to do x. Wikipedia is what
             | it was made to do.
        
               | BlueTemplar wrote:
               | Yeah. I was about to add that it pretty much has been Tim
               | Berners-Lees vision coming to fruition, but the fact that
               | Wikipedia is centralized has stopped me. But then isn't
               | the Web itself technically 'centralized' on the Internet
               | ? And isn't Wikipedia a _great_ example of pseudonymous
               | strangers (= social decentralization) collaborating with
               | each other ?
        
           | nolok wrote:
           | It would succeed the same way Quora does. Much less open,
           | much less universal, much more user hostile, with an almost
           | agressive way to deal with unlogged user.
           | 
           | In terms of financial and organisational success it would
           | probably largely beat what it is now. It terms of benefit to
           | humanity, it would be much worse.
           | 
           | Company + for profit + laws means access to information has
           | to be much more tailored to the laws of each place. "Let's
           | remove tianamen's article or lose your chinese license" kind
           | of things.
           | 
           | I'm for one am glad for the current wikipedia we have,
           | despite it's numerous flaws. I still donate every year,
           | although I wish Wales could stop having it spend its money
           | the same a startup or FAANG does.
        
             | dalbasal wrote:
             | That's one option, though I wouldn't necessarily use Quora
             | as a mainline example. They're kind of a $gme for rich
             | people. I think highly enough of Jimmy to bet on him doing
             | a much better job than that.
             | 
             | Stackoverflow is a decent example. Very capable founding
             | team. They explicitly tried to be like a commercial
             | wikimedia. They do embrace quite a lot of openness, notably
             | creative commons... learning from wikimedia successes.
             | 
             | RE " _I wish Wales would:_ " Another consequence for how
             | wikipedia is structured is that Wales isn't the Zuckerberg
             | of Wikimedia. Power is a lot more dispersed.
             | 
             | RE spending/flaws and such: I feel like wikimedia is held
             | to an extremely unfair standard. Who/what should we compare
             | them to?
             | 
             | Wikimedia spend $70m per year. This is probably less than
             | Quora _or_ stackexchange. FB  & Twitter (IMO more
             | comparable in terms of scale/importance) spend $55bn &
             | $3bn. Twitter spends 45X more than Wikimedia. Facebook
             | spends almost 1,000X compared to Wikimedia. The bang-for-
             | buck is insane.
             | 
             | Also in terms of flaws in rules/judgement calls. A lot of
             | people are highly critical of wikipedia's "deletionism"
             | related MOs. What articles/edits stay in. How good the
             | rules & procedures are for this. What "camp" has power, and
             | how they treat the other camp. I get that this stuff is
             | contentious.
             | 
             | Meanwhile on Twitter or Facebook, the rule is "I decide." "
             | _But it gets us clicks_ " is the killer argument. Nothing
             | is transparent. Wikimedia is doing a much better job,
             | respecting user & editor rights far more, being a lot less
             | self righteous. Of course it's not perfect, but come on.
             | The "norm" is Facebook's content policy, Twitter's safety
             | department, or Apple's App store approval room. Wikimedia
             | is the _one_ example of being better than that... and for
             | that everyone is always yelling at them.
        
           | Vinnl wrote:
           | I can also imagine that he'd say that just because it makes
           | him look/feel better, i.e. it's more of a sacrifice if he
           | gave it for free while he also could've been a billionaire,
           | than if this was the only way Wikipedia could ever have been
           | a success.
           | 
           | Then again, WikiTribune _was_ a for-profit.
        
         | np_tedious wrote:
         | I have, and this is is still fascinating. Got any more links
         | you'd suggest?
        
           | tassu wrote:
           | https://media.ccc.de/v/36c3-73-infrastructure-of-wikipedia
        
         | ShakataGaNai wrote:
         | Wikimedia's infrastructure is radically different than most
         | FAANG.
         | 
         | In large part because 99% (+/-) of their traffic is read only.
         | While Facebook and Google have to do heavy workloads for every
         | click and action taken on their services, Wikimedia can cache
         | basically everything. Allowing them to operate on a tiny
         | fraction of the number of machines (and infrastructure) that
         | the rest of the players do.
        
           | anang wrote:
           | Wikimedia also has less incentive/drive to meticulously track
           | every interaction on their pages. The level of tracking
           | present on Facebook and Google has to be extremely
           | computationally intensive.
        
           | dalbasal wrote:
           | I agree. Another (no contradiction) way of looking at this is
           | that Wikimedia infrastructure is radically different because
           | Wikimedia is radically different.
           | 
           | They need it to be a certain way in order to operate. The
           | limitations and advantages of how software gets made. Why it
           | gets made. The way the software works. How and why product
           | decisions were made over the last 2 decades. What resources
           | they have/had available. It's all a totally different game.
           | Not surprising that different soil and a different climate
           | grow different plants.
           | 
           | One of Google's early coup d'etats, when they were a
           | strategic step ahead of the boomers, was bankrolling gmail,
           | youtube and such. Gmail offered free giant inboxes. They got
           | all the customers. This cost billions (maybe 100s of
           | millions), but storage costs go down every year while the
           | value of ads/data/lock-in and such go up every year. Similar
           | logic for youtube. (1) Buy a leading video-sharing site;
           | (2)bankroll HD streaming because you have the deepest pockets
           | (3) Own online free TV entirely.
           | 
           | That's who Google is, good or bad. How funding works. What
           | products get built. What infrastructure is necessary,
           | possible, affordable. All interlinked. Wikipedia & Google
           | were founded at the same time. Within 5 years (circa 2006)
           | Google was buying charters and fiefdoms. Wikimedia,
           | meanwhile, was starting to take flak for raising 3 or 4
           | million in donations.
           | 
           | It's kinda crazy that Wikipedia is comparable in scale to
           | FAANGs when you consider these disparities.
        
           | MaxBarraclough wrote:
           | Does it hurt their caching if you browse Wikipedia when
           | signed in?
           | 
           | I recall reading HackerNews used to have that problem, unsure
           | if it still does.
        
             | tim333 wrote:
             | Looking at the source of a Wikipedia page it has my
             | username appearing 6 times so I guess it must reduce
             | caching a bit. Though I guess they could cache the user
             | info bits and the rest of the page and just splice them
             | together.
        
           | _joe wrote:
           | This is indeed correct. Wikimedia overall uses less than 2000
           | bare-metal servers, so yes the infrastructure is tiny
           | compared to those.
           | 
           | What can be interesting, I think, is that you have a
           | completely open infrastructure that has to solve problems on
           | a global traffic scale.
           | 
           | If people are interested in knowing more, I suggest you also
           | take a peek at the wikimedia techblog, specifically to the
           | SRE category https://techblog.wikimedia.org/category/site-
           | reliability-eng... and the performance one
           | https://techblog.wikimedia.org/category/performance/
        
           | nostrademons wrote:
           | Search is also largely read-only. The advantage Wikipedia has
           | is that its traffic overwhelming goes to the head of the page
           | distribution, so simple caching solutions work very well.
           | Google has a pretty extreme long-tail distribution (~15% of
           | daily queries have never been seen before), and so needs to
           | do a lot of computation per query.
        
             | zelon88 wrote:
             | > Google has a pretty extreme long-tail distribution (~15%
             | of daily queries have never been seen before)
             | 
             | Do you have a source for this?
             | 
             | I'd be willing to bet that the ONLY reason why 15% of their
             | daily queries "haven't been seen before" is because they
             | add un-needed complexity like fingerprinting. You're making
             | it seem like they've never seen a query for "cute animals"
             | before when obviously they have. They choose to do a lot of
             | extra leg work because of who you are.
             | 
             | So your claim that 15% of their queries have "never been
             | seen before" is probably inaccurate. I'd be willing to bet
             | that "15% of their queries are unique because of the user,
             | location, or other external factor separate from the query
             | itself."
             | 
             | They've seen your query before. They've just never seen
             | _you_ make this query from this device on this side of town
             | before.
        
               | wwwwewwww wrote:
               | It's somewhat analogous to the claim that almost every
               | spoken sentence had never been spoken before in the
               | history of language.
        
               | Rastonbury wrote:
               | Well, you'd be wrong
        
               | zelon88 wrote:
               | Meh, it happens.
        
               | tylerhou wrote:
               | If you took into account user, location, etc. 15% seems
               | too low. I almost never search for the exact same thing
               | twice in the same location.
               | 
               | 15% of the queries themselves are unique.
               | https://blog.google/products/search/our-latest-quality-
               | impro...
               | 
               | https://www.google.com/search/howsearchworks/responses/
               | 
               | I work for Google (and used to work on Search).
        
               | IgorPartola wrote:
               | The point is not that. It's that when you search for
               | "cute animals", Google shouldn't be storing that _you_
               | searched for that, or even care. Your location is
               | arguably potentially relevant but it could be coarse
               | enough except when searching for directions to allow at
               | least some caching.
        
               | ma2rten wrote:
               | Right, you can cache that query. That doesn't mean that
               | you can cache "two bunnies playing in the snow r/aww
               | reddit".
        
               | tylerhou wrote:
               | This is right on the money -- getting search results for
               | queries that are too personalized to e.g. location means
               | that you can't cache those search results (or if you did
               | cache them, their entries would be useless).
        
               | edmundsauto wrote:
               | Hey Igor! Hate to be a bore, but I wanted to provide
               | feedback that your comment may unintentionally come
               | across as aggressive. OP has pretty relevant work
               | experience that I know I'd love to hear more about, but
               | there's not really any room for them to respond.
               | 
               | I know many folks IRL who work at big tech who have no
               | interest in posting here because the community comes off
               | as very unwelcoming. That's a shame, because they have
               | insight that would be great to hear. Regardless of
               | anyone's opinion of their employer.
               | 
               | Apologies in advance if your intent was purely about the
               | topic. I just thought I read something in your tone that
               | might hinder discourse rather than encourage it. I wanted
               | to point it out, in case it was unintentional.
        
               | WanderPanda wrote:
               | To me Igors comment is also displaced. He injects
               | activism into a technical discussion (sadly happens very
               | often here on HN). We all know by now that the bigcorps
               | are to a large degree based on data collection. We do not
               | need to be reminded about it each and every day. We are
               | adults, if we don't like it we use alternatives.
        
               | edmundsauto wrote:
               | Yeah, this is a fair point. My larger point was mostly
               | that HN misses out on some valuable comments by insiders
               | because those people are disincentives by some of the
               | rhetoric and tone when an article on big tech is popular.
               | I didn't think the comment I replied to was particularly
               | aggressive - it was just something that came to mind when
               | I read it. OP was actually very kind and constructive in
               | their response - a good ending and constructive
               | discussion for us all!
        
               | tylerhou wrote:
               | Agreed about the tone. The comment could have been less
               | argumentative -- instead of "that's not the point," they
               | could have said "that's not the only reason."
               | 
               | On the other hand, if I'm not responding, it's not
               | because I find HN too abrasive -- it's because I am
               | afraid of leaking non-public information. That's why
               | whenever I talk about Google, I try to cite a Google blog
               | post or other authoritative source, or talk about my own
               | personal experience; hence, "I rarely search for the same
               | query twice."
        
               | IgorPartola wrote:
               | I apologize for the tone. It the start of my comment was
               | clumsily wired and it wasn't my intention to have it come
               | off as argumentative. The way I read the GP comment to
               | mine was talking about how Google's tracking of its
               | users' telemetry was what was contributing to the
               | uniqueness of requests. Your comment to me boiled down to
               | the fact that of course most requests are unique because
               | of tracking location data and the user account. There
               | seemed to be a disconnect because your comment took for
               | granted that user location and account were a part of the
               | search query while the person you were replying to
               | specifically challenged that notion (again in my reading
               | of both). I tried to post a concise bridge between the
               | two concepts, and of course we all see how well I did
               | with that :)
               | 
               | Having said that, I do think this is clearly a sensitive
               | issues, not a purely technical one. I can appreciate the
               | nuance of working for Google and doing excellent work
               | while seeing the company criticized left and right for
               | its business model. I think given the community, while
               | there is opposition to how Google may at certain points
               | conduct itself as a corporation, there is no lack of
               | respect for any individual working there. I certainly
               | view my comment and the discussion of privacy as having
               | 50% to do with Google's strategy and 50% to do with the
               | technical aspects of whether you can build a search
               | engine that holds user privacy as a core priority rather
               | than trying to launch an ad hominem on you or anyone. And
               | I saw your other comment that agreed with me and the GP
               | comment so I think my first sentence aside, we are on the
               | same page :)
        
               | rectang wrote:
               | Thanks for responding constructively, Igor.
        
               | rStar wrote:
               | I'm gonna have to disagree with the negative comments
               | above concerning Igors tone. He made his point with clear
               | respectful language that I would be happy to entertain at
               | work, at the bar, at worship or while on a (previous to
               | covid) group run or golf outing. so, to me, it looks like
               | instead of an 'agree to disagree' while respecting each
               | other, you disrespect igor by dismissing his arguments
               | due to his tone, which handily allows you to ignore his
               | content, such as it is. Therefore, in my judgement, you
               | guys are being unfair to Igor while also being
               | disingenuous about your reason for policing his tone.
        
               | tylerhou wrote:
               | > disrespect igor by dismissing his arguments due to his
               | tone, which handily allows you to ignore his content,
               | such as it is
               | 
               | I didn't dismiss his argument; I said that he was correct
               | right after he posted:
               | https://news.ycombinator.com/item?id=26073488
               | 
               | "That's not the point" can be interpreted as respectful,
               | but it also can be interpreted as argumentative. I chose
               | to assume good intentions, but I offered a different
               | phrasing that would have a higher chance of not being
               | misinterpreted: i.e. using "yes and" instead of "no but":
               | https://www.theheretic.org/2017/yes-and-vs-no-but/
        
               | csharptwdec19 wrote:
               | I understand. To be a googler you have to be really good
               | at smooth talking.
               | 
               | And when you do this on a public form it just highlights
               | that the company you work for makes sure as many
               | employees as possible are serving kool-aid to the masses.
        
               | [deleted]
        
               | dextralt wrote:
               | is this satire?
        
               | zelon88 wrote:
               | I'd be interested in seeing how polluted that 15% of new
               | queries is with people blasting malformed URLs or FQDNs
               | into the omnibox of Chrome.
        
               | Kiro wrote:
               | What's so unbelievable about 15%? I personally think it
               | is way lower than I expected. We're clearly not googling
               | in the same way.
        
               | simias wrote:
               | I agree with you. Also in my experience less tech-savvy
               | people tend to overcomplicate their queries instead of
               | just entering the relevant keywords which I'm sure
               | accounts for many uniques.
        
               | jobigoud wrote:
               | I don't understand how you compute that estimate.
               | 
               | I doubt you store the history of all searches ever?
               | People don't need a google account to query the engine,
               | others disable history, etc.
               | 
               | Are you saying you still have all searches ever made
               | ever? Because you would need this to say a query hasn't
               | been made before wouldn't you?
        
               | simias wrote:
               | I don't know how they did it but I suspect that it
               | wouldn't be very hard to model the distribution by
               | sampling a few million queries and extrapolate from that.
        
               | freeone3000 wrote:
               | Why would you not store every search ever? It's only a
               | few petabytes, and you can find out all sorts of useful
               | info from it.
        
               | melq wrote:
               | You'd only need to store the list of unique searches, but
               | even if that's true and the 15% number is true, that must
               | be a huge amount of data.
        
               | ma2rten wrote:
               | I think you mean "15% seems too high". Any easy way to
               | think about this is the following: even if search the
               | entire internet you will almost never see the same
               | sentence twice, assuming it's has a certain number of
               | words. There is a combinatorial explosion in possible
               | sentences to write. Search queries are essentially just
               | sentences without stopwords.
        
               | djhn wrote:
               | Removing stop words is what old school users of IT
               | systems do, because that's what we learned worked best at
               | the time.
               | 
               | Internet users who came online later, from GenZ to many
               | boomers, will often just write conversational sentences
               | and questions.
        
               | bagels wrote:
               | https://blog.google/products/search/our-latest-quality-
               | impro...
               | 
               | "There are trillions of searches on Google every year. In
               | fact, 15 percent of searches we see every day are new"
        
               | zepto wrote:
               | It would still be helpful to know what 'new' means.
               | 
               | Does it mean literally the text string typed into the box
               | by the user is new?
               | 
               | Or does it mean the text string combined with a bunch of
               | other inferred parameters we don't know about is new?
        
               | jobigoud wrote:
               | New for the day or new for the history of the engine?
        
               | rfoo wrote:
               | > So your claim that 15% of their queries have "never
               | been seen before" is probably inaccurate.
               | 
               | I'm not sure, on my productive days maybe >50% of my
               | Google searches are not very cachable. (for example, I
               | just googled "htop namespace", "htop novel bytes", "htop
               | pss", "htop nightly build ubuntu 14.04")
        
               | wolfd wrote:
               | https://blog.google/products/search/our-latest-quality-
               | impro...
               | 
               | They briefly mention the statistic in the last paragraph.
        
               | rntksi wrote:
               | I think GP post has a point. I've noticed people use
               | Google really differently from how I do. E.g. I would go
               | search for "figure concave" while my brother would search
               | a longer phrase.
               | 
               | Also, speaking of people behaviour, it would not make
               | sense to search everyday for "cute animals", but the
               | volume of searches done for new things people discover as
               | they get older would make more sense. I mean just look at
               | search trends for things like "hydroxychloroquine" for
               | example (and that's not to mention people who get it
               | wrong, i.e. other factors for differing search queries
               | too)
               | 
               | Also, other languages can change the queries depending on
               | how you phrase the sentence too. Add to that the people
               | using other ways to search instead of just visiting
               | google.com and I think you can get pretty close to 10%.
               | 
               | If fingerprinting is the reason, 15% would be a figure
               | too low I surmise. Would that be the case I think that
               | would make probably 20-25% of searches rather than 15%.
               | 
               | It could very well be that they do classify fingerprinted
               | search differently only in some countries and not others?
               | That would/might explain the 15% figure.
               | 
               | I might be wrong and under-estimated fingerprinting
               | techniques for Google. If they have really good
               | fingerprinting techniques, that would reduce the estimate
               | I have in mind to a better number (close to 15, maybe?)
        
               | zelon88 wrote:
               | So consider your hydroxychloroquine example again this
               | way;
               | 
               | Nobody has _ever_ searched for hydroxychloroquine before
               | today. Today is the day the word is hypothetically
               | invented. Today 2 million people will search for
               | hydroxychloroquine. But only one of them was the first to
               | do it.
               | 
               | What I know about pop-culture and viral internet culture
               | is telling me that 15% of 1 trillion searches being
               | unique is shady math.
               | 
               | So I am not fully convinced that the 15% claim is
               | completely transparent.
        
               | no_way wrote:
               | It's a guess, but my thinking is that previously most
               | people who searched term hydroxychloroquine were mainly
               | scientists and other people related to that not your
               | general population. Suddenly covid happens and now large
               | numbers of people learn about this new drug they never
               | heard before, they are gonna search, and I presume this,
               | most wildly different things like: "how does it work?"
               | "does it cause some disease?" " _insert something
               | political here about hydroxychloroquine_ " "did aliens
               | make hydroxychloroquine?" and many more things I lack
               | imagination to come up with and that's only about
               | hydroxychloroquine. I doubt 15% number is about single
               | word cases, but more about combination of words and that
               | seems reasonable. Inventing new words daily seems
               | unlikely, chaining them on the other hand seems
               | plausible.
        
               | nostrademons wrote:
               | The vast majority of people don't search for
               | [hydroxychloroquine]. They search for [Is
               | hydroxychloroquine effective in treating COVID-19?] or
               | [What is the first drug that was approved to treat
               | COVID-19?] or [What methods do we currently have to treat
               | COVID-19?]. You can see these on the search results page
               | as the "Common questions related to..." widget. How else
               | do you think Google gets that data?
               | 
               | The folks who use keyword-based searches are largely
               | those who got on the Internet before ~2007. Tech-savvy,
               | relatively well-off, usually Millenial or Gen-X, plugged
               | into trends. This happens to be the demographic dominant
               | at Hacker News. But there's a much larger demographic who
               | just types in whatever they're thinking of, in natural
               | language, and expects to get answers.
               | 
               | Come to think of it, this is also the demographic that
               | doesn't use tabbed browsing, and uses whichever browser
               | ships with their OEM, and often doesn't realize that
               | there's a separate program called a "browser" running
               | when they click on the "Internet", and issues a Google
               | Search for [google] (#3 query in 2010) when they want to
               | get to Google even though they're on Google already but
               | don't realize it, and doesn't know what a URL is. When a
               | big-tech company makes a brain-dead usability decision
               | you don't like, first consider how that usability choice
               | might appear to your grandmother and it might not seem so
               | brain-dead.
        
           | Retric wrote:
           | You can do quite a bit of processing per page load without
           | issue. Facebook and Google just take it rather past that
           | point into near absurdity, while still being highly
           | profitable.
        
             | cmckn wrote:
             | Every request at FB is handled in a new container. This
             | isn't absurd, it's actually pretty neat :)
             | 
             | Edit: I don't know what I'm talking about. Happy Monday!
        
               | rachelbythebay wrote:
               | What? Are you calling the context of a HHVM request a
               | container just to confuse people?
               | 
               | Also, there's way more than just the web tier out there.
        
               | cmckn wrote:
               | Wasn't my intention to confuse, just repeating something
               | I've been told by FB folks.
               | 
               | Everyone, please listen to Rachel and never ever me.
        
               | robmurrer wrote:
               | is not neat... is freakish
        
               | ROARosen wrote:
               | Wow that sounds interesting, does anyone know if this is
               | true?
        
               | wilsonthewhale wrote:
               | I'm not on the team that handles this, but I highly doubt
               | that this is the case.
        
             | ianlevesque wrote:
             | Yeah why do they keep spending billions to build new
             | datacenters when they could just stop being absurd instead?
             | 
             | The contempt on here is crazy sometimes.
        
               | MereInterest wrote:
               | I don't think that Facebook/Google developers are foolish
               | or incompetent. That would be contempt. Instead, I think
               | that Facebook and Google as conglomerate entities are
               | fundamentally opposed to my right to privacy. That they
               | make decisions to rationally follow their self-interest
               | does not excuse the absurd lengths to which they go to
               | stalk the general population's activities.
        
               | ryanianian wrote:
               | > I don't think that Facebook/Google developers are
               | foolish or incompetent.
               | 
               | Nobody in this thread is saying that. Parent to you said:
               | 
               | > they could just stop being absurd instead [of building
               | more DCs]
               | 
               | implying FB could build fewer DCs by scaling down some of
               | their per-page complexity/"absurdity". Basically saying
               | their needs are artificial or borne of requirements that
               | aren't.
               | 
               | > conglomerate entities are fundamentally opposed to my
               | right to privacy
               | 
               | That's a common view, but it's not on topic to this
               | thread. This thread is mostly about the tech itself and
               | how WikiMedia scales versus how the bigger techs scale.
               | It has an interesting diversion into some of the reasons
               | why their scaling needs are different.
               | 
               | You could instead continue the thread stating that they
               | could save a lot of money and complexity while also
               | tearing down some of their reputation for being slow and
               | privacy-hostile by removing some of the very features
               | these DCs support (perhaps) without ruining the net
               | bottom line.
               | 
               | This continues the thread and allows the conversation to
               | continue to what the ROI actually is on the sort of
               | complexity that benefits the company but not the user.
        
               | Retric wrote:
               | I was the one saying absurdity and I think you're missing
               | the context. Work out how much processing power is worth
               | even just another 1 cent per thousand page loads and
               | perfectly rational behavior starts to look crazy to the
               | little guys.
               | 
               | Let's suppose the Facebook cluster spends the equivalent
               | of 1 full second of 1 full CPU core per request. That's a
               | lot of processing power and for most small scale
               | architectures likely adding wildly unacceptable latency
               | per page load. Further, as small scale traffic is very
               | spiky even low traffic sites would be expensive to host
               | making it a ludicrous amount of processing power.
               | 
               | However, Google has enough traffic to smooth things out,
               | it's splitting that across multiple of computers and much
               | of it is after the request so latency isn't an issue, and
               | it isn't paying retail so processing power is little more
               | than just hardware costs and electricity. Estimate the
               | rough order of magnitude their paying for 1 second of 1
               | core per request and it's cheap enough to be a rounding
               | error.
        
               | bo1024 wrote:
               | The idea of marginal value/marginal cost is that
               | companies will generally continue spending one billion
               | dollars to add size and complexity, as long as they get
               | back a bit more than a billion dollars in revenue.
               | 
               | So it wouldn't necessarily be contradictory if most of
               | their core functionality could be replicated very simply,
               | yet the actual product is immensely complicated. I forget
               | where I first read this point, but probably on HN.
        
               | civilized wrote:
               | Or maybe you're just reading too much into "absurd" which
               | can just be a colorful word for "an extremely huge
               | amount"
        
             | throwaway3699 wrote:
             | To be fair, there's a bit of a combinatoric effect of scale
             | * features going on there. I'm sure you could build most of
             | a Facebook equiv. 100x-1000x cheaper if it only served one
             | city instead of the whole planet.
        
               | klodolph wrote:
               | The effects of scale are less combinatoric than you might
               | think. Most people on my Facebook feed are from the same
               | city anyway, even though Facebook is global.
        
               | erichurkman wrote:
               | The effects and scale of sales (ads) are very
               | combinatoric, though.
        
           | ashtonkem wrote:
           | They also have looser latency SLAs. The only hard requirement
           | is that a user can read back their own writes, but it's okay
           | if other users are served stale data for a few seconds or
           | minutes even. This makes cache invalidation, one of the most
           | notoriously difficult and expensive operations at large
           | scale, much much easier.
        
             | nostrademons wrote:
             | Facebook also has a similar SLA. I've heard that at one
             | point in their architecture (~2010), they literally stored
             | the user's own writes in memcached and then merged them
             | back into the page when rendered. _You_ would see a page
             | consistent with your actions, but if you logged into
             | Facebook as any of your friends your updates might not show
             | up until replication lag passed.
        
               | dash2 wrote:
               | Interesting that this sounds very similar to how
               | multiplayer games do it.
        
               | mackman wrote:
               | Close, IIRC we cached the fact you had just done a write,
               | and a subsequent read request that arrived on the replica
               | region was then proxied to the primary region instead of
               | serviced locally.
        
               | IgorPartola wrote:
               | Pretty clever. Is that still how it works?
        
               | glittershark wrote:
               | Pretty sure this paper describes what they're doing now:
               | https://research.fb.com/publications/flighttracker-
               | consisten...
        
               | mackman wrote:
               | I'm not sure if FlightTracker completely replaced the
               | need for the internal consistency inside Tao. You can
               | read about that here: https://www.usenix.org/system/files
               | /conference/atc13/atc13-b...
        
               | [deleted]
        
               | eismcc wrote:
               | Dirty bits at scale
        
               | gogopuppygogo wrote:
               | I'm guessing this is ecc memory so likely correcting for
               | bad data.
        
               | Hackbraten wrote:
               | I think they meant dirty bit as in "a flag that means
               | update needed," not as in "bit flipped due to glitch."
        
               | cranekam wrote:
               | My memory is fuzzy now but this dates back to when there
               | were only two datacenter regions and one of them held all
               | the primary DBs (2011 or so). All write endpoints were
               | served in that region, so if a user routed to the
               | secondary region did a write the request was proxied to
               | the primary region. After doing a write a cookie was set
               | for the user in question which caused any future reads to
               | be proxied to the primary region for a few seconds while
               | the DB replication stream (upon which cache invalidation
               | was piggybacked) caught up, because if they went to the
               | secondary region memcached was now stale.
               | 
               | It hasn't been this way since around 2013 but again I am
               | fuzzy on how. I think that's when most such data was
               | switched to TAO, which has local read what you wrote
               | consistency. As long as users landed in the same cluster
               | (and thus TAO cluster) what they wrote was visible to
               | them, even if the DB write hadn't yet replicated to their
               | region.
               | 
               | FlightTracker postdates my time at FB (ended 2018ish) so
               | I'm not sure how that is used. These systems evolved a
               | lot over time as requirements changed.
               | 
               | I don't remember anything about writes being batched in
               | memcached and merged in on page load.
        
       | astrea wrote:
       | I wonder how much additional traffic this investigation brought
       | said image.
        
       | mark-r wrote:
       | Reminds me of the time Netgear routers were hardcoded with the IP
       | address of a NTP server at the University of Wisconsin.
       | https://en.wikipedia.org/wiki/NTP_server_misuse_and_abuse#Ne...
        
       | ct520 wrote:
       | Well I hope they implement some good caching
        
       | dmurray wrote:
       | 20% of Wikipedia Commons requests to their Singapore servers
       | (EQSIN), not globally. That's still a lot, of course.
        
         | reaperducer wrote:
         | 90,000,000 requests a day. That's some flower.
        
           | labster wrote:
           | Replace it with an image saying: "If everyone who sees this
           | flower donated 100 rupees to Wikipedia, this fundraiser would
           | be over in 6 hours."
        
             | thotsBgone wrote:
             | Unfortunately, the image was never displayed by the app
             | that was downloading it as an internet test.
        
             | HotVector wrote:
             | Forward to all your indian uncles
        
               | Ayesh wrote:
               | On whatsapp
        
             | [deleted]
        
           | gumby wrote:
           | Given that they are coming from India it could be 90M single
           | daily requests!
        
           | ceph_ wrote:
           | Would an AsterNovi-belgii-flower-1mb by any other name smell
           | as sweet?
        
             | ed25519FUUU wrote:
             | Let's all just be grateful it wasn't AsterNovi-belgii-
             | flower-100mb.
        
               | efreak wrote:
               | At one point, I had a very small png file that was large
               | enough to crash Netscape and IE, and later firefox. It
               | had large enough dimensions that browsers couldn't handle
               | it.
               | 
               | Today I'm sure it would be fine; instead I'm frustrated
               | by my inability to create webp images larger than 16000
               | pixels tall (i was trying to write a data-saver proxy for
               | reading webtoons)
        
             | [deleted]
        
           | Havoc wrote:
           | Clearly the bestest flower though
        
         | [deleted]
        
       | BugsJustFindMe wrote:
       | And of course, now that the image is linked in the report, I've
       | just added an additional request for it by clicking.
        
         | Wowfunhappy wrote:
         | With the amount of traffic its apparently getting, HN probably
         | won't make a big impact. Plus, most of us aren't in India, and
         | most of us have normal user agents.
        
       | navotgil wrote:
       | Well... posting it here will only increase the number of requests
       | and will make the investigation harder
        
       | z3t4 wrote:
       | Just replace the image... what app/site is this? Email me at
       | my@mail.com
        
       | [deleted]
        
       | tomglynch wrote:
       | Just added this comment on the issue:
       | 
       | Hi all, I've been doing a bit of research into possible apps that
       | could be causing this and found two potential culprits that I am
       | currently investigating.
       | 
       | The first is Mitron TV, an Indian TikTok alternative which was
       | made available again on the app store June 6th
       | (https://indianexpress.com/article/technology/tech-news-
       | techn...).
       | 
       | The second is Say Namaste, an Indian Zoom alternative which was
       | launched on the app stores June 9th
       | (https://indianexpress.com/article/technology/tech-news-
       | techn...).
       | 
       | Both fall into the timeline of huge increases, have millions of
       | users and may be using '1280px-AsterNovi-belgii-flower-1mb.jpg'
       | to check the users internet connection - especially for Say
       | Namaste to ensure video connectivity. I've reached out to some
       | developers at both companies and will report back. Let me know
       | your thoughts.
       | 
       | EDIT: I have also noticed the dates match the reopening after
       | lockdown for the whole of India: "This first phase of reopening
       | was termed as "Unlock 1.0"[13] and permitted shopping malls,
       | religious places, hotels and restaurants to reopen from *8
       | June*."
       | (https://en.wikipedia.org/wiki/COVID-19_lockdown_in_India#Unl...
       | )
       | 
       | Tom
        
         | batch12 wrote:
         | Based on this, I just reversed both Android apps and am not
         | seeing strings related to wikimedia nor asternovi. This doesn't
         | mean it's not obfuscated somehow though. The only app I've
         | found the strings in so far is the "ravn" app proposed by
         | @taviso. As mentioned in the twitter thread though it doesn't
         | seem to have the install base to cause this traffic--
        
           | catlover99 wrote:
           | I took a look at the apk and noticed this in the manifest.
           | "com.blockeq.stellarwallet.WalletApplication" Stellar Lumens
           | is a fairly popular crypto currency. I wonder if the app has
           | built in support for crypto transactions. If not, maybe it's
           | malware to mine crypto coins.
           | 
           | https://i.imgur.com/o8DllVd.png
        
             | captn3m0 wrote:
             | It is a crypto chat application:
             | 
             | >Ravn is your portal to the most private messenger as well
             | as Korrax our proprietary token. Stay up to date with
             | Korrax and other Cryptos and join the crypto group chats.
             | 
             | >Messages, images and docs are never stored on a server
             | (after delivery), they're only locally stored on your own
             | phone. Ravn is not tied to your phone number or email, you
             | only sign up with a username that isn't searchable or
             | discoverable.
        
             | WeekSpeller wrote:
             | Stupid question: how did you reverse the app in Android
             | Studio?
        
               | catlover99 wrote:
               | I downloaded the APK and then used "Profile or Debug APK"
               | under file in Android Studio and ctrl/cmd+shift+f to
               | search for strings.
               | 
               | I don't know much about Android development or APKs but
               | it's not exactly "reversing." from what I understand the
               | profile/debug converts the .dex files from the APK to
               | .smali which is human readable.
        
               | NicolaiS wrote:
               | You can use the "Analyse APK" feature, but you probably
               | rather want to use tools like jadx or apktool that
               | provides fairly good decompilation.
        
           | tomglynch wrote:
           | Thanks batch12. In my edit, it could also be related to a
           | check-in app used at public spaces in India - as it increases
           | from the 8th of June which matches when the India-wide
           | lockdown began to lift. Perhaps a reverse of qr code scan
           | checkin apps in India could be useful?
        
             | batch12 wrote:
             | Could be-- I checked about 50 apps from alternative lists
             | that popped up after the ban with no luck except for that
             | one I mentioned before.
             | 
             | Looks like they posted shortly after yours on the ticket
             | that they found the culprit. Guess we'll find out tomorrow
             | if we were on the right path.
        
               | tomglynch wrote:
               | Yeah hopefully they have a bit of a write up too about
               | how they worked it out - interesting problem to solve!
        
         | thetanil wrote:
         | As far as I know, this is also an image commonly used in
         | machine learning tutorials for image classification of species
         | of flowers. I don't know if the tutorials use the mediawiki
         | source directly though. I do recognize this image though. I
         | think it's in the SciKit Learn O'Reilly book.
        
       | tchalla wrote:
       | Could it be the "Good Morning" like greetings on WhatsApp gone
       | viral ?
       | 
       | https://www.wsj.com/articles/the-internet-is-filling-up-beca...
        
         | mabedan wrote:
         | That's a good one! Maybe an app which generates this you'r type
         | of image has this flower as one of their sample images in a
         | list which it preload on startup
        
         | Clewza313 wrote:
         | No, because the images for those are stored by WhatsApp, not
         | hotlinked from Wikimedia.
        
       | aaron695 wrote:
       | It seems like it started on the 2020-06-09
       | 
       | https://pageviews.toolforge.org/mediaviews/?project=commons....
        
       | dayze wrote:
       | Sukhbir Singh just commented: Thank you everyone for the comments
       | and suggestions. I just wanted to share that we have identified
       | the app and will update this task tomorrow. (And yes, it is a
       | mobile app.)
        
         | NKCSS wrote:
         | Would be curious to get the full story :-/
        
       | m463 wrote:
       | The "traditional" way of fixing this would be a goatse.cx
       | redirect of the image.
       | 
       | I'm sure there is a more enlightened fix.
        
         | sangnoir wrote:
         | ...or sending _that image_ [1] jwz sends back upon detecting HN
         | in the referer. I bet they'll find the app in a matter of
         | hours, or at least reduce the traffic drastically.
         | 
         | 1. https://www.jwz.org NSFW!
        
           | bzb6 wrote:
           | This makes me wonder why the hell referer headers are still
           | sent by major browsers, especially to third parties. I can't
           | think of a single reason that benefits the user.
        
             | qwertay wrote:
             | Originally it probably just sounded like a cool feature to
             | see what blog linked to you. Now its been around for so
             | long that so much has been programmed to actually use it.
             | If you turn it off you get every anti bot script blowing up
             | on you.
             | 
             | I think browsers did drop the path from it at least.
        
             | jessaustin wrote:
             | For one thing, examining referer is a common way that a
             | server determines a request is _not_ a hotlink. Sure you
             | can do something more complicated with cookies or whatever,
             | but lots of sites are just using referer and they 'll break
             | if the client doesn't send it.
        
               | namibj wrote:
               | But for that it's enough to send it for same-origin
               | requests. No need to send it cross-origin, except for
               | tracking purposes.
        
               | iggldiggl wrote:
               | That'd still break the distinction between hotlinking and
               | the user using a bookmark or copy/paste to directly open
               | the URL in question.
        
               | h_anna_h wrote:
               | Letting the sites distinguish between the two does not
               | seem to be in the interest of the user.
        
             | malaya_zemlya wrote:
             | if you are making any sort of content or running a website,
             | it is really useful to know how people found you.
        
           | murph-almighty wrote:
           | For those reticent to click on their work computers but
           | morbidly curious, can someone describe the image?
        
             | sangnoir wrote:
             | It's a motivational-poster-type image with a white egg
             | holder in the foreground, but instead of an egg, it's
             | holding one exquisitely detailed hairy, caucasian ball[1].
             | At the top, the title is "HACKER NEWS" and the bottom text
             | is "A DDoS OF FINANCE-OBSESSED MAN-CHILDREN AND
             | BROGRAMMERS"
             | 
             | 1. Is there a collective biological term for scrotum _and_
             | it 's contents that is not general like "genitals" is?
        
           | sli wrote:
           | All I get is a scrolling hex editor looking thing. Maybe that
           | redirect has been disabled?
        
             | mey wrote:
             | You aren't sending a referrer header (a good thing).
        
             | pengaru wrote:
             | Yep, jwz has had a change of heart and sees today's HN as a
             | born again breath of fresh air.
        
               | geoelectric wrote:
               | I'm seeing the nut sundae on iOS mobile so I wouldn't get
               | too happy yet...
        
               | [deleted]
        
               | [deleted]
        
             | jaredsohn wrote:
             | Try from a new profile or incognito.
             | 
             | I saw the described image but after I visited the site
             | directly I couldn't see it any more when redirectly via
             | hacker news. Saw it again when I opened an incognito tab.
        
           | Dylan16807 wrote:
           | I think he's the only one that uses that? Barely even worth
           | mentioning in comparison.
        
           | lxe wrote:
           | Just learned that this person owns DNA Lounge (and pizza?),
           | and is a founder (early contributor?) of Netscape and
           | Mozilla.org. I've lived and worked in that particular area of
           | SF for years and haven't known this.
        
             | hedora wrote:
             | Also, jwz is responsible for xscreensaver.
        
             | alsetmusic wrote:
             | One of my company's clients has a beautiful office right
             | above DNA Lounge (well, across the street or just adjacent
             | - it's been a while and I've only been there once). They
             | told me they can hear sound checks from their rooftop
             | patio.
        
             | m463 wrote:
             | netscape used to display a spinning compass when you put
             | about:jwz in the title bar
             | 
             | other good ones were about:1994 and about:mozilla
             | 
             | hey, about:mozilla still works in firefox
        
               | eythian wrote:
               | about:robots also works in Firefox, I know it's been
               | there for a long time but I have no idea if it was ever
               | in Netscape.
        
               | kbrosnan wrote:
               | about:robots is from the early Firefox releases. Pretty
               | sure it is from Firefox 3.0 development as you can find
               | the same robot in images when searching for Firefox Gran
               | Paridiso Robot.
               | 
               | https://www.google.com/search?q=firefox+gran+paradiso+rob
               | ot&...
        
             | tingletech wrote:
             | there used to be linux based public terminals in DNA lounge
             | too, IIRC
        
         | mey wrote:
         | A permanent redirect to a non-image page (owned by Wikimedia)
         | may achieve the same thing. Either the calling system can't
         | support a HTML response, or it's a webview in which case you
         | could either report an error or provide a notice. Maybe even
         | ask for donations :)
        
           | ed25519FUUU wrote:
           | Or just downsample the image to a reasonable size and deal
           | with it. Nothing inherently wrong with having a popular
           | image.
        
             | ehwhyreally wrote:
             | Yes there is when you are hotlinking. Hotlinking in general
             | is considered theft, you are using someone elses bandwidth
             | and could even ddos the host if you are not caching the
             | response.
        
               | [deleted]
        
               | concordDance wrote:
               | > Hotlinking in general is considered theft
               | 
               | This is a pretty puzzling idea to me. How could linking
               | something be theft?
               | 
               | To explore this, I shall try a metaphor. Imagine you're
               | on a big social media website (lets call it Programmer
               | Olds) which has an oddity in that 99% of its users use
               | adblock. You then post a link to another small (ad
               | supported) website on your Programmer Olds page, causing
               | a large number of people to click through and download
               | the page using large amounts of bandwidth (for no
               | monetary gain to the site) and possible DDOSing the site.
               | 
               | Have you commited theft?
        
               | JCharante wrote:
               | > causing a large number of people to click through and
               | download the page using large amounts of bandwidth (for
               | no monetary gain to the site)
               | 
               | The difference here is that while a lot of users use
               | adblock, there are some that don't. These users can still
               | see the ads. Additionally even though it's a small
               | website, it may lead to new readers that stick around or
               | the content itself may even be sponsored.
               | 
               | The equivilent to hot linking a picture would be like
               | taking the content of a blog post without really linking
               | to the source, because there's no chance of conversions
               | there. If you're linking to the site itself then there's
               | a reasonable chance that users can convert.
               | 
               | So I suggest that it's theft just because the chances of
               | readers being converted is nil while you're using their
               | bandwidth.
        
               | lorenzhs wrote:
               | > This is a pretty puzzling idea to me.
               | 
               | That's because you're responding to an entirely different
               | issue. "Hotlinking" isn't linking to something, it's
               | including a resource that is hosted elsewhere. It's
               | putting <img
               | src="https://concordDance.whatever/images/big_image.jpg">
               | on _my_ website without asking you. Now if my site ends
               | up on the front page of HN, that could cause a lot of
               | traffic to _your_ site, potentially overwhelming your
               | server or increasing _your_ hosting bill. It 's not nice,
               | and rightfully frowned upon.
        
               | concordDance wrote:
               | But from a loss and gain perspective it seems equivalent.
               | 
               | In both cases the site loses bandwidth for no gain due to
               | your actions.
        
               | yreg wrote:
               | No, it's not universally considered a theft. Wikimedia
               | explicitly permits hotlinking[0]. So does xkcd, imgur and
               | tons of other sites.
               | 
               | Of course when someone doesn't want us to hotlink to
               | their assets then don't do it.
               | 
               | [0] https://commons.wikimedia.org/wiki/Commons:Reusing_co
               | ntent_o...
        
               | arbitrage wrote:
               | it's so easy to mitigate, though, that the fact that one
               | doesn't sorta implies that one might want randos from the
               | internet to use one's resources to view this image.
               | 
               | it's not theft if you leave it out for everyone to use.
        
               | JackWins wrote:
               | My garden doesn't have a fence, doesn't mean you can host
               | your picnic here.
        
               | dylan604 wrote:
               | No, but if I wander into your garden and "injure" myself,
               | I can sue you for damages. You will be held negligent for
               | not properly protecting yourself from preventing other
               | people from injuring themself on your property.
        
               | gokhan wrote:
               | Is this something real (in US, most probably)?
        
               | wruza wrote:
               | Here in Russia, if you leave poisonous chemicals like
               | methanol, etc, unmarked or put a bear trap in your locked
               | house behind a locked fence with a generic warning sign,
               | and then someone dies or gets injured by these, chances
               | are you will go to jail. Idk if this applies to
               | accidental traps like pools or rakes in grass. Same for
               | taking a knife out of an attackers hand and stabbing them
               | back. (Yes, our laws protect criminals better than
               | citizens, not joking.)
        
               | rendall wrote:
               | Interesting. So if I understand this correctly, if
               | someone breaks into your house and gets injured, and they
               | can make a good case for some kind of negligence on your
               | part, then they can successfully sue you?
        
               | terramex wrote:
               | In Poland setting marked traps on your own, fenced
               | property is illegal and their owner is responsible for
               | any harm they cause, because there exist legal reasons to
               | enter another person's property - for example to fight
               | spreading fire.
               | 
               | However my favourite example is the law that allows any
               | bee keeper to enter any private property if they are
               | pursuing fleeing bee swarm.
        
               | wruza wrote:
               | If a judge or an expert is sure that you intended this
               | outcome, and that someone is brave (or dead) enough to
               | admit their own crime.
        
               | rendall wrote:
               | It's also illegal to set a trap in your own home in the
               | US as well, decided when a property owner, tired of
               | people breaking into his property while he was away, set
               | up a shotgun booby trap that injured a burglar.
               | https://youtu.be/bV9ppvY8Nx4
               | 
               | I wasn't sure if it is the same or similar principle in
               | Russia or a different one that requires active care for a
               | burglar. Unlabeled chemicals causing liability for a
               | burglar seems extreme to me
        
               | varjag wrote:
               | This is an urban legend in Russia.
        
               | eythian wrote:
               | Leaving a bear trap goes way beyond negligence, it's
               | literally setting a trap. Similar with unmarked dangerous
               | chemicals, they're required to be marked for good reason.
        
               | h_anna_h wrote:
               | In Greece if a burglar dies while in your house you can
               | be held responsible, even more so if you have set up
               | traps.
        
               | pastech wrote:
               | In France (at least), all swimming pools are protected by
               | a fence. If you own a pool and don't put a fence around
               | it, you can be held responsible for a child drowning into
               | it.
               | 
               | It is possible this principle applies to other countries
               | and other things than pools.
        
               | nikanj wrote:
               | Yes, you can sue anyone for anything. Your suit probably
               | won't prevail, unless you have access to very expensive
               | lawyers and your opponent doesn't.
               | 
               | But you can totally sue anyone for anything, and that
               | makes for entertaining headlines - even though if
               | plaintiff lost promptly
        
               | zaarn wrote:
               | Wikimedia has a User-Agent policy which is being violated
               | here. Hence this is the property owner putting up a sign
               | that says "risk of injury", so if you walk in and injure
               | yourself, you only have to blame yourself for being
               | negligent.
        
               | h_anna_h wrote:
               | The policy is for how wikipedia will act when
               | encountering clients with certain user-agent headers, not
               | a rule for the clients.
        
               | zaarn wrote:
               | It's a policy how wikimedia acts when clients lack a user
               | agent header, it's therefore effectively a rule for
               | clients as without a proper UA header, they may be
               | blocked indefinitely.
        
               | AdrianB1 wrote:
               | Only in your dreams and some dumb countries, not in the
               | rest of the world.
        
               | dylan604 wrote:
               | You think this, but how much experience do you have with
               | it? People know that homeowners have insurance. They sue
               | to make the insurance pay out. It happened to my
               | neighbor. So you can make all of the dumb countries
               | comments you want, but it doesn't make it any less real.
        
             | m463 wrote:
             | I wonder if there's some way to have a frontend cache that
             | or webserver shortcut that looks for that exact url and
             | blurts out the image?
             | 
             | Or maybe wikipedia is already mostly static.
             | 
             | also, I wonder if HN is inadvertently ddos'ing the ticket
             | system ?
        
         | ChrisMarshallNY wrote:
         | Like so?
         | 
         | http://ascii.textfiles.com/archives/1011
        
           | BlueTemplar wrote:
           | Brilliant.
        
         | peterkelly wrote:
         | This, perhaps disturbingly, was my first thought upon reading
         | the issue.
         | 
         | Things were done very differently back in the day. This problem
         | would have been fixed _real_ quick.
        
         | chrisjarvis wrote:
         | To the people who didn't grow up with 4chan: do not search for
         | this image, its pretty disgusting.
        
           | isatty wrote:
           | Fairly sure you'd get goatse'd more often on Efnet etc back
           | in the day
        
           | askvictor wrote:
           | s/4chan/slashdot/
        
           | pengaru wrote:
           | 4chan didn't even exist yet when goatse emerged
        
             | deathlight wrote:
             | It seems plausible to me that the, ahem, "spread" of the
             | image was greatly increased through the efforts of 4chan.
        
               | Semaphor wrote:
               | Back in School, goatse was extremely well known. That was
               | several years before 4chan. I hadn't even heard of goatse
               | in a 4chan relation until now.
        
               | eyelidlessness wrote:
               | Maybe widespread but it was already pretty wide open
               | before there was a gap for 4chan to even exist.
        
               | themaninthedark wrote:
               | ** Kadmium changes topic to 'Our hearts are extended to
               | the 18 victims of the recent internet fraud'
               | 
               | http://bash.org/?434593
        
               | nomat wrote:
               | Hey I'm on that website! IRC used to be so fun and weird
               | back in the day. hanging out on slashnet took up most of
               | my free time in junior high.
        
             | incompatible wrote:
             | I think it was popularized back in the days of Slashdot.
        
               | tingletech wrote:
               | it does not date to alt.tasteless on usenet? (edit: w/r/t
               | goatse)
        
               | disillusioned wrote:
               | I was going to suggest Something Awful but you might win,
               | though Wiki pegs it (heh) at 1999...
        
           | eyelidlessness wrote:
           | To the people who grew up before 4chan, pls don't mention
           | tubgirl
        
             | BlueTemplar wrote:
             | Not having eyelids would certainly make it worse!
        
             | Rompect wrote:
             | I was born after 4chan was created and I found that image
             | on reddit. It's pretty mild; one can tell quickly that it
             | is a doll.
        
             | rootw0rm wrote:
             | lmao
        
             | sulam wrote:
             | 2 girls one cup
        
             | eyelidlessness wrote:
             | I missed the edit window and I'm disappointed in myself for
             | mentioning it by name. Please just don't Google this unless
             | you're prepared for an upsetting image, and even then maybe
             | just skip it. You're probably not as prepared as you think.
        
           | walrus01 wrote:
           | goatse signifiantly pre-dates 4chan
        
           | xwdv wrote:
           | what is it?
        
             | eyelidlessness wrote:
             | Big stretched open butthole. Not sure if you need the
             | warning but I'm commenting in case anyone would prefer not
             | to see it despite their curiosity.
             | 
             | Sorry to ruin the fun y'all but there's images I won't even
             | mention that I can't unsee and make me feel seriously ill
             | when I do see them. I don't want anyone else to feel that
             | way without warning.
        
               | xwdv wrote:
               | What are these images called so we know to avoid them?
        
               | drivebymy wrote:
               | There were quite a few, lemonparty and meatspin spring to
               | mind, and the various incarnations of "two x one y".
        
               | jml7c5 wrote:
               | Can't speak to the images themselves, but the sites are
               | usually referred to as "shock sites":
               | 
               | https://en.wikipedia.org/wiki/Shock_site
        
               | shagie wrote:
               | They were known as a "shock site" (
               | https://en.wikipedia.org/wiki/Shock_site )
               | 
               | The Wikipedia page for
               | https://en.wikipedia.org/wiki/Goatse.cx is text only and
               | without any ascii art.
               | 
               | I'm amused that
               | https://simple.wikipedia.org/wiki/Goatse.cx also exists.
        
               | arp242 wrote:
               | Oh, Goatse is _that_ site.
               | 
               | I remember when I was about 15, before pop-up blockers
               | were really a thing, someone sent me a link to that and
               | it would keep opening popups with that image and you
               | couldn't close all of them :-/
               | 
               | Sometimes people look back to the internet of the 90s
               | with too rose-coloured glasses IMO.
        
               | shagie wrote:
               | For some memories...
               | http://www.bash.org/?search=goatse&sort=0&show=25
               | 
               | I am personally most amused by #38659
        
               | eyelidlessness wrote:
               | Hey at least if you were on a 90s Mac your computer was
               | probably unresponsive and you could skip to the
               | inevitable force reboot. And browsers didn't save
               | sessions so you were in the clear as soon as you got to
               | tabula rasa.
        
               | eyelidlessness wrote:
               | I'm honestly not sure you're asking in good faith so I'm
               | not going to add more (and if you are asking in good
               | faith you've got plenty in responses to go on). Also I
               | never knew the name of the one that's permanently burned
               | into my brain and I'm so glad I don't.
        
           | ArchOversight wrote:
           | It's interesting that you equate goatse with 4chan! I'm old
           | :-(
        
         | bawolff wrote:
         | Why does it need to be fixed? The mission of wikimedia is to
         | serve educational content.
         | 
         | Edit: this is a bit unfair, if its a specific app they should
         | be convinced to cache just to avoid unfair resource usage, but
         | hotlinking in general should not be seen as a problem
        
           | _joe wrote:
           | Any for-profit entity hotlinking Commons is unfair. Heck,
           | they have the right to redistribute freely the image as they
           | see fit, instead of consuming resources that are a common
           | good.
           | 
           | But this goes beyond that - it's some blind check of internet
           | connectivity for the app, and doesn't get shown to the user.
           | We're pretty sure of that, given that with the amount of
           | noise that task generated, if there was an app featuring that
           | image at least one of the ~ 90M daily "views" would've been
           | someone reading these posts.
           | 
           | Now, given we want to be nice, we didn't just blindly block
           | the traffic, although making requests without user-agent is
           | against our UA policy https://meta.wikimedia.org/wiki/User-
           | Agent_policy
        
           | unreal37 wrote:
           | Presumably they are paying for the servers/bandwidth to
           | support that, and that money is coming from donors.
           | 
           | It's a waste of donors money if someone is using this image
           | as some kind of "is this thing on" test using hacked
           | computers...
        
             | _joe wrote:
             | It's both a waste of donor money and a starvation of
             | resources for people actually consulting images on
             | wikimedia commons.
        
             | arbitrage wrote:
             | i'm sure the revenue model is robust enough to accommodate
             | spikes in traffic.
        
         | walrus01 wrote:
         | This is exactly what I used to do about 17 years ago.
        
         | jessaustin wrote:
         | A red flower rather than a lavender one.
        
         | jobigoud wrote:
         | If it's just used internally by an app to test connectivity as
         | suggested in another subthread, this wouldn't solve the
         | problem.
        
         | Blikkentrekker wrote:
         | I am a conservative man of tradition; fuck these modern liberal
         | commies and their new age ways calling their degenerate
         | diplomatic solutions "enlightenment".
        
         | [deleted]
        
       | macawfish wrote:
       | Tragedy of the commons
        
       | blakesterz wrote:
       | They did figure it out, a popular chat app in India (they won't
       | name yet) fetches the image but does not display it.
       | 
       | https://phabricator.wikimedia.org/T273741#6815828
        
       | owlninja wrote:
       | >>Thank you everyone for the comments and suggestions. I just
       | wanted to share that we have identified the app and will update
       | this task tomorrow. (And yes, it is a mobile app.)
       | 
       | Looks like we will know soon.
        
       | aquadrop wrote:
       | It's not 20% of all requests, it's 20% of media requests to one
       | of the clusters (it's said in the issue description) There are 5
       | clusters.
        
       | newsclues wrote:
       | Example code that gets copy pasted into production app somewhere?
        
         | [deleted]
        
         | [deleted]
        
         | rovr138 wrote:
         | Possibility and linked in a couple places. They found examples
        
       | rburhum wrote:
       | so what is the app? I felt like I read a full blown novel, and
       | the last sentence with the conclusion is missing!
        
         | Liquid_Fire wrote:
         | > we have identified the app and will update this task tomorrow
         | 
         | I guess we will find out tomorrow.
        
       | gertlex wrote:
       | After realizing "wiki[p|m]edia" and "flower" triggered a specific
       | image in my head I was guessing it would be a yellow flower, this
       | one in the corner of https://www.mediawiki.org/wiki/MediaWiki but
       | nope, more interesting than that!
        
         | jakoblorz wrote:
         | Yes OMG! Didn't expect this one
        
         | doovd wrote:
         | Same here dude. Was sad when it turned out NOT to be the yellow
         | flower!
        
       | simonebrunozzi wrote:
       | Here's the flower in question [0].
       | 
       | [0]:
       | https://upload.wikimedia.org/wikipedia/commons/thumb/1/16/As...
        
         | danparsonson wrote:
         | Posting the very link they're trying to reduce traffic on
         | doesn't seem like a very helpful thing to do
        
           | simonebrunozzi wrote:
           | I bet everyone on HN would have wanted to take a look
           | eventually.
        
           | smcl wrote:
           | They were trying to figure out the root cause of a sudden
           | uptick of _millions_ of requests being made for a given image
           | with no user agent or referer, presumably with a view to
           | notifying the app responsible or figuring out a workaround.
           | 
           | A few thousand requests from clearly identifiable as coming
           | from browsers _and_ with a referer header from
           | news.ycombinator.com would not exactly interfere with this
           | and in the grand scheme of things isn 't a huge burden in
           | terms of network traffic.
        
       | astrea wrote:
       | My initial thinking is it's in some flower recognition dataset.
        
         | stevefolta wrote:
         | Or it's India's canonical non-hotdog.
        
       | shp0ngle wrote:
       | According to the comments, it's probably an Indian TikTok clone
       | that checks internet connection by downloading the picture.
        
       | scrps wrote:
       | They didn't mention if the 90M requests were unique, perhaps some
       | app doing background refresh and not caching that image?
        
       | siltpotato wrote:
       | First thing I think of seeing just the headline: something to do
       | with ML datasets?
       | 
       | Now that I've read it: Hmm, never heard of phabricator.
        
       | V-2 wrote:
       | _" Please avoid adding drive-by comments such as "hello from
       | Hacker News" to this task as they are not helpful. Thank you"_
       | 
       | Why would anyone do such stuff is, as usual, beyond me...
       | 
       | PS. "First!"
        
         | tyingq wrote:
         | The funny thing is, the first instance of that in the thread
         | wasn't "hello from hacker news". It was a "hello to hackernews"
         | from an engineer on the WikiMedia team.
         | https://phabricator.wikimedia.org/T273741#6813995
        
           | Dumbdo wrote:
           | And that comment was removed by the author a few minutes ago.
        
         | calibas wrote:
         | 999 out of a 1,000 people know better, but when there's
         | thousands of people...
        
         | vultour wrote:
         | Even more curiously, the person that did that registered an
         | account and put up a profile picture (I assume of himself),
         | just for that comment...
        
       | bpicolo wrote:
       | Huh, I worked on a site with a similar issue in ~2019. A massive
       | flood of traffic for a single site from Indian mobile apps
       | (~15kqps at peak iirc).
       | 
       | I think it ended up being a sort of mobile-based botnet with a
       | bizarre target, which luckily was deduced from some of the
       | headers sent (they all had a random common header).
        
         | smcl wrote:
         | What's "kqps"? It's obviously kilo-{somethings} per second, but
         | I don't know what the {something} would be
         | 
         | edit: queries?
        
           | bpicolo wrote:
           | Yep, queries
        
           | dailypeeker wrote:
           | 15000 queries per second
        
         | downrightmike wrote:
         | Saw a story recently that Indians were bringing down the
         | internet because of sending good morning messages:
         | https://www.wsj.com/articles/the-internet-is-filling-up-beca...
         | 
         | I'd bet that this is the flower of the week for them.
        
           | nso wrote:
           | Says it's been going since last June.
        
       | Codesleuth wrote:
       | Now you just need to draw attention to it further by posting it
       | in hacker news. I'm sure none of us are curious to immediately
       | see the picture.
        
       | Donckele wrote:
       | " Thank you everyone for the comments and suggestions. I just
       | wanted to share that we have identified the app and will update
       | this task tomorrow. (And yes, it is a mobile app.)"
        
       | airhead969 wrote:
       | Those goddamn white Hare Krishnas are in the airport again,
       | handing out flowers no one wants.
       | 
       | Also: What's your vector, Victor?
        
       | kayxspre wrote:
       | I remember that MediaWiki installation allowed the configuration
       | that essentially permits the use of Commons files, albeit in that
       | case, the file will be downloaded and cached in the Wiki's own
       | server [1].
       | 
       | That being said, though the image wasn't hotlinked directly, they
       | expressed concerns of DDOS and the possible costs the Foundation
       | has to incur from each load (they even pointed out that it's
       | "fair and reasonable" to point donation link to them).
       | 
       | I would be interested to see how the licensing issue will be
       | handled, though. The photographer licensed this photo as GFDL/CC
       | BY-SA 3.0 [2], and hotlinking may break the term of these
       | licenses.
       | 
       | 1: https://www.mediawiki.org/wiki/InstantCommons
       | 
       | 2: https://commons.wikimedia.org/wiki/File:AsterNovi-belgii-
       | flo...
        
       | nos4A2 wrote:
       | Perhaps they can replace the image with an obviously wrong image
       | (and smaller in size), and then wait for someone to complain
        
       | bombcar wrote:
       | I had some random images on a web server years ago - and noticed
       | that something like 99% of my traffic was one image - and
       | searching through refers I realized I was the #1 hit on google
       | images for robot attack cat.
       | 
       | Simpler times.
        
         | Tijdreiziger wrote:
         | Can we see the image? :D
        
           | ConcreteGidget wrote:
           | Yes, but only 10 million times.
        
           | bombcar wrote:
           | https://imgur.com/8MMET5V - now there are companies that can
           | host things FOR my servants!
        
             | BlueTemplar wrote:
             | Anyone knows how imgur is able to afford that ?
        
               | kingnothing wrote:
               | ads
        
               | BlueTemplar wrote:
               | Even though ad-blockers are ever more common ? (I didn't
               | even _think_ about ads, since I never saw them on imgur
               | !)
        
               | kingnothing wrote:
               | It's the same way Google pulled in $180B last year and
               | Facebook made $86B.
        
               | kristianp wrote:
               | Low cost CDNs like Cloudflare.
        
         | guessbest wrote:
         | You got to have hotlink protection on when you are hosting
         | memes. I've learned this the hard way, too.
        
           | globular-toast wrote:
           | Remember when some sites would send you a shock image instead
           | of the one you were expecting if it detected hot linking? I
           | don't miss that.
        
             | mcintyre1994 wrote:
             | There's a site that's occasionally posted in comments here
             | which is apparently run by someone who hates HN because
             | they serve some image I can't remember to readers from
             | here.
        
               | alickz wrote:
               | Yeah it was posted not long ago. When it sees HN as the
               | referrer it shows a picture of a testicle in an egg cup.
        
               | [deleted]
        
         | berkes wrote:
         | I had a similar issue. Some 15+ years ago, an image from my
         | blog showed up for people who searched the phrase 'Peanubutter
         | Sex'. The image had nothing to do with peanutbutter nor with
         | sex. My blog is SFW. It was some screenshot of KDE IIRC.
         | 
         | For almost a week it remained the most requested image the post
         | on which it appeared, the most popular.
         | 
         | It did make me uncomfortable, though. Fearing that my rankings
         | would plummet or so.
         | 
         | My takeaway is nothing new: there are weirdo's online.
        
           | kristianp wrote:
           | So where did the connection between "Peanubutter Sex" and
           | your blog come from? Did you ever find out?
        
       | Zaheer wrote:
       | What's the right way to solve for this generally? A CORS policy
       | wouldn't be effective since it's not a browser requesting the
       | image.
        
         | Ayesh wrote:
         | It's likely that the consumer doesn't even display that image.
         | It's probably a dry 1mb download test.
         | 
         | Most indian ISPs, even mobile ones are extremely cheap to not
         | matter a 1mb.
        
         | speedgoose wrote:
         | If the image is not displayed and the stack can handle a lot of
         | tcp connections, perhaps a reverse http slowloris attack : you
         | send the image response headers as slow as you can to keep the
         | tcp connection open but to make the receiver waste its time.
         | 
         | If it's a speed test, they will eventually use another image.
        
       ___________________________________________________________________
       (page generated 2021-02-09 23:01 UTC)