[HN Gopher] 20% of requests for Wikimedia Commons are for one im...
___________________________________________________________________
20% of requests for Wikimedia Commons are for one image of a flower
Author : IfOnlyYouKnew
Score : 1298 points
Date : 2021-02-08 23:53 UTC (23 hours ago)
(HTM) web link (phabricator.wikimedia.org)
(TXT) w3m dump (phabricator.wikimedia.org)
| andi999 wrote:
| Title is misleading. It is 20% of requests to the eqsin cluster
| located at Changi airport.
| tassu wrote:
| It isn't _at_ the airport, that's just the closest airport. The
| WMF names its clusters with the initials of the data center
| vendor and the closest airport's code.
| andi999 wrote:
| True. Hahaha
| mlester wrote:
| anyone check if it's stegnography?
| dane-pgp wrote:
| Nice idea, but if it's getting 90 million requests per day,
| then either there are a lot of people requesting the same
| message (so it's not very secret), or the few people requesting
| it are very forgetful (in that they have to keep re-requesting
| the same message multiple times per day).
|
| I suppose the contents of the image/"message" could change
| every day, but presumably that would be very obvious in the
| edit history of that file[0], unless Wikimedia Commons were
| suppressing the fact that the file is constantly changing. If
| they are part of the conspiracy, though, you'd think they would
| have taken down the task from Phabricator too.
|
| [0] https://commons.wikimedia.org/wiki/File:AsterNovi-belgii-
| flo...
| jacquesm wrote:
| At the height of the browser wars I once woke up to Microsoft
| hotlinking a small button for downloading our software from the
| MSN homepage. I tried to reach someone there for hours but nobody
| cared enough to do something about it. The image was small (no
| more than a few K), but the millions of requests that page got
| were enough to totally kill our server.
|
| Finally, I replaced the image on there with a 'Netscape Now'
| button. Within 15 minutes the matter was resolved.
| rdescartes wrote:
| One of my friends used the same strategy to block DDOS from
| China : just put "Falun Gong" on there and it was resolved
| instantly.
| himinlomax wrote:
| I remember someone doing that with the goatse picture. The
| hotlinker was pissed and all sorts of amusing drama ensued.
| MisterTea wrote:
| That was popular in the early ebay days when you had to
| host your own images. A friend had someone selling similar
| items using his image links. So he changed the images to
| goatse. Problem solved.
| gipp wrote:
| The Tribalwar forums did this to CNN after 9/11, CNN had
| hotlinked one of those images where people were trying to
| pick out "demon faces" in the smoke
| daveslash wrote:
| That was exactly how I learned what goatse was. My MySpace
| page was all decked out with images that I was hotlinking
| from some server... The server owner realized this and
| replaced all the images with Goatse. One day a friend goes
| _" Hey... uh, what's up with your MySpace page... that's
| pretty gross"_. So I went to log in: Goastse. Goastse
| _everywhere_ (gestures with hand). And my eyes were never
| the same again tth_tth
|
| Edit: grammar.
| thaumasiotes wrote:
| > One of my friends used the same strategy to block DDOS from
| China : just put "Falun Gong" on there and it was resolved
| instantly.
|
| ...because attacks from China are horrified at the thought of
| disrupting Falun Gong?
| dspillett wrote:
| Because it is one of the things that will get you added to
| the blocklists that form part of the Great Firewall of
| China.
|
| It won't stop a hacker who is probably bypassing parts of
| that anyway, but the more casual requests such as those
| caused by deep linking will generally stop getting through.
| snoshy wrote:
| The old school response, weaponized without being
| inappropriate.
| rattray wrote:
| That's hilarious!
|
| Did they continue to link to your software after that? (I'm
| curious - what was your software?)
| jacquesm wrote:
| Yes, they did, they actually thought it was quite funny. They
| even cached the actual download once they realized we
| wouldn't be able to deal with that either. The software was
| the first version of the public peer-to-peer webcam software
| I wrote:
|
| http://web.archive.org/web/20000510010712/http://www.camarad.
| ..
| phinnaeus wrote:
| Oh my! This is a blast from the past. I was a kid, probably
| 10 years old or something, and I had a LEGO MovieMaker
| webcam. I was trying to set it up as a sort of
| security/monitoring camera for the back door of the small
| business my parents ran. I remember using this software and
| supposedly getting it working.
|
| I invited my parents to come see what I had done, and
| somehow typed the website wrong and ended up on a spanish-
| language porn site. I could not hit the back button fast
| enough. Possibly one of the most embarrassing memories of
| my childhood.
|
| I have no idea what my parents thought I was up to.
| alickz wrote:
| Haha I know that pain.
|
| When I was a kid I asked my mom to print me out Grand
| Theft Auto cheats from Gamewinners.com while she was in
| work.
|
| Somehow I got the address wrong and she wanted to know
| why I wanted to print out pages and pages from a site
| dedicated to men cheating on their wives. Got there in
| the end though and I still have some of those GTA cheats
| memorised.
| acct776 wrote:
| Your mom might have another family in the Greater Toronto
| Area now, just so you are aware!
| prawn wrote:
| Not sure if you're aware, but it's interesting that you
| mention Lego as the person you're responding to once
| accidentally bought literally tons of bulk Lego and later
| designed an automated Lego sorting machine. It's a fun
| read:
|
| https://jacquesmattheij.com/sorting-two-metric-tons-of-
| lego/
| jacquesm wrote:
| Heh. Hilarious story, thank you! Camarades.com had just
| about everything, from people being born to people dying
| and everything in between. It was a pretty honest
| (sometimes brutally honest) slice of life.
|
| One of the most popular cams for years was an old person
| that was extremely ill and that rarely moved but he had
| pretty big fanclub and he thought it was quite funny that
| he was more famous on what eventually became his deathbed
| than he had ever been while he was still active. After he
| died his family asked to remove all the images and close
| the account which of course we did. Makes you wonder if
| all those people wishing him well over the years kept him
| going a bit longer. What is interesting is that if you
| did this today I'm pretty sure the jerks would drown out
| the nice people by a considerable margin, of course there
| were jerks back then as well, but on the whole the
| internet seemed to be a much nicer place to hang out than
| it is today.
| [deleted]
| macintux wrote:
| My then-wife was watching over my shoulder once as I
| typed something into the address bar. "Freshmeat.net"
| auto-completed, drawing a suspicious look from her.
| xrisk wrote:
| How is this on the Wayback machine?!
| loktarogar wrote:
| You can click "about this capture" for more information
|
| > Starting in 1996, Alexa Internet has been donating
| their crawl data to the Internet Archive. Flowing in
| every day, these data are added to the Wayback Machine
| after an embargo period.
| Thorrez wrote:
| Fun fact: Amazon's home assistant was named after Alexa
| Internet. Amazon owns Alexa Internet.
| phpnode wrote:
| it's not named after it, it's just amazon is so massive
| they have to reuse brand names. AWS has exhausted not
| only the supply of IPv4 addresses but also the supply of
| 3 letter initialisms.
| aasasd wrote:
| Your 'Netscape Now' pic is whopping 1.9 KB--could probably
| be optimized quite a lot, if for some weird reason the GIF
| didn't have jpeg-y artifacts on the background and a ton of
| blur on the text. Basically, you've brought that DoS on
| yourselves.
| DonHopkins wrote:
| As pioneer of "<something> On Internet", do you regret not
| turning out like Russ Hanneman? ;) (OR DID YOU???!)
|
| https://www.youtube.com/watch?v=BzAdXyPYKQo&ab_channel=yate
| 5...
|
| https://silicon-valley.fandom.com/wiki/Russ_Hanneman
|
| I'm just glad I didn't turn out like Erlich Bachman! (OR
| DID I???!)
|
| https://www.reddit.com/r/SiliconValleyHBO/comments/4jmlv9/w
| h...
|
| https://silicon-valley.fandom.com/wiki/Erlich_Bachman
| vidarh wrote:
| I've finally gotten around to watching this series, and
| it's disturbing how many moments I've watched that were
| more familiar than they should have been, and too many
| characters I could instantly put a real name to....
| rattray wrote:
| Beautiful. The internet was a truly different place back
| then...
| jacquesm wrote:
| With 100K visitors / day or so we were in the top 30
| websites world wide in 1998. The really big boosts came
| from the Space Shuttle webcasts and an Yves St. Laurent
| fashion show webcast from Paris.
|
| Hard to believe now, a typical blog post will already
| pick up 30K visitors without too much trouble.
| NetOpWibby wrote:
| I could listen to stories of the Old Net all day.
| jacquesm wrote:
| Enjoy:
|
| https://jacquesmattheij.com/story-behind-wwcom-
| camaradescom/
|
| And apologies for the non-working images.
| gowld wrote:
| Serves you right for hotlinking ;-)
| jacquesm wrote:
| Yes, but at least it was my own domain :)
|
| I didn't see that consequence coming when camarades.com
| shut down. I really should dig up those images and repair
| the blog but the todo list isn't really getting any
| shorter on this end.
| aembleton wrote:
| Back around 2002, I had a pdf icon on my website. It got deep
| linked by a few others but the number one source of traffic
| came from the website of a lawyer who specialised in
| intellectual property. There was something on there about how
| it was illegal to deep link.
|
| I was tempted to replace with goatse but I think I just changed
| it to a screenshot of his website saying that it was illegal to
| deep link.
|
| It soon got changed.
| gumby wrote:
| Even though it's not illegal!
| jacquesm wrote:
| That's a neat example of recursion :)
| failrate wrote:
| We used something like this technique back in the Flash days.
| Sites would straight up steal your games, so one defense was to
| have the game grab its sprites from a server local endpoint.
| Thieving sites would get either no graphics or deliberately
| corrupted graphics.
| ramraj07 wrote:
| I'm in India now, is it possible for me to install some traffic
| snooper and monitor if any wikimedia requests go out? I can then
| install some popular apps and see if anything bites!
| rozab wrote:
| This page someone noticed is very interesting.
|
| https://newshimalaya.com/2021/02/09/%E2%9A%93-t273741-invest...
|
| I was sure I'd seen this website before, and sure enough, it's
| scraping and rehosting almost everything that's posted on HN...
| bottled_poe wrote:
| Nothing unusual here, just run of the mill online copyright
| violation.
| BlueTemplar wrote:
| Yeah, here's the unusual one :
|
| http://n-gate.com/
| ip26 wrote:
| 145kB for a connectivity check, ouch. This is a poster child for
| why many apps guzzle so much data.
|
| (On a 500MB/mo plan you start noticing)
| eyelidlessness wrote:
| You gotta respect the suggested approach to take preventive
| measures by banning requests to this individual image without a
| User Agent header _and_ to try to identify who might be affected.
| I'm sure I'm not the only one here who would just treat it as
| abuse and ban without followup.
| dr_dshiv wrote:
| This happening now! Some suspect a failure in a CV training
| pipeline. Others suggest an extremely popular app with a
| hotlinked image.
| magicalhippo wrote:
| If it was my site I'd replace the image with goatse and see who
| complains, but I guess that's a bit drastic for Wikimedia.
| waheoo wrote:
| I'd give them a pass.
| ISL wrote:
| Even just serving a giant blinking red X gif to 10^-6 of the
| requests might be sufficient.
|
| Only 10^-6 of the "legitimate" requests would be affected,
| but a whole lot of the "undesireable" requests would see
| it...
| viraptor wrote:
| Serving a giant blinking red gif to unsuspecting internet
| users is a bad idea.
| https://en.wikipedia.org/wiki/Photosensitive_epilepsy
|
| There was a better idea posted in comments - serve a
| picture with a very short explanation and an email to
| contact.
| EE84M3i wrote:
| There is no reason it needs to blink quickly.
| Taniwha wrote:
| But he's suggesting that they only be served to 10^-6
| people - that's one 1,000,000th of a person - I suspect
| it will have little effect
| hinkley wrote:
| Tarpit the image and it will take care of itself.
|
| Same advice I gave a w3c.org admin who was lamenting how much
| traffic people generate by not caching xml schemas. Yes, you
| have to serve the requests. But you don't have to try to
| serve them in 100 ms. If a human is on the other end, 1-2
| seconds is just fine. If a human is not, then the human will
| surely notice when their batch process goes from 3 minutes to
| 10 minutes because it fetches the same schema 200 times.
| rictic wrote:
| Could you end up executing a slow loris style attack on
| yourself by doing this?
|
| I guess a couple seconds won't matter unless the server is
| already redlining it and the tarpitted traffic is a small
| proportion.
| toast0 wrote:
| Yes, but slowloris isn't really a big deal if you've got
| a modern http(s) server with async i/o. It costs nearly
| nothing to have a idle connection while waiting 3 seconds
| before sendfiling the schema xml.
| mekkkkkk wrote:
| Can you not run out of sockets though? I know it used to
| be a thing anyway. Maybe it's handled somehow nowadays.
| toast0 wrote:
| You can run out of sockets, but that's easy to tune. I
| don't know the limits on other systems, but FeeeBSD lets
| you set the maximum up to Physical Pages / 4 with just
| boot time setings. So about 1 million sockets per 16 GB
| of ram.
|
| Worst case, if you start running out of sockets because
| you're sleeping, sample the socket count once a second
| and adjust sleep time to avoid hitting the cap. Also, you
| could use that sampling to drive decisions about keeping
| http sockets open or closed.
|
| I should add, select on millions of sockets is going to
| suck; so you'll need kqueue/epoll/whatever your kernel
| select but better interface is.
| hinkley wrote:
| Well any time you start yanking levers and spinning dials
| you'd better know where the breaking points in your
| system are.
|
| If you care about the traffic because you're already
| having trouble with that many simultaneous requests, then
| you are definitely not going to solve that problem by
| increasing the response time by a factor of 10.
|
| But an important property of reverse proxies is that once
| the proxy sees the last byte of the response, the
| originating server is no longer involved in the
| transaction. The proxy server is stuck ferrying bits over
| a slow connection, and hopefully is designed for that
| sort of work load. If the payload is a static file, as it
| is in both of these cases, then it should be cheap for
| the server to retrieve them.
| Clewza313 wrote:
| It can't be a training pipeline, because the IPs are all around
| India.
|
| Sample code from Stack Overflow being used by some major app is
| the most likely candidate. It's also possible that the image
| fetch call is a vestigial appendix that doesn't even display
| the image, which will make tracking this down extra
| challenging.
| eli wrote:
| Perhaps a very inefficient "check if we have working internet
| access" routine.
| rootw0rm wrote:
| i hate it when vestigial appendices go awry. pretty sure the
| evidence is mounting, however
| m3kw9 wrote:
| Maybe some sort of social network using it as the default profile
| pic and isn't caching
| [deleted]
| [deleted]
| blunte wrote:
| And now thanks to this HN post, 21% of requests are for that same
| flower!
| Mattwmaster58 wrote:
| I was curious what the actual amount of requests HN would have
| to muster, and with a frequency of 90,000,000 reqs/day, HN
| would need to hit it with 4,500,000 requests.
| HotVector wrote:
| HN ain't that big
| mitchs wrote:
| Heh, a popular consumer electronics product a room mate worked on
| shipped an update that used example.com as a connectivity test.
| Apparently they were on pace to rack up $20k/month in server
| costs. At least their user agent made it obvious who to contact.
| fariss wrote:
| can this be some sort of botnet checking whether a host is
| connected to the internet or not?
| andrewmatte wrote:
| 90M requests daily from India? I wonder if KaiOS is checking
| whether it's got internet access.
| batch12 wrote:
| Superficial reversing shows that the ravn app mentioned,
| com.app.rcn may use the file as part of a speedtest:
|
| com.app.rcn/smali/com/app/rcn/utils/InternetSpeedCalculator.smali
| : "hxxps://upload.wikimedia.org/wikipedia/commons/1/16/AsterNovi-
| belgii-flower-1mb.jpg"
|
| edit: defanged the link to maybe save the wikimedia team some
| bytes
| [deleted]
| wiz21c wrote:
| FTB (from the bug) :
|
| You could even serve another image in its place to this UA, with
| some text and an email address to contact. You'd probably find
| out pretty quickly what it is from users of that mysterious
| thing. A throwaway email address is probably best
|
| Really good idea :-)
| crazygringo wrote:
| I'll just be looking forward to the follow-up post on HN
| announcing when they figure out what the culprit was!
|
| Per the comments, right now the top suspect seems to be the app
| "Josh" or another TikTok clone because of how traffic surged
| immediately after the TikTon ban:
|
| https://twitter.com/bwaber/status/1358915338637873154
| IfOnlyYouKnew wrote:
| Wikimedia is unique in running some of the most popular websites
| with open access to almost all systems. As someone who has never
| been on the inside of FAANG, I found it rather interesting to
| browse around the backend infrastructure.
|
| See, for example, their statistics at
| https://grafana.wikimedia.org/d/000000102/production-logging...
| dalbasal wrote:
| In interviews with Jimmy Wales, he seems somewhat regretful of
| not having made Wikipedia a for-profit. At the least, he's
| fairly adamant that Wikipedia could have been Wikipedia as a
| for profit.
|
| The way he structured wikipedia, from back-end infrastructure
| to ownership/governance structure was just the logical way of
| doing the project. Times were different. Online culture was
| different.
|
| I don't want to overinterpret the man, or put words in his
| mouth... but... I got the impression that Wales thinks that if
| he was starting Wikipedia now, he'd just do it asd a startup
| and also succeed.
|
| To me, this is almost sad. Besides being an awesome
| encyclopedia, wikipedia is existence proof for something of
| scale outside the norm. Something that isn't a corporation. A
| lot of things are deterministic to the structure of an
| organization.
|
| For example, take the current postpostmodern war over truth and
| stuff: platforming/deplatforming, freedom of speech,
| censorship, bias, manipulation, narrative = power issues, etc.
| Wikipedia is at the very centre all these problems. Whatever
| difficulties Twitter is experiencing should be 100X worse for
| wikipedia. Meanwhile, Wikipedia is withstanding far better, and
| with far more integrity. I don't think this is a coincidence.
|
| Dunking on wikipedia's budget/spending is popular. Meanwhile,
| Wikipedia uses <1% of the resources/budget of Twitter. They are
| operating @ >100X efficiency compared to a realistic for-profit
| equivalent. That's a flying shuttle.
|
| We know that Wikipedia, Linux & The Worldwide Web are possible
| because they exist. We literally wouldn't know otherwise.
| Theory couldn't have gotten us to this knowledge. Each is
| existence proof for other ways of doing things. They aren't
| necessarily roadmaps, but I'm a big believer in existence
| proofs. What Jimmy made is 100X better, more important and non-
| inevtiable than what Zuck made. The thought that he wants to be
| Zuck bums me out.
| BlueTemplar wrote:
| Yeah, the Web was quite impressive (though we already had the
| Minitel), but it was Wikipedia that _really_ blew my mind
| (even though we already had Encarta). (In fact I consider
| Wikipedia to be the Web 's "killer app", even more than
| Google and other search engines were.)
| dalbasal wrote:
| Out of all the "killer apps" for the web... wikipedia is
| the one that implement the www most faithfully. Hypertext
| articles. Most apps got the web to do x. Wikipedia is what
| it was made to do.
| BlueTemplar wrote:
| Yeah. I was about to add that it pretty much has been Tim
| Berners-Lees vision coming to fruition, but the fact that
| Wikipedia is centralized has stopped me. But then isn't
| the Web itself technically 'centralized' on the Internet
| ? And isn't Wikipedia a _great_ example of pseudonymous
| strangers (= social decentralization) collaborating with
| each other ?
| nolok wrote:
| It would succeed the same way Quora does. Much less open,
| much less universal, much more user hostile, with an almost
| agressive way to deal with unlogged user.
|
| In terms of financial and organisational success it would
| probably largely beat what it is now. It terms of benefit to
| humanity, it would be much worse.
|
| Company + for profit + laws means access to information has
| to be much more tailored to the laws of each place. "Let's
| remove tianamen's article or lose your chinese license" kind
| of things.
|
| I'm for one am glad for the current wikipedia we have,
| despite it's numerous flaws. I still donate every year,
| although I wish Wales could stop having it spend its money
| the same a startup or FAANG does.
| dalbasal wrote:
| That's one option, though I wouldn't necessarily use Quora
| as a mainline example. They're kind of a $gme for rich
| people. I think highly enough of Jimmy to bet on him doing
| a much better job than that.
|
| Stackoverflow is a decent example. Very capable founding
| team. They explicitly tried to be like a commercial
| wikimedia. They do embrace quite a lot of openness, notably
| creative commons... learning from wikimedia successes.
|
| RE " _I wish Wales would:_ " Another consequence for how
| wikipedia is structured is that Wales isn't the Zuckerberg
| of Wikimedia. Power is a lot more dispersed.
|
| RE spending/flaws and such: I feel like wikimedia is held
| to an extremely unfair standard. Who/what should we compare
| them to?
|
| Wikimedia spend $70m per year. This is probably less than
| Quora _or_ stackexchange. FB & Twitter (IMO more
| comparable in terms of scale/importance) spend $55bn &
| $3bn. Twitter spends 45X more than Wikimedia. Facebook
| spends almost 1,000X compared to Wikimedia. The bang-for-
| buck is insane.
|
| Also in terms of flaws in rules/judgement calls. A lot of
| people are highly critical of wikipedia's "deletionism"
| related MOs. What articles/edits stay in. How good the
| rules & procedures are for this. What "camp" has power, and
| how they treat the other camp. I get that this stuff is
| contentious.
|
| Meanwhile on Twitter or Facebook, the rule is "I decide." "
| _But it gets us clicks_ " is the killer argument. Nothing
| is transparent. Wikimedia is doing a much better job,
| respecting user & editor rights far more, being a lot less
| self righteous. Of course it's not perfect, but come on.
| The "norm" is Facebook's content policy, Twitter's safety
| department, or Apple's App store approval room. Wikimedia
| is the _one_ example of being better than that... and for
| that everyone is always yelling at them.
| Vinnl wrote:
| I can also imagine that he'd say that just because it makes
| him look/feel better, i.e. it's more of a sacrifice if he
| gave it for free while he also could've been a billionaire,
| than if this was the only way Wikipedia could ever have been
| a success.
|
| Then again, WikiTribune _was_ a for-profit.
| np_tedious wrote:
| I have, and this is is still fascinating. Got any more links
| you'd suggest?
| tassu wrote:
| https://media.ccc.de/v/36c3-73-infrastructure-of-wikipedia
| ShakataGaNai wrote:
| Wikimedia's infrastructure is radically different than most
| FAANG.
|
| In large part because 99% (+/-) of their traffic is read only.
| While Facebook and Google have to do heavy workloads for every
| click and action taken on their services, Wikimedia can cache
| basically everything. Allowing them to operate on a tiny
| fraction of the number of machines (and infrastructure) that
| the rest of the players do.
| anang wrote:
| Wikimedia also has less incentive/drive to meticulously track
| every interaction on their pages. The level of tracking
| present on Facebook and Google has to be extremely
| computationally intensive.
| dalbasal wrote:
| I agree. Another (no contradiction) way of looking at this is
| that Wikimedia infrastructure is radically different because
| Wikimedia is radically different.
|
| They need it to be a certain way in order to operate. The
| limitations and advantages of how software gets made. Why it
| gets made. The way the software works. How and why product
| decisions were made over the last 2 decades. What resources
| they have/had available. It's all a totally different game.
| Not surprising that different soil and a different climate
| grow different plants.
|
| One of Google's early coup d'etats, when they were a
| strategic step ahead of the boomers, was bankrolling gmail,
| youtube and such. Gmail offered free giant inboxes. They got
| all the customers. This cost billions (maybe 100s of
| millions), but storage costs go down every year while the
| value of ads/data/lock-in and such go up every year. Similar
| logic for youtube. (1) Buy a leading video-sharing site;
| (2)bankroll HD streaming because you have the deepest pockets
| (3) Own online free TV entirely.
|
| That's who Google is, good or bad. How funding works. What
| products get built. What infrastructure is necessary,
| possible, affordable. All interlinked. Wikipedia & Google
| were founded at the same time. Within 5 years (circa 2006)
| Google was buying charters and fiefdoms. Wikimedia,
| meanwhile, was starting to take flak for raising 3 or 4
| million in donations.
|
| It's kinda crazy that Wikipedia is comparable in scale to
| FAANGs when you consider these disparities.
| MaxBarraclough wrote:
| Does it hurt their caching if you browse Wikipedia when
| signed in?
|
| I recall reading HackerNews used to have that problem, unsure
| if it still does.
| tim333 wrote:
| Looking at the source of a Wikipedia page it has my
| username appearing 6 times so I guess it must reduce
| caching a bit. Though I guess they could cache the user
| info bits and the rest of the page and just splice them
| together.
| _joe wrote:
| This is indeed correct. Wikimedia overall uses less than 2000
| bare-metal servers, so yes the infrastructure is tiny
| compared to those.
|
| What can be interesting, I think, is that you have a
| completely open infrastructure that has to solve problems on
| a global traffic scale.
|
| If people are interested in knowing more, I suggest you also
| take a peek at the wikimedia techblog, specifically to the
| SRE category https://techblog.wikimedia.org/category/site-
| reliability-eng... and the performance one
| https://techblog.wikimedia.org/category/performance/
| nostrademons wrote:
| Search is also largely read-only. The advantage Wikipedia has
| is that its traffic overwhelming goes to the head of the page
| distribution, so simple caching solutions work very well.
| Google has a pretty extreme long-tail distribution (~15% of
| daily queries have never been seen before), and so needs to
| do a lot of computation per query.
| zelon88 wrote:
| > Google has a pretty extreme long-tail distribution (~15%
| of daily queries have never been seen before)
|
| Do you have a source for this?
|
| I'd be willing to bet that the ONLY reason why 15% of their
| daily queries "haven't been seen before" is because they
| add un-needed complexity like fingerprinting. You're making
| it seem like they've never seen a query for "cute animals"
| before when obviously they have. They choose to do a lot of
| extra leg work because of who you are.
|
| So your claim that 15% of their queries have "never been
| seen before" is probably inaccurate. I'd be willing to bet
| that "15% of their queries are unique because of the user,
| location, or other external factor separate from the query
| itself."
|
| They've seen your query before. They've just never seen
| _you_ make this query from this device on this side of town
| before.
| wwwwewwww wrote:
| It's somewhat analogous to the claim that almost every
| spoken sentence had never been spoken before in the
| history of language.
| Rastonbury wrote:
| Well, you'd be wrong
| zelon88 wrote:
| Meh, it happens.
| tylerhou wrote:
| If you took into account user, location, etc. 15% seems
| too low. I almost never search for the exact same thing
| twice in the same location.
|
| 15% of the queries themselves are unique.
| https://blog.google/products/search/our-latest-quality-
| impro...
|
| https://www.google.com/search/howsearchworks/responses/
|
| I work for Google (and used to work on Search).
| IgorPartola wrote:
| The point is not that. It's that when you search for
| "cute animals", Google shouldn't be storing that _you_
| searched for that, or even care. Your location is
| arguably potentially relevant but it could be coarse
| enough except when searching for directions to allow at
| least some caching.
| ma2rten wrote:
| Right, you can cache that query. That doesn't mean that
| you can cache "two bunnies playing in the snow r/aww
| reddit".
| tylerhou wrote:
| This is right on the money -- getting search results for
| queries that are too personalized to e.g. location means
| that you can't cache those search results (or if you did
| cache them, their entries would be useless).
| edmundsauto wrote:
| Hey Igor! Hate to be a bore, but I wanted to provide
| feedback that your comment may unintentionally come
| across as aggressive. OP has pretty relevant work
| experience that I know I'd love to hear more about, but
| there's not really any room for them to respond.
|
| I know many folks IRL who work at big tech who have no
| interest in posting here because the community comes off
| as very unwelcoming. That's a shame, because they have
| insight that would be great to hear. Regardless of
| anyone's opinion of their employer.
|
| Apologies in advance if your intent was purely about the
| topic. I just thought I read something in your tone that
| might hinder discourse rather than encourage it. I wanted
| to point it out, in case it was unintentional.
| WanderPanda wrote:
| To me Igors comment is also displaced. He injects
| activism into a technical discussion (sadly happens very
| often here on HN). We all know by now that the bigcorps
| are to a large degree based on data collection. We do not
| need to be reminded about it each and every day. We are
| adults, if we don't like it we use alternatives.
| edmundsauto wrote:
| Yeah, this is a fair point. My larger point was mostly
| that HN misses out on some valuable comments by insiders
| because those people are disincentives by some of the
| rhetoric and tone when an article on big tech is popular.
| I didn't think the comment I replied to was particularly
| aggressive - it was just something that came to mind when
| I read it. OP was actually very kind and constructive in
| their response - a good ending and constructive
| discussion for us all!
| tylerhou wrote:
| Agreed about the tone. The comment could have been less
| argumentative -- instead of "that's not the point," they
| could have said "that's not the only reason."
|
| On the other hand, if I'm not responding, it's not
| because I find HN too abrasive -- it's because I am
| afraid of leaking non-public information. That's why
| whenever I talk about Google, I try to cite a Google blog
| post or other authoritative source, or talk about my own
| personal experience; hence, "I rarely search for the same
| query twice."
| IgorPartola wrote:
| I apologize for the tone. It the start of my comment was
| clumsily wired and it wasn't my intention to have it come
| off as argumentative. The way I read the GP comment to
| mine was talking about how Google's tracking of its
| users' telemetry was what was contributing to the
| uniqueness of requests. Your comment to me boiled down to
| the fact that of course most requests are unique because
| of tracking location data and the user account. There
| seemed to be a disconnect because your comment took for
| granted that user location and account were a part of the
| search query while the person you were replying to
| specifically challenged that notion (again in my reading
| of both). I tried to post a concise bridge between the
| two concepts, and of course we all see how well I did
| with that :)
|
| Having said that, I do think this is clearly a sensitive
| issues, not a purely technical one. I can appreciate the
| nuance of working for Google and doing excellent work
| while seeing the company criticized left and right for
| its business model. I think given the community, while
| there is opposition to how Google may at certain points
| conduct itself as a corporation, there is no lack of
| respect for any individual working there. I certainly
| view my comment and the discussion of privacy as having
| 50% to do with Google's strategy and 50% to do with the
| technical aspects of whether you can build a search
| engine that holds user privacy as a core priority rather
| than trying to launch an ad hominem on you or anyone. And
| I saw your other comment that agreed with me and the GP
| comment so I think my first sentence aside, we are on the
| same page :)
| rectang wrote:
| Thanks for responding constructively, Igor.
| rStar wrote:
| I'm gonna have to disagree with the negative comments
| above concerning Igors tone. He made his point with clear
| respectful language that I would be happy to entertain at
| work, at the bar, at worship or while on a (previous to
| covid) group run or golf outing. so, to me, it looks like
| instead of an 'agree to disagree' while respecting each
| other, you disrespect igor by dismissing his arguments
| due to his tone, which handily allows you to ignore his
| content, such as it is. Therefore, in my judgement, you
| guys are being unfair to Igor while also being
| disingenuous about your reason for policing his tone.
| tylerhou wrote:
| > disrespect igor by dismissing his arguments due to his
| tone, which handily allows you to ignore his content,
| such as it is
|
| I didn't dismiss his argument; I said that he was correct
| right after he posted:
| https://news.ycombinator.com/item?id=26073488
|
| "That's not the point" can be interpreted as respectful,
| but it also can be interpreted as argumentative. I chose
| to assume good intentions, but I offered a different
| phrasing that would have a higher chance of not being
| misinterpreted: i.e. using "yes and" instead of "no but":
| https://www.theheretic.org/2017/yes-and-vs-no-but/
| csharptwdec19 wrote:
| I understand. To be a googler you have to be really good
| at smooth talking.
|
| And when you do this on a public form it just highlights
| that the company you work for makes sure as many
| employees as possible are serving kool-aid to the masses.
| [deleted]
| dextralt wrote:
| is this satire?
| zelon88 wrote:
| I'd be interested in seeing how polluted that 15% of new
| queries is with people blasting malformed URLs or FQDNs
| into the omnibox of Chrome.
| Kiro wrote:
| What's so unbelievable about 15%? I personally think it
| is way lower than I expected. We're clearly not googling
| in the same way.
| simias wrote:
| I agree with you. Also in my experience less tech-savvy
| people tend to overcomplicate their queries instead of
| just entering the relevant keywords which I'm sure
| accounts for many uniques.
| jobigoud wrote:
| I don't understand how you compute that estimate.
|
| I doubt you store the history of all searches ever?
| People don't need a google account to query the engine,
| others disable history, etc.
|
| Are you saying you still have all searches ever made
| ever? Because you would need this to say a query hasn't
| been made before wouldn't you?
| simias wrote:
| I don't know how they did it but I suspect that it
| wouldn't be very hard to model the distribution by
| sampling a few million queries and extrapolate from that.
| freeone3000 wrote:
| Why would you not store every search ever? It's only a
| few petabytes, and you can find out all sorts of useful
| info from it.
| melq wrote:
| You'd only need to store the list of unique searches, but
| even if that's true and the 15% number is true, that must
| be a huge amount of data.
| ma2rten wrote:
| I think you mean "15% seems too high". Any easy way to
| think about this is the following: even if search the
| entire internet you will almost never see the same
| sentence twice, assuming it's has a certain number of
| words. There is a combinatorial explosion in possible
| sentences to write. Search queries are essentially just
| sentences without stopwords.
| djhn wrote:
| Removing stop words is what old school users of IT
| systems do, because that's what we learned worked best at
| the time.
|
| Internet users who came online later, from GenZ to many
| boomers, will often just write conversational sentences
| and questions.
| bagels wrote:
| https://blog.google/products/search/our-latest-quality-
| impro...
|
| "There are trillions of searches on Google every year. In
| fact, 15 percent of searches we see every day are new"
| zepto wrote:
| It would still be helpful to know what 'new' means.
|
| Does it mean literally the text string typed into the box
| by the user is new?
|
| Or does it mean the text string combined with a bunch of
| other inferred parameters we don't know about is new?
| jobigoud wrote:
| New for the day or new for the history of the engine?
| rfoo wrote:
| > So your claim that 15% of their queries have "never
| been seen before" is probably inaccurate.
|
| I'm not sure, on my productive days maybe >50% of my
| Google searches are not very cachable. (for example, I
| just googled "htop namespace", "htop novel bytes", "htop
| pss", "htop nightly build ubuntu 14.04")
| wolfd wrote:
| https://blog.google/products/search/our-latest-quality-
| impro...
|
| They briefly mention the statistic in the last paragraph.
| rntksi wrote:
| I think GP post has a point. I've noticed people use
| Google really differently from how I do. E.g. I would go
| search for "figure concave" while my brother would search
| a longer phrase.
|
| Also, speaking of people behaviour, it would not make
| sense to search everyday for "cute animals", but the
| volume of searches done for new things people discover as
| they get older would make more sense. I mean just look at
| search trends for things like "hydroxychloroquine" for
| example (and that's not to mention people who get it
| wrong, i.e. other factors for differing search queries
| too)
|
| Also, other languages can change the queries depending on
| how you phrase the sentence too. Add to that the people
| using other ways to search instead of just visiting
| google.com and I think you can get pretty close to 10%.
|
| If fingerprinting is the reason, 15% would be a figure
| too low I surmise. Would that be the case I think that
| would make probably 20-25% of searches rather than 15%.
|
| It could very well be that they do classify fingerprinted
| search differently only in some countries and not others?
| That would/might explain the 15% figure.
|
| I might be wrong and under-estimated fingerprinting
| techniques for Google. If they have really good
| fingerprinting techniques, that would reduce the estimate
| I have in mind to a better number (close to 15, maybe?)
| zelon88 wrote:
| So consider your hydroxychloroquine example again this
| way;
|
| Nobody has _ever_ searched for hydroxychloroquine before
| today. Today is the day the word is hypothetically
| invented. Today 2 million people will search for
| hydroxychloroquine. But only one of them was the first to
| do it.
|
| What I know about pop-culture and viral internet culture
| is telling me that 15% of 1 trillion searches being
| unique is shady math.
|
| So I am not fully convinced that the 15% claim is
| completely transparent.
| no_way wrote:
| It's a guess, but my thinking is that previously most
| people who searched term hydroxychloroquine were mainly
| scientists and other people related to that not your
| general population. Suddenly covid happens and now large
| numbers of people learn about this new drug they never
| heard before, they are gonna search, and I presume this,
| most wildly different things like: "how does it work?"
| "does it cause some disease?" " _insert something
| political here about hydroxychloroquine_ " "did aliens
| make hydroxychloroquine?" and many more things I lack
| imagination to come up with and that's only about
| hydroxychloroquine. I doubt 15% number is about single
| word cases, but more about combination of words and that
| seems reasonable. Inventing new words daily seems
| unlikely, chaining them on the other hand seems
| plausible.
| nostrademons wrote:
| The vast majority of people don't search for
| [hydroxychloroquine]. They search for [Is
| hydroxychloroquine effective in treating COVID-19?] or
| [What is the first drug that was approved to treat
| COVID-19?] or [What methods do we currently have to treat
| COVID-19?]. You can see these on the search results page
| as the "Common questions related to..." widget. How else
| do you think Google gets that data?
|
| The folks who use keyword-based searches are largely
| those who got on the Internet before ~2007. Tech-savvy,
| relatively well-off, usually Millenial or Gen-X, plugged
| into trends. This happens to be the demographic dominant
| at Hacker News. But there's a much larger demographic who
| just types in whatever they're thinking of, in natural
| language, and expects to get answers.
|
| Come to think of it, this is also the demographic that
| doesn't use tabbed browsing, and uses whichever browser
| ships with their OEM, and often doesn't realize that
| there's a separate program called a "browser" running
| when they click on the "Internet", and issues a Google
| Search for [google] (#3 query in 2010) when they want to
| get to Google even though they're on Google already but
| don't realize it, and doesn't know what a URL is. When a
| big-tech company makes a brain-dead usability decision
| you don't like, first consider how that usability choice
| might appear to your grandmother and it might not seem so
| brain-dead.
| Retric wrote:
| You can do quite a bit of processing per page load without
| issue. Facebook and Google just take it rather past that
| point into near absurdity, while still being highly
| profitable.
| cmckn wrote:
| Every request at FB is handled in a new container. This
| isn't absurd, it's actually pretty neat :)
|
| Edit: I don't know what I'm talking about. Happy Monday!
| rachelbythebay wrote:
| What? Are you calling the context of a HHVM request a
| container just to confuse people?
|
| Also, there's way more than just the web tier out there.
| cmckn wrote:
| Wasn't my intention to confuse, just repeating something
| I've been told by FB folks.
|
| Everyone, please listen to Rachel and never ever me.
| robmurrer wrote:
| is not neat... is freakish
| ROARosen wrote:
| Wow that sounds interesting, does anyone know if this is
| true?
| wilsonthewhale wrote:
| I'm not on the team that handles this, but I highly doubt
| that this is the case.
| ianlevesque wrote:
| Yeah why do they keep spending billions to build new
| datacenters when they could just stop being absurd instead?
|
| The contempt on here is crazy sometimes.
| MereInterest wrote:
| I don't think that Facebook/Google developers are foolish
| or incompetent. That would be contempt. Instead, I think
| that Facebook and Google as conglomerate entities are
| fundamentally opposed to my right to privacy. That they
| make decisions to rationally follow their self-interest
| does not excuse the absurd lengths to which they go to
| stalk the general population's activities.
| ryanianian wrote:
| > I don't think that Facebook/Google developers are
| foolish or incompetent.
|
| Nobody in this thread is saying that. Parent to you said:
|
| > they could just stop being absurd instead [of building
| more DCs]
|
| implying FB could build fewer DCs by scaling down some of
| their per-page complexity/"absurdity". Basically saying
| their needs are artificial or borne of requirements that
| aren't.
|
| > conglomerate entities are fundamentally opposed to my
| right to privacy
|
| That's a common view, but it's not on topic to this
| thread. This thread is mostly about the tech itself and
| how WikiMedia scales versus how the bigger techs scale.
| It has an interesting diversion into some of the reasons
| why their scaling needs are different.
|
| You could instead continue the thread stating that they
| could save a lot of money and complexity while also
| tearing down some of their reputation for being slow and
| privacy-hostile by removing some of the very features
| these DCs support (perhaps) without ruining the net
| bottom line.
|
| This continues the thread and allows the conversation to
| continue to what the ROI actually is on the sort of
| complexity that benefits the company but not the user.
| Retric wrote:
| I was the one saying absurdity and I think you're missing
| the context. Work out how much processing power is worth
| even just another 1 cent per thousand page loads and
| perfectly rational behavior starts to look crazy to the
| little guys.
|
| Let's suppose the Facebook cluster spends the equivalent
| of 1 full second of 1 full CPU core per request. That's a
| lot of processing power and for most small scale
| architectures likely adding wildly unacceptable latency
| per page load. Further, as small scale traffic is very
| spiky even low traffic sites would be expensive to host
| making it a ludicrous amount of processing power.
|
| However, Google has enough traffic to smooth things out,
| it's splitting that across multiple of computers and much
| of it is after the request so latency isn't an issue, and
| it isn't paying retail so processing power is little more
| than just hardware costs and electricity. Estimate the
| rough order of magnitude their paying for 1 second of 1
| core per request and it's cheap enough to be a rounding
| error.
| bo1024 wrote:
| The idea of marginal value/marginal cost is that
| companies will generally continue spending one billion
| dollars to add size and complexity, as long as they get
| back a bit more than a billion dollars in revenue.
|
| So it wouldn't necessarily be contradictory if most of
| their core functionality could be replicated very simply,
| yet the actual product is immensely complicated. I forget
| where I first read this point, but probably on HN.
| civilized wrote:
| Or maybe you're just reading too much into "absurd" which
| can just be a colorful word for "an extremely huge
| amount"
| throwaway3699 wrote:
| To be fair, there's a bit of a combinatoric effect of scale
| * features going on there. I'm sure you could build most of
| a Facebook equiv. 100x-1000x cheaper if it only served one
| city instead of the whole planet.
| klodolph wrote:
| The effects of scale are less combinatoric than you might
| think. Most people on my Facebook feed are from the same
| city anyway, even though Facebook is global.
| erichurkman wrote:
| The effects and scale of sales (ads) are very
| combinatoric, though.
| ashtonkem wrote:
| They also have looser latency SLAs. The only hard requirement
| is that a user can read back their own writes, but it's okay
| if other users are served stale data for a few seconds or
| minutes even. This makes cache invalidation, one of the most
| notoriously difficult and expensive operations at large
| scale, much much easier.
| nostrademons wrote:
| Facebook also has a similar SLA. I've heard that at one
| point in their architecture (~2010), they literally stored
| the user's own writes in memcached and then merged them
| back into the page when rendered. _You_ would see a page
| consistent with your actions, but if you logged into
| Facebook as any of your friends your updates might not show
| up until replication lag passed.
| dash2 wrote:
| Interesting that this sounds very similar to how
| multiplayer games do it.
| mackman wrote:
| Close, IIRC we cached the fact you had just done a write,
| and a subsequent read request that arrived on the replica
| region was then proxied to the primary region instead of
| serviced locally.
| IgorPartola wrote:
| Pretty clever. Is that still how it works?
| glittershark wrote:
| Pretty sure this paper describes what they're doing now:
| https://research.fb.com/publications/flighttracker-
| consisten...
| mackman wrote:
| I'm not sure if FlightTracker completely replaced the
| need for the internal consistency inside Tao. You can
| read about that here: https://www.usenix.org/system/files
| /conference/atc13/atc13-b...
| [deleted]
| eismcc wrote:
| Dirty bits at scale
| gogopuppygogo wrote:
| I'm guessing this is ecc memory so likely correcting for
| bad data.
| Hackbraten wrote:
| I think they meant dirty bit as in "a flag that means
| update needed," not as in "bit flipped due to glitch."
| cranekam wrote:
| My memory is fuzzy now but this dates back to when there
| were only two datacenter regions and one of them held all
| the primary DBs (2011 or so). All write endpoints were
| served in that region, so if a user routed to the
| secondary region did a write the request was proxied to
| the primary region. After doing a write a cookie was set
| for the user in question which caused any future reads to
| be proxied to the primary region for a few seconds while
| the DB replication stream (upon which cache invalidation
| was piggybacked) caught up, because if they went to the
| secondary region memcached was now stale.
|
| It hasn't been this way since around 2013 but again I am
| fuzzy on how. I think that's when most such data was
| switched to TAO, which has local read what you wrote
| consistency. As long as users landed in the same cluster
| (and thus TAO cluster) what they wrote was visible to
| them, even if the DB write hadn't yet replicated to their
| region.
|
| FlightTracker postdates my time at FB (ended 2018ish) so
| I'm not sure how that is used. These systems evolved a
| lot over time as requirements changed.
|
| I don't remember anything about writes being batched in
| memcached and merged in on page load.
| astrea wrote:
| I wonder how much additional traffic this investigation brought
| said image.
| mark-r wrote:
| Reminds me of the time Netgear routers were hardcoded with the IP
| address of a NTP server at the University of Wisconsin.
| https://en.wikipedia.org/wiki/NTP_server_misuse_and_abuse#Ne...
| ct520 wrote:
| Well I hope they implement some good caching
| dmurray wrote:
| 20% of Wikipedia Commons requests to their Singapore servers
| (EQSIN), not globally. That's still a lot, of course.
| reaperducer wrote:
| 90,000,000 requests a day. That's some flower.
| labster wrote:
| Replace it with an image saying: "If everyone who sees this
| flower donated 100 rupees to Wikipedia, this fundraiser would
| be over in 6 hours."
| thotsBgone wrote:
| Unfortunately, the image was never displayed by the app
| that was downloading it as an internet test.
| HotVector wrote:
| Forward to all your indian uncles
| Ayesh wrote:
| On whatsapp
| [deleted]
| gumby wrote:
| Given that they are coming from India it could be 90M single
| daily requests!
| ceph_ wrote:
| Would an AsterNovi-belgii-flower-1mb by any other name smell
| as sweet?
| ed25519FUUU wrote:
| Let's all just be grateful it wasn't AsterNovi-belgii-
| flower-100mb.
| efreak wrote:
| At one point, I had a very small png file that was large
| enough to crash Netscape and IE, and later firefox. It
| had large enough dimensions that browsers couldn't handle
| it.
|
| Today I'm sure it would be fine; instead I'm frustrated
| by my inability to create webp images larger than 16000
| pixels tall (i was trying to write a data-saver proxy for
| reading webtoons)
| [deleted]
| Havoc wrote:
| Clearly the bestest flower though
| [deleted]
| BugsJustFindMe wrote:
| And of course, now that the image is linked in the report, I've
| just added an additional request for it by clicking.
| Wowfunhappy wrote:
| With the amount of traffic its apparently getting, HN probably
| won't make a big impact. Plus, most of us aren't in India, and
| most of us have normal user agents.
| navotgil wrote:
| Well... posting it here will only increase the number of requests
| and will make the investigation harder
| z3t4 wrote:
| Just replace the image... what app/site is this? Email me at
| my@mail.com
| [deleted]
| tomglynch wrote:
| Just added this comment on the issue:
|
| Hi all, I've been doing a bit of research into possible apps that
| could be causing this and found two potential culprits that I am
| currently investigating.
|
| The first is Mitron TV, an Indian TikTok alternative which was
| made available again on the app store June 6th
| (https://indianexpress.com/article/technology/tech-news-
| techn...).
|
| The second is Say Namaste, an Indian Zoom alternative which was
| launched on the app stores June 9th
| (https://indianexpress.com/article/technology/tech-news-
| techn...).
|
| Both fall into the timeline of huge increases, have millions of
| users and may be using '1280px-AsterNovi-belgii-flower-1mb.jpg'
| to check the users internet connection - especially for Say
| Namaste to ensure video connectivity. I've reached out to some
| developers at both companies and will report back. Let me know
| your thoughts.
|
| EDIT: I have also noticed the dates match the reopening after
| lockdown for the whole of India: "This first phase of reopening
| was termed as "Unlock 1.0"[13] and permitted shopping malls,
| religious places, hotels and restaurants to reopen from *8
| June*."
| (https://en.wikipedia.org/wiki/COVID-19_lockdown_in_India#Unl...
| )
|
| Tom
| batch12 wrote:
| Based on this, I just reversed both Android apps and am not
| seeing strings related to wikimedia nor asternovi. This doesn't
| mean it's not obfuscated somehow though. The only app I've
| found the strings in so far is the "ravn" app proposed by
| @taviso. As mentioned in the twitter thread though it doesn't
| seem to have the install base to cause this traffic--
| catlover99 wrote:
| I took a look at the apk and noticed this in the manifest.
| "com.blockeq.stellarwallet.WalletApplication" Stellar Lumens
| is a fairly popular crypto currency. I wonder if the app has
| built in support for crypto transactions. If not, maybe it's
| malware to mine crypto coins.
|
| https://i.imgur.com/o8DllVd.png
| captn3m0 wrote:
| It is a crypto chat application:
|
| >Ravn is your portal to the most private messenger as well
| as Korrax our proprietary token. Stay up to date with
| Korrax and other Cryptos and join the crypto group chats.
|
| >Messages, images and docs are never stored on a server
| (after delivery), they're only locally stored on your own
| phone. Ravn is not tied to your phone number or email, you
| only sign up with a username that isn't searchable or
| discoverable.
| WeekSpeller wrote:
| Stupid question: how did you reverse the app in Android
| Studio?
| catlover99 wrote:
| I downloaded the APK and then used "Profile or Debug APK"
| under file in Android Studio and ctrl/cmd+shift+f to
| search for strings.
|
| I don't know much about Android development or APKs but
| it's not exactly "reversing." from what I understand the
| profile/debug converts the .dex files from the APK to
| .smali which is human readable.
| NicolaiS wrote:
| You can use the "Analyse APK" feature, but you probably
| rather want to use tools like jadx or apktool that
| provides fairly good decompilation.
| tomglynch wrote:
| Thanks batch12. In my edit, it could also be related to a
| check-in app used at public spaces in India - as it increases
| from the 8th of June which matches when the India-wide
| lockdown began to lift. Perhaps a reverse of qr code scan
| checkin apps in India could be useful?
| batch12 wrote:
| Could be-- I checked about 50 apps from alternative lists
| that popped up after the ban with no luck except for that
| one I mentioned before.
|
| Looks like they posted shortly after yours on the ticket
| that they found the culprit. Guess we'll find out tomorrow
| if we were on the right path.
| tomglynch wrote:
| Yeah hopefully they have a bit of a write up too about
| how they worked it out - interesting problem to solve!
| thetanil wrote:
| As far as I know, this is also an image commonly used in
| machine learning tutorials for image classification of species
| of flowers. I don't know if the tutorials use the mediawiki
| source directly though. I do recognize this image though. I
| think it's in the SciKit Learn O'Reilly book.
| tchalla wrote:
| Could it be the "Good Morning" like greetings on WhatsApp gone
| viral ?
|
| https://www.wsj.com/articles/the-internet-is-filling-up-beca...
| mabedan wrote:
| That's a good one! Maybe an app which generates this you'r type
| of image has this flower as one of their sample images in a
| list which it preload on startup
| Clewza313 wrote:
| No, because the images for those are stored by WhatsApp, not
| hotlinked from Wikimedia.
| aaron695 wrote:
| It seems like it started on the 2020-06-09
|
| https://pageviews.toolforge.org/mediaviews/?project=commons....
| dayze wrote:
| Sukhbir Singh just commented: Thank you everyone for the comments
| and suggestions. I just wanted to share that we have identified
| the app and will update this task tomorrow. (And yes, it is a
| mobile app.)
| NKCSS wrote:
| Would be curious to get the full story :-/
| m463 wrote:
| The "traditional" way of fixing this would be a goatse.cx
| redirect of the image.
|
| I'm sure there is a more enlightened fix.
| sangnoir wrote:
| ...or sending _that image_ [1] jwz sends back upon detecting HN
| in the referer. I bet they'll find the app in a matter of
| hours, or at least reduce the traffic drastically.
|
| 1. https://www.jwz.org NSFW!
| bzb6 wrote:
| This makes me wonder why the hell referer headers are still
| sent by major browsers, especially to third parties. I can't
| think of a single reason that benefits the user.
| qwertay wrote:
| Originally it probably just sounded like a cool feature to
| see what blog linked to you. Now its been around for so
| long that so much has been programmed to actually use it.
| If you turn it off you get every anti bot script blowing up
| on you.
|
| I think browsers did drop the path from it at least.
| jessaustin wrote:
| For one thing, examining referer is a common way that a
| server determines a request is _not_ a hotlink. Sure you
| can do something more complicated with cookies or whatever,
| but lots of sites are just using referer and they 'll break
| if the client doesn't send it.
| namibj wrote:
| But for that it's enough to send it for same-origin
| requests. No need to send it cross-origin, except for
| tracking purposes.
| iggldiggl wrote:
| That'd still break the distinction between hotlinking and
| the user using a bookmark or copy/paste to directly open
| the URL in question.
| h_anna_h wrote:
| Letting the sites distinguish between the two does not
| seem to be in the interest of the user.
| malaya_zemlya wrote:
| if you are making any sort of content or running a website,
| it is really useful to know how people found you.
| murph-almighty wrote:
| For those reticent to click on their work computers but
| morbidly curious, can someone describe the image?
| sangnoir wrote:
| It's a motivational-poster-type image with a white egg
| holder in the foreground, but instead of an egg, it's
| holding one exquisitely detailed hairy, caucasian ball[1].
| At the top, the title is "HACKER NEWS" and the bottom text
| is "A DDoS OF FINANCE-OBSESSED MAN-CHILDREN AND
| BROGRAMMERS"
|
| 1. Is there a collective biological term for scrotum _and_
| it 's contents that is not general like "genitals" is?
| sli wrote:
| All I get is a scrolling hex editor looking thing. Maybe that
| redirect has been disabled?
| mey wrote:
| You aren't sending a referrer header (a good thing).
| pengaru wrote:
| Yep, jwz has had a change of heart and sees today's HN as a
| born again breath of fresh air.
| geoelectric wrote:
| I'm seeing the nut sundae on iOS mobile so I wouldn't get
| too happy yet...
| [deleted]
| [deleted]
| jaredsohn wrote:
| Try from a new profile or incognito.
|
| I saw the described image but after I visited the site
| directly I couldn't see it any more when redirectly via
| hacker news. Saw it again when I opened an incognito tab.
| Dylan16807 wrote:
| I think he's the only one that uses that? Barely even worth
| mentioning in comparison.
| lxe wrote:
| Just learned that this person owns DNA Lounge (and pizza?),
| and is a founder (early contributor?) of Netscape and
| Mozilla.org. I've lived and worked in that particular area of
| SF for years and haven't known this.
| hedora wrote:
| Also, jwz is responsible for xscreensaver.
| alsetmusic wrote:
| One of my company's clients has a beautiful office right
| above DNA Lounge (well, across the street or just adjacent
| - it's been a while and I've only been there once). They
| told me they can hear sound checks from their rooftop
| patio.
| m463 wrote:
| netscape used to display a spinning compass when you put
| about:jwz in the title bar
|
| other good ones were about:1994 and about:mozilla
|
| hey, about:mozilla still works in firefox
| eythian wrote:
| about:robots also works in Firefox, I know it's been
| there for a long time but I have no idea if it was ever
| in Netscape.
| kbrosnan wrote:
| about:robots is from the early Firefox releases. Pretty
| sure it is from Firefox 3.0 development as you can find
| the same robot in images when searching for Firefox Gran
| Paridiso Robot.
|
| https://www.google.com/search?q=firefox+gran+paradiso+rob
| ot&...
| tingletech wrote:
| there used to be linux based public terminals in DNA lounge
| too, IIRC
| mey wrote:
| A permanent redirect to a non-image page (owned by Wikimedia)
| may achieve the same thing. Either the calling system can't
| support a HTML response, or it's a webview in which case you
| could either report an error or provide a notice. Maybe even
| ask for donations :)
| ed25519FUUU wrote:
| Or just downsample the image to a reasonable size and deal
| with it. Nothing inherently wrong with having a popular
| image.
| ehwhyreally wrote:
| Yes there is when you are hotlinking. Hotlinking in general
| is considered theft, you are using someone elses bandwidth
| and could even ddos the host if you are not caching the
| response.
| [deleted]
| concordDance wrote:
| > Hotlinking in general is considered theft
|
| This is a pretty puzzling idea to me. How could linking
| something be theft?
|
| To explore this, I shall try a metaphor. Imagine you're
| on a big social media website (lets call it Programmer
| Olds) which has an oddity in that 99% of its users use
| adblock. You then post a link to another small (ad
| supported) website on your Programmer Olds page, causing
| a large number of people to click through and download
| the page using large amounts of bandwidth (for no
| monetary gain to the site) and possible DDOSing the site.
|
| Have you commited theft?
| JCharante wrote:
| > causing a large number of people to click through and
| download the page using large amounts of bandwidth (for
| no monetary gain to the site)
|
| The difference here is that while a lot of users use
| adblock, there are some that don't. These users can still
| see the ads. Additionally even though it's a small
| website, it may lead to new readers that stick around or
| the content itself may even be sponsored.
|
| The equivilent to hot linking a picture would be like
| taking the content of a blog post without really linking
| to the source, because there's no chance of conversions
| there. If you're linking to the site itself then there's
| a reasonable chance that users can convert.
|
| So I suggest that it's theft just because the chances of
| readers being converted is nil while you're using their
| bandwidth.
| lorenzhs wrote:
| > This is a pretty puzzling idea to me.
|
| That's because you're responding to an entirely different
| issue. "Hotlinking" isn't linking to something, it's
| including a resource that is hosted elsewhere. It's
| putting <img
| src="https://concordDance.whatever/images/big_image.jpg">
| on _my_ website without asking you. Now if my site ends
| up on the front page of HN, that could cause a lot of
| traffic to _your_ site, potentially overwhelming your
| server or increasing _your_ hosting bill. It 's not nice,
| and rightfully frowned upon.
| concordDance wrote:
| But from a loss and gain perspective it seems equivalent.
|
| In both cases the site loses bandwidth for no gain due to
| your actions.
| yreg wrote:
| No, it's not universally considered a theft. Wikimedia
| explicitly permits hotlinking[0]. So does xkcd, imgur and
| tons of other sites.
|
| Of course when someone doesn't want us to hotlink to
| their assets then don't do it.
|
| [0] https://commons.wikimedia.org/wiki/Commons:Reusing_co
| ntent_o...
| arbitrage wrote:
| it's so easy to mitigate, though, that the fact that one
| doesn't sorta implies that one might want randos from the
| internet to use one's resources to view this image.
|
| it's not theft if you leave it out for everyone to use.
| JackWins wrote:
| My garden doesn't have a fence, doesn't mean you can host
| your picnic here.
| dylan604 wrote:
| No, but if I wander into your garden and "injure" myself,
| I can sue you for damages. You will be held negligent for
| not properly protecting yourself from preventing other
| people from injuring themself on your property.
| gokhan wrote:
| Is this something real (in US, most probably)?
| wruza wrote:
| Here in Russia, if you leave poisonous chemicals like
| methanol, etc, unmarked or put a bear trap in your locked
| house behind a locked fence with a generic warning sign,
| and then someone dies or gets injured by these, chances
| are you will go to jail. Idk if this applies to
| accidental traps like pools or rakes in grass. Same for
| taking a knife out of an attackers hand and stabbing them
| back. (Yes, our laws protect criminals better than
| citizens, not joking.)
| rendall wrote:
| Interesting. So if I understand this correctly, if
| someone breaks into your house and gets injured, and they
| can make a good case for some kind of negligence on your
| part, then they can successfully sue you?
| terramex wrote:
| In Poland setting marked traps on your own, fenced
| property is illegal and their owner is responsible for
| any harm they cause, because there exist legal reasons to
| enter another person's property - for example to fight
| spreading fire.
|
| However my favourite example is the law that allows any
| bee keeper to enter any private property if they are
| pursuing fleeing bee swarm.
| wruza wrote:
| If a judge or an expert is sure that you intended this
| outcome, and that someone is brave (or dead) enough to
| admit their own crime.
| rendall wrote:
| It's also illegal to set a trap in your own home in the
| US as well, decided when a property owner, tired of
| people breaking into his property while he was away, set
| up a shotgun booby trap that injured a burglar.
| https://youtu.be/bV9ppvY8Nx4
|
| I wasn't sure if it is the same or similar principle in
| Russia or a different one that requires active care for a
| burglar. Unlabeled chemicals causing liability for a
| burglar seems extreme to me
| varjag wrote:
| This is an urban legend in Russia.
| eythian wrote:
| Leaving a bear trap goes way beyond negligence, it's
| literally setting a trap. Similar with unmarked dangerous
| chemicals, they're required to be marked for good reason.
| h_anna_h wrote:
| In Greece if a burglar dies while in your house you can
| be held responsible, even more so if you have set up
| traps.
| pastech wrote:
| In France (at least), all swimming pools are protected by
| a fence. If you own a pool and don't put a fence around
| it, you can be held responsible for a child drowning into
| it.
|
| It is possible this principle applies to other countries
| and other things than pools.
| nikanj wrote:
| Yes, you can sue anyone for anything. Your suit probably
| won't prevail, unless you have access to very expensive
| lawyers and your opponent doesn't.
|
| But you can totally sue anyone for anything, and that
| makes for entertaining headlines - even though if
| plaintiff lost promptly
| zaarn wrote:
| Wikimedia has a User-Agent policy which is being violated
| here. Hence this is the property owner putting up a sign
| that says "risk of injury", so if you walk in and injure
| yourself, you only have to blame yourself for being
| negligent.
| h_anna_h wrote:
| The policy is for how wikipedia will act when
| encountering clients with certain user-agent headers, not
| a rule for the clients.
| zaarn wrote:
| It's a policy how wikimedia acts when clients lack a user
| agent header, it's therefore effectively a rule for
| clients as without a proper UA header, they may be
| blocked indefinitely.
| AdrianB1 wrote:
| Only in your dreams and some dumb countries, not in the
| rest of the world.
| dylan604 wrote:
| You think this, but how much experience do you have with
| it? People know that homeowners have insurance. They sue
| to make the insurance pay out. It happened to my
| neighbor. So you can make all of the dumb countries
| comments you want, but it doesn't make it any less real.
| m463 wrote:
| I wonder if there's some way to have a frontend cache that
| or webserver shortcut that looks for that exact url and
| blurts out the image?
|
| Or maybe wikipedia is already mostly static.
|
| also, I wonder if HN is inadvertently ddos'ing the ticket
| system ?
| ChrisMarshallNY wrote:
| Like so?
|
| http://ascii.textfiles.com/archives/1011
| BlueTemplar wrote:
| Brilliant.
| peterkelly wrote:
| This, perhaps disturbingly, was my first thought upon reading
| the issue.
|
| Things were done very differently back in the day. This problem
| would have been fixed _real_ quick.
| chrisjarvis wrote:
| To the people who didn't grow up with 4chan: do not search for
| this image, its pretty disgusting.
| isatty wrote:
| Fairly sure you'd get goatse'd more often on Efnet etc back
| in the day
| askvictor wrote:
| s/4chan/slashdot/
| pengaru wrote:
| 4chan didn't even exist yet when goatse emerged
| deathlight wrote:
| It seems plausible to me that the, ahem, "spread" of the
| image was greatly increased through the efforts of 4chan.
| Semaphor wrote:
| Back in School, goatse was extremely well known. That was
| several years before 4chan. I hadn't even heard of goatse
| in a 4chan relation until now.
| eyelidlessness wrote:
| Maybe widespread but it was already pretty wide open
| before there was a gap for 4chan to even exist.
| themaninthedark wrote:
| ** Kadmium changes topic to 'Our hearts are extended to
| the 18 victims of the recent internet fraud'
|
| http://bash.org/?434593
| nomat wrote:
| Hey I'm on that website! IRC used to be so fun and weird
| back in the day. hanging out on slashnet took up most of
| my free time in junior high.
| incompatible wrote:
| I think it was popularized back in the days of Slashdot.
| tingletech wrote:
| it does not date to alt.tasteless on usenet? (edit: w/r/t
| goatse)
| disillusioned wrote:
| I was going to suggest Something Awful but you might win,
| though Wiki pegs it (heh) at 1999...
| eyelidlessness wrote:
| To the people who grew up before 4chan, pls don't mention
| tubgirl
| BlueTemplar wrote:
| Not having eyelids would certainly make it worse!
| Rompect wrote:
| I was born after 4chan was created and I found that image
| on reddit. It's pretty mild; one can tell quickly that it
| is a doll.
| rootw0rm wrote:
| lmao
| sulam wrote:
| 2 girls one cup
| eyelidlessness wrote:
| I missed the edit window and I'm disappointed in myself for
| mentioning it by name. Please just don't Google this unless
| you're prepared for an upsetting image, and even then maybe
| just skip it. You're probably not as prepared as you think.
| walrus01 wrote:
| goatse signifiantly pre-dates 4chan
| xwdv wrote:
| what is it?
| eyelidlessness wrote:
| Big stretched open butthole. Not sure if you need the
| warning but I'm commenting in case anyone would prefer not
| to see it despite their curiosity.
|
| Sorry to ruin the fun y'all but there's images I won't even
| mention that I can't unsee and make me feel seriously ill
| when I do see them. I don't want anyone else to feel that
| way without warning.
| xwdv wrote:
| What are these images called so we know to avoid them?
| drivebymy wrote:
| There were quite a few, lemonparty and meatspin spring to
| mind, and the various incarnations of "two x one y".
| jml7c5 wrote:
| Can't speak to the images themselves, but the sites are
| usually referred to as "shock sites":
|
| https://en.wikipedia.org/wiki/Shock_site
| shagie wrote:
| They were known as a "shock site" (
| https://en.wikipedia.org/wiki/Shock_site )
|
| The Wikipedia page for
| https://en.wikipedia.org/wiki/Goatse.cx is text only and
| without any ascii art.
|
| I'm amused that
| https://simple.wikipedia.org/wiki/Goatse.cx also exists.
| arp242 wrote:
| Oh, Goatse is _that_ site.
|
| I remember when I was about 15, before pop-up blockers
| were really a thing, someone sent me a link to that and
| it would keep opening popups with that image and you
| couldn't close all of them :-/
|
| Sometimes people look back to the internet of the 90s
| with too rose-coloured glasses IMO.
| shagie wrote:
| For some memories...
| http://www.bash.org/?search=goatse&sort=0&show=25
|
| I am personally most amused by #38659
| eyelidlessness wrote:
| Hey at least if you were on a 90s Mac your computer was
| probably unresponsive and you could skip to the
| inevitable force reboot. And browsers didn't save
| sessions so you were in the clear as soon as you got to
| tabula rasa.
| eyelidlessness wrote:
| I'm honestly not sure you're asking in good faith so I'm
| not going to add more (and if you are asking in good
| faith you've got plenty in responses to go on). Also I
| never knew the name of the one that's permanently burned
| into my brain and I'm so glad I don't.
| ArchOversight wrote:
| It's interesting that you equate goatse with 4chan! I'm old
| :-(
| bawolff wrote:
| Why does it need to be fixed? The mission of wikimedia is to
| serve educational content.
|
| Edit: this is a bit unfair, if its a specific app they should
| be convinced to cache just to avoid unfair resource usage, but
| hotlinking in general should not be seen as a problem
| _joe wrote:
| Any for-profit entity hotlinking Commons is unfair. Heck,
| they have the right to redistribute freely the image as they
| see fit, instead of consuming resources that are a common
| good.
|
| But this goes beyond that - it's some blind check of internet
| connectivity for the app, and doesn't get shown to the user.
| We're pretty sure of that, given that with the amount of
| noise that task generated, if there was an app featuring that
| image at least one of the ~ 90M daily "views" would've been
| someone reading these posts.
|
| Now, given we want to be nice, we didn't just blindly block
| the traffic, although making requests without user-agent is
| against our UA policy https://meta.wikimedia.org/wiki/User-
| Agent_policy
| unreal37 wrote:
| Presumably they are paying for the servers/bandwidth to
| support that, and that money is coming from donors.
|
| It's a waste of donors money if someone is using this image
| as some kind of "is this thing on" test using hacked
| computers...
| _joe wrote:
| It's both a waste of donor money and a starvation of
| resources for people actually consulting images on
| wikimedia commons.
| arbitrage wrote:
| i'm sure the revenue model is robust enough to accommodate
| spikes in traffic.
| walrus01 wrote:
| This is exactly what I used to do about 17 years ago.
| jessaustin wrote:
| A red flower rather than a lavender one.
| jobigoud wrote:
| If it's just used internally by an app to test connectivity as
| suggested in another subthread, this wouldn't solve the
| problem.
| Blikkentrekker wrote:
| I am a conservative man of tradition; fuck these modern liberal
| commies and their new age ways calling their degenerate
| diplomatic solutions "enlightenment".
| [deleted]
| macawfish wrote:
| Tragedy of the commons
| blakesterz wrote:
| They did figure it out, a popular chat app in India (they won't
| name yet) fetches the image but does not display it.
|
| https://phabricator.wikimedia.org/T273741#6815828
| owlninja wrote:
| >>Thank you everyone for the comments and suggestions. I just
| wanted to share that we have identified the app and will update
| this task tomorrow. (And yes, it is a mobile app.)
|
| Looks like we will know soon.
| aquadrop wrote:
| It's not 20% of all requests, it's 20% of media requests to one
| of the clusters (it's said in the issue description) There are 5
| clusters.
| newsclues wrote:
| Example code that gets copy pasted into production app somewhere?
| [deleted]
| [deleted]
| rovr138 wrote:
| Possibility and linked in a couple places. They found examples
| rburhum wrote:
| so what is the app? I felt like I read a full blown novel, and
| the last sentence with the conclusion is missing!
| Liquid_Fire wrote:
| > we have identified the app and will update this task tomorrow
|
| I guess we will find out tomorrow.
| gertlex wrote:
| After realizing "wiki[p|m]edia" and "flower" triggered a specific
| image in my head I was guessing it would be a yellow flower, this
| one in the corner of https://www.mediawiki.org/wiki/MediaWiki but
| nope, more interesting than that!
| jakoblorz wrote:
| Yes OMG! Didn't expect this one
| doovd wrote:
| Same here dude. Was sad when it turned out NOT to be the yellow
| flower!
| simonebrunozzi wrote:
| Here's the flower in question [0].
|
| [0]:
| https://upload.wikimedia.org/wikipedia/commons/thumb/1/16/As...
| danparsonson wrote:
| Posting the very link they're trying to reduce traffic on
| doesn't seem like a very helpful thing to do
| simonebrunozzi wrote:
| I bet everyone on HN would have wanted to take a look
| eventually.
| smcl wrote:
| They were trying to figure out the root cause of a sudden
| uptick of _millions_ of requests being made for a given image
| with no user agent or referer, presumably with a view to
| notifying the app responsible or figuring out a workaround.
|
| A few thousand requests from clearly identifiable as coming
| from browsers _and_ with a referer header from
| news.ycombinator.com would not exactly interfere with this
| and in the grand scheme of things isn 't a huge burden in
| terms of network traffic.
| astrea wrote:
| My initial thinking is it's in some flower recognition dataset.
| stevefolta wrote:
| Or it's India's canonical non-hotdog.
| shp0ngle wrote:
| According to the comments, it's probably an Indian TikTok clone
| that checks internet connection by downloading the picture.
| scrps wrote:
| They didn't mention if the 90M requests were unique, perhaps some
| app doing background refresh and not caching that image?
| siltpotato wrote:
| First thing I think of seeing just the headline: something to do
| with ML datasets?
|
| Now that I've read it: Hmm, never heard of phabricator.
| V-2 wrote:
| _" Please avoid adding drive-by comments such as "hello from
| Hacker News" to this task as they are not helpful. Thank you"_
|
| Why would anyone do such stuff is, as usual, beyond me...
|
| PS. "First!"
| tyingq wrote:
| The funny thing is, the first instance of that in the thread
| wasn't "hello from hacker news". It was a "hello to hackernews"
| from an engineer on the WikiMedia team.
| https://phabricator.wikimedia.org/T273741#6813995
| Dumbdo wrote:
| And that comment was removed by the author a few minutes ago.
| calibas wrote:
| 999 out of a 1,000 people know better, but when there's
| thousands of people...
| vultour wrote:
| Even more curiously, the person that did that registered an
| account and put up a profile picture (I assume of himself),
| just for that comment...
| bpicolo wrote:
| Huh, I worked on a site with a similar issue in ~2019. A massive
| flood of traffic for a single site from Indian mobile apps
| (~15kqps at peak iirc).
|
| I think it ended up being a sort of mobile-based botnet with a
| bizarre target, which luckily was deduced from some of the
| headers sent (they all had a random common header).
| smcl wrote:
| What's "kqps"? It's obviously kilo-{somethings} per second, but
| I don't know what the {something} would be
|
| edit: queries?
| bpicolo wrote:
| Yep, queries
| dailypeeker wrote:
| 15000 queries per second
| downrightmike wrote:
| Saw a story recently that Indians were bringing down the
| internet because of sending good morning messages:
| https://www.wsj.com/articles/the-internet-is-filling-up-beca...
|
| I'd bet that this is the flower of the week for them.
| nso wrote:
| Says it's been going since last June.
| Codesleuth wrote:
| Now you just need to draw attention to it further by posting it
| in hacker news. I'm sure none of us are curious to immediately
| see the picture.
| Donckele wrote:
| " Thank you everyone for the comments and suggestions. I just
| wanted to share that we have identified the app and will update
| this task tomorrow. (And yes, it is a mobile app.)"
| airhead969 wrote:
| Those goddamn white Hare Krishnas are in the airport again,
| handing out flowers no one wants.
|
| Also: What's your vector, Victor?
| kayxspre wrote:
| I remember that MediaWiki installation allowed the configuration
| that essentially permits the use of Commons files, albeit in that
| case, the file will be downloaded and cached in the Wiki's own
| server [1].
|
| That being said, though the image wasn't hotlinked directly, they
| expressed concerns of DDOS and the possible costs the Foundation
| has to incur from each load (they even pointed out that it's
| "fair and reasonable" to point donation link to them).
|
| I would be interested to see how the licensing issue will be
| handled, though. The photographer licensed this photo as GFDL/CC
| BY-SA 3.0 [2], and hotlinking may break the term of these
| licenses.
|
| 1: https://www.mediawiki.org/wiki/InstantCommons
|
| 2: https://commons.wikimedia.org/wiki/File:AsterNovi-belgii-
| flo...
| nos4A2 wrote:
| Perhaps they can replace the image with an obviously wrong image
| (and smaller in size), and then wait for someone to complain
| bombcar wrote:
| I had some random images on a web server years ago - and noticed
| that something like 99% of my traffic was one image - and
| searching through refers I realized I was the #1 hit on google
| images for robot attack cat.
|
| Simpler times.
| Tijdreiziger wrote:
| Can we see the image? :D
| ConcreteGidget wrote:
| Yes, but only 10 million times.
| bombcar wrote:
| https://imgur.com/8MMET5V - now there are companies that can
| host things FOR my servants!
| BlueTemplar wrote:
| Anyone knows how imgur is able to afford that ?
| kingnothing wrote:
| ads
| BlueTemplar wrote:
| Even though ad-blockers are ever more common ? (I didn't
| even _think_ about ads, since I never saw them on imgur
| !)
| kingnothing wrote:
| It's the same way Google pulled in $180B last year and
| Facebook made $86B.
| kristianp wrote:
| Low cost CDNs like Cloudflare.
| guessbest wrote:
| You got to have hotlink protection on when you are hosting
| memes. I've learned this the hard way, too.
| globular-toast wrote:
| Remember when some sites would send you a shock image instead
| of the one you were expecting if it detected hot linking? I
| don't miss that.
| mcintyre1994 wrote:
| There's a site that's occasionally posted in comments here
| which is apparently run by someone who hates HN because
| they serve some image I can't remember to readers from
| here.
| alickz wrote:
| Yeah it was posted not long ago. When it sees HN as the
| referrer it shows a picture of a testicle in an egg cup.
| [deleted]
| berkes wrote:
| I had a similar issue. Some 15+ years ago, an image from my
| blog showed up for people who searched the phrase 'Peanubutter
| Sex'. The image had nothing to do with peanutbutter nor with
| sex. My blog is SFW. It was some screenshot of KDE IIRC.
|
| For almost a week it remained the most requested image the post
| on which it appeared, the most popular.
|
| It did make me uncomfortable, though. Fearing that my rankings
| would plummet or so.
|
| My takeaway is nothing new: there are weirdo's online.
| kristianp wrote:
| So where did the connection between "Peanubutter Sex" and
| your blog come from? Did you ever find out?
| Zaheer wrote:
| What's the right way to solve for this generally? A CORS policy
| wouldn't be effective since it's not a browser requesting the
| image.
| Ayesh wrote:
| It's likely that the consumer doesn't even display that image.
| It's probably a dry 1mb download test.
|
| Most indian ISPs, even mobile ones are extremely cheap to not
| matter a 1mb.
| speedgoose wrote:
| If the image is not displayed and the stack can handle a lot of
| tcp connections, perhaps a reverse http slowloris attack : you
| send the image response headers as slow as you can to keep the
| tcp connection open but to make the receiver waste its time.
|
| If it's a speed test, they will eventually use another image.
___________________________________________________________________
(page generated 2021-02-09 23:01 UTC)