[HN Gopher] We analyzed 425k favicons
       ___________________________________________________________________
        
       We analyzed 425k favicons
        
       Author : gurgeous
       Score  : 515 points
       Date   : 2021-10-20 17:29 UTC (1 days ago)
        
 (HTM) web link (iconmap.io)
 (TXT) w3m dump (iconmap.io)
        
       | paulirish wrote:
       | Aside: This article is a decent usecase for the esoteric `image-
       | rendering: pixelated;` css property.
        
         | nkriege wrote:
         | Great tip. I've never come across this before. I updated the
         | post and the scaled up icons look much sharper now.
        
         | dmitrygr wrote:
         | I used it to make this PWA work well on iPhones:
         | http://dmitry.gr/89
        
           | mod wrote:
           | I loaded this up on a surface tablet--it renders larger than
           | my viewport, but with no scrollbar.
           | 
           | I was able to zoom out and see everything, but some people
           | don't know (or wouldn't think of) that trick.
        
             | dmitrygr wrote:
             | Designed for personal use as a PWA specifically on my
             | iPhone. I migrated from android where i had a TI-89
             | emulator app. No such thing exists for iOS. Usability by
             | others was never a requirement :)
        
           | lostgame wrote:
           | Ha - that's a fantastically nerdy little project. I love it!
        
         | philistine wrote:
         | My website, gameboyessentials.com, would not exist without this
         | esoteric CSS property. I wanted to show Game Boy images in
         | their exact resolution (160 by 144). With image-rendering:
         | pixelated; I have crisp pictures on my site whose sizes are
         | counted in bytes.
        
       | 1cvmask wrote:
       | The favicon visualization brought memories of the million dollar
       | homepage. I suppose it was precursor of NFTs.
       | 
       | https://en.wikipedia.org/wiki/The_Million_Dollar_Homepage
       | 
       | http://www.milliondollarhomepage.com/
        
         | mrkramer wrote:
         | >The favicon visualization brought memories of the million
         | dollar homepage. I suppose it was precursor of NFTs.
         | 
         | It was not; NFTs are digital certificates saying that you own
         | certain digital content on the other hand The Million Dollar
         | Homepage was basically selling ad space on the website.
         | 
         | You can argue you could buy part of the website(digital space)
         | and therefore you own the part of the website but in reality
         | you were renting it as an ad space meant to promote your
         | website(link).
         | 
         | Purpose and vision of The Million Dollar Homepage and NFTs are
         | completely different but I can see similarities between quasi
         | owning digital space(part of website) and owning digital
         | content or digital certificate(digital token).
        
           | alextheparrot wrote:
           | Is it really necessary that we assume a precursor must be a
           | strict equality in all dimensions aside time?
        
             | mrkramer wrote:
             | No but because of the all aforementioned reasons they are
             | of minimal similarity.
        
       | philshem wrote:
       | Less analysis, but a couple years ago I posted a script to
       | download and then generate mosaics from favicons:
       | https://smalldata.dev/posts/favicon-mosaic/
       | 
       | example image: https://smalldata.dev/images/mosaic.jpeg
       | 
       | script to get the favicons:
       | https://gist.github.com/philshem/e59388197fd9ddb7dcdb8098f9f...
        
       | Groxx wrote:
       | Off in one of the more esoteric corners of favicons, you have
       | games played within the favicon:
       | https://www.youtube.com/watch?v=fpjM5myls7I
       | 
       | Sadly it doesn't quite work for me any more, but the youtube
       | video does a decent job showing what it looked like when it
       | worked.
        
       | tinco wrote:
       | Not really relevant, but using Go to fetch the data, and then
       | Ruby to process the data is the best. I used this exact set up
       | for a project and it was amazing. Really the sweet spot of use
       | cases for both languages.
        
         | tweakimp wrote:
         | Can you please explain why they are the best languages for
         | these jobs?
        
           | tinco wrote:
           | Go's got an awesome feature set built in to the language for
           | building small networked services. I implemented a client to
           | a cryptocurrency network to extract information about its
           | status and clients. I can't really express why it's so good,
           | it just feels right.
           | 
           | Same for Ruby, the syntax is perfectly suited for
           | transforming, digging through and acting upon data. I didn't
           | even add a Gemfile, only used standard library functions,
           | transforming the data the Go program mined into usable
           | information serialized in JSON which was subsequently used as
           | a static database for a webpage.
           | 
           | You can find the source here:
           | https://github.com/tinco/stellar-core-go, the Go is in cmd
           | and the Ruby is in tools.
           | 
           | The site it powers is now defunct, apparently they changed
           | some stuff in the past 3 years and the crawler no longer
           | functions.
        
       | whalesalad wrote:
       | I have always wanted to do this _exact_ analysis - so awesome!
       | Every time I am building some kind of semi-intelligent parser to
       | fetch an arbitrary visual icon for a URL I think to myself there
       | has gotta be a better way do do this.
        
       | account42 wrote:
       | One weird behavior with favicons that I noticed is that Firefox
       | will download both the 16x16 icon that matches the size its
       | displayed at (on 1x pixel ratio screen) as well as the largest
       | icon and then will display whichever finished last. This behavior
       | makes no sense to me.
        
       | Quai wrote:
       | I worked on Opera Link, the first built-in synchronization
       | between different installations of the Opera browser, both
       | desktop, Opera Mini and Opera Mobile (+ a web view).
       | 
       | Favicons got included in the data from day one, and it was
       | awesome to get the look and feel of your bookmark bar/UI with the
       | correct icons right away.
       | 
       | Back then we stored the booksmarks in a home grown XML data store
       | (built on top of mysql, acting more or less as a key-value
       | store). This worked quite nice, and it allowed us to easily scale
       | the system.
       | 
       | One night the databases and backends handling the client requests
       | suddenly started eating a lot more memory, and the database
       | started using much more storage than normal.
       | 
       | As one of only two backend devops working on Opera Link, I had to
       | debug this, and find out what was going on. After a while I
       | isolated the problem to a handful of users. But how could a few
       | users affect the system so much?
       | 
       | As a part of the XML data store, we decided naively to store the
       | favicons in the XML, as a base64 encoded string. While not
       | pretty, a 16x16 PNG is not that much data, and even with thousand
       | of bookmarks, the total overhead on compression and parsing was
       | neglishable. What we did not foresee was what I uncovered that
       | night; A semi-popular porn site had changed something on their
       | server. They had started serving the images while also pointing
       | browser to the same images as the favicon! Each image being
       | multiple megabytes, sent from the client, parsed on the backend,
       | decoded, verified, encoded back to base64, added to the XML DOM,
       | serialized, compressed and pushed back to the database...
       | 
       | Before going to bed that night, I had implemented a backlist of
       | domains we would not accept favicons for, cleaned up the
       | "affected" user data, and washed my eyes with soap.
       | 
       | I miss those days!
        
         | eezurr wrote:
         | I have fond memories of using Opera <= 12. You guys were in
         | space compared to other browsers at the time.
        
         | thrdbndndn wrote:
         | Wait, so you can see user's data directly?
        
           | lmm wrote:
           | GP is talking about something implemented in 2008. It was a
           | different time and a different mentality.
        
             | Brybry wrote:
             | Google Chrome Sync help docs imply it defaults to storing
             | data on servers unencrypted by default.[1]
             | 
             | Firefox Sync seems to have sane/encrypted defaults.[2]
             | 
             | [1] https://support.google.com/chrome/answer/165139
             | 
             | [2] https://hacks.mozilla.org/2018/11/firefox-sync-privacy/
        
           | Quai wrote:
           | The truth is that most services will have a set of devops
           | with access to personal information. And some times, we need
           | to look at private data to solve issues like this. My first
           | instinct back then was that some smart hacker had created a
           | FUSE support for Link or something similar.
           | 
           | Opera Link did not encrypt bookmarks and speeddials etc, but
           | had datatypes encrypted with master password, even while
           | syncing. We where two people with the access and knowledge to
           | access individual user information, and we took it very
           | serious.
        
       | munk-a wrote:
       | Didn't they miss all the pre-sized icons in their scan as well?
       | For a while Apple encouraged multiple resolution sizes for
       | favicons for... reasons.
       | 
       | I know they additionally missed the directory specific favicons
       | which have always had iffy support (i.e. /index.html =>
       | /favicon.ico and /munks-page/index.html => /munks-
       | page/favicon.ico)
        
       | arantius wrote:
       | I did something similar in 2008:
       | https://tech.arantius.com/favicon-survey
        
       | achillean wrote:
       | Nmap generated a similar version many years ago and it's still
       | available at:
       | 
       | https://nmap.org/favicon/
       | 
       | We also did something looks at favicons by IP:
       | 
       | https://faviconmap.shodan.io/
        
       | Lorin wrote:
       | This reminds me of the time I reported to CIRA (Canadian domain
       | registry) that their favicon was ~2mb /w bad caching rules and
       | was causing issues in ... many situations.
        
       | [deleted]
        
       | arp242 wrote:
       | I got mine down to 160 bytes with some pixel tweaking and
       | converting it to a 16-color indexed PNG. It's not a lot of work
       | or very difficult (I'm an idiot at graphics editing), but you do
       | need to spend the (small amount of) effort. I embed it as a data
       | URI and it's just four lines of (col-80 wrapped) base64 text,
       | which seems reasonable to me.
       | 
       | Haven't managed to get my headshot down to less than 10k without
       | looking horrible no matter how much I tweaked the JPEG or WebP
       | settings, and thought that was just a tad too big to embed. Maybe
       | I need to find a different picture that compresses better.
       | 
       | I got that 280k Discord favicon down to just 24K simply by
       | opening it in GIMP and saving it again. I got it down to 12K by
       | indexing it to 255 colours rather than using RGB (I can't tell
       | the difference even at full size). You can probably make it even
       | smaller if you tried, but that's diminishing returns. Still, I
       | bet with 5 more minutes you can get it to ~5k or so.
       | 
       | It's very easy; you just need to care. Does it matter? Well, when
       | I used Slack I regularly spent a minute waiting for them to push
       | their >10M updates, so I'd say that 250k here and 250k there etc.
       | adds up and matters, giving real actual improvements to your
       | customers.
       | 
       | The Event Horizon Telescope having a huge favicon I can
       | understand; probably just some astronomer who uploaded it in
       | WordPress or something. Arguably a fault of the software for not
       | dealing with that more sensibly, but these sort of oversights
       | happen. A tech company making custom software for a living is
       | quite frankly just embarrassing to the entire industry. It's a
       | big fat "fuck you" to anyone from less developed areas with less-
       | than-ideal internet connections.
        
         | TheJoeMan wrote:
         | " I got that 280k Discord favicon down to just 24K simply by
         | opening it in GIMP and saving it again. "
         | 
         | You made me laugh out loud.
         | 
         | I agree that stuff like YouTube.com saying 144x but really 145x
         | seems like it should be embarrassing.
        
           | arp242 wrote:
           | I wouldn't be surprised if that was for a specific reason,
           | like somehow showing up better somewhere for some reason, or
           | something like that. Or maybe not; who knows...
        
         | fbrchps wrote:
         | Oh hey, Discord must have seen this article -- their favicon is
         | down to 14k now.
        
           | ehsankia wrote:
           | It's not, at least for me. If you checked in devtools, that's
           | gzip over the wire size. Hover over the size and it'll show
           | you the actual resource size, still 285k for me.
        
             | dmurray wrote:
             | The gzipped size is probably the correct metric to care
             | about, right? Virtually all browsers will support that.
             | 
             | Sure, Discord could do a bit better, but it's not correct
             | to knock them here for costing their users 285KB.
        
               | MrBoomixer wrote:
               | This is bad math, not researched heavily but in 2020
               | discord had 300 million users. 285kb goes a long way with
               | wasted energy and bits flowing through the pipes. I agree
               | generally with what your saying though gzipped sizes are
               | what's being sent some CPU usage somewhere to unzip. less
               | bytes == less waste?
        
               | lmm wrote:
               | PNG basically includes gzip in the file format, so you're
               | not reducing the amount of CPU used, you're just moving
               | where it happens.
        
               | giantrobot wrote:
               | Includes but doesn't always use. PNG also includes
               | filters which can dramatically decrease sizes, especially
               | when combined with compression.
               | 
               | That's why tools like OptiPng basically brute force all
               | the combination of options. Depending on the image
               | content different combinations of filters and compression
               | will get the best file size.
        
             | jhgg wrote:
             | I committed a fix, it's now 24k uncompressed! :)
        
               | blitzar wrote:
               | Congratulations. Don't forget your 11x improvement when
               | it comes to the end of year reviews.
        
           | gremloni wrote:
           | That's lit and a fantastic turnaround. Great work to whoever
           | is reading this!
        
         | nerfhammer wrote:
         | there are png optimizer programs, e.g. optipng
        
           | pseudosavant wrote:
           | The Squoosh (web) app is awesome for this too! All processing
           | is done locally with wasm.
           | 
           | https://squoosh.app
        
             | 101008 wrote:
             | I'd love to have a browser plugin that converts all images
             | I upload to CMS using Squoosh.
        
             | ehsankia wrote:
             | Yep, just tried the Discord icon with OxyPNG and it went
             | from 285k to 6.35k, visually indistinguishable.
        
           | vadfa wrote:
           | `optipng -o9 -strip all' is a must
        
           | jamesfinlayson wrote:
           | I found https://pngquant.org/ to be pretty good.
        
             | account42 wrote:
             | Note that unlike some of the other tools mentioned here,
             | pngquant does _lossy_ compression. Might still be the right
             | tool in many cases, but it means you should check the
             | output while e.g. optipng is a no-brainer to add to
             | whatever your publishing pipeline is.
        
           | memco wrote:
           | ImageOptim was a favorite of mine. They have a standalone mac
           | app and a webservice. It combines several of these tools into
           | a single GUI.
        
         | JohnTHaller wrote:
         | 256x256 PNG reduced to 256 colors with pixel transparency gets
         | it to 2.68K. I manually dropped the color depth to indexed and
         | saved it out in PhotoShop and I used FileOptimizer to shrink
         | it. It includes 12 different image shrinkers and runs them all.
        
       | jtbayly wrote:
       | > Check out this startling ICO with 64 images, all roughly 16x16.
       | I suspect a bug.
       | 
       | I suspect an animation. Anybody know how to find out?
        
       | TazeTSchnitzel wrote:
       | The non-PNG Apple touch icons might be CgBI files? It's an
       | undocumented proprietary Apple extension to PNG which most PNG
       | tools won't accept, but which Xcode uses for iOS apps.
        
       | bugmen0t wrote:
       | > We did a hacky image analysis with ImageMagick to survey
       | favicon colors. Here is the dominant color breakdown across our
       | favicons. This isn't very accurate, unfortunately. I suspect that
       | many multicolored favicons are getting lumped in with purple.
       | 
       | Writing or reviewing a sentence like this should make you
       | reconsider. Either do the right analysis or remove this from your
       | article. But when you say your analysis is probably wrong and the
       | results look weird, then why publish as is?
        
         | duckmysick wrote:
         | Imperfect analysis with known limitations still has value. We
         | can build upon it and improve. I'd rather have it out in the
         | open than omitted.
        
       | thrdbndndn wrote:
       | > Strangely, only 96.1% of Apple touch icons are PNG. Presumably
       | the other 4% are broken.
       | 
       | What does broken mean in this context? Non-PNG, or actually
       | broken? I assume the author has the files.
        
       | ryan29 wrote:
       | > In fact, I recommend that browsers ignore these hints because
       | they are wrong much of the time.
       | 
       | I don't agree. That's the kind of coddling that encourages
       | incompetence. Instead of compensating for others' mistakes, just
       | let their stuff break.
       | 
       | I wonder if Safai on iOS ignores the hints. When I tested, I was
       | surprised to see that pressing the share icon, which holds the
       | option for `Add to Home Screen`, would cause a download of all of
       | the icons listed with `link rel="icon"`.
       | 
       | Favicons are a huge pain to deal with correctly.
        
         | malfist wrote:
         | People make mistakes all the time. Breaking because somebody
         | made a mistake that you can correct for just leads to
         | unnecessarily fragile code.
         | 
         | What's the point of failing and breaking stuff if someone tells
         | you their image is 144x144 but it's really 145x145? Who does
         | that benefit?
        
           | anyfoo wrote:
           | The opposite is the case. Overall, being too lenient in what
           | code accepts and applying heuristics will lead to way worse
           | problems down the line. For example, you want your compiler
           | to fail hard instead of saying: "Oh, this isn't a pointer,
           | but I'm sure you meant well, I'm just going to treat it as a
           | pointer!"
           | 
           | In _this_ particular case, it seems to me that the hints
           | serve no purpose and should be abolished, and in the meantime
           | fully ignored, altogether. All necessary metadata is
           | contained in the image file, and browsers should also be
           | (relatively) strict in what image files with what metadata
           | they accept, for security reasons alone.
           | 
           | And if they also went so far as limiting file size, the
           | perpetrators that clog up bandwidth by putting up multi-MB
           | favicons would catch on much earlier (or at all), too.
           | 
           | So what actually is the point of those hints, if browsers
           | have to fallback anyway?
        
             | notatoad wrote:
             | The hints are not a hint in how to render the icon -
             | browsers don't need hints for that. the hints are an
             | instruction to browsers on which icon to download in the
             | case where multiple icons are specified.
             | 
             | if you are safari and you don't know how to display SVG
             | favicons, then you don't need to waste bytes downloading a
             | favicon only to fail to display it. the HTML does not limit
             | a site to only one favicon.
        
               | anyfoo wrote:
               | Why is that not done through the MIME type and using
               | HEAD? The server is apparently much better able to figure
               | out the MIME type through magic numbers and file
               | extensions of the actual file, than the author (human or
               | not) of the HTML, as we see.
               | 
               | The same headers also inform the browser that they can
               | skip downloading a favicon that they consider too big,
               | for example.
        
               | scrollaway wrote:
               | HEAD support is never a guarantee, and content type auto
               | detection is just another kind of heuristics.
        
               | anyfoo wrote:
               | Ugh, HEAD is not being universally supported, at least
               | for static content? Okay, I accept that this has value
               | then.
               | 
               | As for the MIME type, for image types I'd say it's more
               | than stable enough. Certainly much, _much_ more stable
               | than the 6.7% error rate mentioned in the article here, I
               | 'd be surprised if it was even 1%. If you double click on
               | an image on your desktop for example, you can in almost
               | all cases expect that it will be opened correctly. It
               | ceases being a heuristic entirely if you tell the
               | webserver that *.png is image/png, and only put PNGs with
               | names ending in ".png".
               | 
               | Guess those are the reasons why I got out of web
               | development in 10 years ago, everything's held together
               | by scaffolding and needlessly wasteful and inefficient
               | there.
        
               | scrollaway wrote:
               | You might be overthinking this. I agree with the
               | philosophy that stricter is better, but in this case what
               | do you expect broken hints to do?
               | 
               | They're not used for rendering, they're used for figuring
               | out what to fetch. A HEAD request would be far less
               | efficient than knowing ahead of time what to fetch: 1
               | request versus 2N+1 requests.
               | 
               | What you suggest sounds all fine but the entire web is
               | user input for a browser, so no matter what, you need to
               | define how to fail. If you can fail gracefully, you might
               | as well do so, because a failure might not even be
               | triggered by bad code/configuration on your side but
               | simply by flaky network issues.
        
               | vbezhenar wrote:
               | Just don't ignore filename extension. favicon.svg is SVG
               | and that's about it. If you don't support SVG, don't
               | download it. If you want to store png in favicon.svg,
               | don't do that.
        
               | account42 wrote:
               | The web runs on mime types and file extensions are
               | irrelevant except for buggy browsers that try to be too
               | clever (Internet Explorer).
        
               | anyfoo wrote:
               | Yeah, I get how those hints make sense, now that you (and
               | others in the thread) have told me how things are, and I
               | did overlook that HEAD is still an extra request, while
               | the attributes are (effectively) for free.
               | 
               | I do wish that content negotiation (e.g. Accept headers)
               | worked properly. In the end though, those hints implement
               | a subset of content negotiation in a reasonable way,
               | given the state of affairs.
        
         | iudqnolq wrote:
         | YouTube and Twitter both have wrong parameters. Presumably this
         | means all major browsers ignore them or someone would have
         | noticed their favicons not displaying right?
        
         | paxys wrote:
         | Browsers ignore the hints because they aren't needed. The image
         | file itself has everything you need for rendering it.
        
           | ygra wrote:
           | The point for the hints is probably that the browser doesn't
           | need to fetch the 2000x2000 favicon if it only needs
           | something in 16x16 to render in the tab bar.
        
         | sokoloff wrote:
         | I don't see Postel's Law cited here yet, which I find pragmatic
         | and worth sharing/considering as I used to be in the "let their
         | stuff break" camp.
         | 
         | https://en.m.wikipedia.org/wiki/Robustness_principle (Quite
         | short)
        
         | Conlectus wrote:
         | A problem with this is that when a website breaks in one
         | browser, but works in another, I imagine most people's reaction
         | would be to blame the browser. This leads to a kind of race-to-
         | the-bottom for browser compatibility. See for example the
         | history of User-Agent strings.
        
           | jiggunjer wrote:
           | depends on the error message? Maybe instead of failing, give
           | an annoying prompt to offer a workaround.
        
         | ehsankia wrote:
         | That may be your viewpoint but browsers have historically
         | always taken the other viewpoint. Take HTML parsing for
         | example. You can miss closing tags and a ton of other stuff,
         | and it'll all work on a best-effort basis.
         | 
         | The browsers job is to do the best it can, that's what users
         | want. No one would use a browser that breaks at the smallest
         | tiniest error in the source code.
        
           | adamrezich wrote:
           | > browsers have historically always taken the other
           | viewpoint.
           | 
           | except for the short-lived XHTML fad which tbh I kind of miss
           | every day
        
             | vbezhenar wrote:
             | XHTML is still supported and works even with HTML5 tags.
        
       | Diesel555 wrote:
       | That article was a fun read! There was one sentence that bothered
       | me though.
       | 
       | > I recommend that browsers ignore these hints because they are
       | wrong much of the time. We calculated a 6.7% error rate for these
       | attributes, where the size is not found in the image or the type
       | does not match the file format.
       | 
       | I think of much in this context to mean at least more than 50% of
       | the time. So I had to look up the definition of the word. One
       | definition from Merriam is "more than is expected or acceptable :
       | more than enough." So I guess the usage is acceptable!
       | 
       | I always enjoy finding I have a slightly wrong definition in my
       | mind for a word. Many arguments, or much arguments, fail to move
       | forward due to the differing, unidentified, underlying
       | assumptions relying on words with slightly different definitions,
       | both people having a slightly different question they are arguing
       | in their mind.
        
       | silvestrov wrote:
       | It such a shame that Safari does not support SVG favicons. It's
       | the only major browser which doesn't: https://caniuse.com/link-
       | icon-svg
       | 
       | All current browsers support PNG.
        
         | amelius wrote:
         | Don't hold your breath. Safari is the new IE6.
        
         | mixmastamyk wrote:
         | Will it look good on a browser tab? Seems like the res would be
         | too low.
        
           | deathanatos wrote:
           | It's a vector graphic; its resolution is whatever you render
           | it at. "S" as in, "Scalable".
           | 
           | Sure, there is some nuance in that you wouldn't want some
           | fine detail to get lost at the displayed size, but presumably
           | you know you're making a favicon when you do so.
           | 
           | Or, you're the NFL & you're going to supply a 4 megapixel
           | image IDK.
        
             | account42 wrote:
             | > Sure, there is some nuance in that you wouldn't want some
             | fine detail to get lost at the displayed size, but
             | presumably you know you're making a favicon when you do so.
             | 
             | On the other hand, SVG is really not designed for the fine
             | pixel control you want to make the icon look good at
             | smaller sizes as it does not have the equivalent of font
             | hinting.
        
             | mixmastamyk wrote:
             | Not at very low resolutions, <= 32 px. See sibling comment.
        
         | est wrote:
         | Its such a shame that PNG does not support packing multiple
         | dimensions into one file like .ico formats actually do.
        
           | kevin_thibedeau wrote:
           | It can be done with MNG. There just has never been a tooling
           | ecosystem that supports it for non-animated applications.
        
       | fho wrote:
       | That "I am feeling lucky" button does not seem random at all, it
       | brought me in order to: Microsoft Windows, Blogger, The Financial
       | Times, Github, Adobe ...
       | 
       | As every other location I randomly scroll to has no recognizable
       | image on it ... that seems preselected :-)
        
       | ChrisArchitect wrote:
       | What is the Tranco dataset that this is based on? I mean come on
       | -- anything that claims to be based on 'Alexa' (or any of these
       | others: Cisco Umbrella/openDNS? Majestic? Quantcast?) is sooo
       | suspect. None of these sources are that good and especially Alexa
       | which harks back to a time 20 years ago of browser toolbars and
       | extensions which the large majority do not use anymore.
       | 
       | Just saying yes maybe it's easy to come up with a top 1000 list
       | of sites on the net, but other than that no one really knows
       | unless you're like Google/Bing/Apple/Cloudflare that have
       | redirection urls/DNS control, tracking clicks etc
        
       | cratermoon wrote:
       | I haven't updated the favicon on a site I run in years, if not
       | decades. It's a 32x32 GIF 89a file that runs 131 bytes.
       | 
       | It's interesting to ponder how many hundreds of bytes are
       | exchanged between the browser and the site just for a simple GET
       | request for the image.
        
       | gurgeous wrote:
       | Also, we turned up 2,000 domains that redirect to a very shady
       | site called happyfamilymedstore[dot]com. Stuff like
       | avanafill[dot]com, pfzviagra[dot]com, prednisoloneotc[dot]com.
       | These domains made it into the Tranco 100k somehow.
       | 
       | Full list here -
       | https://gist.github.com/gurgeous/bcb3e851087763efe4b2f4b992f...
        
         | johnx123-up wrote:
         | IMHO, you should add this note in the blog too. Also, wondering
         | about the use case of the website... are you building anything
         | else too?
        
         | unicornporn wrote:
         | Lately, happyfamilymedstore has mysteriously always been in the
         | top ~ten Google Images results for super niche bicycle parts
         | searches I do. They seem to have ripped an insane amount if
         | images that gets reposted on their domain.
        
           | 0des wrote:
           | What kind of parts are you looking for?
        
         | noitpmeder wrote:
         | Does anyone know the story behind these? How do seemingly
         | obscure sites consistently get massive amount of obscure
         | content placed highly in results.
        
           | jacurtis wrote:
           | What most of them do is they will use Wordpress exploits to
           | get into random wordpress website ran by people who know
           | nothing about managing a website and are running on a $3/mo
           | shared hosting account.
           | 
           | After they get into these random wordpress sites, then then
           | embed links back to their sketchy site in obscure places on
           | the wordpress site that they hacked, so that owners of the
           | site don't notice, but search bots do. They usually leave the
           | wordpress site alone, but will create a user account to get
           | back into it again later if Wordpress patches an exploit. All
           | of this exploit and link adding is automated, so it is just
           | done by crawlers and bots.
           | 
           | This is done tens of thousands or even millions of times
           | over. All of these sketchy backlinks eventually add up, even
           | if they are low quality, and provide higher ranking for the
           | site they all point to.
           | 
           | Think of websites like mommy blogs, diet diaries, family
           | sites, personal blogs, and random service companies
           | (plumbers, pest control, restaurants, etc) that had their
           | nephew throw up a wordpress site instead of hiring a
           | professional.
           | 
           | I don't mean to pick on wordpress, but it really is the most
           | common culprit of these attacks. Because so many Wordpress
           | sites exist that are operated by people who aren't informed
           | about basic security. Plus, wordpress is open source, so
           | exploits get discovered by looking at source code and
           | attackers will sell those exploits instead of reporting them.
           | So Wordpress is in an infinite cycle of chasing exploits and
           | patching them.
        
             | lazide wrote:
             | Pretty sure closed source wasn't very effective at stopping
             | 0days either (Windows). The most common platform gets the
             | attention generally.
        
             | shuntress wrote:
             | > "had their nephew throw up a wordpress site instead of
             | hiring a professional"
             | 
             | The web is _supposed_ to be accessible to everyone.
             | 
             | This type of "blame the victim" attitude is a poor way to
             | handle criminal activity.
        
               | [deleted]
        
               | pixl97 wrote:
               | There are plenty of places that you can go to on this
               | planet with little to no law enforcement. Don't be
               | surprised if you end up dead there. Handling global crime
               | is very difficult.
        
               | charcircuit wrote:
               | and anyone can hire me to design them a website.
        
               | jiggawatts wrote:
               | If they had used static content, it would remain 100%
               | accessible to them, but also vastly more secure.
               | 
               | Dynamic content generation _on the fly_ for a blog is
               | unnecessary complexity that invites attacks.
        
               | pc86 wrote:
               | Static content is definitively _not_ as accessible to the
               | typical person asking their nephew to put up a WP blog on
               | shared GoDaddy hosting.
        
               | jiggunjer wrote:
               | wouldn't that preclude a few popular features like a rich
               | text editor?
        
               | jiggawatts wrote:
               | You can have a separate system, even a locally running
               | desktop app do that. You can still have a database,
               | complex HTML templating, and image resizing! You just do
               | it offline as a preprocessing step instead of online
               | dynamically for each page view.
               | 
               | Unfortunately, this approach never took off, even though
               | it scales trivially to enormous sites and traffic levels.
               | 
               | I recently tried to optimise a CMS system where it was
               | streaming photos from the database to the web tier, which
               | then resized it and even _optimised_ it on the fly. Even
               | with caching, the overheads were just obscene. Over a 100
               | cores could barely push out 200 Mbps of content.
               | Meanwhile a single-core VM can easily do 1 Gbps of static
               | content!
        
               | vbezhenar wrote:
               | I thought about "serverless" blog.
               | 
               | Here's some rough scheme I came up with (I never
               | implemented it, though):
               | 
               | 1. Use github pages to serve content.
               | 
               | 2. Use github login to authenticate using just JS.
               | 
               | 3. Use JS to implement rich text editor and other edit
               | features.
               | 
               | 4. When you're done with editing, your browser creates a
               | commit and pushes it using GitHub API.
               | 
               | 5. GitHub rebuilds your website and few seconds later
               | your website reflects the changes. JavaScript with
               | localStorage can reflect the changes instantly to improve
               | editor experience.
               | 
               | 6. Comments could be implemented with fork/push request.
               | Of course that implies that your users are registered on
               | GitHub, so may not be appropriate for every blog. Or just
               | use external commenting system.
        
               | mkotowski wrote:
               | So, essentially a site generated with Jekyll, hosted on
               | GitHub Pages with Utterances [0] for comments and updated
               | with GitHub Actions.
               | 
               | I don't know if https://github.dev version of Visual
               | Studio Code supports extensions/plugins, but if so, then
               | there is also a rich text editor for markdown ready.
               | 
               | All that's left would be an instant refresh for editing.
               | 
               | [0]: https://utteranc.es
        
               | pc86 wrote:
               | If this is a serious suggestion (I really hope it isn't),
               | you have never met the kind of person setting up the
               | blogs the GP is talking about.
        
             | mfkp wrote:
             | I recently saw and reported one to a local business.
             | 
             | If you typed in the domain and visited directly, it
             | wouldn't redirect to the scam site. But if you clicked on a
             | link from a google search, then it would redirect.
             | 
             | Probably makes it harder to find for small website owners
             | if they're not clicking their own google searches.
        
           | IncRnd wrote:
           | It happens through search engine optimization, SEO, and a mix
           | of planting reviews and other tactics. Think of it like this
           | - what would you do to get people talking about your site?
           | You'd somehow put links, conversations, reviews, quotes, etc.
           | in front of them.
        
       | comeonseriously wrote:
       | Slightly OT, but what was that one that came around a few years
       | ago that would make everyone's CPU go to 100%?
        
       | nanis wrote:
       | I know of a company whose favicon was a hires true color PNG that
       | weighed in at more than 2 MB. The web site was the dominion of
       | marketing. Suggestions to improve the situation were detrimental
       | to one's career path. _sigh_
        
       | tonetheman wrote:
       | I use an inline svg for mine... which is really just a poop
       | emoji.
        
       | anyfoo wrote:
       | ... and wrote an interesting technical article about it, that
       | even someone like me, who doesn't do web development, enjoys
       | reading. Definitely why I come to HN (no sarcasm, it is).
        
       | toast0 wrote:
       | Favicons are slightly useful. You can serve your page at
       | http://www.example.com with a favicon from https://example.com
       | that has a HTTP Strict-Transport-Security header with
       | includeSubDomains, and then future page loads in that browser
       | will be https (across your whole domain). (This assumes you want
       | your domain to be https)
       | 
       | Other than that, I'm still pretty meh about them.
        
       | gurgeous wrote:
       | Also see the gigantic map - https://iconmap.io
       | 
       | The blog post is the analysis of the data set, the map is the
       | visualization.
        
         | isoprophlex wrote:
         | Is the dataset available for download? I couldn't immediately
         | find a download to the dataset in the linked article.
         | 
         | My hands itch to do some dimension reduction on that data and
         | make some nice plots
        
           | nkriege wrote:
           | We'd be happy to share the data. Reach us at help at
           | gurge.com if you're interested.
        
           | wiz21c wrote:
           | damn I was thinking about that too :-)
        
         | oehpr wrote:
         | I wonder if there might be a way to map all these using t-SNE
         | to discrete grid locations? Maybe even an autoencoder. I'd love
         | to see what features it could pick out.
         | 
         | I don't see their data set though. hmmm.
         | 
         | maybe I'll just have to crawl it on my own if I want to do it.
        
           | yboris wrote:
           | side note: instead of t-SNE consider UMAP - provides better
           | results (and it's _much_ faster)
           | https://github.com/lmcinnes/umap
        
           | lgvld wrote:
           | You can use t-SNE (or even better: UMAP or one of its
           | variation) to create a 2D points cloud, and then use
           | something like RasterFairy [1] to map 2D positions to the
           | cells a grid. It usually works well.
           | 
           | [1] https://github.com/Quasimondo/RasterFairy
        
         | svdr wrote:
         | I see a lot of repetitions in the map?
        
           | gurgeous wrote:
           | It's one icon per domain. Try hovering (on desktop) and
           | you'll see that many domains have the same favicon.
        
             | true_religion wrote:
             | It also works on mobile if you tap the fav icon.
        
       | bellyfullofbac wrote:
       | Huh, there's a row of identical icons of 3 blue circles (search
       | for cashadvancewow[dot]com) and all the domains using them are
       | loan-related. Interesting way to do forensics on clone sites
       | (although trying a few of them, they're not showing any icons
       | right now, and the URL /favicon.ico 404's)
       | 
       | And I checked a few of the sites, I just got lorem-ipsum style
       | landing pages. I wonder what's the point, or are the scammers
       | using the domains mostly for emails?
        
         | deathanatos wrote:
         | There are multiple runs of "just a bit _too_ abstract " icons
         | that point into the abyssal cesspools of the Internet. Most of
         | them seem to be about loans, so I'm going to avoid announcing
         | that too loudly if I ever need a loan, since clearly, there are
         | some scumbags out there.
        
       | ScaleneTriangle wrote:
       | Would have liked to see more color analysis, like a graph showing
       | the number of distinct colours per icon.
        
       | quitit wrote:
       | The difference between the Apple "precomposed" and standard icons
       | had to do with the gloss effect on icons on pre iOS 7 home
       | screens.
       | 
       | When adding a website/webapp to these earlier home screens, the
       | OS would apply a gloss effect over the icon in order to match the
       | aesthetic of the standard apps. The precomposed icon was a way
       | for the developer to stop the OS from applying this effect, such
       | as if their logo already had a different gloss effect already
       | applied (i.e "precomposed") or other design where adding the
       | glossy shine wouldn't look right. The standard icon allowed the
       | OS to apply the gloss effect - which was a timesaver as Apple did
       | tweak the gloss contour over the years: hence using a standard
       | icon ensured that the website/webapp always matched the user's OS
       | version.
        
       ___________________________________________________________________
       (page generated 2021-10-21 23:02 UTC)