[HN Gopher] We analyzed 425k favicons
___________________________________________________________________
We analyzed 425k favicons
Author : gurgeous
Score : 229 points
Date : 2021-10-20 17:29 UTC (5 hours ago)
(HTM) web link (iconmap.io)
(TXT) w3m dump (iconmap.io)
| paulirish wrote:
| Aside: This article is a decent usecase for the esoteric `image-
| rendering: pixelated;` css property.
| nkriege wrote:
| Great tip. I've never come across this before. I updated the
| post and the scaled up icons look much sharper now.
| dmitrygr wrote:
| I used it to make this PWA work well on iPhones:
| http://dmitry.gr/89
| lostgame wrote:
| Ha - that's a fantastically nerdy little project. I love it!
| 1cvmask wrote:
| The favicon visualization brought memories of the million dollar
| homepage. I suppose it was precursor of NFTs.
|
| https://en.wikipedia.org/wiki/The_Million_Dollar_Homepage
|
| http://www.milliondollarhomepage.com/
| Groxx wrote:
| Off in one of the more esoteric corners of favicons, you have
| games played within the favicon:
| https://www.youtube.com/watch?v=fpjM5myls7I
|
| Sadly it doesn't quite work for me any more, but the youtube
| video does a decent job showing what it looked like when it
| worked.
| tinco wrote:
| Not really relevant, but using Go to fetch the data, and then
| Ruby to process the data is the best. I used this exact set up
| for a project and it was amazing. Really the sweet spot of use
| cases for both languages.
| tweakimp wrote:
| Can you please explain why they are the best languages for
| these jobs?
| tinco wrote:
| Go's got an awesome feature set built in to the language for
| building small networked services. I implemented a client to
| a cryptocurrency network to extract information about its
| status and clients. I can't really express why it's so good,
| it just feels right.
|
| Same for Ruby, the syntax is perfectly suited for
| transforming, digging through and acting upon data. I didn't
| even add a Gemfile, only used standard library functions,
| transforming the data the Go program mined into usable
| information serialized in JSON which was subsequently used as
| a static database for a webpage.
|
| You can find the source here:
| https://github.com/tinco/stellar-core-go, the Go is in cmd
| and the Ruby is in tools.
|
| The site it powers is now defunct, apparently they changed
| some stuff in the past 3 years and the crawler no longer
| functions.
| whalesalad wrote:
| I have always wanted to do this _exact_ analysis - so awesome!
| Every time I am building some kind of semi-intelligent parser to
| fetch an arbitrary visual icon for a URL I think to myself there
| has gotta be a better way do do this.
| munk-a wrote:
| Didn't they miss all the pre-sized icons in their scan as well?
| For a while Apple encouraged multiple resolution sizes for
| favicons for... reasons.
|
| I know they additionally missed the directory specific favicons
| which have always had iffy support (i.e. /index.html =>
| /favicon.ico and /munks-page/index.html => /munks-
| page/favicon.ico)
| achillean wrote:
| Nmap generated a similar version many years ago and it's still
| available at:
|
| https://nmap.org/favicon/
|
| We also did something looks at favicons by IP:
|
| https://faviconmap.shodan.io/
| [deleted]
| arp242 wrote:
| I got mine down to 160 bytes with some pixel tweaking and
| converting it to a 16-color indexed PNG. It's not a lot of work
| or very difficult (I'm an idiot at graphics editing), but you do
| need to spend the (small amount of) effort. I embed it as a data
| URI and it's just four lines of (col-80 wrapped) base64 text,
| which seems reasonable to me.
|
| Haven't managed to get my headshot down to less than 10k without
| looking horrible no matter how much I tweaked the JPEG or WebP
| settings, and thought that was just a tad too big to embed. Maybe
| I need to find a different picture that compresses better.
|
| I got that 280k Discord favicon down to just 24K simply by
| opening it in GIMP and saving it again. I got it down to 12K by
| indexing it to 255 colours rather than using RGB (I can't tell
| the difference even at full size). You can probably make it even
| smaller if you tried, but that's diminishing returns. Still, I
| bet with 5 more minutes you can get it to ~5k or so.
|
| It's very easy; you just need to care. Does it matter? Well, when
| I used Slack I regularly spent a minute waiting for them to push
| their >10M updates, so I'd say that 250k here and 250k there etc.
| adds up and matters, giving real actual improvements to your
| customers.
|
| The Event Horizon Telescope having a huge favicon I can
| understand; probably just some astronomer who uploaded it in
| WordPress or something. Arguably a fault of the software for not
| dealing with that more sensibly, but these sort of oversights
| happen. A tech company making custom software for a living is
| quite frankly just embarrassing to the entire industry. It's a
| big fat "fuck you" to anyone from less developed areas with less-
| than-ideal internet connections.
| TheJoeMan wrote:
| " I got that 280k Discord favicon down to just 24K simply by
| opening it in GIMP and saving it again. "
|
| You made me laugh out loud.
|
| I agree that stuff like YouTube.com saying 144x but really 145x
| seems like it should be embarrassing.
| arp242 wrote:
| I wouldn't be surprised if that was for a specific reason,
| like somehow showing up better somewhere for some reason, or
| something like that. Or maybe not; who knows...
| fbrchps wrote:
| Oh hey, Discord must have seen this article -- their favicon is
| down to 14k now.
| gremloni wrote:
| That's lit and a fantastic turnaround. Great work to whoever
| is reading this!
| nerfhammer wrote:
| there are png optimizer programs, e.g. optipng
| pseudosavant wrote:
| The Squoosh (web) app is awesome for this too! All processing
| is done locally with wasm.
|
| https://squoosh.app
| vadfa wrote:
| `optipng -o9 -strip all' is a must
| JohnTHaller wrote:
| 256x256 PNG reduced to 256 colors with pixel transparency gets
| it to 2.68K. I manually dropped the color depth to indexed and
| saved it out in PhotoShop and I used FileOptimizer to shrink
| it. It includes 12 different image shrinkers and runs them all.
| TazeTSchnitzel wrote:
| The non-PNG Apple touch icons might be CgBI files? It's an
| undocumented proprietary Apple extension to PNG which most PNG
| tools won't accept, but which Xcode uses for iOS apps.
| ryan29 wrote:
| > In fact, I recommend that browsers ignore these hints because
| they are wrong much of the time.
|
| I don't agree. That's the kind of coddling that encourages
| incompetence. Instead of compensating for others' mistakes, just
| let their stuff break.
|
| I wonder if Safai on iOS ignores the hints. When I tested, I was
| surprised to see that pressing the share icon, which holds the
| option for `Add to Home Screen`, would cause a download of all of
| the icons listed with `link rel="icon"`.
|
| Favicons are a huge pain to deal with correctly.
| malfist wrote:
| People make mistakes all the time. Breaking because somebody
| made a mistake that you can correct for just leads to
| unnecessarily fragile code.
|
| What's the point of failing and breaking stuff if someone tells
| you their image is 144x144 but it's really 145x145? Who does
| that benefit?
| anyfoo wrote:
| The opposite is the case. Overall, being too lenient in what
| code accepts and applying heuristics will lead to way worse
| problems down the line. For example, you want your compiler
| to fail hard instead of saying: "Oh, this isn't a pointer,
| but I'm sure you meant well, I'm just going to treat it as a
| pointer!"
|
| In _this_ particular case, it seems to me that the hints
| serve no purpose and should be abolished, and in the meantime
| fully ignored, altogether. All necessary metadata is
| contained in the image file, and browsers should also be
| (relatively) strict in what image files with what metadata
| they accept, for security reasons alone.
|
| And if they also went so far as limiting file size, the
| perpetrators that clog up bandwidth by putting up multi-MB
| favicons would catch on much earlier (or at all), too.
|
| So what actually is the point of those hints, if browsers
| have to fallback anyway?
| iudqnolq wrote:
| YouTube and Twitter both have wrong parameters. Presumably this
| means all major browsers ignore them or someone would have
| noticed their favicons not displaying right?
| paxys wrote:
| Browsers ignore the hints because they aren't needed. The image
| file itself has everything you need for rendering it.
| ygra wrote:
| The point for the hints is probably that the browser doesn't
| need to fetch the 2000x2000 favicon if it only needs
| something in 16x16 to render in the tab bar.
| Conlectus wrote:
| A problem with this is that when a website breaks in one
| browser, but works in another, I imagine most people's reaction
| would be to blame the browser. This leads to a kind of race-to-
| the-bottom for browser compatibility. See for example the
| history of User-Agent strings.
| silvestrov wrote:
| It such a shame that Safari does not support SVG favicons. It's
| the only major browser which doesn't: https://caniuse.com/link-
| icon-svg
|
| All current browsers support PNG.
| amelius wrote:
| Don't hold your breath. Safari is the new IE6.
| ChrisArchitect wrote:
| What is the Tranco dataset that this is based on? I mean come on
| -- anything that claims to be based on 'Alexa' (or any of these
| others: Cisco Umbrella/openDNS? Majestic? Quantcast?) is sooo
| suspect. None of these sources are that good and especially Alexa
| which harks back to a time 20 years ago of browser toolbars and
| extensions which the large majority do not use anymore.
|
| Just saying yes maybe it's easy to come up with a top 1000 list
| of sites on the net, but other than that no one really knows
| unless you're like Google/Bing/Apple/Cloudflare that have
| redirection urls tracking clicks etc
| gurgeous wrote:
| Also, we turned up 2,000 domains that redirect to a very shady
| site called happyfamilymedstore[dot]com. Stuff like
| avanafill[dot]com, pfzviagra[dot]com, prednisoloneotc[dot]com.
| These domains made it into the Tranco 100k somehow.
|
| Full list here -
| https://gist.github.com/gurgeous/bcb3e851087763efe4b2f4b992f...
| unicornporn wrote:
| Lately, happyfamilymedstore has mysteriously always been in the
| top ~ten Google Images results for super niche bicycle parts
| searches I do. They seem to have ripped an insane amount if
| images that gets reposted on their domain.
| 0des wrote:
| What kind of parts are you looking for?
| noitpmeder wrote:
| Does anyone know the story behind these? How do seemingly
| obscure sites consistently get massive amount of obscure
| content placed highly in results.
| jacurtis wrote:
| What most of them do is they will use Wordpress exploits to
| get into random wordpress website ran by people who know
| nothing about managing a website and are running on a $3/mo
| shared hosting account.
|
| After they get into these random wordpress sites, then then
| embed links back to their sketchy site in obscure places on
| the wordpress site that they hacked, so that owners of the
| site don't notice, but search bots do. They usually leave the
| wordpress site alone, but will create a user account to get
| back into it again later if Wordpress patches an exploit. All
| of this exploit and link adding is automated, so it is just
| done by crawlers and bots.
|
| This is done tens of thousands or even millions of times
| over. All of these sketchy backlinks eventually add up, even
| if they are low quality, and provide higher ranking for the
| site they all point to.
|
| Think of websites like mommy blogs, diet diaries, family
| sites, personal blogs, and random service companies
| (plumbers, pest control, restaurants, etc) that had their
| nephew throw up a wordpress site instead of hiring a
| professional.
|
| I don't mean to pick on wordpress, but it really is the most
| common culprit of these attacks. Because so many Wordpress
| sites exist that are operated by people who aren't informed
| about basic security. Plus, wordpress is open source, so
| exploits get discovered by looking at source code and
| attackers will sell those exploits instead of reporting them.
| So Wordpress is in an infinite cycle of chasing exploits and
| patching them.
| lazide wrote:
| Pretty sure closed source wasn't very effective at stopping
| 0days either (Windows). The most common platform gets the
| attention generally.
| shuntress wrote:
| > "had their nephew throw up a wordpress site instead of
| hiring a professional"
|
| The web is _supposed_ to be accessible to everyone.
|
| This type of "blame the victim" attitude is a poor way to
| handle criminal activity.
| IncRnd wrote:
| It happens through search engine optimization, SEO, and a mix
| of planting reviews and other tactics. Think of it like this
| - what would you do to get people talking about your site?
| You'd somehow put links, conversations, reviews, quotes, etc.
| in front of them.
| comeonseriously wrote:
| Slightly OT, but what was that one that came around a few years
| ago that would make everyone's CPU go to 100%?
| nanis wrote:
| I know of a company whose favicon was a hires true color PNG that
| weighed in at more than 2 MB. The web site was the dominion of
| marketing. Suggestions to improve the situation were detrimental
| to one's career path. _sigh_
| anyfoo wrote:
| ... and wrote an interesting technical article about it, that
| even someone like me, who doesn't do web development, enjoys
| reading. Definitely why I come to HN (no sarcasm, it is).
| toast0 wrote:
| Favicons are slightly useful. You can serve your page at
| http://www.example.com with a favicon from https://example.com
| that has a HTTP Strict-Transport-Security header with
| includeSubDomains, and then future page loads in that browser
| will be https (across your whole domain). (This assumes you want
| your domain to be https)
|
| Other than that, I'm still pretty meh about them.
| gurgeous wrote:
| Also see the gigantic map - https://iconmap.io
|
| The blog post is the analysis of the data set, the map is the
| visualization.
| isoprophlex wrote:
| Is the dataset available for download? I couldn't immediately
| find a download to the dataset in the linked article.
|
| My hands itch to do some dimension reduction on that data and
| make some nice plots
| nkriege wrote:
| We'd be happy to share the data. Reach us at help at
| gurge.com if you're interested.
| oehpr wrote:
| I wonder if there might be a way to map all these using t-SNE
| to discrete grid locations? Maybe even an autoencoder. I'd love
| to see what features it could pick out.
|
| I don't see their data set though. hmmm.
|
| maybe I'll just have to crawl it on my own if I want to do it.
| yboris wrote:
| side note: instead of t-SNE consider UMAP - provides better
| results (and it's _much_ faster)
| https://github.com/lmcinnes/umap
| svdr wrote:
| I see a lot of repetitions in the map?
| gurgeous wrote:
| It's one icon per domain. Try hovering (on desktop) and
| you'll see that many domains have the same favicon.
| true_religion wrote:
| It also works on mobile if you tap the fav icon.
| bellyfullofbac wrote:
| Huh, there's a row of identical icons of 3 blue circles (search
| for cashadvancewow[dot]com) and all the domains using them are
| loan-related. Interesting way to do forensics on clone sites
| (although trying a few of them, they're not showing any icons
| right now, and the URL /favicon.ico 404's)
|
| And I checked a few of the sites, I just got lorem-ipsum style
| landing pages. I wonder what's the point, or are the scammers
| using the domains mostly for emails?
| quitit wrote:
| The difference between the Apple "precomposed" and standard icons
| had to do with the gloss effect on icons on pre iOS 7 home
| screens.
|
| When adding a website/webapp to these earlier home screens, the
| OS would apply a gloss effect over the icon in order to match the
| aesthetic of the standard apps. The precomposed icon was a way
| for the developer to stop the OS from applying this effect, such
| as if their logo already had a different gloss effect already
| applied (i.e "precomposed") or other design where adding the
| glossy shine wouldn't look right. The standard icon allowed the
| OS to apply the gloss effect - which was a timesaver as Apple did
| tweak the gloss contour over the years: hence using a standard
| icon ensured that the website/webapp always matched the user's OS
| version.
___________________________________________________________________
(page generated 2021-10-20 23:00 UTC)