[HN Gopher] Taking on Google
       ___________________________________________________________________
        
       Taking on Google
        
       Author : rapnie
       Score  : 117 points
       Date   : 2021-03-31 14:49 UTC (8 hours ago)
        
 (HTM) web link (inthegood.co)
 (TXT) w3m dump (inthegood.co)
        
       | koalaman wrote:
       | I read most of the article but couldn't quite find how they do
       | analytics without tracking people. Did I miss it? How are they
       | more privacy preserving?
        
         | markosaric wrote:
         | This is probably a better overview of what makes Plausible a
         | privacy-focused tool: open source, can be self-hosted, no
         | connection to adtech, minimal data collection, no cookies, no
         | persistent identifiers, no personal data, no cross-site/device
         | tracking etc
         | 
         | https://plausible.io/privacy-focused-web-analytics
         | 
         | (I'm the co-founder)
        
           | warkdarrior wrote:
           | OK, so not quite privacy-preserving in the cryptographic
           | sense, but more of a matter of degree. Plausible Analytics
           | collects less data than Google Analytics, but not zero.
        
       | cyberlab wrote:
       | Plausible is great, and I see the need for it, but I've always
       | enjoyed using AWStats instead, as there is no need to add third
       | party code to my site. It all happens in the background and it
       | paints a much better picture of your stats since users can't
       | block the gathering of stats with an AD-Blocker.
        
         | marvinblum wrote:
         | That's one of the reasons I build Pirsch (https://pirsch.io/).
         | All the JavaScript integrations can be blocked.
        
         | rkagerer wrote:
         | I've used AWStats for years. It's not perfect, but it's
         | preferable to the scummy alternatives proffered by the big
         | boys.
        
         | not_knuth wrote:
         | How would you compare it to GoAcess [0]? I've only ever used
         | GoAccess, but AWStats seems to be the older, more mature
         | tool... so I would be curious about a comparison.
         | 
         | [0] https://goaccess.io/
        
       | johnghanks wrote:
       | This is literally an ad.
        
       | max_ wrote:
       | Am I missing something? To me, analytics & privacy sound like a
       | contradiction
        
         | XCSme wrote:
         | It is better privacy. There's one thing for one entity to know
         | what everyone is doing on the web, all the websites that you
         | visit and what you do on them, and another thing for an entity
         | to know that you visited their own website, without knowing
         | what other websites you visit and what you do on them.
         | 
         | LE: The best solution is still self-hosting, as hosted
         | plausible is still a 3rd party entity that centralizes data
         | (even though they probably don't use or share this data).
        
           | gdsdfe wrote:
           | but ... the website that I'm visiting have no incentive in
           | caring about my privacy, I mean yes they should but what's in
           | it for them ? I think this go to market approach of "we are
           | better because google is evil" is just flawed.
        
             | lecarore wrote:
             | Well, I'm an indie Dev and I do care, I find advertising
             | and cookie notices really annoyining and I can afford the
             | 40 something euros a year it costs me. I don't need to
             | track everything.
        
             | XCSme wrote:
             | They do care, the data can be collected anonymously,
             | without being linked directly to your person. They can use
             | such data to improve your experience, without affecting you
             | personally in any way.
        
         | kevincox wrote:
         | This is far too simple of a view.
         | 
         | - For the large part the concern is what is done with the data.
         | 
         | - Data can be anonymized. (Although this is often hard to
         | verify)
         | 
         | - You can hide the data in the client. For example imagine you
         | want to know how many users use feature X. You can send an
         | analytics report with 90% chance of a random value, and 10%
         | chance sending the true boolean. You can't tell if any specific
         | user has used the feature (because most likely it is a random
         | value) but you can get a pretty good estimate what portion of
         | your users use the feature.
         | 
         | My understanding is that Plausible is focused on the use an
         | anonymization.
        
       | CobrastanJorji wrote:
       | > For a website that has 10,000 visitors per month, in one year
       | you could save about 4.5 kilograms of CO2 emissions just by
       | replacing Google analytics with Plausible.
       | 
       | What the hell? How do you calculate this figure? That's roughly
       | equivalent to the CO2 created by driving a gas-burning car 10
       | feet to the data center to ferry information about each request
       | (figuring a car emits about 1.2 pounds of CO2 per mile traveled).
       | That's an astounding claim, and there's no effort to even explain
       | the idea behind it.
        
         | aimor wrote:
         | It's a point from the plausible.io website and sales pitch.
         | 
         | https://plausible.io/lightweight-web-analytics
         | 
         | The file size savings is 44.3 kB per visitor, which over
         | 120,000 visits is 5 GB per year.
         | 
         | The Website Carbon Calculator uses a ratio of 1.8 kWh per GB of
         | data transferred, and 475 g CO2 generated per kWh.
         | 
         | https://www.websitecarbon.com/
         | 
         | 44.3 kB per visit * 120,000 visits per year * 1.8 kWh per GB *
         | 475 g CO2 per kWh = 4.5 kg per year after fixing units.
         | 
         | "These numbers are all estimates but you can imagine if
         | millions of website owners and Google Analytics users end up
         | making a similar reduction in their website size too. The total
         | reduction in the carbon footprint of the web would be immense."
        
           | ariwilson wrote:
           | "a ratio of 1.8 kWh per GB of data transferred"
           | 
           | This seems wildly high, even counting all the hops. For
           | reference, 1.8 kWh is enough to move my car 7 miles and my
           | e-bike over 100 miles.
        
             | aimor wrote:
             | It's difficult to estimate accurately, but their methods
             | are spelled out on the website.
             | 
             | https://www.websitecarbon.com/how-does-it-work/
             | 
             | "Energy intensity of web data
             | 
             | Energy is used at the data centre, telecoms networks and by
             | the end user's computer or mobile device. Of course, this
             | varies for every website and every visitor and so we use an
             | average figure. The figures used are for 2017 from the
             | report On Global Electricity Usage of Communication
             | Technology: Trends to 2030 by Anders Andrae and Tomas
             | Edler, adjusted to remove manufacturing energy as this is
             | not relevant to this calculator. We then divide the total
             | amount of energy used by the total annual data transfer
             | over the web as reported in the Nature article, How to stop
             | data centres gobbling up the world's electricity. This
             | gives us a figure of 1.8kWh/GB."
        
               | crazygringo wrote:
               | If you view the paper [1], I gave it a quick scan and it
               | seems to be counting the electricity usage of all
               | communications _devices_ on top of data centers.
               | 
               | So in power per GB transferred, it's counting all the
               | power used by people's 60" internet-connected TV
               | displays.
               | 
               | Which is, obviously, absurd to include if you're trying
               | to measure the _marginal_ effect of additional _data_.
               | More data doesn 't increase your screen's power
               | consumption, obviously.
               | 
               | An accurate claim for Plausible would have to be based
               | mainly on marginal increases of power by datacenter and
               | communications networks.
               | 
               | [1] https://www.mdpi.com/2078-1547/6/1/117/htm
        
               | SamBam wrote:
               | It's hard to say. Obviously without any data at all none
               | of those screens would be on.
               | 
               | I always find the discussion of marginal increases of
               | energy tricky. If I buy a plane ticket on a half-empty
               | flight, obviously that flight was going to take off
               | anyway, so the marginal increase of my weight plus my
               | luggage is fairly negligible in comparison, so I'm only
               | to "blame" for a fraction of the fuel spent, right? But
               | who else is there to blame except the passengers, without
               | whom there would be (eventually) no flights? So shouldn't
               | we all divide the blame evenly?
        
               | crazygringo wrote:
               | If we keep it simple, there are two kinds of marginal
               | increases.
               | 
               | The first type is when marginal increase can lead to a
               | "new unit", like planes you refer to -- or servers used
               | by data centers. If a plane fits 100 people, then
               | (simplifying) 1/100 of the time you'll result in a new
               | plane being used, so it makes sense to divide the plane's
               | total resources by passengers -- not just the fuel you
               | used.
               | 
               | But the second type never results in a "new unit". In
               | this scenario, using more resource-hungry analytics will
               | _never_ push someone to purchase a second cell phone to
               | spread the load. So counting _anything_ but marginal
               | energy increase usage by the CPU directly is
               | disingenuous.
               | 
               | So in the case of analytics software, their data center
               | server/power resources fall into the first type. But the
               | consumer device resources fall into the second type.
               | 
               | So in this case I don't think there's anything tricky at
               | all about it.
        
             | guenthert wrote:
             | Someone wants to be paid for that energy dissipated. I
             | don't see how Netflix could be profitable this way.
        
           | ryanobjc wrote:
           | What if you are switching from a datacenter that's carbon
           | neutral to one that isn't?
           | 
           | Also as a note, the google analytics js is heavily cached and
           | thus doesn't have to travel as far or at all. Also Google has
           | onramps to their carbon neutral infrastructure everywhere, so
           | theres also that.
        
           | foolmeonce wrote:
           | > "These numbers are all estimates but you can imagine if
           | millions of website owners and Google Analytics users end up
           | making a similar reduction in their website size too. The
           | total reduction in the carbon footprint of the web would be
           | immense."
           | 
           | If we removed 40k of CDN content per visit then the 1.8
           | kwh/GB would be 2.8 kwh/GB.
        
         | eximius wrote:
         | > 4.5 kilograms of CO2
         | 
         | So, an absolutely negligible amount of CO2?
         | 
         | By virtually any metric? i.e., you, as an individual, exhale
         | that much CO2 in a week.
         | 
         | Hold industrial processes responsible for CO2 emissions, not
         | your website. (Unless you're bitcoin, I guess?)
        
           | anonporridge wrote:
           | This kind of penny wise and pound foolish approach just seems
           | like a waste of time at best and at worst it lulls voters
           | into complacency and distracts from the fact that our
           | politicians still aren't doing anywhere close to enough to
           | address carbon emissions. It's just PR and corporate
           | greenwashing.
           | 
           | Like everything else, the best approach is just to use a mix
           | of regulation, renewable subsidies, and a carbon tax to make
           | using fossil fuels cost prohibitive compared to renewables
           | and the market will eliminate them on its own. The wider the
           | cost difference becomes, the faster renewables will displace
           | carbon energy. We're getting there slowly as wind and solar
           | are now slightly cheaper than carbon fuels, but we should
           | definitely be helping it along a lot faster if we're serious
           | about avoiding the worst case climate scenarios.
           | 
           | So far, it seems like we aren't serious about it and our
           | leadership is sleepwalking us towards increasing catastrophe.
        
         | seoaeu wrote:
         | I mean 10k requests/second seems quite achievable for a single
         | server. And I'd totally believe that 12 seconds of compute (per
         | year!) wouldn't use much energy. In reality those requests
         | would be intermixed with millions more for other sites and the
         | servers would be running continuously, but the resources
         | attributable to an individual site should be the same.
        
           | CobrastanJorji wrote:
           | I mean, let's think about this a bit. The US generated about
           | 4.13 trillion kilowatt-hours in 2019, and that generation
           | emitted about 1.72 billion metric tons of CO2, or about 0.92
           | pounds of CO2 per kWh
           | (https://www.eia.gov/tools/faqs/faq.php). Let's assume Google
           | gets their power at that rate (which is unfair to Google
           | because they claim to use 100% renewable energy, but I don't
           | want to get into that).
           | 
           | A typical server rack might use anywhere from maybe 5-50 kWh.
           | Let's say Google has really beefy ones that consume 100 kWh
           | per hour. That's 92 pounds of CO2 per hour. For the 12
           | seconds you mentioned, that's still only 0.011 kilograms of
           | total CO2 used. And the claim is that they're BETTER by 4.5
           | kilograms.
           | 
           | They've gotta be talking about some other expense than the
           | server. But what sort of expense? The cost to build a server?
           | Something about general maintenance of the Internet? ISPs
           | between clients and the server?
        
       | kureikain wrote:
       | Anyone know how they identify the same user? All solution I know
       | generate a unique number and put in cookie.
       | 
       | At my app https://hanami.run I don't track user and cannot know
       | if the same users visit our website :-(. I don't want to use
       | cookie and want to get away with GDPR. At the same time, I love
       | to see which visitors repeatly read my website/blog and where
       | they drop so I can optimize my site.
        
         | marvinblum wrote:
         | Fingerprinting. That's what I do for Pirsch
         | (https://pirsch.io/) and I think they do it in a similar
         | manner. You can check out our source code here:
         | https://github.com/pirsch-analytics/pirsch/blob/master/finge...
         | or take a look at the Plausible repo.
        
           | A21z wrote:
           | Even though it may be cookieless, you won't << get away with
           | GDPR >> with fingerprinting.
        
             | kureikain wrote:
             | It looks like the finger print only rely on ip/user-agent
             | and I think ip/user-agent are ok to be stored and still
             | GDPR compliance?
        
               | marvinblum wrote:
               | Everything that can be used to uniquely identify a
               | visitor falls under the GDPR. We don't store IP
               | addresses, so it should be GDPR compliant, but we still
               | need to check that to make the claim.
        
           | kureikain wrote:
           | Oh this is really great. Thanks for that. The code is
           | concise.
        
             | marvinblum wrote:
             | Yeah that's the easy part. It gets more interesting when
             | you get try to filter bots, parse the user-agent and stuff
             | like that.
        
       | iujjkfjdkkdkf wrote:
       | The site has a big re-captcha banner on it - one of Google's most
       | consumer hostile products. They should consider switching to
       | something else if they want to "take on google".
        
         | melomal wrote:
         | The fact that they will need to pay Google for this service
         | after a certain threshold also limits taking them on. Pennies
         | but still paying them.
        
         | sedatk wrote:
         | What's the non-hostile alternative?
        
           | Nextgrid wrote:
           | Old-school "squiggly letters" captcha? For all the fear
           | mongering around AI and machine learning supposedly breaking
           | them, I'm still not aware of a general-purpose tool that
           | would solve those out of the box without significant
           | engineering effort.
        
             | spijdar wrote:
             | A general purpose tool like this one?
             | https://github.com/PatrickLib/captcha_recognize
        
             | throwaway53453 wrote:
             | The worry is not about ML. It's about bot farms in
             | India/China with real people behind the wheel. That's why
             | CAPTCHA needs to be able to evolve without maintenance from
             | the website operator.
        
               | tinus_hn wrote:
               | It's not like Googles solution is watertight.
        
         | johnnybaptist wrote:
         | Could you explain more about what is consumer hostile about
         | Google's re-captcha?
         | 
         | Any recommended alternatives would be appreciated as well.
        
           | tinus_hn wrote:
           | You're only a considered a real person if you use Googles
           | blessed browser set up the way Google likes it.
        
             | nindalf wrote:
             | This is not true. I've used exclusively Firefox and Safari
             | for years and have never fallen afoul of recaptcha except
             | when testing it as a developer.
        
           | edoceo wrote:
           | hCaptcha has been mentioned as alternative.
        
             | kevincox wrote:
             | I find hCaptcha way more annoying to solve than reCAPTCHA.
             | The puzzles take way longer and I often have to do multiple
             | of them.
        
               | z77dj3kl wrote:
               | Do you block Google trackers aggressively? reCAPTCHA uses
               | that very heavily: if you allow all of their stuff and
               | they track you across the web, you'll have to basically
               | never do more than click the button. On the other hand,
               | if you take your privacy seriously and are aggressive
               | about tracker blocking, you'll have a pretty awful time.
               | 
               | I imagine hCaptcha doesn't have enough trackers sprinkled
               | around the web to use those as signals for this.
        
               | kevincox wrote:
               | I do block Google trackers, and have network state
               | partitioning enabled, however the reCAPTCHA tests are
               | usually bearable. (often a checkbox, sometimes a page) It
               | seem like I get at least 2 pages of tests for hCaptcha
               | every time.
        
               | eatbots wrote:
               | This is under the control of the site with hCaptcha, so
               | you'll tend to see more variety in difficulty levels
               | depending on their settings.
               | 
               | There will always be some individual variance, but when
               | we've tested this people always solve hCaptcha faster
               | than reCAPTCHA on average.
               | 
               | (disclosure: work there)
        
               | Jiejeing wrote:
               | Same. I heavily block google scripts and hate reCAPTCHA
               | with a passion, but hCaptcha really takes the cake for
               | the most painful captcha experience.
        
           | minsc__and__boo wrote:
           | If they're using the latest version (v3) of re-captcha, it's
           | not hostile as it doesn't even have user interaction.
           | 
           | https://www.youtube.com/watch?v=tbvxFW4UJdU
           | 
           | It runs entirely in the background, and pretty much the only
           | time you'll see a prompt is if you're using a VPN, Tor, or
           | specifically block it.
        
             | swiley wrote:
             | >pretty much the only time you'll see a prompt is if you're
             | using a VPN, Tor, or specifically block it.
             | 
             | Or using a non Google browser or using an account that
             | Google doesn't like (because they can't associate it with a
             | real identity or whatever.)
        
             | jraph wrote:
             | Yes, "it runs entirely in the background" means it tracks
             | the hell out of you, across websites.
             | 
             | Basically you are blocked if you care about privacy and
             | refuse this tracking.
             | 
             | That's what I'm willing to call "hostile". I'd say, it's
             | even worse than picking a few pictures, which is already
             | hostile.
        
               | iujjkfjdkkdkf wrote:
               | That's about it. Horrible user experience - oh you're
               | about to pay us, just click a few sidewalks first - and
               | condescension of asking people to do a menial task that
               | improves their ML models. But forcing you to use one of
               | their sanctioned browsers and let the record what they
               | want to is where the real hostility comes in. Its
               | exercising monopoly power to squeeze more out of people
               | and repress competition, I'd call that hostile.
        
           | [deleted]
        
         | rapnie wrote:
         | Well, the site is probably not taking on Google all that much
         | just yet, but they are interviewing Marko Saric who is.
         | 
         | The plausible landing page gives me zero cookies and only
         | requests are to plausible.io and testing.plausible.io
         | 
         | https://plausible.io/
        
         | elliekelly wrote:
         | Off topic but I've noticed recently that I'm frequently forced
         | to _incorrectly_ answer re-captchas the way a computer would in
         | order to move forward.
         | 
         | Some examples: "click all the tractors" showed I did not
         | complete the task because of a photo of construction equipment;
         | "click all the crosswalks" because I didn't select the photo of
         | a thick white fence; "click all the traffic lights" because I
         | didn't select a photo of a parking meter. I just clicked the
         | incorrect photo so I could move on but I can't help but wonder
         | if there's any mechanism to catch those incorrect (manual,
         | human) annotations on the training data Google is collecting.
        
       | ElijahLynn wrote:
       | Live demo of the open-source Plausible Analytics here, so you can
       | see the HN spike!
       | 
       | https://plausible.io/plausible.io?period=day (39 current
       | visitors)
        
       | nopaintwat wrote:
       | Really great to see a tech company with the motto "Don't be evil"
        
       | hoerzu wrote:
       | Oh the irony. The website is protected with Google Recaptcha
        
         | untoxicness wrote:
         | > The website is protected with Google Recaptcha
         | 
         | Which website?
         | 
         | The Plausible register page uses hCaptcha
         | (https://plausible.io/register).
        
       | Daho0n wrote:
       | Plausible is still allowing DNS trickery for cross domain
       | tracking as far as I can tell. This alone will keep us from ever
       | trusting them. Only bad actors does this.
        
         | lecarore wrote:
         | The analytics have no direct benefits to an individual visitor,
         | like ads, so I get why you'd block them. I myself don't care
         | about showing up in the analytics of the website I visit. But I
         | pay for paisible because they are way less intrusive, and they
         | get added to the blocklist anyway. This doesn't encourage good
         | behaviour. From a website owner perspective, if I don't
         | circumvent the blockers I need a server side solution. It would
         | be equivalent privacy wise, harder to set up but less visible.
        
       | 0898 wrote:
       | Great to see Plausible on HackerNews. It's one of the few pieces
       | of software (Stripe is one, Starling another) that I deeply enjoy
       | using. I get a good feeling when I open it up. I don't really
       | have the UX vocabulary to explain it better than that
       | unfortunately.
        
         | camjohnson26 wrote:
         | I've been using it for a while but feels pretty lite on the
         | analytics so far, would be nice to see performance stats per
         | page if that's possible in a privacy friendly way.
        
           | markosaric wrote:
           | You mean like a page drilldown to see stats of the individual
           | page? You can do that already. On our live demo, click on any
           | page in the Top Pages report and the dashboard will be
           | segmented to only show the traffic that visited that
           | particular page.
        
         | octopoc wrote:
         | What is Starling? I searched for it and found tons of things
         | called that.
        
           | 0898 wrote:
           | Sorry. Starling Bank.
        
       | frakkingcylons wrote:
       | I feel like the title should be Taking on Google Analytics.
       | Everyone associates Google with search, not so much website
       | analytics. This title makes me think there's someone trying to
       | unseat their position in search.
        
         | ganeshkrishnan wrote:
         | Google analytics is the wrong end of Google. Sure you can get
         | few customers now and then who love privacy and will ditch GA.
         | 
         | But for most, GA is how Google ads knows how to calculate
         | conversions. People who want to use Google ads (which are
         | everywhere) have to use GA. If you are not using Google Ads, I
         | dont think Google cares much about your site anyway.
        
       | z77dj3kl wrote:
       | Seems a bit like Plausible only pays lip service to some of these
       | ideas. Merely 5 months ago the co-founder touted here on HN about
       | how they are "big fans of open source so wanted as permissive [a]
       | licence as possible" [0], then promptly went and changed the
       | license to a strongly copyleft (AGPL) a few weeks later!
       | 
       | They might well be the next Elastic/CockroachDB/MongoDB/etc. Or
       | better yet, they might do the classic bait-and-switch later on:
       | get developer buy in with a good story about openness, then once
       | they'd gotten enough of a customer (aka dev) share, do the
       | switch.
       | 
       | [0]: https://news.ycombinator.com/item?id=24700565
        
         | FearlessNebula wrote:
         | Why is [a] in brackets?
        
           | z77dj3kl wrote:
           | To indicate I edited it (to make it grammatically correct in
           | my sentence).
        
           | IncRnd wrote:
           | That looks like a grammatical correction.
        
         | nightpool wrote:
         | I'm not sure I understand the root of your complaint. You're
         | saying that because the developers changed the license from a
         | permissive license to a strong copyleft license, they're not
         | supporting open source? I think that using a license like the
         | AGPL is _much better_ for the open source community in the long
         | run, because it makes it more likely that the code will stay
         | free and accessible no matter what company wants to adapt it
        
         | elliekelly wrote:
         | I don't really know the background here but it really bugs me
         | when I see people arguing nefarious intent simply because
         | someone changed their mind later. Is there a logical fallacy
         | that addresses "allegations of flip flopping"?
         | 
         | Sometimes people learn something new that changes things.
         | Sometimes situations change and so the strategy needs to
         | change. Sometimes people realize, for whatever reason, they
         | were wrong and so they take steps to correct it. Do _some_
         | people _sometimes_ flip flop for the purpose of misleading
         | people or pandering? Of course. But I really don't think that's
         | typically the motive. We should be supportive of people
         | changing their minds, not suspicious.
        
         | markosaric wrote:
         | What's wrong with AGPL that doesn't fit with our ideas?
         | 
         | We were on the MIT first and got into a situation where a large
         | corporation wanted to take our code and resell it to tens of
         | thousands of their customers and they made it clear they didn't
         | want to contribute anything back to our project whatsoever.
         | 
         | We are a two person team putting our own time and savings into
         | this and it could have instantly killed the project and the
         | chance of becoming sustainable.
         | 
         | We changed the license and that was a simple way to stop them
         | without changing our principles/ideas. Could have gone
         | proprietary too at that stage but we didn't.
         | 
         | Everything is clearly explained here
         | https://plausible.io/blog/open-source-licenses
        
           | Daho0n wrote:
           | So pretty much the Elastic route as pointed out by GP.
        
           | [deleted]
        
           | BugsJustFindMe wrote:
           | > _What 's wrong with AGPL that doesn't fit with our ideas?_
           | 
           | Absolutely nothing. That person doesn't know what they're
           | talking about.
           | 
           | I am sorry to hear that you learned about the peril of a
           | permissive license in the way you did, but I'm happy that you
           | switched to strong copyleft. Arguments demanding permissive
           | licensing instead of strong copyleft amount to saying "but
           | then how will I stand on your neck?" You shouldn't have to
           | put up with that.
        
       ___________________________________________________________________
       (page generated 2021-03-31 23:01 UTC)