[HN Gopher] Taking on Google
___________________________________________________________________
Taking on Google
Author : rapnie
Score : 117 points
Date : 2021-03-31 14:49 UTC (8 hours ago)
(HTM) web link (inthegood.co)
(TXT) w3m dump (inthegood.co)
| koalaman wrote:
| I read most of the article but couldn't quite find how they do
| analytics without tracking people. Did I miss it? How are they
| more privacy preserving?
| markosaric wrote:
| This is probably a better overview of what makes Plausible a
| privacy-focused tool: open source, can be self-hosted, no
| connection to adtech, minimal data collection, no cookies, no
| persistent identifiers, no personal data, no cross-site/device
| tracking etc
|
| https://plausible.io/privacy-focused-web-analytics
|
| (I'm the co-founder)
| warkdarrior wrote:
| OK, so not quite privacy-preserving in the cryptographic
| sense, but more of a matter of degree. Plausible Analytics
| collects less data than Google Analytics, but not zero.
| cyberlab wrote:
| Plausible is great, and I see the need for it, but I've always
| enjoyed using AWStats instead, as there is no need to add third
| party code to my site. It all happens in the background and it
| paints a much better picture of your stats since users can't
| block the gathering of stats with an AD-Blocker.
| marvinblum wrote:
| That's one of the reasons I build Pirsch (https://pirsch.io/).
| All the JavaScript integrations can be blocked.
| rkagerer wrote:
| I've used AWStats for years. It's not perfect, but it's
| preferable to the scummy alternatives proffered by the big
| boys.
| not_knuth wrote:
| How would you compare it to GoAcess [0]? I've only ever used
| GoAccess, but AWStats seems to be the older, more mature
| tool... so I would be curious about a comparison.
|
| [0] https://goaccess.io/
| johnghanks wrote:
| This is literally an ad.
| max_ wrote:
| Am I missing something? To me, analytics & privacy sound like a
| contradiction
| XCSme wrote:
| It is better privacy. There's one thing for one entity to know
| what everyone is doing on the web, all the websites that you
| visit and what you do on them, and another thing for an entity
| to know that you visited their own website, without knowing
| what other websites you visit and what you do on them.
|
| LE: The best solution is still self-hosting, as hosted
| plausible is still a 3rd party entity that centralizes data
| (even though they probably don't use or share this data).
| gdsdfe wrote:
| but ... the website that I'm visiting have no incentive in
| caring about my privacy, I mean yes they should but what's in
| it for them ? I think this go to market approach of "we are
| better because google is evil" is just flawed.
| lecarore wrote:
| Well, I'm an indie Dev and I do care, I find advertising
| and cookie notices really annoyining and I can afford the
| 40 something euros a year it costs me. I don't need to
| track everything.
| XCSme wrote:
| They do care, the data can be collected anonymously,
| without being linked directly to your person. They can use
| such data to improve your experience, without affecting you
| personally in any way.
| kevincox wrote:
| This is far too simple of a view.
|
| - For the large part the concern is what is done with the data.
|
| - Data can be anonymized. (Although this is often hard to
| verify)
|
| - You can hide the data in the client. For example imagine you
| want to know how many users use feature X. You can send an
| analytics report with 90% chance of a random value, and 10%
| chance sending the true boolean. You can't tell if any specific
| user has used the feature (because most likely it is a random
| value) but you can get a pretty good estimate what portion of
| your users use the feature.
|
| My understanding is that Plausible is focused on the use an
| anonymization.
| CobrastanJorji wrote:
| > For a website that has 10,000 visitors per month, in one year
| you could save about 4.5 kilograms of CO2 emissions just by
| replacing Google analytics with Plausible.
|
| What the hell? How do you calculate this figure? That's roughly
| equivalent to the CO2 created by driving a gas-burning car 10
| feet to the data center to ferry information about each request
| (figuring a car emits about 1.2 pounds of CO2 per mile traveled).
| That's an astounding claim, and there's no effort to even explain
| the idea behind it.
| aimor wrote:
| It's a point from the plausible.io website and sales pitch.
|
| https://plausible.io/lightweight-web-analytics
|
| The file size savings is 44.3 kB per visitor, which over
| 120,000 visits is 5 GB per year.
|
| The Website Carbon Calculator uses a ratio of 1.8 kWh per GB of
| data transferred, and 475 g CO2 generated per kWh.
|
| https://www.websitecarbon.com/
|
| 44.3 kB per visit * 120,000 visits per year * 1.8 kWh per GB *
| 475 g CO2 per kWh = 4.5 kg per year after fixing units.
|
| "These numbers are all estimates but you can imagine if
| millions of website owners and Google Analytics users end up
| making a similar reduction in their website size too. The total
| reduction in the carbon footprint of the web would be immense."
| ariwilson wrote:
| "a ratio of 1.8 kWh per GB of data transferred"
|
| This seems wildly high, even counting all the hops. For
| reference, 1.8 kWh is enough to move my car 7 miles and my
| e-bike over 100 miles.
| aimor wrote:
| It's difficult to estimate accurately, but their methods
| are spelled out on the website.
|
| https://www.websitecarbon.com/how-does-it-work/
|
| "Energy intensity of web data
|
| Energy is used at the data centre, telecoms networks and by
| the end user's computer or mobile device. Of course, this
| varies for every website and every visitor and so we use an
| average figure. The figures used are for 2017 from the
| report On Global Electricity Usage of Communication
| Technology: Trends to 2030 by Anders Andrae and Tomas
| Edler, adjusted to remove manufacturing energy as this is
| not relevant to this calculator. We then divide the total
| amount of energy used by the total annual data transfer
| over the web as reported in the Nature article, How to stop
| data centres gobbling up the world's electricity. This
| gives us a figure of 1.8kWh/GB."
| crazygringo wrote:
| If you view the paper [1], I gave it a quick scan and it
| seems to be counting the electricity usage of all
| communications _devices_ on top of data centers.
|
| So in power per GB transferred, it's counting all the
| power used by people's 60" internet-connected TV
| displays.
|
| Which is, obviously, absurd to include if you're trying
| to measure the _marginal_ effect of additional _data_.
| More data doesn 't increase your screen's power
| consumption, obviously.
|
| An accurate claim for Plausible would have to be based
| mainly on marginal increases of power by datacenter and
| communications networks.
|
| [1] https://www.mdpi.com/2078-1547/6/1/117/htm
| SamBam wrote:
| It's hard to say. Obviously without any data at all none
| of those screens would be on.
|
| I always find the discussion of marginal increases of
| energy tricky. If I buy a plane ticket on a half-empty
| flight, obviously that flight was going to take off
| anyway, so the marginal increase of my weight plus my
| luggage is fairly negligible in comparison, so I'm only
| to "blame" for a fraction of the fuel spent, right? But
| who else is there to blame except the passengers, without
| whom there would be (eventually) no flights? So shouldn't
| we all divide the blame evenly?
| crazygringo wrote:
| If we keep it simple, there are two kinds of marginal
| increases.
|
| The first type is when marginal increase can lead to a
| "new unit", like planes you refer to -- or servers used
| by data centers. If a plane fits 100 people, then
| (simplifying) 1/100 of the time you'll result in a new
| plane being used, so it makes sense to divide the plane's
| total resources by passengers -- not just the fuel you
| used.
|
| But the second type never results in a "new unit". In
| this scenario, using more resource-hungry analytics will
| _never_ push someone to purchase a second cell phone to
| spread the load. So counting _anything_ but marginal
| energy increase usage by the CPU directly is
| disingenuous.
|
| So in the case of analytics software, their data center
| server/power resources fall into the first type. But the
| consumer device resources fall into the second type.
|
| So in this case I don't think there's anything tricky at
| all about it.
| guenthert wrote:
| Someone wants to be paid for that energy dissipated. I
| don't see how Netflix could be profitable this way.
| ryanobjc wrote:
| What if you are switching from a datacenter that's carbon
| neutral to one that isn't?
|
| Also as a note, the google analytics js is heavily cached and
| thus doesn't have to travel as far or at all. Also Google has
| onramps to their carbon neutral infrastructure everywhere, so
| theres also that.
| foolmeonce wrote:
| > "These numbers are all estimates but you can imagine if
| millions of website owners and Google Analytics users end up
| making a similar reduction in their website size too. The
| total reduction in the carbon footprint of the web would be
| immense."
|
| If we removed 40k of CDN content per visit then the 1.8
| kwh/GB would be 2.8 kwh/GB.
| eximius wrote:
| > 4.5 kilograms of CO2
|
| So, an absolutely negligible amount of CO2?
|
| By virtually any metric? i.e., you, as an individual, exhale
| that much CO2 in a week.
|
| Hold industrial processes responsible for CO2 emissions, not
| your website. (Unless you're bitcoin, I guess?)
| anonporridge wrote:
| This kind of penny wise and pound foolish approach just seems
| like a waste of time at best and at worst it lulls voters
| into complacency and distracts from the fact that our
| politicians still aren't doing anywhere close to enough to
| address carbon emissions. It's just PR and corporate
| greenwashing.
|
| Like everything else, the best approach is just to use a mix
| of regulation, renewable subsidies, and a carbon tax to make
| using fossil fuels cost prohibitive compared to renewables
| and the market will eliminate them on its own. The wider the
| cost difference becomes, the faster renewables will displace
| carbon energy. We're getting there slowly as wind and solar
| are now slightly cheaper than carbon fuels, but we should
| definitely be helping it along a lot faster if we're serious
| about avoiding the worst case climate scenarios.
|
| So far, it seems like we aren't serious about it and our
| leadership is sleepwalking us towards increasing catastrophe.
| seoaeu wrote:
| I mean 10k requests/second seems quite achievable for a single
| server. And I'd totally believe that 12 seconds of compute (per
| year!) wouldn't use much energy. In reality those requests
| would be intermixed with millions more for other sites and the
| servers would be running continuously, but the resources
| attributable to an individual site should be the same.
| CobrastanJorji wrote:
| I mean, let's think about this a bit. The US generated about
| 4.13 trillion kilowatt-hours in 2019, and that generation
| emitted about 1.72 billion metric tons of CO2, or about 0.92
| pounds of CO2 per kWh
| (https://www.eia.gov/tools/faqs/faq.php). Let's assume Google
| gets their power at that rate (which is unfair to Google
| because they claim to use 100% renewable energy, but I don't
| want to get into that).
|
| A typical server rack might use anywhere from maybe 5-50 kWh.
| Let's say Google has really beefy ones that consume 100 kWh
| per hour. That's 92 pounds of CO2 per hour. For the 12
| seconds you mentioned, that's still only 0.011 kilograms of
| total CO2 used. And the claim is that they're BETTER by 4.5
| kilograms.
|
| They've gotta be talking about some other expense than the
| server. But what sort of expense? The cost to build a server?
| Something about general maintenance of the Internet? ISPs
| between clients and the server?
| kureikain wrote:
| Anyone know how they identify the same user? All solution I know
| generate a unique number and put in cookie.
|
| At my app https://hanami.run I don't track user and cannot know
| if the same users visit our website :-(. I don't want to use
| cookie and want to get away with GDPR. At the same time, I love
| to see which visitors repeatly read my website/blog and where
| they drop so I can optimize my site.
| marvinblum wrote:
| Fingerprinting. That's what I do for Pirsch
| (https://pirsch.io/) and I think they do it in a similar
| manner. You can check out our source code here:
| https://github.com/pirsch-analytics/pirsch/blob/master/finge...
| or take a look at the Plausible repo.
| A21z wrote:
| Even though it may be cookieless, you won't << get away with
| GDPR >> with fingerprinting.
| kureikain wrote:
| It looks like the finger print only rely on ip/user-agent
| and I think ip/user-agent are ok to be stored and still
| GDPR compliance?
| marvinblum wrote:
| Everything that can be used to uniquely identify a
| visitor falls under the GDPR. We don't store IP
| addresses, so it should be GDPR compliant, but we still
| need to check that to make the claim.
| kureikain wrote:
| Oh this is really great. Thanks for that. The code is
| concise.
| marvinblum wrote:
| Yeah that's the easy part. It gets more interesting when
| you get try to filter bots, parse the user-agent and stuff
| like that.
| iujjkfjdkkdkf wrote:
| The site has a big re-captcha banner on it - one of Google's most
| consumer hostile products. They should consider switching to
| something else if they want to "take on google".
| melomal wrote:
| The fact that they will need to pay Google for this service
| after a certain threshold also limits taking them on. Pennies
| but still paying them.
| sedatk wrote:
| What's the non-hostile alternative?
| Nextgrid wrote:
| Old-school "squiggly letters" captcha? For all the fear
| mongering around AI and machine learning supposedly breaking
| them, I'm still not aware of a general-purpose tool that
| would solve those out of the box without significant
| engineering effort.
| spijdar wrote:
| A general purpose tool like this one?
| https://github.com/PatrickLib/captcha_recognize
| throwaway53453 wrote:
| The worry is not about ML. It's about bot farms in
| India/China with real people behind the wheel. That's why
| CAPTCHA needs to be able to evolve without maintenance from
| the website operator.
| tinus_hn wrote:
| It's not like Googles solution is watertight.
| johnnybaptist wrote:
| Could you explain more about what is consumer hostile about
| Google's re-captcha?
|
| Any recommended alternatives would be appreciated as well.
| tinus_hn wrote:
| You're only a considered a real person if you use Googles
| blessed browser set up the way Google likes it.
| nindalf wrote:
| This is not true. I've used exclusively Firefox and Safari
| for years and have never fallen afoul of recaptcha except
| when testing it as a developer.
| edoceo wrote:
| hCaptcha has been mentioned as alternative.
| kevincox wrote:
| I find hCaptcha way more annoying to solve than reCAPTCHA.
| The puzzles take way longer and I often have to do multiple
| of them.
| z77dj3kl wrote:
| Do you block Google trackers aggressively? reCAPTCHA uses
| that very heavily: if you allow all of their stuff and
| they track you across the web, you'll have to basically
| never do more than click the button. On the other hand,
| if you take your privacy seriously and are aggressive
| about tracker blocking, you'll have a pretty awful time.
|
| I imagine hCaptcha doesn't have enough trackers sprinkled
| around the web to use those as signals for this.
| kevincox wrote:
| I do block Google trackers, and have network state
| partitioning enabled, however the reCAPTCHA tests are
| usually bearable. (often a checkbox, sometimes a page) It
| seem like I get at least 2 pages of tests for hCaptcha
| every time.
| eatbots wrote:
| This is under the control of the site with hCaptcha, so
| you'll tend to see more variety in difficulty levels
| depending on their settings.
|
| There will always be some individual variance, but when
| we've tested this people always solve hCaptcha faster
| than reCAPTCHA on average.
|
| (disclosure: work there)
| Jiejeing wrote:
| Same. I heavily block google scripts and hate reCAPTCHA
| with a passion, but hCaptcha really takes the cake for
| the most painful captcha experience.
| minsc__and__boo wrote:
| If they're using the latest version (v3) of re-captcha, it's
| not hostile as it doesn't even have user interaction.
|
| https://www.youtube.com/watch?v=tbvxFW4UJdU
|
| It runs entirely in the background, and pretty much the only
| time you'll see a prompt is if you're using a VPN, Tor, or
| specifically block it.
| swiley wrote:
| >pretty much the only time you'll see a prompt is if you're
| using a VPN, Tor, or specifically block it.
|
| Or using a non Google browser or using an account that
| Google doesn't like (because they can't associate it with a
| real identity or whatever.)
| jraph wrote:
| Yes, "it runs entirely in the background" means it tracks
| the hell out of you, across websites.
|
| Basically you are blocked if you care about privacy and
| refuse this tracking.
|
| That's what I'm willing to call "hostile". I'd say, it's
| even worse than picking a few pictures, which is already
| hostile.
| iujjkfjdkkdkf wrote:
| That's about it. Horrible user experience - oh you're
| about to pay us, just click a few sidewalks first - and
| condescension of asking people to do a menial task that
| improves their ML models. But forcing you to use one of
| their sanctioned browsers and let the record what they
| want to is where the real hostility comes in. Its
| exercising monopoly power to squeeze more out of people
| and repress competition, I'd call that hostile.
| [deleted]
| rapnie wrote:
| Well, the site is probably not taking on Google all that much
| just yet, but they are interviewing Marko Saric who is.
|
| The plausible landing page gives me zero cookies and only
| requests are to plausible.io and testing.plausible.io
|
| https://plausible.io/
| elliekelly wrote:
| Off topic but I've noticed recently that I'm frequently forced
| to _incorrectly_ answer re-captchas the way a computer would in
| order to move forward.
|
| Some examples: "click all the tractors" showed I did not
| complete the task because of a photo of construction equipment;
| "click all the crosswalks" because I didn't select the photo of
| a thick white fence; "click all the traffic lights" because I
| didn't select a photo of a parking meter. I just clicked the
| incorrect photo so I could move on but I can't help but wonder
| if there's any mechanism to catch those incorrect (manual,
| human) annotations on the training data Google is collecting.
| ElijahLynn wrote:
| Live demo of the open-source Plausible Analytics here, so you can
| see the HN spike!
|
| https://plausible.io/plausible.io?period=day (39 current
| visitors)
| nopaintwat wrote:
| Really great to see a tech company with the motto "Don't be evil"
| hoerzu wrote:
| Oh the irony. The website is protected with Google Recaptcha
| untoxicness wrote:
| > The website is protected with Google Recaptcha
|
| Which website?
|
| The Plausible register page uses hCaptcha
| (https://plausible.io/register).
| Daho0n wrote:
| Plausible is still allowing DNS trickery for cross domain
| tracking as far as I can tell. This alone will keep us from ever
| trusting them. Only bad actors does this.
| lecarore wrote:
| The analytics have no direct benefits to an individual visitor,
| like ads, so I get why you'd block them. I myself don't care
| about showing up in the analytics of the website I visit. But I
| pay for paisible because they are way less intrusive, and they
| get added to the blocklist anyway. This doesn't encourage good
| behaviour. From a website owner perspective, if I don't
| circumvent the blockers I need a server side solution. It would
| be equivalent privacy wise, harder to set up but less visible.
| 0898 wrote:
| Great to see Plausible on HackerNews. It's one of the few pieces
| of software (Stripe is one, Starling another) that I deeply enjoy
| using. I get a good feeling when I open it up. I don't really
| have the UX vocabulary to explain it better than that
| unfortunately.
| camjohnson26 wrote:
| I've been using it for a while but feels pretty lite on the
| analytics so far, would be nice to see performance stats per
| page if that's possible in a privacy friendly way.
| markosaric wrote:
| You mean like a page drilldown to see stats of the individual
| page? You can do that already. On our live demo, click on any
| page in the Top Pages report and the dashboard will be
| segmented to only show the traffic that visited that
| particular page.
| octopoc wrote:
| What is Starling? I searched for it and found tons of things
| called that.
| 0898 wrote:
| Sorry. Starling Bank.
| frakkingcylons wrote:
| I feel like the title should be Taking on Google Analytics.
| Everyone associates Google with search, not so much website
| analytics. This title makes me think there's someone trying to
| unseat their position in search.
| ganeshkrishnan wrote:
| Google analytics is the wrong end of Google. Sure you can get
| few customers now and then who love privacy and will ditch GA.
|
| But for most, GA is how Google ads knows how to calculate
| conversions. People who want to use Google ads (which are
| everywhere) have to use GA. If you are not using Google Ads, I
| dont think Google cares much about your site anyway.
| z77dj3kl wrote:
| Seems a bit like Plausible only pays lip service to some of these
| ideas. Merely 5 months ago the co-founder touted here on HN about
| how they are "big fans of open source so wanted as permissive [a]
| licence as possible" [0], then promptly went and changed the
| license to a strongly copyleft (AGPL) a few weeks later!
|
| They might well be the next Elastic/CockroachDB/MongoDB/etc. Or
| better yet, they might do the classic bait-and-switch later on:
| get developer buy in with a good story about openness, then once
| they'd gotten enough of a customer (aka dev) share, do the
| switch.
|
| [0]: https://news.ycombinator.com/item?id=24700565
| FearlessNebula wrote:
| Why is [a] in brackets?
| z77dj3kl wrote:
| To indicate I edited it (to make it grammatically correct in
| my sentence).
| IncRnd wrote:
| That looks like a grammatical correction.
| nightpool wrote:
| I'm not sure I understand the root of your complaint. You're
| saying that because the developers changed the license from a
| permissive license to a strong copyleft license, they're not
| supporting open source? I think that using a license like the
| AGPL is _much better_ for the open source community in the long
| run, because it makes it more likely that the code will stay
| free and accessible no matter what company wants to adapt it
| elliekelly wrote:
| I don't really know the background here but it really bugs me
| when I see people arguing nefarious intent simply because
| someone changed their mind later. Is there a logical fallacy
| that addresses "allegations of flip flopping"?
|
| Sometimes people learn something new that changes things.
| Sometimes situations change and so the strategy needs to
| change. Sometimes people realize, for whatever reason, they
| were wrong and so they take steps to correct it. Do _some_
| people _sometimes_ flip flop for the purpose of misleading
| people or pandering? Of course. But I really don't think that's
| typically the motive. We should be supportive of people
| changing their minds, not suspicious.
| markosaric wrote:
| What's wrong with AGPL that doesn't fit with our ideas?
|
| We were on the MIT first and got into a situation where a large
| corporation wanted to take our code and resell it to tens of
| thousands of their customers and they made it clear they didn't
| want to contribute anything back to our project whatsoever.
|
| We are a two person team putting our own time and savings into
| this and it could have instantly killed the project and the
| chance of becoming sustainable.
|
| We changed the license and that was a simple way to stop them
| without changing our principles/ideas. Could have gone
| proprietary too at that stage but we didn't.
|
| Everything is clearly explained here
| https://plausible.io/blog/open-source-licenses
| Daho0n wrote:
| So pretty much the Elastic route as pointed out by GP.
| [deleted]
| BugsJustFindMe wrote:
| > _What 's wrong with AGPL that doesn't fit with our ideas?_
|
| Absolutely nothing. That person doesn't know what they're
| talking about.
|
| I am sorry to hear that you learned about the peril of a
| permissive license in the way you did, but I'm happy that you
| switched to strong copyleft. Arguments demanding permissive
| licensing instead of strong copyleft amount to saying "but
| then how will I stand on your neck?" You shouldn't have to
| put up with that.
___________________________________________________________________
(page generated 2021-03-31 23:01 UTC)