[HN Gopher] Open source Google Analytics replacement
___________________________________________________________________
Open source Google Analytics replacement
Author : samdung
Score : 131 points
Date : 2025-05-07 17:45 UTC (5 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| ray023 wrote:
| Well, obvious question: How does it compare to Plausible and all
| the other open source analytics.
| colesantiago wrote:
| Plausible is too needlessly expensive as one grows and it
| essentially punishes you for growing.
|
| And some features aren't available 1:1 with the CE version of
| Plausible either.
| bill_yang wrote:
| Yea, funnels are not open source for Plausible
| bill_yang wrote:
| Check out our demo at https://demo.rybbit.io/1. We have a lot
| more features than Plausible, but they're still presented in a
| way that is intuitive to use. You shouldn't need to read pages
| and pages of documentation to be able to set up funnels on
| rybbit, for example.
| AndrewStephens wrote:
| The documentation states that rybbit does not use cookies and is
| compliant with the GDPR. The first part is true but, looking at
| the code (very nice to have it available), the tracking is done
| by IP address, trading one piece of tracking data for another.
|
| I realize that this is probably the only way it could work but it
| is not clear to me that tracking by IP address (even over a
| single session and shredding the data once a day) is any better
| from a GDPR standpoint.
| 9283409232 wrote:
| I deal with GDPR daily and the truth is that GDPR enforcement
| doesn't understand what is acceptable from a GDPR standpoint
| and that is likely why they are in the process of revamping it.
| You can also anonymize data and that is no longer considered
| personal data under GDPR so it is possible to hash an IP
| address and that be acceptable.
| Fraaaank wrote:
| > You can also anonymize data and that is no longer
| considered personal data under GDPR so it is possible to hash
| an IP address and that be acceptable.
|
| That's not completely true. Recital 26 of GDPR stipulates
| that
|
| > "information which does not relate to an identified or
| identifiable natural person or to personal data rendered
| anonymous in such a manner that the data subject is not or no
| longer identifiable."
|
| Hashing does not meet this threshold. If the same IP address
| is hashed using the same method, the result will always be
| the same, meaning it can be matched. Hashing is therefore
| considered pseudonimization and under GDPR, pseudonymized
| data is still considered personal data.
|
| Moreover, the act of anonymization itself is a form of
| processing and therefore falls under the scope of GDPR. So
| even attempting to anonymize personal data doesn't remove
| GDPR obligations for the anonimyzation itself.
| robbie-c wrote:
| Disclaimer: IANAL
|
| > If the same IP address is hashed using the same method,
| the result will always be the same, meaning it can be
| matched.
|
| The way people get around this is by using an ephemeral
| salt, that is deleted e.g. daily. After enough time has
| passed, it'd be impossible to reverse the hash as the salt
| would be lost.
| rustc wrote:
| Plausible uses the same algorithm and they have a page
| written by a lawyer claiming this is GDPR compliant:
| https://plausible.io/blog/legal-assessment-gdpr-eprivacy
|
| Edit: Found more discussion here: https://github.com/plausi
| ble/analytics/discussions/1963#disc...
|
| > To summarize, I believe the EDPB has made their position
| very clear on this in their 2023 guidelines: Plausible's
| fingerprinting is subject to Article 5(3) of the ePD.
| Plausible has made their position very clear in their blog
| post, leaning in the other direction. Until this is tried
| out in court, I don't believe that there will be any
| definitive answer.
| dkga wrote:
| So IP is considered personal information?
| KronisLV wrote:
| It doesn't have that much in the way of fancy UI, but I found
| that Matomo allows you to both choose whether to use cookies /
| IP or maybe to cut off parts of the IP as well:
| https://matomo.org/faq/general/configure-privacy-settings-in...
|
| People seem to occasionally post cool new solutions, though it
| doesn't seem like Matomo has gotten that much attention,
| despite being a pretty strong alternative to Google Analytics
| (I haven't had that many issues while self-hosting it either).
| keerthiko wrote:
| If the IP address is hashed somehow it would no longer be
| personally identifying while still being unique enough for
| analytics purposes, correct?
|
| Does geographic grouping data depend on the IP address? If so I
| suppose it would need to be extracted first before hashing the
| IP, and I wonder how much that weakens the anonymization.
| kevin_thibedeau wrote:
| You can hash every IPV4 for a rainbow table. Needs some salt.
| dylan604 wrote:
| Okay, but that doesn't mean the concept is bad.
| lmkg wrote:
| Yes it does.
|
| If a user can say "here's my IP address, what data do you
| have on me?" and you can answer that question, then
| that's personal data under GDPR. It's pseudynomized, but
| not anonymized, and pseudynomous data is personal data.
| wizzwizz4 wrote:
| Even if _you_ can 't answer that question, if it _can_ be
| answered, that 's still personal data.
| dylan604 wrote:
| What's the minimum size of an operation before the GDPR
| kicks in? In other words, are all sites governed by GDPR,
| or are some companies considered too small to be under
| the GDPR regulations? I know that there are some
| regulations that get a pass for smaller outfits. I know
| nothing about GDPR as a European audience is not my
| target and not kowtowing for them.
| SquareWheel wrote:
| According to the author, Rybbit hashes IPs with a daily
| rotating salt.
|
| https://www.reddit.com/r/selfhosted/comments/1kgytl4/i_buil
| t...
| dkga wrote:
| Interesting!
| Apreche wrote:
| For me, the best Google analytics replacement has been nothing.
| Just don't do analytics at all. Your web site will still work
| without it. In fact, it will work better!
| dylan604 wrote:
| That's just not realistic though. People with marketing
| departments _need_ analytics. Otherwise, they atrophy and
| reveal to everyone they are not as necessary as led to believe.
| People without marketing departments probably never look at the
| logs like you.
| jsheard wrote:
| True, but for personal/hobby sites you probably are just
| better off just not knowing. Nothing good comes of tying your
| self-worth to how much attention you think you're getting.
| sneak wrote:
| There is nothing to suggest that people who want to measure
| (and perhaps increase) their publishing reach are "tying
| [their] self-worth to how much attention [they] think
| [they're] getting".
|
| This is sort of like assuming everyone who is taking photos
| at a tourist attraction is doing so to show off their
| holiday for social status.
|
| If your site or content is truly valuable, it is a public
| good to monitor, analyze, and improve upon its reach and
| usability.
| cortesoft wrote:
| I think most people are talking about for business websites
| mindcrash wrote:
| Once upon a time we did analytics and error analysis by running
| shell scripts executing awk, sed and grep over a apache or
| nginx access log or error log.
|
| What I am trying to say is that you can still do analytics,
| even pretty advanced stuff with some more elaborate scripting,
| if you want. The only thing you need is the access log.
|
| Something which has been largely forgotten ever since tools
| like Urchin became a thing :)
| pc86 wrote:
| One of the greatest jobs I ever had from a technical
| perspective had terabytes of structured access logs hosted on
| prem inside of a VPN, with a few small bespoke tools to
| search through them (and many more pages of commands for
| common tasks not yet implemented in a UI).
|
| Not a single line of tracking or analytics on the front end,
| we just tracked everything we cared about at the server
| level.
| closewith wrote:
| And most likely a compliance and legal nightmare waiting to
| drop on a DPO one day.
| cptskippy wrote:
| > Urchin
|
| Urchin was acquired by Google and was ultimately sunset in
| favor of Google Analytics. It supported local and hybrid
| analytics models, the later arguably evolved into Google
| Analytics.
| ordersofmag wrote:
| Except if any of your pages are cached between eyeball and
| your server and so your server logs don't capture everything
| that is going on. You can get fancy with web server logs, but
| depending on what you're trying to understand it may not be
| the data you need.
|
| <source: did fancy things with logs over the last 25 years,
| including running multiple tools on the same site in parallel
| to do comparisons (Analog, AWStats Urchin, GA, Omniture,
| homegrown, etc...)>
| hinkley wrote:
| This is how you end up with no-cache assets on pages so
| they can keep track of actual traffic.
| codingdave wrote:
| If you control the cache layer, log it there. If you don't
| control the cache layer, does a read from the end user
| cache really count as a separate visit anyway?
| ordersofmag wrote:
| There are plenty of situations where someone visiting a
| page once and someone repeatedly looking at that page
| over a period of days (even if it is pulled from their
| browser cache) is an important difference. Obviously it
| depends on what you're using the data to try to
| understand.
| closewith wrote:
| However, if you do this, you will still need to comply with
| all relevant privacy laws.
|
| For example, in the EU, you need user consent to use server
| logs that include IP addresses for analytics. You also need
| to provide post-consent opt-outs and privacy statements and
| audit logs and all off a sudden you're building another
| analytics tool.
| paxys wrote:
| Such a product will work fantastic until you get your first
| user.
| autoexec wrote:
| If people insist on tracking users with analytics, the least
| folks can do is use something other than google to do it.
| indiantinker wrote:
| Umami works for me. I just want that dopamine kick that someone
| clicked on my page so I dont feel lonely on the internet.
| bitbasher wrote:
| It was only a bot, but if it makes you feel better... :)
| kull wrote:
| Why not Matomo?
| tacker2000 wrote:
| Upvote for matomo!
|
| This project here looks interesting, but is quite new. Lets see
| how it evolves in the future.
| ordersofmag wrote:
| Matomo is an evolution of Piwik which was first released in
| 2007. So not 'quite new'.
| tacker2000 wrote:
| Im talking about the project OP posted, not matomo.
| karolist wrote:
| I'm hosting my blog on cloudflare pages, it's analytics show 80
| or so uniques every day consistently even though I barely write
| there. Installed Umami - 0 visitors. None. Internet is just LLM
| crawlers hungry for content now?
| lmkg wrote:
| We passed the tipping point where bot traffic outnumbered human
| traffic _fifteen years ago_. LLMs are an order of magnitude
| worse by most first-hand accounts, but it 's just a
| continuation of a very long trend.
| sltr wrote:
| I see this too on my CF Pages-hosted blog.
|
| Analytics only work if the agent runs JS. CF on the other hand
| counts file fetches, which can't be circumvented.
|
| There's always a baseline of bot traffic.
| karolist wrote:
| ah, that explains it, I think. I expected them to sessionize
| the file transfers under one unique somehow still, even
| without JS.
| nm980 wrote:
| The market for Google Analytics alternatives is crowded. There's
| Plausible, Ahrefs web analytics, onedollarstats.com, PostHog,
| Matomo, Unami, Grafana, Microsoft Clarity (free at any scale),
| and so many others. Despite minor differences these products all
| compete for the same users (e.g. if someone is a PostHog customer
| they probably won't be using Ahref web analytics) yet most of
| these companies offer generous free tiers while rybbit only a
| free trial.
|
| How do products like rybbit.io stay competitive without a similar
| free tier or major differentiation? Is rybbit generating revenue
| for its hosted plan?
| neves wrote:
| Are these open source and locally hosted? Or you must share
| your data with a big corporation to use them?
| pc86 wrote:
| Is sharing your data with a startup or small company any
| better than sharing it with a big corporation?
| haswell wrote:
| Potentially yes, but depends very much on the privacy
| policy and data handling promises being made.
|
| I think the instinct to distrust big companies is at least
| partly because many of them have already proven not to be
| good stewards of data which when combined with their scale
| has more worrisome implications.
|
| With a smaller/newer player, at least there's some hope
| that they're not capable of the same harms at a smaller
| scale, and in some cases may market themselves specifically
| as a more private alternative.
|
| Whether or not this turns out to be true in practice and
| over the long run is another thing.
| betterThanTexas wrote:
| Hope ain't the same thing as trust, though. A small
| player would need to make a pretty significant effort to
| suggest they wouldn't abuse your usage-patterns.
| dec0dedab0de wrote:
| It's open source and locally hosted, you don't have to
| share your data with anyone.
| nm980 wrote:
| PostHog and Plausible are both open source and not backed by
| big corporations but if sharing data to third parties and
| being open source is a concern (which seems to be the selling
| point rybbit.io is targeting) I would expect users to self
| host instead of paying for a hosted plan anyways?
| betterThanTexas wrote:
| > Or you must share your data with a big corporation to use
| them?
|
| I'm choking on the irony
| steviedotboston wrote:
| Clarity is more of a Hotjar competitor, right?
| nm980 wrote:
| It also tracks page views, referrers, geographic location,
| and other analytics common to rybbit
| luckylion wrote:
| Grafana isn't a Google Analytics alternative. You can build a
| lot of what you need with it (I've done that), but you still
| need to manage the actual Analytics part separately, Grafana
| only gives you the visualization.
|
| It's okay, but I probably wouldn't choose it again. The ease of
| setting up Dashboards and Panels is great at first, but you pay
| for it with a low ceiling of what you can do (without building
| around it) and a "we trust everyone" approach to security.
| betterThanTexas wrote:
| > actual Analytics
|
| I've never used google analytics before. What's the marginal
| value over statsd?
| xyst wrote:
| Posthog is pretty good but very pushy towards using their SaaS
| (understandably). Self hosting is not really advertised on
| their main site however is buried in their gh repo as a
| footnote [1] with indications of vague issues past 100K
| events/month. Haven't delved into how to scale it past that
| though and they do provide some docs that I have yet to review.
|
| Also the primary repo is not FOSS, and that "100% FOSS" repo is
| buried in yet another footnote [2].
|
| Plausible follows in PH footsteps but is not fully faithful to
| open source. If you want to self host, you won't have same set
| of features as their SaaS and need to rely on long term
| releases for their "community edition" [3]
|
| On "Ahrefs", is there even an open source version of their
| product? I couldn't easily find it (on mobile). [4]
|
| Maybe I'll take a look at others you mentioned later but if
| rybbit can remain faithful to their FOSS roots then I think
| there's a real chance of it becoming huge.
|
| For thosw that don't want to self host (mostly corporate
| shitholes), rybbit can milk them with their managed SaaS
| product.
|
| [1] https://github.com/PostHog/posthog?tab=readme-ov-
| file#self-h...
|
| [2] https://github.com/PostHog/posthog?tab=readme-ov-
| file#open-s...
|
| [3] https://github.com/plausible/analytics?tab=readme-ov-
| file#ca...
|
| [4] https://ahrefs.com/
| nm980 wrote:
| > "Self hosting is not really advertised on their main site"
|
| How would rybbit.io make money if they are only better at
| self hosting? Wouldn't the users they are targeting only self
| host anyways?
|
| > "On "Ahrefs", is there even an open source version of their
| product? I couldn't easily find it (on mobile)."
|
| Not all of these companies are open source but they are still
| competitors because they have generous free tiers so the cost
| of self hosting an alternative wouldn't be justified.
| bill_yang wrote:
| I think Posthog is incredible, and there's no way I (it's
| just been me building rybbit for the past few months) will be
| able to compete with them on their full scope of features for
| the foreseeable future.
|
| I tried to self host Posthog for my other project as it far
| exceeded even the generous free tier. I have a Hetzner bare
| metal server with 64gb of ram
| https://www.hetzner.com/dedicated-rootserver/ax42/ and it was
| running all 16 cores at 100% and didn't end up working. So I
| think Posthog's stack is just way too heavy to self host
| effectively, and it's just not in the same category as
| Plausible, Umami, or Rybbit.
|
| I'm trying to build best OSS analytics out there - and even
| though it's super crowded, most non-trivial websites run one
| so there is space for everyone to survive in.
| openplatypus wrote:
| As a founder in this space, it not as bad as you think. There
| are niches in this crowded yet broad space.
|
| Plausible - good for self-hosting, but their SaaS is very
| expensive and FOSS vs SaaS offering differ.
|
| Ahrefs - they will use your traffic to improve your competitor
| research, you really should use them cautiously.
|
| Matomo - feature rich but can be overwhelming.
|
| Posthog - its SaaS is US based so dismissed early by EU
| customers.
|
| Clarity, like GA has serious privacy issues.
|
| Our product, Wide Angle Analytics, has its own gotchas compared
| to competitors - its opinionated and there are folks who do not
| agree with our opinions, but the landscape of websites is so
| vast that you find your client nevertheless.
|
| That said, we are still in business after 4 years, and we saw
| few competitors disappear or get acquired and extinguished.
|
| So, all the best to the OP. Hope you find your niche :)
| meander_water wrote:
| There's a bunch more listed here as well
| https://github.com/oxnr/awesome-analytics
| dec0dedab0de wrote:
| It's open source, why would you also need a free tier for
| hosting?
| bill_yang wrote:
| Builder of rybbit here - I will probably add a free tier in the
| following weeks. I didn't was because I was scared of being
| overloaded by an influx of free users, but that doesn't scare
| me anymore.
|
| I started working on this 4 months ago and only publicly
| launched a few days ago.
|
| As for monetization, I have no idea yet. I'm happy to collect
| stars for the time being. What do you think I should do?
| nadermx wrote:
| If you don't want to roll your own and don't care if its open
| source, I've used clicky.com for years. Simple, and shows
| everything I need. As others have said, it's a crowded market.
| Still cool though that people are launching these projects.
| dhosek wrote:
| There were a gajillion of these things before Google Analytics.
| Probably the best options were those that relied on log analysis
| rather than having a JavaScript bug on every page.
| bill_yang wrote:
| Hey I built this! I was meaning to launch Rybbit on show HN
| tomorrow morning but I guess you beat me to it haha.
| codazoda wrote:
| Because I like minimalist tools, onedollarstats.com looks
| interesting to me. I can't find much info about their privacy
| posture (which prevents me from using Google Analytics). I use my
| own counter, but it's got very limited features.
___________________________________________________________________
(page generated 2025-05-07 23:00 UTC)