[HN Gopher] Open source Google Analytics replacement
       ___________________________________________________________________
        
       Open source Google Analytics replacement
        
       Author : samdung
       Score  : 131 points
       Date   : 2025-05-07 17:45 UTC (5 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | ray023 wrote:
       | Well, obvious question: How does it compare to Plausible and all
       | the other open source analytics.
        
         | colesantiago wrote:
         | Plausible is too needlessly expensive as one grows and it
         | essentially punishes you for growing.
         | 
         | And some features aren't available 1:1 with the CE version of
         | Plausible either.
        
           | bill_yang wrote:
           | Yea, funnels are not open source for Plausible
        
         | bill_yang wrote:
         | Check out our demo at https://demo.rybbit.io/1. We have a lot
         | more features than Plausible, but they're still presented in a
         | way that is intuitive to use. You shouldn't need to read pages
         | and pages of documentation to be able to set up funnels on
         | rybbit, for example.
        
       | AndrewStephens wrote:
       | The documentation states that rybbit does not use cookies and is
       | compliant with the GDPR. The first part is true but, looking at
       | the code (very nice to have it available), the tracking is done
       | by IP address, trading one piece of tracking data for another.
       | 
       | I realize that this is probably the only way it could work but it
       | is not clear to me that tracking by IP address (even over a
       | single session and shredding the data once a day) is any better
       | from a GDPR standpoint.
        
         | 9283409232 wrote:
         | I deal with GDPR daily and the truth is that GDPR enforcement
         | doesn't understand what is acceptable from a GDPR standpoint
         | and that is likely why they are in the process of revamping it.
         | You can also anonymize data and that is no longer considered
         | personal data under GDPR so it is possible to hash an IP
         | address and that be acceptable.
        
           | Fraaaank wrote:
           | > You can also anonymize data and that is no longer
           | considered personal data under GDPR so it is possible to hash
           | an IP address and that be acceptable.
           | 
           | That's not completely true. Recital 26 of GDPR stipulates
           | that
           | 
           | > "information which does not relate to an identified or
           | identifiable natural person or to personal data rendered
           | anonymous in such a manner that the data subject is not or no
           | longer identifiable."
           | 
           | Hashing does not meet this threshold. If the same IP address
           | is hashed using the same method, the result will always be
           | the same, meaning it can be matched. Hashing is therefore
           | considered pseudonimization and under GDPR, pseudonymized
           | data is still considered personal data.
           | 
           | Moreover, the act of anonymization itself is a form of
           | processing and therefore falls under the scope of GDPR. So
           | even attempting to anonymize personal data doesn't remove
           | GDPR obligations for the anonimyzation itself.
        
             | robbie-c wrote:
             | Disclaimer: IANAL
             | 
             | > If the same IP address is hashed using the same method,
             | the result will always be the same, meaning it can be
             | matched.
             | 
             | The way people get around this is by using an ephemeral
             | salt, that is deleted e.g. daily. After enough time has
             | passed, it'd be impossible to reverse the hash as the salt
             | would be lost.
        
             | rustc wrote:
             | Plausible uses the same algorithm and they have a page
             | written by a lawyer claiming this is GDPR compliant:
             | https://plausible.io/blog/legal-assessment-gdpr-eprivacy
             | 
             | Edit: Found more discussion here: https://github.com/plausi
             | ble/analytics/discussions/1963#disc...
             | 
             | > To summarize, I believe the EDPB has made their position
             | very clear on this in their 2023 guidelines: Plausible's
             | fingerprinting is subject to Article 5(3) of the ePD.
             | Plausible has made their position very clear in their blog
             | post, leaning in the other direction. Until this is tried
             | out in court, I don't believe that there will be any
             | definitive answer.
        
           | dkga wrote:
           | So IP is considered personal information?
        
         | KronisLV wrote:
         | It doesn't have that much in the way of fancy UI, but I found
         | that Matomo allows you to both choose whether to use cookies /
         | IP or maybe to cut off parts of the IP as well:
         | https://matomo.org/faq/general/configure-privacy-settings-in...
         | 
         | People seem to occasionally post cool new solutions, though it
         | doesn't seem like Matomo has gotten that much attention,
         | despite being a pretty strong alternative to Google Analytics
         | (I haven't had that many issues while self-hosting it either).
        
         | keerthiko wrote:
         | If the IP address is hashed somehow it would no longer be
         | personally identifying while still being unique enough for
         | analytics purposes, correct?
         | 
         | Does geographic grouping data depend on the IP address? If so I
         | suppose it would need to be extracted first before hashing the
         | IP, and I wonder how much that weakens the anonymization.
        
           | kevin_thibedeau wrote:
           | You can hash every IPV4 for a rainbow table. Needs some salt.
        
             | dylan604 wrote:
             | Okay, but that doesn't mean the concept is bad.
        
               | lmkg wrote:
               | Yes it does.
               | 
               | If a user can say "here's my IP address, what data do you
               | have on me?" and you can answer that question, then
               | that's personal data under GDPR. It's pseudynomized, but
               | not anonymized, and pseudynomous data is personal data.
        
               | wizzwizz4 wrote:
               | Even if _you_ can 't answer that question, if it _can_ be
               | answered, that 's still personal data.
        
               | dylan604 wrote:
               | What's the minimum size of an operation before the GDPR
               | kicks in? In other words, are all sites governed by GDPR,
               | or are some companies considered too small to be under
               | the GDPR regulations? I know that there are some
               | regulations that get a pass for smaller outfits. I know
               | nothing about GDPR as a European audience is not my
               | target and not kowtowing for them.
        
             | SquareWheel wrote:
             | According to the author, Rybbit hashes IPs with a daily
             | rotating salt.
             | 
             | https://www.reddit.com/r/selfhosted/comments/1kgytl4/i_buil
             | t...
        
       | dkga wrote:
       | Interesting!
        
       | Apreche wrote:
       | For me, the best Google analytics replacement has been nothing.
       | Just don't do analytics at all. Your web site will still work
       | without it. In fact, it will work better!
        
         | dylan604 wrote:
         | That's just not realistic though. People with marketing
         | departments _need_ analytics. Otherwise, they atrophy and
         | reveal to everyone they are not as necessary as led to believe.
         | People without marketing departments probably never look at the
         | logs like you.
        
           | jsheard wrote:
           | True, but for personal/hobby sites you probably are just
           | better off just not knowing. Nothing good comes of tying your
           | self-worth to how much attention you think you're getting.
        
             | sneak wrote:
             | There is nothing to suggest that people who want to measure
             | (and perhaps increase) their publishing reach are "tying
             | [their] self-worth to how much attention [they] think
             | [they're] getting".
             | 
             | This is sort of like assuming everyone who is taking photos
             | at a tourist attraction is doing so to show off their
             | holiday for social status.
             | 
             | If your site or content is truly valuable, it is a public
             | good to monitor, analyze, and improve upon its reach and
             | usability.
        
             | cortesoft wrote:
             | I think most people are talking about for business websites
        
         | mindcrash wrote:
         | Once upon a time we did analytics and error analysis by running
         | shell scripts executing awk, sed and grep over a apache or
         | nginx access log or error log.
         | 
         | What I am trying to say is that you can still do analytics,
         | even pretty advanced stuff with some more elaborate scripting,
         | if you want. The only thing you need is the access log.
         | 
         | Something which has been largely forgotten ever since tools
         | like Urchin became a thing :)
        
           | pc86 wrote:
           | One of the greatest jobs I ever had from a technical
           | perspective had terabytes of structured access logs hosted on
           | prem inside of a VPN, with a few small bespoke tools to
           | search through them (and many more pages of commands for
           | common tasks not yet implemented in a UI).
           | 
           | Not a single line of tracking or analytics on the front end,
           | we just tracked everything we cared about at the server
           | level.
        
             | closewith wrote:
             | And most likely a compliance and legal nightmare waiting to
             | drop on a DPO one day.
        
           | cptskippy wrote:
           | > Urchin
           | 
           | Urchin was acquired by Google and was ultimately sunset in
           | favor of Google Analytics. It supported local and hybrid
           | analytics models, the later arguably evolved into Google
           | Analytics.
        
           | ordersofmag wrote:
           | Except if any of your pages are cached between eyeball and
           | your server and so your server logs don't capture everything
           | that is going on. You can get fancy with web server logs, but
           | depending on what you're trying to understand it may not be
           | the data you need.
           | 
           | <source: did fancy things with logs over the last 25 years,
           | including running multiple tools on the same site in parallel
           | to do comparisons (Analog, AWStats Urchin, GA, Omniture,
           | homegrown, etc...)>
        
             | hinkley wrote:
             | This is how you end up with no-cache assets on pages so
             | they can keep track of actual traffic.
        
             | codingdave wrote:
             | If you control the cache layer, log it there. If you don't
             | control the cache layer, does a read from the end user
             | cache really count as a separate visit anyway?
        
               | ordersofmag wrote:
               | There are plenty of situations where someone visiting a
               | page once and someone repeatedly looking at that page
               | over a period of days (even if it is pulled from their
               | browser cache) is an important difference. Obviously it
               | depends on what you're using the data to try to
               | understand.
        
           | closewith wrote:
           | However, if you do this, you will still need to comply with
           | all relevant privacy laws.
           | 
           | For example, in the EU, you need user consent to use server
           | logs that include IP addresses for analytics. You also need
           | to provide post-consent opt-outs and privacy statements and
           | audit logs and all off a sudden you're building another
           | analytics tool.
        
         | paxys wrote:
         | Such a product will work fantastic until you get your first
         | user.
        
       | autoexec wrote:
       | If people insist on tracking users with analytics, the least
       | folks can do is use something other than google to do it.
        
       | indiantinker wrote:
       | Umami works for me. I just want that dopamine kick that someone
       | clicked on my page so I dont feel lonely on the internet.
        
         | bitbasher wrote:
         | It was only a bot, but if it makes you feel better... :)
        
       | kull wrote:
       | Why not Matomo?
        
         | tacker2000 wrote:
         | Upvote for matomo!
         | 
         | This project here looks interesting, but is quite new. Lets see
         | how it evolves in the future.
        
           | ordersofmag wrote:
           | Matomo is an evolution of Piwik which was first released in
           | 2007. So not 'quite new'.
        
             | tacker2000 wrote:
             | Im talking about the project OP posted, not matomo.
        
       | karolist wrote:
       | I'm hosting my blog on cloudflare pages, it's analytics show 80
       | or so uniques every day consistently even though I barely write
       | there. Installed Umami - 0 visitors. None. Internet is just LLM
       | crawlers hungry for content now?
        
         | lmkg wrote:
         | We passed the tipping point where bot traffic outnumbered human
         | traffic _fifteen years ago_. LLMs are an order of magnitude
         | worse by most first-hand accounts, but it 's just a
         | continuation of a very long trend.
        
         | sltr wrote:
         | I see this too on my CF Pages-hosted blog.
         | 
         | Analytics only work if the agent runs JS. CF on the other hand
         | counts file fetches, which can't be circumvented.
         | 
         | There's always a baseline of bot traffic.
        
           | karolist wrote:
           | ah, that explains it, I think. I expected them to sessionize
           | the file transfers under one unique somehow still, even
           | without JS.
        
       | nm980 wrote:
       | The market for Google Analytics alternatives is crowded. There's
       | Plausible, Ahrefs web analytics, onedollarstats.com, PostHog,
       | Matomo, Unami, Grafana, Microsoft Clarity (free at any scale),
       | and so many others. Despite minor differences these products all
       | compete for the same users (e.g. if someone is a PostHog customer
       | they probably won't be using Ahref web analytics) yet most of
       | these companies offer generous free tiers while rybbit only a
       | free trial.
       | 
       | How do products like rybbit.io stay competitive without a similar
       | free tier or major differentiation? Is rybbit generating revenue
       | for its hosted plan?
        
         | neves wrote:
         | Are these open source and locally hosted? Or you must share
         | your data with a big corporation to use them?
        
           | pc86 wrote:
           | Is sharing your data with a startup or small company any
           | better than sharing it with a big corporation?
        
             | haswell wrote:
             | Potentially yes, but depends very much on the privacy
             | policy and data handling promises being made.
             | 
             | I think the instinct to distrust big companies is at least
             | partly because many of them have already proven not to be
             | good stewards of data which when combined with their scale
             | has more worrisome implications.
             | 
             | With a smaller/newer player, at least there's some hope
             | that they're not capable of the same harms at a smaller
             | scale, and in some cases may market themselves specifically
             | as a more private alternative.
             | 
             | Whether or not this turns out to be true in practice and
             | over the long run is another thing.
        
               | betterThanTexas wrote:
               | Hope ain't the same thing as trust, though. A small
               | player would need to make a pretty significant effort to
               | suggest they wouldn't abuse your usage-patterns.
        
             | dec0dedab0de wrote:
             | It's open source and locally hosted, you don't have to
             | share your data with anyone.
        
           | nm980 wrote:
           | PostHog and Plausible are both open source and not backed by
           | big corporations but if sharing data to third parties and
           | being open source is a concern (which seems to be the selling
           | point rybbit.io is targeting) I would expect users to self
           | host instead of paying for a hosted plan anyways?
        
           | betterThanTexas wrote:
           | > Or you must share your data with a big corporation to use
           | them?
           | 
           | I'm choking on the irony
        
         | steviedotboston wrote:
         | Clarity is more of a Hotjar competitor, right?
        
           | nm980 wrote:
           | It also tracks page views, referrers, geographic location,
           | and other analytics common to rybbit
        
         | luckylion wrote:
         | Grafana isn't a Google Analytics alternative. You can build a
         | lot of what you need with it (I've done that), but you still
         | need to manage the actual Analytics part separately, Grafana
         | only gives you the visualization.
         | 
         | It's okay, but I probably wouldn't choose it again. The ease of
         | setting up Dashboards and Panels is great at first, but you pay
         | for it with a low ceiling of what you can do (without building
         | around it) and a "we trust everyone" approach to security.
        
           | betterThanTexas wrote:
           | > actual Analytics
           | 
           | I've never used google analytics before. What's the marginal
           | value over statsd?
        
         | xyst wrote:
         | Posthog is pretty good but very pushy towards using their SaaS
         | (understandably). Self hosting is not really advertised on
         | their main site however is buried in their gh repo as a
         | footnote [1] with indications of vague issues past 100K
         | events/month. Haven't delved into how to scale it past that
         | though and they do provide some docs that I have yet to review.
         | 
         | Also the primary repo is not FOSS, and that "100% FOSS" repo is
         | buried in yet another footnote [2].
         | 
         | Plausible follows in PH footsteps but is not fully faithful to
         | open source. If you want to self host, you won't have same set
         | of features as their SaaS and need to rely on long term
         | releases for their "community edition" [3]
         | 
         | On "Ahrefs", is there even an open source version of their
         | product? I couldn't easily find it (on mobile). [4]
         | 
         | Maybe I'll take a look at others you mentioned later but if
         | rybbit can remain faithful to their FOSS roots then I think
         | there's a real chance of it becoming huge.
         | 
         | For thosw that don't want to self host (mostly corporate
         | shitholes), rybbit can milk them with their managed SaaS
         | product.
         | 
         | [1] https://github.com/PostHog/posthog?tab=readme-ov-
         | file#self-h...
         | 
         | [2] https://github.com/PostHog/posthog?tab=readme-ov-
         | file#open-s...
         | 
         | [3] https://github.com/plausible/analytics?tab=readme-ov-
         | file#ca...
         | 
         | [4] https://ahrefs.com/
        
           | nm980 wrote:
           | > "Self hosting is not really advertised on their main site"
           | 
           | How would rybbit.io make money if they are only better at
           | self hosting? Wouldn't the users they are targeting only self
           | host anyways?
           | 
           | > "On "Ahrefs", is there even an open source version of their
           | product? I couldn't easily find it (on mobile)."
           | 
           | Not all of these companies are open source but they are still
           | competitors because they have generous free tiers so the cost
           | of self hosting an alternative wouldn't be justified.
        
           | bill_yang wrote:
           | I think Posthog is incredible, and there's no way I (it's
           | just been me building rybbit for the past few months) will be
           | able to compete with them on their full scope of features for
           | the foreseeable future.
           | 
           | I tried to self host Posthog for my other project as it far
           | exceeded even the generous free tier. I have a Hetzner bare
           | metal server with 64gb of ram
           | https://www.hetzner.com/dedicated-rootserver/ax42/ and it was
           | running all 16 cores at 100% and didn't end up working. So I
           | think Posthog's stack is just way too heavy to self host
           | effectively, and it's just not in the same category as
           | Plausible, Umami, or Rybbit.
           | 
           | I'm trying to build best OSS analytics out there - and even
           | though it's super crowded, most non-trivial websites run one
           | so there is space for everyone to survive in.
        
         | openplatypus wrote:
         | As a founder in this space, it not as bad as you think. There
         | are niches in this crowded yet broad space.
         | 
         | Plausible - good for self-hosting, but their SaaS is very
         | expensive and FOSS vs SaaS offering differ.
         | 
         | Ahrefs - they will use your traffic to improve your competitor
         | research, you really should use them cautiously.
         | 
         | Matomo - feature rich but can be overwhelming.
         | 
         | Posthog - its SaaS is US based so dismissed early by EU
         | customers.
         | 
         | Clarity, like GA has serious privacy issues.
         | 
         | Our product, Wide Angle Analytics, has its own gotchas compared
         | to competitors - its opinionated and there are folks who do not
         | agree with our opinions, but the landscape of websites is so
         | vast that you find your client nevertheless.
         | 
         | That said, we are still in business after 4 years, and we saw
         | few competitors disappear or get acquired and extinguished.
         | 
         | So, all the best to the OP. Hope you find your niche :)
        
           | meander_water wrote:
           | There's a bunch more listed here as well
           | https://github.com/oxnr/awesome-analytics
        
         | dec0dedab0de wrote:
         | It's open source, why would you also need a free tier for
         | hosting?
        
         | bill_yang wrote:
         | Builder of rybbit here - I will probably add a free tier in the
         | following weeks. I didn't was because I was scared of being
         | overloaded by an influx of free users, but that doesn't scare
         | me anymore.
         | 
         | I started working on this 4 months ago and only publicly
         | launched a few days ago.
         | 
         | As for monetization, I have no idea yet. I'm happy to collect
         | stars for the time being. What do you think I should do?
        
       | nadermx wrote:
       | If you don't want to roll your own and don't care if its open
       | source, I've used clicky.com for years. Simple, and shows
       | everything I need. As others have said, it's a crowded market.
       | Still cool though that people are launching these projects.
        
       | dhosek wrote:
       | There were a gajillion of these things before Google Analytics.
       | Probably the best options were those that relied on log analysis
       | rather than having a JavaScript bug on every page.
        
       | bill_yang wrote:
       | Hey I built this! I was meaning to launch Rybbit on show HN
       | tomorrow morning but I guess you beat me to it haha.
        
       | codazoda wrote:
       | Because I like minimalist tools, onedollarstats.com looks
       | interesting to me. I can't find much info about their privacy
       | posture (which prevents me from using Google Analytics). I use my
       | own counter, but it's got very limited features.
        
       ___________________________________________________________________
       (page generated 2025-05-07 23:00 UTC)