hngopher.com

       [HN Gopher] Show HN: CLI that spots fake GitHub stars, risky dep...
       ___________________________________________________________________
        
       Show HN: CLI that spots fake GitHub stars, risky dependencies and
       licence traps
        
       When I came across a study that traced 4.5 million fake GitHub
       stars, it confirmed a suspicion I'd had for a while: stars are
       noisy. The issue is they're visible, they're persuasive, and they
       still shape hiring decisions, VC term sheets, and dependency
       choices--but they say very little about actual quality.  I wrote
       StarGuard to put that number in perspective based on my own
       methodology inspired with what they did and to fold a broader
       supply-chain check into one command-line run.  It starts with the
       simplest raw input: every starred_at timestamp GitHub will give. It
       applies a median-absolute-deviation test to locate sudden bursts.
       For each spike, StarGuard pulls a random sample of the accounts
       behind it and asks: how old is the user? Any followers? Any
       contribution history? Still using the default avatar? From that, it
       computes a Fake Star Index, between 0 (organic) and 1 (fully
       synthetic).  But inflated stars are just one issue. In parallel,
       StarGuard parses dependency manifests or SBOMs and flags common
       risk signs: unpinned versions, direct Git URLs, lookalike package
       names. It also scans licences--AGPL sneaking into a repo claiming
       MIT, or other inconsistencies that can turn into compliance
       headaches.  It checks contributor patterns too. If 90% of commits
       come from one person who hasn't pushed in months, that's flagged.
       It skims for obvious code red flags: eval calls, minified blobs,
       sketchy install scripts--because sometimes the problem is hiding in
       plain sight.  All of this feeds into a weighted scoring model. The
       final Trust Score (0-100) reflects repo health at a glance, with
       direct penalties for fake-star behaviour, so a pretty README badge
       can't hide inorganic hype.  I added for the fun of it it generating
       a cool little badge for the trust score lol.  Under the hood, its
       all uses, heuristics, and a lot of GitHub API paging. Run it on any
       public repo with:  python starguard.py owner/repo --format markdown
       It works without a token, but you'll hit rate limits sooner.
       Please provide any feedback you can.
        
       Author : artski
       Score  : 89 points
       Date   : 2025-05-12 12:59 UTC (10 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | hungryhobbit wrote:
       | Dependencies: PyPI, Maven, Go, Ruby
       | 
       | This looks like a cool project, but why on earth would it need
       | Python, Java, Go, AND Ruby?
        
         | deltaknight wrote:
         | I think these are just the package managers that it supports
         | parsing dependencies for. The actual script seems to just be a
         | single python file.
         | 
         | It does seem like the repo is missing some files though; make
         | is mentioned in the README but no makefile and no list of
         | python dependencies for the script that I can see.
        
           | artski wrote:
           | Yeah to be fair I need to clean it up, was stuck in the
           | testing diff strategies and making it work and just wanted to
           | get feedback asap before moving on to the next step (didn't
           | want to spend too much time on something and turns out I was
           | wrong about something badly) - next step is to get it all
           | cleaned up.
        
         | 27theo wrote:
         | It doesn't need them, it parses SBOMs and manifests from their
         | ecosystems. I think you misunderstood this section of the
         | README.
         | 
         | > Dependencies | SBOM / manifest parsing across npm, PyPI,
         | Maven, Go, Ruby; flags unpinned, shadow, or non-registry deps.
         | 
         | The project seems like it only requires Python >= 3.9!
        
       | nottorp wrote:
       | Of course, github could just drop the stars, but everything has
       | to entshittify towards "engagement" and add social network
       | features.
       | 
       | Or users could ignore the stars and go old school and you know,
       | research their dependencies before they rely on them.
        
         | Vanclief wrote:
         | Stars are just a signal. When I am looking at multiple
         | libraries that do the same, I am going to trust more a repo
         | with 200 starts that one with 0. Its not perfect, but I don't
         | have the time to go through the entire codebase and try it out.
         | If the repo works for me I will star it to contribute to the
         | signal.
        
           | shlomo_z wrote:
           | If that works for you, great. I don't do that. I don't even
           | check how many stars it has.
           | 
           | I check the docs, features, and sometimes the code quality.
           | Sometimes I check the date of the last commit.
        
           | mlhpdx wrote:
           | I tend to put more attention on repos with 15-75 (ish) stars.
           | Less is something obscure or unproven maybe, and above ~500
           | is _much_ more likely to be BS /hype.
        
           | tough wrote:
           | I use stars for bookmarking purposes, i wouldn't care if they
           | go private but would miss the feature
        
             | aquariusDue wrote:
             | Same along with lists. I've got more than a thousand
             | starred repos by now.
        
               | tough wrote:
               | Sadly lists had a hard cap at 32 or 36 or something like
               | that.. i was too eager early with my specificity (hav
               | elists w 1 repo) and now i cant make new ones (need to
               | delete others)
               | 
               | lol
               | 
               | found a couple non-maintained projects for managing them
               | 
               | https://github.com/astralapp/astral
               | https://github.com/gkze/gh-stars
        
         | benwilber0 wrote:
         | Github was a "social network" from its very beginning. The
         | whole premise was geared around git hosting and "social
         | coding". I don't think it became enshittified later since that
         | was the entire value proposition from day 1.
        
           | nottorp wrote:
           | Funny, I'm pretty sure I paid them just so I don't have to
           | maintain my own git hosting.
           | 
           | I never even noticed the stupid stars until they started
           | being mentioned on HN.
        
             | rafram wrote:
             | See the tagline under the logo, May 14, 2008: https://web.a
             | rchive.org/web/20080514210148/http://github.com...
        
               | nottorp wrote:
               | I'm sorry, but I never read the main github site. I only
               | spend a few seconds on it when my login expires and I
               | need my repository list.
               | 
               | 99.99999% of my interaction is via git pull and push :)
        
             | eikenberry wrote:
             | There are tons of places you can use for simple git
             | hosting. The only reason to use github over the others is
             | due to the social factors. Because everyone already has an
             | account on it so they can easily file issues, PRs, etc. For
             | simple git hosting, github leaves a lot to be desired.
        
             | _bin_ wrote:
             | You may like Drew Devault's https://sr.ht more
        
       | Am4TIfIsER0ppos wrote:
       | What is a license trap? This "AGPL sneaking into a repo claiming
       | MIT"? Isn't that just a plain old license violation?
        
         | artski wrote:
         | Basically what I mean by it is for example a repository appears
         | to be under a permissive license like MIT, Apache, or BSD, but
         | actually includes code that's governed by a much stricter or
         | viral license--like GPL or AGPL--often buried in a
         | subdirectory, dependency, or embedded snippet. The problem is,
         | if you reuse or build on that code assuming it's fully
         | permissive, you could end up violating the terms of the
         | stricter license without realising it. It's a trap because the
         | original authors might have mixed incompatible licenses,
         | knowingly or not, and the legal risk then falls on downstream
         | users. So yeah essentially a plain old license violation which
         | are relatively easy to miss or not think about
        
           | tough wrote:
           | oh interesting you put a word on it, most of the VC funded
           | FOSS -open- core apps/saas that have pop up the past years
           | are like this
           | 
           | the /ee folders are a disgrace
        
         | tough wrote:
         | they get around it by licensing differently only packages /
         | parts of the codebase
        
       | the__alchemist wrote:
       | > It checks contributor patterns too. If 90% of commits come from
       | one person who hasn't pushed in months, that's flagged.
       | 
       | IMO this is a slight green flag; not red.
        
         | lispisok wrote:
         | It's gonna flag most of the clojure ecosystem
        
           | throwaway150 wrote:
           | Yep, and it's not just Clojure. This will end up flagging
           | projects across all non-mainstream ecosystems. Whether it's
           | Vim plugins, niche command-line tools, academic research
           | code, or hobbyist libraries for things like game development
           | or creative coding, they'll likely get flagged simply because
           | they're often maintained by individual developers. These devs
           | build the projects, iterate quickly in the early stages, and
           | eventually reach a point where the code is stable and no
           | longer needs frequent updates.
           | 
           | It's a shame that this tool penalizes such projects, which I
           | think are vital to a healthy open source ecosystem.
           | 
           | It's a nice project otherwise. But flagging stable projects
           | from solo developers really sticks out like a sore thumb. :(
        
             | artski wrote:
             | It would still count as "trustworthy" just wouldnt come out
             | to 100/100 :(.
        
         | sethops1 wrote:
         | I have to agree - the highest quality libraries in my
         | experience are the ones maintained that one dedicated person as
         | their pet project. There's no glory, no money, no large
         | community, no Twitter followers - just a person with a problem
         | to solve and making the solution open source for the benefit of
         | others.
        
         | artski wrote:
         | Fair take--it's definitely context-dependent. In some cases,
         | solo-maintainer projects can be great, especially if they're
         | stable or purpose-built. But from a trust and maintenance
         | standpoint, it's worth flagging as a signal: if 90% of commits
         | are from one person who's now inactive, it could mean slow
         | responses to bugs or no updates for security issues. Doesn't
         | mean the project is bad--just something to consider alongside
         | other factors.
         | 
         | Heuristics are never perfect and it's all iterative but it's
         | all about understanding the underlying assumptions and taking
         | the knowledge you get out of it with your own context. Probably
         | could enhance it slightly by a run through an LLM with a prompt
         | but I prefer to keep things purely statistical for now.
        
           | delfinom wrote:
           | The problem is your audience is:
           | 
           | > CTOs, security teams, and VCs automate open-source due
           | diligence in seconds.
           | 
           | The people that probably have less brain cells than the
           | average programmer to understand the nuance in the flagging.
        
             | artski wrote:
             | Lol yeah tbh - I just made it without really thinking of an
             | audience, just was looking for a project to work on till I
             | saw the paper and figured it would be cool to check it out
             | on some repositories out there. That part is just me asking
             | gpt to make the read me better.
        
           | 85392_school wrote:
           | It could also mean that the project is stable. Since you only
           | look at the one repository's commit activity, a stable
           | project with a maintainer who's still active on GitHub in
           | other places would be "less trustworthy" than a project
           | that's a work in progress.
        
             | artski wrote:
             | Not a bad idea tbh, maybe an additional how long issues are
             | left open, would be a good idea. Though yeh thats why I was
             | contemplating of not necessarily highlighting the actual
             | number and more have a range e.g. 80-100 is good, 50-70
             | Moderate and so on.
        
               | InvisGhost wrote:
               | Be careful with this. Each project has different
               | practices which could lead to false positives and false
               | negatives. You may also create the wrong incentives,
               | depending on how you measure and report things.
        
             | kstrauser wrote:
             | I agree. I have a popular-ish project on GitHub that I
             | haven't touched in like a decade. I _would_ if needed, but
             | it 's basically "done". It works. It does everything it
             | needs to, and no one's reported a bug in many, many years.
             | 
             | You could etch that thing into granite as far as I can
             | tell. The only thing left to do is rewrite it in Rust.
        
           | mlhpdx wrote:
           | The signal here is how many unpatched vulnerabilities there
           | are maybe multiplied by how long they've been out there.
           | Purely statistical. And an actual signal.
        
       | knowitnone wrote:
       | Great idea. This should be done by Github though. I'm surprised
       | Github hasn't been sued for serving malware.
        
         | artski wrote:
         | Yeah to be fair would be great, sometimes just giving a nudge
         | and showing people want these features is the first step to
         | getting an official integration.
        
         | swyx wrote:
         | > I'm surprised Github hasn't been sued for serving malware.
         | 
         | do you want a world where people can randomly sue you for any
         | random damages they suffer or do you want nice things like free
         | code hosting?
        
           | MrDarcy wrote:
           | I'm not sure if you're being sarcastic but if the claim of
           | damages is likely to win then I'd like someone to hear it.
        
           | unclad5968 wrote:
           | In the US people can already randomly sue you for any random
           | damages. I could sue github right now even if I'd never
           | previously heard of or interacted with the site.
        
           | KomoD wrote:
           | > do you want a world where people can randomly sue you for
           | any random damages they suffer
           | 
           | Isn't that already a thing, but in the US, not the entire
           | world.
        
       | edoceo wrote:
       | Could you add support for PHP via package.json? Accept patch?
        
         | artski wrote:
         | I haven't done that before so it would be a small learning
         | curve for me to figure that out. Feel free to make a pull
         | request.
        
       | feverzsj wrote:
       | CTOs don't care about github stars. They are behind tons of
       | screening processes.
        
         | throwaway314155 wrote:
         | Believe me, CTO's of startups do.
        
       | binary132 wrote:
       | I approve! It would be cool to have customizable and transparent
       | heuristics. That way if you know for example that a burst of
       | stars was organic, or you don't care and want to look at other
       | metrics, you can, or you can at least see a report that explains
       | the reasoning.
        
       | nfriedly wrote:
       | I love the idea! How feasible would it be to turn it into a
       | browser extension?
        
       | coffeeboy wrote:
       | Very nice! I'm personally looking into bot account detection for
       | my own service and have come up with very similar heuristics
       | (albeit simpler ones since I'm doing this at scale) so I will
       | provide some additional ones that I have discovered:
       | 
       | 1. Fork to stars ratio. I've noticed that several of the "bot"
       | repos have the same number of forks as stars (or rather, most
       | ratios are above 0.5). Typically a project doesn't have nearly as
       | many forks as stars.
       | 
       | 2. Fake repo owners clone real projects and push them directly to
       | their account (not fork) and impersonate the real project to try
       | and make their account look real.
       | 
       | Example bot account with both strategies employed:
       | https://github.com/algariis
        
       | ngangaga wrote:
       | > they still shape hiring decisions, VC term sheets, and
       | dependency choices
       | 
       | This is nuts to me. A star is a "like". It has carries no signal
       | of quality and even its popularity proxy is quite weak. I can't
       | remember the last time I looked at stars and considered them
       | meaningful.
        
       | zxilly wrote:
       | Frankly, I think this program is ai generated.
       | 
       | 1. there are hallucinatory descriptions in the Readme (make
       | test), and also in the code, such as the rate limit set at line
       | 158, which is the wrong number
       | 
       | 2. all commits are done on github webui, checking the signature
       | confirms this
       | 
       | 3. too verbose function names and a 2000 line python file
       | 
       | I don't have a complaint about ai, but the code quality clearly
       | needs improvement, the license only lists a few common examples,
       | the thresholds for detection seem to be set randomly,
       | _get_stargazers_graphql the entire function is commented out and
       | performs no action, it says "Currently bypassed by get_
       | stargazers", did you generate the code without even reading
       | through it?
       | 
       | Bad code like this gets over 100stars, it seems like you're doing
       | a satirical fake-star performance art.
        
         | zxilly wrote:
         | I checked your past submissions and yes, they are also ai
         | generated.
         | 
         | I know it's the age of ai, but one should do a little checking
         | oneself before posting ai generated content, right? Or at least
         | one should know how to use git and write meaningful commit
         | messages?
        
           | artski wrote:
           | It's a project I'm making purely for myself and I like to
           | share what I make - sorry I didn't put up most effort in the
           | commit messages, will not do that again.
        
         | artski wrote:
         | Well I initially planned to use GraphQL and started to
         | implement it, but switched to REST for now as it's still not
         | fully complete, just to keep things simpler while I iterate and
         | the fact that it's not required currently. I'll bring GraphQL
         | back once I've got key cycling in place and things are more
         | stable. As for the rate limit, I've been tweaking things
         | manually to avoid hitting it constantly which I did to an
         | extent--that's actually why I want to add key rotation... and I
         | am allowed to leave comments for myself for a work in progress
         | no? or does everything have to be perfect from day one?
         | 
         | You would assume if it was pure ai generated it would have the
         | correct rate limit in the comments and the code .... but
         | honestly I don't care and yeah I ran the read me through GPT to
         | 'prettify it'. Arrest me.
        
       ___________________________________________________________________
       (page generated 2025-05-12 23:01 UTC)