[HN Gopher] Show HN: CLI that spots fake GitHub stars, risky dep...
___________________________________________________________________
Show HN: CLI that spots fake GitHub stars, risky dependencies and
licence traps
When I came across a study that traced 4.5 million fake GitHub
stars, it confirmed a suspicion I'd had for a while: stars are
noisy. The issue is they're visible, they're persuasive, and they
still shape hiring decisions, VC term sheets, and dependency
choices--but they say very little about actual quality. I wrote
StarGuard to put that number in perspective based on my own
methodology inspired with what they did and to fold a broader
supply-chain check into one command-line run. It starts with the
simplest raw input: every starred_at timestamp GitHub will give. It
applies a median-absolute-deviation test to locate sudden bursts.
For each spike, StarGuard pulls a random sample of the accounts
behind it and asks: how old is the user? Any followers? Any
contribution history? Still using the default avatar? From that, it
computes a Fake Star Index, between 0 (organic) and 1 (fully
synthetic). But inflated stars are just one issue. In parallel,
StarGuard parses dependency manifests or SBOMs and flags common
risk signs: unpinned versions, direct Git URLs, lookalike package
names. It also scans licences--AGPL sneaking into a repo claiming
MIT, or other inconsistencies that can turn into compliance
headaches. It checks contributor patterns too. If 90% of commits
come from one person who hasn't pushed in months, that's flagged.
It skims for obvious code red flags: eval calls, minified blobs,
sketchy install scripts--because sometimes the problem is hiding in
plain sight. All of this feeds into a weighted scoring model. The
final Trust Score (0-100) reflects repo health at a glance, with
direct penalties for fake-star behaviour, so a pretty README badge
can't hide inorganic hype. I added for the fun of it it generating
a cool little badge for the trust score lol. Under the hood, its
all uses, heuristics, and a lot of GitHub API paging. Run it on any
public repo with: python starguard.py owner/repo --format markdown
It works without a token, but you'll hit rate limits sooner.
Please provide any feedback you can.
Author : artski
Score : 89 points
Date : 2025-05-12 12:59 UTC (10 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| hungryhobbit wrote:
| Dependencies: PyPI, Maven, Go, Ruby
|
| This looks like a cool project, but why on earth would it need
| Python, Java, Go, AND Ruby?
| deltaknight wrote:
| I think these are just the package managers that it supports
| parsing dependencies for. The actual script seems to just be a
| single python file.
|
| It does seem like the repo is missing some files though; make
| is mentioned in the README but no makefile and no list of
| python dependencies for the script that I can see.
| artski wrote:
| Yeah to be fair I need to clean it up, was stuck in the
| testing diff strategies and making it work and just wanted to
| get feedback asap before moving on to the next step (didn't
| want to spend too much time on something and turns out I was
| wrong about something badly) - next step is to get it all
| cleaned up.
| 27theo wrote:
| It doesn't need them, it parses SBOMs and manifests from their
| ecosystems. I think you misunderstood this section of the
| README.
|
| > Dependencies | SBOM / manifest parsing across npm, PyPI,
| Maven, Go, Ruby; flags unpinned, shadow, or non-registry deps.
|
| The project seems like it only requires Python >= 3.9!
| nottorp wrote:
| Of course, github could just drop the stars, but everything has
| to entshittify towards "engagement" and add social network
| features.
|
| Or users could ignore the stars and go old school and you know,
| research their dependencies before they rely on them.
| Vanclief wrote:
| Stars are just a signal. When I am looking at multiple
| libraries that do the same, I am going to trust more a repo
| with 200 starts that one with 0. Its not perfect, but I don't
| have the time to go through the entire codebase and try it out.
| If the repo works for me I will star it to contribute to the
| signal.
| shlomo_z wrote:
| If that works for you, great. I don't do that. I don't even
| check how many stars it has.
|
| I check the docs, features, and sometimes the code quality.
| Sometimes I check the date of the last commit.
| mlhpdx wrote:
| I tend to put more attention on repos with 15-75 (ish) stars.
| Less is something obscure or unproven maybe, and above ~500
| is _much_ more likely to be BS /hype.
| tough wrote:
| I use stars for bookmarking purposes, i wouldn't care if they
| go private but would miss the feature
| aquariusDue wrote:
| Same along with lists. I've got more than a thousand
| starred repos by now.
| tough wrote:
| Sadly lists had a hard cap at 32 or 36 or something like
| that.. i was too eager early with my specificity (hav
| elists w 1 repo) and now i cant make new ones (need to
| delete others)
|
| lol
|
| found a couple non-maintained projects for managing them
|
| https://github.com/astralapp/astral
| https://github.com/gkze/gh-stars
| benwilber0 wrote:
| Github was a "social network" from its very beginning. The
| whole premise was geared around git hosting and "social
| coding". I don't think it became enshittified later since that
| was the entire value proposition from day 1.
| nottorp wrote:
| Funny, I'm pretty sure I paid them just so I don't have to
| maintain my own git hosting.
|
| I never even noticed the stupid stars until they started
| being mentioned on HN.
| rafram wrote:
| See the tagline under the logo, May 14, 2008: https://web.a
| rchive.org/web/20080514210148/http://github.com...
| nottorp wrote:
| I'm sorry, but I never read the main github site. I only
| spend a few seconds on it when my login expires and I
| need my repository list.
|
| 99.99999% of my interaction is via git pull and push :)
| eikenberry wrote:
| There are tons of places you can use for simple git
| hosting. The only reason to use github over the others is
| due to the social factors. Because everyone already has an
| account on it so they can easily file issues, PRs, etc. For
| simple git hosting, github leaves a lot to be desired.
| _bin_ wrote:
| You may like Drew Devault's https://sr.ht more
| Am4TIfIsER0ppos wrote:
| What is a license trap? This "AGPL sneaking into a repo claiming
| MIT"? Isn't that just a plain old license violation?
| artski wrote:
| Basically what I mean by it is for example a repository appears
| to be under a permissive license like MIT, Apache, or BSD, but
| actually includes code that's governed by a much stricter or
| viral license--like GPL or AGPL--often buried in a
| subdirectory, dependency, or embedded snippet. The problem is,
| if you reuse or build on that code assuming it's fully
| permissive, you could end up violating the terms of the
| stricter license without realising it. It's a trap because the
| original authors might have mixed incompatible licenses,
| knowingly or not, and the legal risk then falls on downstream
| users. So yeah essentially a plain old license violation which
| are relatively easy to miss or not think about
| tough wrote:
| oh interesting you put a word on it, most of the VC funded
| FOSS -open- core apps/saas that have pop up the past years
| are like this
|
| the /ee folders are a disgrace
| tough wrote:
| they get around it by licensing differently only packages /
| parts of the codebase
| the__alchemist wrote:
| > It checks contributor patterns too. If 90% of commits come from
| one person who hasn't pushed in months, that's flagged.
|
| IMO this is a slight green flag; not red.
| lispisok wrote:
| It's gonna flag most of the clojure ecosystem
| throwaway150 wrote:
| Yep, and it's not just Clojure. This will end up flagging
| projects across all non-mainstream ecosystems. Whether it's
| Vim plugins, niche command-line tools, academic research
| code, or hobbyist libraries for things like game development
| or creative coding, they'll likely get flagged simply because
| they're often maintained by individual developers. These devs
| build the projects, iterate quickly in the early stages, and
| eventually reach a point where the code is stable and no
| longer needs frequent updates.
|
| It's a shame that this tool penalizes such projects, which I
| think are vital to a healthy open source ecosystem.
|
| It's a nice project otherwise. But flagging stable projects
| from solo developers really sticks out like a sore thumb. :(
| artski wrote:
| It would still count as "trustworthy" just wouldnt come out
| to 100/100 :(.
| sethops1 wrote:
| I have to agree - the highest quality libraries in my
| experience are the ones maintained that one dedicated person as
| their pet project. There's no glory, no money, no large
| community, no Twitter followers - just a person with a problem
| to solve and making the solution open source for the benefit of
| others.
| artski wrote:
| Fair take--it's definitely context-dependent. In some cases,
| solo-maintainer projects can be great, especially if they're
| stable or purpose-built. But from a trust and maintenance
| standpoint, it's worth flagging as a signal: if 90% of commits
| are from one person who's now inactive, it could mean slow
| responses to bugs or no updates for security issues. Doesn't
| mean the project is bad--just something to consider alongside
| other factors.
|
| Heuristics are never perfect and it's all iterative but it's
| all about understanding the underlying assumptions and taking
| the knowledge you get out of it with your own context. Probably
| could enhance it slightly by a run through an LLM with a prompt
| but I prefer to keep things purely statistical for now.
| delfinom wrote:
| The problem is your audience is:
|
| > CTOs, security teams, and VCs automate open-source due
| diligence in seconds.
|
| The people that probably have less brain cells than the
| average programmer to understand the nuance in the flagging.
| artski wrote:
| Lol yeah tbh - I just made it without really thinking of an
| audience, just was looking for a project to work on till I
| saw the paper and figured it would be cool to check it out
| on some repositories out there. That part is just me asking
| gpt to make the read me better.
| 85392_school wrote:
| It could also mean that the project is stable. Since you only
| look at the one repository's commit activity, a stable
| project with a maintainer who's still active on GitHub in
| other places would be "less trustworthy" than a project
| that's a work in progress.
| artski wrote:
| Not a bad idea tbh, maybe an additional how long issues are
| left open, would be a good idea. Though yeh thats why I was
| contemplating of not necessarily highlighting the actual
| number and more have a range e.g. 80-100 is good, 50-70
| Moderate and so on.
| InvisGhost wrote:
| Be careful with this. Each project has different
| practices which could lead to false positives and false
| negatives. You may also create the wrong incentives,
| depending on how you measure and report things.
| kstrauser wrote:
| I agree. I have a popular-ish project on GitHub that I
| haven't touched in like a decade. I _would_ if needed, but
| it 's basically "done". It works. It does everything it
| needs to, and no one's reported a bug in many, many years.
|
| You could etch that thing into granite as far as I can
| tell. The only thing left to do is rewrite it in Rust.
| mlhpdx wrote:
| The signal here is how many unpatched vulnerabilities there
| are maybe multiplied by how long they've been out there.
| Purely statistical. And an actual signal.
| knowitnone wrote:
| Great idea. This should be done by Github though. I'm surprised
| Github hasn't been sued for serving malware.
| artski wrote:
| Yeah to be fair would be great, sometimes just giving a nudge
| and showing people want these features is the first step to
| getting an official integration.
| swyx wrote:
| > I'm surprised Github hasn't been sued for serving malware.
|
| do you want a world where people can randomly sue you for any
| random damages they suffer or do you want nice things like free
| code hosting?
| MrDarcy wrote:
| I'm not sure if you're being sarcastic but if the claim of
| damages is likely to win then I'd like someone to hear it.
| unclad5968 wrote:
| In the US people can already randomly sue you for any random
| damages. I could sue github right now even if I'd never
| previously heard of or interacted with the site.
| KomoD wrote:
| > do you want a world where people can randomly sue you for
| any random damages they suffer
|
| Isn't that already a thing, but in the US, not the entire
| world.
| edoceo wrote:
| Could you add support for PHP via package.json? Accept patch?
| artski wrote:
| I haven't done that before so it would be a small learning
| curve for me to figure that out. Feel free to make a pull
| request.
| feverzsj wrote:
| CTOs don't care about github stars. They are behind tons of
| screening processes.
| throwaway314155 wrote:
| Believe me, CTO's of startups do.
| binary132 wrote:
| I approve! It would be cool to have customizable and transparent
| heuristics. That way if you know for example that a burst of
| stars was organic, or you don't care and want to look at other
| metrics, you can, or you can at least see a report that explains
| the reasoning.
| nfriedly wrote:
| I love the idea! How feasible would it be to turn it into a
| browser extension?
| coffeeboy wrote:
| Very nice! I'm personally looking into bot account detection for
| my own service and have come up with very similar heuristics
| (albeit simpler ones since I'm doing this at scale) so I will
| provide some additional ones that I have discovered:
|
| 1. Fork to stars ratio. I've noticed that several of the "bot"
| repos have the same number of forks as stars (or rather, most
| ratios are above 0.5). Typically a project doesn't have nearly as
| many forks as stars.
|
| 2. Fake repo owners clone real projects and push them directly to
| their account (not fork) and impersonate the real project to try
| and make their account look real.
|
| Example bot account with both strategies employed:
| https://github.com/algariis
| ngangaga wrote:
| > they still shape hiring decisions, VC term sheets, and
| dependency choices
|
| This is nuts to me. A star is a "like". It has carries no signal
| of quality and even its popularity proxy is quite weak. I can't
| remember the last time I looked at stars and considered them
| meaningful.
| zxilly wrote:
| Frankly, I think this program is ai generated.
|
| 1. there are hallucinatory descriptions in the Readme (make
| test), and also in the code, such as the rate limit set at line
| 158, which is the wrong number
|
| 2. all commits are done on github webui, checking the signature
| confirms this
|
| 3. too verbose function names and a 2000 line python file
|
| I don't have a complaint about ai, but the code quality clearly
| needs improvement, the license only lists a few common examples,
| the thresholds for detection seem to be set randomly,
| _get_stargazers_graphql the entire function is commented out and
| performs no action, it says "Currently bypassed by get_
| stargazers", did you generate the code without even reading
| through it?
|
| Bad code like this gets over 100stars, it seems like you're doing
| a satirical fake-star performance art.
| zxilly wrote:
| I checked your past submissions and yes, they are also ai
| generated.
|
| I know it's the age of ai, but one should do a little checking
| oneself before posting ai generated content, right? Or at least
| one should know how to use git and write meaningful commit
| messages?
| artski wrote:
| It's a project I'm making purely for myself and I like to
| share what I make - sorry I didn't put up most effort in the
| commit messages, will not do that again.
| artski wrote:
| Well I initially planned to use GraphQL and started to
| implement it, but switched to REST for now as it's still not
| fully complete, just to keep things simpler while I iterate and
| the fact that it's not required currently. I'll bring GraphQL
| back once I've got key cycling in place and things are more
| stable. As for the rate limit, I've been tweaking things
| manually to avoid hitting it constantly which I did to an
| extent--that's actually why I want to add key rotation... and I
| am allowed to leave comments for myself for a work in progress
| no? or does everything have to be perfect from day one?
|
| You would assume if it was pure ai generated it would have the
| correct rate limit in the comments and the code .... but
| honestly I don't care and yeah I ran the read me through GPT to
| 'prettify it'. Arrest me.
___________________________________________________________________
(page generated 2025-05-12 23:01 UTC)