[HN Gopher] 4.5M Suspected Fake Stars in GitHub
___________________________________________________________________
4.5M Suspected Fake Stars in GitHub
Author : qianli_cs
Score : 130 points
Date : 2024-12-29 14:30 UTC (4 days ago)
(HTM) web link (arxiv.org)
(TXT) w3m dump (arxiv.org)
| mentalgear wrote:
| In a world with so much fake PR stuff and AI slop, any and all
| project that tries to verify what's real and what's not is an
| excellent choice of topic, fostering integrity again in the our
| industry.
|
| Here's the actual repo: https://github.com/hehao98/StarScout
| hobs wrote:
| Allow it a week to finish all iterations and expect it to read >=
| 40TB of data. You can use nohup to put it as a background
| process. After a week, you can run the following command to
| collect the results into MongoDB and local CSV files:
|
| I just love the yolo nature of "well let's check in a week if
| that 40TB of data processing worked"
| queuebert wrote:
| Reminds me of this famous paper: https://arxiv.org/abs/astro-
| ph/9912202
| prepend wrote:
| I don't like stars as a metric. Or at least as a comparator. If
| you brag about having a millions stars that says something as a
| million is a lot.
|
| But if you brag that your project as a million and your
| competitor has half a million, that is so illogical that I would
| discount your project and think it's run by dummies.
|
| Are there practical situations where people really need stars
| enough to buy them?
| hiccuphippo wrote:
| My only guesses are people showing popular repos for their CV
| or to appear legitimate to get access to another repo like what
| happened with the xz utils backdoor.
| bdcravens wrote:
| There's also the third category of projects receiving
| funding.
| datadrivenangel wrote:
| And for a while startups were using it as a traction metric
| for open core projects when pitching to VCs.
| pembrook wrote:
| Once you start trying to make a living from anything you do
| online, you start to realize that literally everything on the
| internet is gamed to extreme. Even this article was written and
| posted here for a reason.
|
| If your GitHub repo can in any way provide you with income
| (from just having something to talk about in an interview or an
| innocuous "buy me a coffee link"...all the way up to selling
| $100,000/yr enterprise support plans), you now have a strong
| incentive to game the system.
|
| And if it's allowed by the system, then it's a prisoners
| dilemma. Because if you DON'T do it, your competition will do
| it and eat your lunch.
|
| That's why it's so important to design high integrity ranking
| systems.
| magic_smoke_ee wrote:
| The "like" metric is dumbed-down to self-amplifying popularity
| that hovers around meaninglessness. It would be more valuable
| to weight things based who else you respect also rate a
| particular item.
| mrweasel wrote:
| Github really wants to be a social network or something to that
| effect and I get the feeling that most developer don't care. If
| you log in it's pretty clear that the "front page" is suppose
| to be something like a feed, but I don't know anyone who uses
| it. Mine is completely blank and pointless. Stars I suppose is
| to be something akin to a like, maybe.
|
| I have plenty of Github projects bookmarked, but I never
| "stared" one... Why would I?
| dang wrote:
| (We merged comments from
| https://news.ycombinator.com/item?id=42573954 to this thread.)
|
| https://www.bleepingcomputer.com/news/security/over-31-milli...
| zitterbewegung wrote:
| All metrics will be gamed at some point. I don't know exactly how
| you could even fight this.
| jasoneckert wrote:
| Neither do I.
|
| I believe the only thing anyone can do is take metrics of how
| the metrics are gamed, as this particular paper has done.
| jazzyjackson wrote:
| there's various reasons webs-of-trust don't takeoff, but I can
| imagine a system where the metrics I see are only aggregated
| from friends-of-friends, and any other signal is just
| considered untrustworthy and therefor not worth observing
| drusepth wrote:
| Do you still trust that system when your friends-of-friends
| are the ones gaming the system? Given the inherent network
| effects of manipulating webs of trust, I wouldn't be
| surprised if everyone had at least one friend-of-a-friend
| they shouldn't necessarily trust.
| morkalork wrote:
| Given all the obvious bots and sketchy recruiters that try
| to connect with me on LinkedIn, who all appear to have at
| least one mutual connection, it probably won't work.
| jagged-chisel wrote:
| Do we have a similar issue on GH? I think the nature of
| the service and its target audience affect this problem
| in a big way. You can follow anyone on GH, but there's no
| mutual connection option at all. LI has following _and_
| mutual connections. LI also has a much wider audience.
|
| How might a 'connection' look on GH? Will people freely
| connect, or will they appraise requests more closely?
| wruza wrote:
| I can imagine access to raw data instead of some stupid come-
| on-game-me-able predefined indicator, and that I can run some
| private statistical analysis over it. People would use (and
| share) different algorithms and gamers will at least wander
| through this collectively created mud without any
| understanding except for the defaultest measures.
|
| But of course this is too complex and "no one will use it"
| (tm). So we'll better have a screwed up recommendation system
| that doesn't work at all, cause that's simpler!
| codetrotter wrote:
| I can only speak for me personally. For me the way that I use
| GitHub I don't think the concept of "friends of friends"
| would be all that useful on GitHub.
|
| There are a handful of people that I know IRL that I follow
| on GitHub. And a few hundred that I follow in total. Out of
| the handful of people I know IRL, and who I follow on GitHub,
| only two or three of them are active there any given week.
| All of the other people I follow I have very little idea who
| they are. Usually I follow people I don't know if I come
| across their profile and either the profile itself or their
| projects make me follow them. But I star way more different
| repos than the number of people I click follow on.
|
| For me, the main way of discovering new repos are:
|
| - Frontpage of HN, and comments in posts on HN.
|
| - Specific search results on Google when I have searched for
| libraries or programs that do specific things.
|
| - Libraries on crates.io that I think might be interesting to
| look into in the future.
|
| Maybe once or twice a month I happen to click on the main
| page of GitHub itself and see mentions of repos that have
| been committed to or starred or created by people I follow.
|
| So for me I don't think "friends of friends" is a
| particularly great signal for things to look at. Most of the
| people I follow, I don't know much about them.
|
| Likewise, for anyone that follows me it's not necessarily any
| strong signal that I follow someone else in order to
| determine if activity from that someone else should be shown
| or weighted as more significant to my follower just because I
| happen to follow that other person.
|
| If you do want a strong signal for who to boost for my
| followers based on my own activity, go and look at the
| dependencies that I am using in my own projects. That's a
| pretty good indicator that I put some amount of effort and
| interest into looking at something. This could be done by
| GitHub itself, parsing the Cargo.toml files of my projects
| and extracting the dependencies section and looking up which
| of those dependencies are hosted on GitHub.
| kube-system wrote:
| Maybe so, but in this case, I don't think 'stars' is a good
| candidate for one of those metrics. I think the people
| worried about 'fake stars' are doing it wrong, and should
| just ignore the metric entirely.
| begueradj wrote:
| It comes down to fighting against the human nature. And that's
| a lost battle.
|
| Set any law you want, our nature will push us to circumvent it
| even legally.
| thrance wrote:
| Not nature no, it's all about incentives. Oftentimes it's
| financial, for github stars it's prestige and visibility.
| mentalgear wrote:
| Most people are happy living in a fair ecosystem - it's only
| the 1-2% of the population that seek control, money and power
| that start trying to exploit the system.
|
| Only if we let that minority keep manipulating the system
| without consequences, it becomes the driving market force
| that the rest of the population also feels they have to
| comply to, to go along, as it already has happened in
| finance, academia, etc.
| JumpCrisscross wrote:
| > _Most people are happy living in a fair ecosystem_
|
| For varying and self-serving definitions of fair. (Almost
| everyone in the rich world is in an unfairly-advantaged
| minority.)
| vouaobrasil wrote:
| I don't really think so. The Amish have a nice system. Their
| society has many fewer bad actors compared to general
| society.
|
| Actually one of the keys is repeated contact. People who have
| to interact again and again will try and game the system
| less. Not sure how to build that into a star system but why
| give up so easily? Do programmers give up when you say "this
| algorithm can't be made any faster?"
| JumpCrisscross wrote:
| > _one of the keys is repeated contact_
|
| The other is hierarchy. You can't automate reputation
| scoring.
| eddythompson80 wrote:
| I don't think it's just the Amish. Collectivist cultures in
| general have (or maybe perceived to have, I don't know)
| fewer bad actors compared to individualistic cultures.
|
| It doesn't matter if people have to interact frequently if
| there is no real consequences to that interaction. The
| punishment in those collectivist cultures involves social
| shunning, shaming, etc. Individualistic cultures almost
| pride themselves on how much they can disregard social
| shunning and shaming. Shameless people are celebrities and
| elected officials. They are admired as opposed to shunned
| and ignored. A bad actor in an Amish community is expelled
| and loses access to what that community offers. That would
| be illegal in the general society unless their "bad act"
| was actually illegal. Discriminating against someone for
| being a dickhead who exploits loopholes and unregulated
| corner cases (without explicitly breaking the law) would be
| illegal in many contexts.
|
| > Not sure how to build that into a star system but why
| give up so easily? Do programmers give up when you say
| "this algorithm can't be made any faster?"
|
| I don't think people have given up. Online fraud detection
| is a massive industry as is. Spotify plays, YouTube views,
| Google search, Amazon reviews, reddit upvotes, twitter's
| retweets, facebook likes/shares, etc all fall exactly into
| the same bucket. There is even a significant dollar amount
| attached to many of those more so that GitHub stars. All
| are frequently gamed/faked and it's a battle between the
| platforms and the adversary
| nwienert wrote:
| Only show "Active Developer Stars" by default:
|
| - Only accounts that have a decent amount of activity (pushing
| code, commenting, etc)
|
| - Has set up SSH
|
| - Older than 2 years
|
| - Account active consistently for at least a year
|
| - Must have 2-factor enabled
|
| - Filled out profile
|
| etc
| stevage wrote:
| So now all the bots are pushing code, have SSH etc...
| zitterbewegung wrote:
| I've heard of gaming GitHub stars by asking their friends to
| star their projects which would get around all of your
| bullets. Hence why I said it would be hard to game.
| eddythompson80 wrote:
| All of those are very, very, easy to automate. There are
| plenty of bot accounts that have _unintentionally_ checked
| the full list.
| nwienert wrote:
| You can find a set of requirements that aren't. Eg 2-factor
| can include phone number. And activity requirements can be
| based on repo maturity (no just pushing to random empty
| repos).
|
| And while some boy accounts may have them, I doubt many
| have most.
|
| Also, you argue on semantics but the general idea of
| setting up a legitimacy test that factors in various things
| is very easily doable, the factors can be kept private, and
| you definitely can find ones that are generally hard to
| game.
| gruez wrote:
| >You can find a set of requirements that aren't. Eg
| 2-factor can include phone number. And activity
| requirements can be based on repo maturity (no just
| pushing to random empty repos).
|
| Then you have people complaining about being
| "shadowbanned" (because there's no recourse if you're a
| person and the algorithm thinks you're not active
| enough), or that github is being anti-privacy (by
| requiring phone number). It's hard to win here.
| wholinator2 wrote:
| I think the point is that these requirements are not
| published, and they are not requirements to use stars.
| Anyone can star, no one knows whether their account is
| contributing to the star count. Now, presumably you could
| star a thing and check if the number went up but maybe
| introduce slight randomness or delay to obfuscate even
| those details. I remember when reddit removed the total
| upvote/downvote counts from the ui
| eddythompson80 wrote:
| The point is that this is not arguing on semantics nor is
| it as simple as just a "set of requirements" that they
| just follow. Battling fraud online is an entire business
| in itself. Take Spotify plays, YouTube views, Google
| search ranking, Amazon reviews, reddit votes, etc. These
| organizations have significantly more incentives than
| GitHub to reduce fraud in these metrics, and while they
| do, it's still really really hard and it's very easy to
| show how these metrics are gamed/faked all the time.
|
| It's not a matter of "here is a list of requirements that
| no one knows about, and here is slight randomness/delay
| to obfuscate".
|
| How much do you think it takes to pay an actual human
| from a poor country to come to work each day at 8am,
| create one github account after another, enter them in a
| database, and leave at 5pm?
|
| If you want to "study" how github handles stars because
| there is legitimate financial incentive for you in it,
| for $100 a day you can pay 10 or 20 of those people to
| create few thousands accounts a day. Do it few times a
| month, and throw these accounts in an automated system
| that creates random repos, pushes a few commits here and
| there, etc. Also "introduce some slight randomness or
| delay to obfuscate these events". Do some A/B testing to
| figure how the 300k accounts under your control affect a
| repo star system, then advertise a "GitHub stars service"
| "$0.50 per guaranteed star on Github". Your average VC
| funded startup could get 10k stars for $5k.They probably
| give AWS 10 times that a month.
|
| Once github changes their requirements, do more testing,
| figure out what the requirements now are, then you're
| back in the game. If people do it all the time to
| Spotify, YouTube, Google, Amazon, Reddit, and Twitter,
| why do you think GitHub would somehow crack that nut?
| JumpCrisscross wrote:
| > _the point is that these requirements are not
| published_
|
| Well-connected people will get the tip off. And your PR
| team will have to keep batting down conspiracy theories,
| since if there's one thing the nutters love it's black
| boxes.
| the__alchemist wrote:
| Hmm. I don't have SSH, but have many GH projects, and have
| been active for a decade. So, I would be filtered out as not
| an active dev, with the spammers?
| nwienert wrote:
| Sure, but at least stars would be net more useful.
| uludag wrote:
| I believe networks of human individuals can solve this to a
| good degree assuming a particular topology exists.
|
| Like, imagine a group of professionals of decent sized, all
| specializing in a similar field, and having lots of strong
| connections between each other where they have ample
| opportunities to share information. It would be hard for an
| outsider to come in and astroturf their product without immense
| effort (like hiring shills to attend conferences). In-person
| networks also obviously solve the problem stars as reputation:
| reputation spreads naturally in these sorts of networks.
|
| I think the problem comes with algorithmic scale. Maybe a
| solution would be to have more community building activities
| (maybe preferably offline).
| aydyn wrote:
| Requiring real ID and showing _regional_ stars like
| Apple/Google would be a start.
| eddythompson80 wrote:
| > Requiring real ID
|
| Yeah, people would love that for sure.
|
| > showing _regional_ stars like Apple/Google would be a
| start.
|
| What does that mean? I thought regions only impact ranking
| not the net amount of stars (assuming we're talking about
| Apple/Google Maps). Which as far as I know, github doesn't do
| ranking.
| stronglikedan wrote:
| > Requiring real ID
|
| Sir, this is an HN.
| mentalgear wrote:
| doesn't mean why shouldn't fight back. That's exactly why we
| need research projects like these: to maintain the balance.
| 1propionyl wrote:
| Any metric that becomes a target ceases to be a good metric.
|
| The wrinkle is that measures that don't easily quantify are
| more resistant. For example, showing provable use by other
| reputable or trusted projects, or a significant amount of
| resources allocated to maintenance, or ...
|
| Really just anything that can't be reduced to a single number
| in a canonical way will in the long run prove far more useful
| for longer.
|
| This of course shifts some of the burden onto potential users
| to assess things more critically, and forecloses direct
| numerical comparison. But the idea that you could just look at
| a number and make such comparisons was faulty from the get go.
| sedatk wrote:
| Prioritize the stars given by accounts you follow in the UI.
| Done.
| p1esk wrote:
| I don't want to follow anyone, but I do give stars to repos I
| like.
| sedatk wrote:
| Then you'll have to start following the creators of repos
| you like to build a web of trust.
| awkward wrote:
| I can see github platform internals caring about this for
| anomaly detection, but as a developer, who cares? I suppose a
| botnet could be making fake stars on a malware project or
| supply chain attack, but the problem there doesn't seem like
| it's the number of stars.
| dzonga wrote:
| do stars even count ?
|
| my determination to use a project is 1. the readme 2. the issues
| tonymet wrote:
| recent commits and community engagement are better indicators
| Retric wrote:
| I'd generally rather use a library that hasn't needed to
| update in 5 years than something in active development.
| insane_dreamer wrote:
| the challenge is differentiating between "haven't need to
| update it in 5 years because it still solid and compatible
| with its ecosystem" vs "haven't updated it in 5 years
| because of any other reason"
| sixothree wrote:
| Sounds good in theory. But almost every time I use one of
| these projects, it's in "abandoned" status and definitely
| needs attention. There is 1 project I can point to that I
| use that does not actually need any maintenance and another
| that honestly makes me _extremely_ nervous to use because
| of lack of maintenance.
| mardifoufs wrote:
| Can you give me some examples? Because in my experience
| even very stable, very "foundational" libs and frameworks
| that I know about and use almost never go 5 years without
| any commit/change. There's always either a small bug fix,
| or some update to a build script, updated documentation, or
| something.
|
| The only repos where that's not the case are usually very
| niche, and in that case it becomes very hard to judge if
| the library is just very stable or a minefield of bugs and
| undesired behavior that no one else reported because no one
| else is using it.
| tonymet wrote:
| openssl?
| renewiltord wrote:
| It used to be a heuristic VCs would use to gauge popularity.
| You know how it is: if you have the revenue, talk about the
| revenue; if you only have the users, talk about the users; if
| you only have the stars, talk about the stars hehe
| muglug wrote:
| Sometimes projects get stars just because people like the
| personality or company behind the project.
|
| Case in point: https://github.com/facebook/hhvm/. It got 15,000
| stars in its first few years, but roughly 10 non-Facebook
| companies actually ever used it in production, and today only
| one non-Facebook company uses it (I work at that company).
| consumer451 wrote:
| Sometimes, they are surreal stars for surrealist languages
| that zero people actually use:
|
| https://github.com/TodePond/DreamBerd - 11.7k stars
| michaelmior wrote:
| That doesn't mean that the stars are just because people like
| the company. People may find the technology interesting even
| if they have no intent of using it.
| wildzzz wrote:
| A star is just a bookmark for me. It says nothing beyond "I may
| want to look at this again". When comparing two similar
| projects, I may look at the star counts to see which one is
| more popular but it's probably the last metric I'd consider.
| glaucon wrote:
| I agree, I am also interested in : date of most recent
| substantive commit; date of first commit; number of
| contributors.
|
| I don't have hard and fast rules for how I interpret those
| values, it depends on my intentions but I find them useful
| things to consider.
|
| Going back to the readme, nothing turns me off faster than a
| skeletal readme, it doesn't have to be "War and Peace" but it
| needs to be more than just how to install it.
| attentionmech wrote:
| I think number of clones is a much better metric (it's like proof
| of work, it needs compute to clone a repo). For me starring a
| repo is liking bookmarking it, nothing else. They might as well
| just mark it as "Bookmarked" instead of "Starred".
| nejsjsjsbsb wrote:
| A better metric until it becomes a target. Once it is a target,
| getting a billion clones is trivial.
|
| Github should just stop showing star counts. Who cares about
| them.
| attentionmech wrote:
| I think it's like a "upvote" thing which shows whether
| historically users have found the repo interesting. Even if
| you hide stars, there needs to be a way for the collective
| hivemind of github users to help each other with what repos
| are high quality or not right?
| rpdillon wrote:
| You don't need to crowdsource everything. I've never used
| stars as a good metric because it's literally zero effort.
| It's anybody who happens by just stars it, So all you can
| really conclude from star count is that this is interesting
| to this number of people.
|
| Two metrics that I think correlate extremely highly with
| quality: The number of commits in the repository and the
| date of the most recent commit. I've used a metric based on
| those two inputs for the past 15 years to evaluate repos
| and I am not disappointed. Depending on the nature of the
| project, I weigh the two attributes differently. Some
| projects are arguably, 'done', and so the date of the most
| recent commit is not very important in that case.
| michaelmior wrote:
| I think "interesting to this number of people" is not a
| meaningless metric, but I would agree on the two other
| metrics you cite.
| ryandrake wrote:
| There is a big difference between "highest quality" and
| "most popular." Online services constantly confuse the
| two because it's easier to measure popularity.
| LtWorf wrote:
| Except that most people don't bother starring stuff, so the
| few who do are drowned by noise of fake stars.
| ghxst wrote:
| I sort by most amount of stars quite frequently when I am
| learning a new language and want to know what the most
| popular package is for something. What do you think would be
| a better metric for a use case like that?
| arccy wrote:
| number of actual imports in code
| flippyhead wrote:
| CodeRank(tm)!
| nejsjsjsbsb wrote:
| This might work but biases against languages whose
| package managers are not used in the rank. As well as
| code that is used alot but not referenced via code
| directly e.g. drop in dlls.
| james_marks wrote:
| Goodheart's law - this would just cause imports in junk
| repos
| michaelmior wrote:
| I think it's a decent metric. I agree with the other
| comment that actual imports is probably a better metric,
| but that's not always as trivial to find.
|
| That said, the package repositories for many popular
| languages list stats of either declared dependencies or
| package downloads, which helps.
| LtWorf wrote:
| rdeps are completely broken in github. I wrote a library
| that I have used in other projects of my own and it was
| always at 0 users.
|
| Anyway if stuff is used by proprietary stuff it will also
| sit at 0.
|
| I now moved to codeberg where there is less spam,
| although it does have stars
| burnte wrote:
| Don't count any of my stars then, I thought it was a
| bookmark feature. Every repo I've starred is only starred
| to find later, not an endorsement from me.
| LtWorf wrote:
| VC apparently.
| pan69 wrote:
| A similar thing happens on npmjs.com where it shows downloads
| for packages, which is often used as a metric of quality.
| However, everytime a build pipeline runs and it pulls the
| package, that's a download.
| attentionmech wrote:
| May be with these rules: - Per user account we only count one
| clone - We don't count anonymous clones
|
| But I agree it's not like this is also without any issues
| michaelmior wrote:
| I don't think it's a useless metric and it's one I use
| myself, but it can also be gamed pretty easily. So the more
| people making decisions based on downloads, the higher the
| likelihood of bots generating downloads just to juice the
| stats.
| LtWorf wrote:
| And if your users know about "a cache" you won't get
| downloads. So iy's more beneficial if your users are the kind
| of noobs who redownload all the crap every single time rather
| than having fast CI
| Lerc wrote:
| The weird thing is I forked Freepascal to add an architecture
| of A VM I had written. It wasn't really useful to anyone else,
| but every now and again it earns a star from a random passer
| by.
| attentionmech wrote:
| Even I am curious now. Can you share me the fork? I want to
| see what you added there and how it's added.
| GZGavinZhao wrote:
| *sad noises from NixOS/nixpkgs, llvm/llvm-project, and all
| other repos with an absurd commit log/branches that takes ages
| to do a full clone
|
| (just a joke that immediately came to mind, not intended to
| undermine OP's idea)
| attentionmech wrote:
| default to git --shallow in the cli can be one option here.
| simoncion wrote:
| > (it's like proof of work, it needs compute to clone a repo)
|
| It's github's compute, so why do I (the person who's cloning
| the repo) care about the compute? I don't pay for it!
| david_allison wrote:
| I suspect GP is referring to counting the occurrences of `git
| clone` [on a fork?], rather than counting forks via the
| GitHub UI
| TZubiri wrote:
| That is absolutely the wrong takeaway. The correct takeaway is
| that supply chain attacks and spam are real threats, and that
| these metrics can be gamed by malicious actors.
|
| The work in cloning a repo is negligible, and the requirement
| of work is not a security design guarantee in github. The
| actual cost of liking projects is network, malicious actors
| need to create fake accounts, waste IP addresses and ip blocks
| in the process. Whether you are cloning or liking is just the
| last mile.
|
| To me the takeaway is not to trust a project based on it's
| github metrics, and by extension not to trust projects just
| because they are linked and liked in hacker news for example.
| And to be wary of how I introduce dependencies into my
| projects.
|
| Not just because of strictly malicious dependencies, but also
| because of trash dependencies that don't add value.
| james_marks wrote:
| > because of trash dependencies that don't add value
|
| And at best, will still need maintenance in the future. One
| of the top lessons I preach to juniors.
| galangalalgol wrote:
| I like the idea of granular permissions for libraries. When
| you include a dependency you whitelist permissions it gets.
| Package managers could automate this if the language
| supports it. But making it about permissions instead of
| metrics .akes it not arbitrary. This library gets no
| filesystem access, that one gets no network access. This
| one runs build time system commands... Austral is the only
| language I know of that supports such a thing. While it
| might be possible to bolt it on to rust, I think it would
| take so much rework to make it infeasible.
| yieldcrv wrote:
| if a supply chain attack is susceptible to that, its purely
| the fault of the crowd the relies on those metrics
| ATechGuy wrote:
| For all speculation around supply chain attacks with fake
| Github stars, the article says:
|
| "our study does not find any evidence of fake stars being
| used for social engineering attacks"
| robinsonb5 wrote:
| The weird thing is I've seen enough forks that have never seen
| any development that I'm pretty sure some people are using
| those as bookmarks rather than stars!
| neom wrote:
| I'm not a SWE but I use github still, I thought stars ARE
| bookmarks, what are stars then???? They're not for
| bookmarking????
| diego_sandoval wrote:
| I would think most people use it for bookmarking, but it
| seems like another portion of users use it as a "like"
| button.
| notpushkin wrote:
| It is kinda both. It also reposts the project for your
| GitHub followers.
| attentionmech wrote:
| They are currency of reputation and status. If you have
| enough stars, you get invited to private parties with
| elites. (I am just joking, they are bookmarks who got
| famous)
| LtWorf wrote:
| Nobody knows but since we are at the point where you can
| get VC money if you have enough, there is an incentive to
| get them.
| Terr_ wrote:
| AFAIK the "fork" option also helps guard against the original
| project getting deleted or somehow moved.
| datadrivenangel wrote:
| And forks on github have some bad ergonomics! Weird places
| where the upstream project still has control/influence over
| your fork. A full clone is better if you actually want
| control over the code fork.
| kube-system wrote:
| I would imagine those figures would mostly indicate which
| projects are most likely to be used in scripts or CI pipelines.
| burnte wrote:
| > They might as well just mark it as "Bookmarked" instead of
| "Starred".
|
| This is how I always interpreted the star feature and have used
| it as a bookmarking feature. I didn't know it was more akin to
| a like button!
| Suppafly wrote:
| >For me starring a repo is liking bookmarking it, nothing else.
|
| Literally all I ever use the stars for, I don't know what they
| are 'supposed' to be used for if not that.
| WA wrote:
| For me, it's "bookmarking obscure stuff". Why would I bookmark,
| say, React? I can find this easily. I only star stuff that has
| few stars and isn't as easy to find later.
| lprd wrote:
| Do we need that type of metric anyways? Surely there are better
| ways to measure a repo's activity...
| topspin wrote:
| It seems like a conceptionally simple problem to grade a repo
| given the vast number of metrics available. Especially
| considering the advanced code analysis tools available today. I
| want a top-level analysis of some sort, based on: usage by
| other software (if applicable,) activity, issue frequency and
| resolution, derivatives (forks, etc.,) number of participants,
| code maturity, code testing, release frequency, license
| structure and many other parameters.
|
| There is an opportunity here for a third party to do this well.
| ocean_moist wrote:
| The github social media features are so weird I get around 10
| follow requests per week from random people who follow >2k people
| something off happening there.
| mattbruv wrote:
| I have the same thing happen to me often. Sometimes I get a
| notification on my GitHub homepage that someone followed me a
| day or so ago, and when I click to view their profile it seems
| that they have already unfollowed me. For example, This guy did
| it, and he has 6K+ followers and is only following ~200:
| https://github.com/NobleMajo. It seems weird that he would
| follow me to unfollow me right away. I have a feeling that
| these accounts do this intentionally to harvest followers by
| prompting Github to show a ton of different people that he is
| following them in order to have them follow back in exchange. I
| think most people will follow someone back who follows them
| without really thinking about it. In my case I investigated who
| it was who followed me and realized he isn't actually following
| me and is probably harvesting followers. Why would someone
| waste time out of their life to do this? Who knows. Probably
| want to feel special or stand out from other people without
| doing anything to earn it.
| medv wrote:
| This means 4.5M fake accounts. GitHub does a good job of
| detecting bots, but room for improvements still exists.
| elashri wrote:
| That's not what the paper said. The numbers are much lower
| because not all starts are by unique accounts.
|
| > In total, StarScout identified 4.53 million fake stars across
| 22,915 repositories (before the postprocessing step designed to
| remove spurious ones), created by 1.32 million accounts; among
| these stars, 0.95 million are identified with the low activity
| signature and 3.58 million are identified by the clustering
| signature. In the postprocessing step, StarScout further
| identified 15,835 repositories with fake star campaigns
| (corresponding to 3.1 million fake stars coming from 278k
| accounts).
| ashvardanian wrote:
| Not surprising at all, honestly. The incentive to farm stars is
| massive. According to the article, 10K stars can cost just $1K,
| whereas achieving those numbers organically often takes years of
| work, millions in R&D, and countless deployments. When this
| seemingly trivial metric becomes a key factor in unlocking
| capital from VCs, it's no wonder people resort to shortcuts. In a
| way, the real surprise is that not everyone is buying stars.
| halamadrid wrote:
| Another interesting way - and I personally think its fraudulent.
| This is how it goes - run hackathons or sponsor events in
| Universities. There are a ton of colleges who are constantly
| seeking support to run events.
|
| Some companies take advantage of this by asking for stars in
| return of sponsorship. I have seen proposals that say for a $2000
| sponsorship - 2000 stars guaranteed. The way it works is if a
| participant registers in the event they also have to show proof
| that they starred a specific repo that belongs to the company.
| simoncion wrote:
| IMO, Github stars and number of "forks" are just as good a metric
| as "number of daily downloads" of a library or Docker image or
| similar.
|
| After noticing how many, many companies run many, many builds
| through their CI systems and (for a variety of reasons) end up
| re-downloading everything those builds require, regardless of
| whether or not it has changed since the last time they ran the
| build, I've come to the firm conclusion that these metrics are
| just plain bad if one uses them as a basis to make any
| significant decision.
| semiinfinitely wrote:
| sometimes I star my own github repos does that count as fake
| bdangubic wrote:
| it doesn't if you really like it :)
| openrisk wrote:
| If you were wondering about fake forks, spoiler alert
|
| > counts in Cluster 1 come from merchants that only sell stars,
| while accounts in Cluster 2 come from merchants selling stars and
| forks simultaneously
| johncoltrane wrote:
| $PROJECT was bookmarked 666 times with GitHub's internal
| bookmarking mechanism doesn't say much about a project.
|
| The fact that so many people give those bookmarks so much value
| that an entire ecosystem was built around "fake" bookmarks is
| mind boggling.
| gitgud wrote:
| GitHub Stars are just one of many signals that describe the
| quality of a project.
|
| If a project has 10,000 stars but 1 commit and a terrible
| README... then the star count doesn't have as much weight...
|
| You can't trust any signal in isolation (like star count), but
| looking at many signals together is quite reliable
| ivanjermakov wrote:
| In my experience, open/closed issues ratio is much more important
| than star count.
|
| Star count is how interested people are in this project, does not
| signify much about its quality. I would not star the repo of a
| tool I use everyday, but would star some obscure project to try
| it out later.
| Der_Einzige wrote:
| I wrote a whole benchmark which is not only resistant to this,
| but would automatically detect most fake stars!
|
| https://github.com/Hellisotherpeople/Bright
| casenmgreen wrote:
| I'm rather surprised it's only 4.5m.
___________________________________________________________________
(page generated 2025-01-02 23:00 UTC)