[HN Gopher] Tracking the Fake GitHub Star Black Market
___________________________________________________________________
Tracking the Fake GitHub Star Black Market
Author : kaeruct
Score : 445 points
Date : 2023-03-18 07:42 UTC (15 hours ago)
(HTM) web link (dagster.io)
(TXT) w3m dump (dagster.io)
| amsterdorn wrote:
| GitHub is fully aware of these, would they consider something
| like a "confirmed" star count that subtracts the suspicious/fake
| number? Or is that too much of a slippery slope.
| mapmeld wrote:
| GitHub gradually removes these users as they catch up to them,
| so not helpful to have extra steps. I have a couple of repos
| which were briefly popular, so when a new user stars it today,
| and I see 1000s of other stars, it's suspicious and I get a
| peek into their world.
|
| There are obvious numeric usernames, but also fake orgs with
| repos for the users to fork and interact with, and a few
| account takeovers (i.e. someone had signed up for GitHub in
| 2015 to make a free wedding website, abandoned it, and the
| account fell into spammer hands). These used to be easier to
| report.
| Azadzadeh wrote:
| >GitHub gradually removes these users as they catch up to
| them
|
| With collaterals too I presume [1]. I guess I've been the
| victim of some automated system. They have banned my account
| without warning or explanation and they've been ignoring my
| support tickets for about 2 months!
|
| [1]: https://news.ycombinator.com/item?id=34817163
| penguin_booze wrote:
| My ex-employer used Github stars in their job description and
| during recruitement pitches. They regularly encouraged employees
| to go and star the firm's repos in Github. In all-hands meetings,
| the Github stars were one of the items they reported: "we've
| surpassed X in Github stars" (applause).
|
| (The firm X, however, is a more well-known name than my ex-
| employer was).
|
| A while ago, I listened to a Freakonomics episode where it was
| discussed that businesses use proxies to both boost their image
| and to cover up their incompetency. The example was that a lot of
| businesses chose fancy names starting with A (like, AAA
| plumbers), so that they get listed first in business directories.
| These firms were later proven to be very incompetent and/or even
| fraudulent.
|
| The relevant paper, also cited in the episode, was "A Business by
| Any Other Name":
| https://papers.ssrn.com/sol3/papers.cfm?abstract_id=1667550.
| moneywoes wrote:
| Podcast episode name please
| shagie wrote:
| Not sure if this is it, but 552. Is Google Getting Worse has
| the 'AAA Plumbers' in it.
|
| https://freakonomics.com/podcast/is-google-getting-worse/
| malshe wrote:
| Small correction, the episode is 522
| lobocinza wrote:
| The tech version of this is SaaS companies advertising on
| Reddit.
| karakanb wrote:
| do you mind elaborating on this? I am using Reddit to
| advertise some of my projects because it seems like a
| relevant crowd to advertise to, but I am curious to hear how
| it would be perceived.
| goodoldneon wrote:
| They were incompetent because they didn't have enough As. I
| exclusively use AAAAAAAAAAAA Plumbers
| photochemsyn wrote:
| Apparently seven is the sweet spot for visual recognition at
| a glance, so I'd go with AAAAAAA Plumbers instead.
| siva7 wrote:
| Is there even such a thing as a github influencer (people living
| just from github)?
| blitzar wrote:
| I am going start posting linkedin influencer style "content" on
| my github for clout.
| Hackbraten wrote:
| Twenty pull requests every morning. That's my plan for 2024.
| hoofhearted wrote:
| Taylor Otwell lol.. He has some pretty dope cars in his garage
| and is doing well.
|
| I follow him on GitHub, and pay for some of his products. I
| have been heavily influenced by his coding styles, and the
| tools he uses. His code just looks so tight and perfect. He
| writes his stuff so open ended and reusable that he basically
| writes a method once, and then reuses it across numerous
| projects.
|
| Look at this tight code:
| https://github.com/laravel/framework/blob/10.x/src/Illuminat...
|
| I'd say that Adam Wathan is rapidly growing his influence as
| well, and is probably doing alright too.
| jorgesborges wrote:
| The multiple-line comment styling is so pleasingly
| pathological -- each descending line has a few characters
| less than the last.
| supriyo-biswas wrote:
| People working in DevRel often aggregate developer oriented
| content and gain popularity that way, an example would be
| "swyx" for example. I'm not taking a dump on his work, but you
| can see the Github influencer effect over there.
| wodenokoto wrote:
| Never heard of swyx.
|
| Self proclaimed GitHub star. But still only 5000 followers
| and projects max out at 8000 stars.
|
| I don't know what I had expected but I think it was bigger
| numbers than that.
|
| https://github.com/sw-yx
| delusional wrote:
| The "github star" claim links to the source (it's some
| github program where you can nominate people to be accepted
| into some promotion campaign). Saying self proclaimed makes
| him sound pretentious, it's actually awarded by github.
| latexr wrote:
| You can be factual and still sound pretentious and
| cringey. Like the medical doctors who insist on being
| called "doctor", to the point of smuggly "correcting"
| strangers in a social setting.
|
| I don't know this user and won't assume his intentions,
| but I can see how having "I'm a GitHub star [star emoji]"
| as the first sentence on the profile is doing him a
| disservice: it makes it seem like it's the most
| impressive thing he's achieved and diminishes everything
| else.
| swyx wrote:
| fixed. i wrote that when i was still trying to be
| approachable and cutesy. now i dont need it lol.
| tylerhannan wrote:
| I love the edit in GH. So much.
|
| Thank you for the work you do and for how much you have
| contributed to people learning over the years. <3
| newmac wrote:
| FWIW a smug doctor usually corrects people that they are
| a Physician.
| latexr wrote:
| Maybe in English, but in my native tongue there's no word
| for physician.
|
| Also, I meant in the sense that you call someone "mister
| McSmug" and they reply almost angrily with " _doctor_
| McSmug".
| adamgordonbell wrote:
| swyx is on hn and legit great writer. He's influenced my
| thinking in many areas.
|
| I've never seen his github account before but I expect that
| people following him there are doing so because of the
| content he's putting out. His blog has been on the HN
| Frontpage many times and has a book about developer career
| building.
|
| My github account isn't as pimped out as his, but marketing
| yourself isn't toxic, it's smart.
| swyx wrote:
| love and appreciate your work as well adam (everyone
| check out Corecursive https://hn.algolia.com/?dateRange=a
| ll&page=0&prefix=true&que... )
|
| i honestly dont even view my github readme as "marketing
| yourself". most pple dont even go to an individual's
| profile in the first place, but if you do its kinda like
| a cute little myspace thing where you can let people know
| you as a human being and be a little quirky. i certainly
| dont hold myself out as an authority on writing the best
| software in the world and hey if 40k stars on the react-
| typescript stuff doesnt count i'm alright with that
| tbragin wrote:
| Agreed that marketing yourself is not toxic. I follow
| "swyx" on Twitter and find his insight valuable, and so
| do a lot of my peers. Btw, looks like his Github profile
| has not been updated for some time - he's no longer Head
| of DX at Airbyte and is now an independent consultant.
| https://www.swyx.io/about
| swyx wrote:
| appreciate it but also whoa this literally just happened
| and its freaky how up to date you are. consulting is
| temporary (check out https://www.trychroma.com/ if you
| are exploring LangChain/OpenAI apps and need an
| embeddings database) and i'm working on an ai infra
| startup idea on the side with a couple cofoudners.
| tbragin wrote:
| Congrats! I'll be watching :)
| rozenmd wrote:
| I didn't even know Shawn had a popular GitHub, though he has
| written about the meta-creator ceiling before:
| https://www.swyx.io/meta-creator-ceiling
| swyx wrote:
| yeah i also am surprised that people use the follow feature
| for my work even tho i dont run a popular oss project.
|
| well idk what "github influencer" even means but fwiw i am
| not "people living just from github". ive never taken a
| dime of github sponsor money. as far as github is concerned
| i just put my stuff up for free and the github stars
| program gets me an early look into new features so i can
| give them feedback. (eg i helped with Hey GitHub before the
| big launch at GH Universe).
|
| obviously i'll happily ambassador github to anyone who will
| listen but who isnt already on github here
| pictur wrote:
| There are very few people who work like this and are non-
| toxic.
| azu wrote:
| https://press.stripe.com/working-in-public
|
| The book presents similar stories.
| bombolo wrote:
| I guess the purpose is to find a job as evangelist and similar.
| reidjs wrote:
| I have heard of people getting interviews from their GitHub
| profile.
| justinclift wrote:
| Yeah. Several years ago extremely clueless recruiters used to
| email people heaps. Lots of people were complaining about
| getting tonnes of spam from them. :(
|
| Had to change my Location (or some similar obvious field) in
| my GitHub profile to "Recruiters FUCK OFF" before they took
| the hint. ;)
|
| Thankfully, GitHub introduced some other way to signal if you
| are/aren't interested in getting a job (toggle switch?) not
| long after, which seemed to work.
| ccouzens wrote:
| I got my current job through GitHub.
|
| At least that's how the 3rd party recruiter told me he found
| me. It's possible he was lying and thought it would impress
| me (it did).
|
| My profile is more active than most, but very far from
| rockstar.
| PragmaticPulp wrote:
| I've seen a number of resumes where people convey the
| popularity of their personal projects by number of stars or
| number of downloads.
| version_five wrote:
| I think it would be tough (a good thing) because how often do
| people go to someone's root github page, even if they have a
| good repo? Not to say it never happens, but github is really
| about the repo, not the person (again a good thing) so it would
| be harder for an individual to become "influential". Hopefully
| nobody gets any ideas.
| deefour wrote:
| There are plenty of people making a living from donations to
| their open source contributions.
|
| It seems odd to title them influencers based on that.
| ziml77 wrote:
| I'm surprised that Github stars are valuable enough to buy.
| Personally I never look at the star count because even if they
| were legit, they don't really tell me anything more useful than I
| get from looking at other things in the repo.
|
| I tend to check the age difference between the earliest and
| latest commits because that lets me be sure it's not a project
| that someone spent a couple weeks coding up, dropped on github,
| and then forgot about. I'll also check the issues on there. I'm
| looking for more closed issues than open ones, but I'll also
| quickly scan over them to get a rough idea of how many are truly
| meaningful issues. I also get signals from the readme and docs.
| It's not a hard pass if there's issues with those, but it's
| certainly helpful to my opinion if they exist and are both clear
| and detailed.
| renewiltord wrote:
| Displaying stars to represent traction in open source was a
| pitch deck phenomenon that was highly effective fitting the
| ZIRP.
| cdiamand wrote:
| I find stars helpful when I'm evaluating several different
| repos to choose a particular tool for a job.
|
| If one of the repos has many more stars, I weigh that strongly
| when choosing. Freshness of commits is definitely important,
| but for me the fact that many other people starred the repo
| shows that there are eyeballs and activity.
| [deleted]
| imadj wrote:
| Closed issues dont mean anything though... a lot of maintainers
| bulk close hundered of issues as "nofix", "no activity after 3
| months", and so on. Just sweeping them under the rug. And many
| of them pride themselves with the 0 opened issues like it mean
| something. Any software in the world can have 0 issues if they
| played this game.
|
| So unless you are really well versed in the project and spent
| some time following it, stars actually might be a better
| indicator of the project quality and reputation.
| bakugo wrote:
| > a lot of maintainers bulk close hundered of issues as
| "nofix", "no activity after 3 months", and so on
|
| God, I hate this. Every time I have an issue with something,
| look it up on the issue tracker and find the exact issue I'm
| having autoclosed as "stale" by a fucking bot because the
| author didn't reply "this is still an issue" once every 24
| hours, it instantly makes my blood boil and I avoid using the
| software in question as much as possible in the future.
| Nothing screams "I care more about github numbers than my
| users or the quality of my software" more than this.
| version_five wrote:
| I'll admit I've used them. In particular, I've used
| paperswithcode to find implementations of ML models. There are
| often a number of implementations of the same model, and the
| quality is highly variable. I've used stars (which
| paperswithcode displays) as a pre-screen. Spoiler alert, the
| highest started implementations are not always the best. But it
| still helps to triage, as a proxy for how well used it is
| Takennickname wrote:
| You are likely not important enough to scam. The first people I
| can imagine this being shown to are VCs in pitch decks who are
| only going to see this on a powerpoint and not actually on
| github. Very unlikely the VC will check github itself to verify
| the number, and if they do, even less likely they'll verify
| that the stars are real.
|
| You're the kind that checks everything. Even if you had
| something valuable, a scammer wouldn't waste their time with
| you then there are easier fish to bait.
| varunjain99 wrote:
| Metrics based on issues / commit activity are certainly higher
| fidelity.
|
| As you indicate though, they require more effort to adjudicate.
| Are issues from core team members? Are commits meaningful? Is
| community activity meaningful? I wish GitHub would give allow
| us to parse things like this more easily.
|
| My use of star count is generally a binary indicator. 1k+ is
| probably a legit project and below is probably still early.
| Beyond that, it's probably too noisy.
| A4ET8a8uTh0 wrote:
| Interesting, I just use them to keep track of interesting
| projects ( edit: not the number of starts as a proxy; stars is
| basically my bookmark ). People treat them as internet points?
| ChancyChance wrote:
| > dropped on github, and then forgot about.
|
| I really wish GitHub would have some sort of flag for "stale"
| projects. I use your methods too (issues, dates, etc.), and I'm
| usually disappointed when search results bring up ghost
| projects. However, in a few instances, I found a project that
| was similar to an issue I was working on that went one step
| beyond where I was, and even though it was a ghost project, it
| helped. But in general, these projects don't help. I'm also
| disappointed that I'm thinking, "Hmmm, maybe LLMs can help..."
| dylan604 wrote:
| Why is stale a bad thing? It could be something that was
| created to serve a purpose, developed to the point that it
| was feature complete for that purpose, and now requires no
| more development yet continues to do its purpose without
| modifications.
|
| It's almost like you are thinking of it as an expiration date
| and the software has spoiled.
| javajosh wrote:
| Stale _is_ bad. Asymptotically approaching stale is great.
| Cthulhu_ wrote:
| But in that case it should have a note saying it's finished
| or in maintenance mode (e.g.
| https://github.com/sirupsen/logrus); include references to
| replacements, offer paid support if you really need it or
| still use it, keep an eye on issues, and update
| dependencies.
|
| Else, ask for a new maintainer. While code can be
| considered done (especially if no new features are added),
| it should never go unmaintained. If it's actually used a
| lot of course.
| nine_k wrote:
| "Stale" and "done" are different states. Stale is when bugs
| are known but not fixed, dependencies old and unsupported,
| build instructions do not work any more on modern versions
| of OSes and other environments.
| dylan604 wrote:
| i think you're leaving out the state of "good enough"
| datadeft wrote:
| Because many languages have breaking changes in the
| interpreter. For example it is almost impossible to review
| old Python projects you have to change so much, it is
| easier to rewrite in many cases.
|
| Rust and other compiled languages that have backward and
| forward compatibility in mind do much better.
| dasil003 wrote:
| All software is subject to shifting environments over time
| that will eventually render it obsolete. How fast this
| happens really depends on the ecosystem--it's a function of
| the abstraction level and context in which it runs. C or Go
| code that compiles to a standalone binary will be less
| susceptible to this, higher level Ruby or Node code that
| depends on a lot of peer libraries moving in lockstep will
| be more susceptible. Newer languages that have some notion
| of backwards compatibility baked into their charter like
| Elixir or Rust are somewhere in between.
| dylan604 wrote:
| well, the original dev did release the code as open
| source. you are free to take their lead and continue on
| with modifications in your own source or even as a fork
| if you feel so strongly about it needing to be maintained
| to that level.
| UncleEntity wrote:
| I have one project on GitHub that I use all the time as part
| of a script and only push changes when the python API breaks
| it. It is essentially "finished" and usually just needs a
| quick compile against the new python version whenever I
| upgrade the distro. I haven't even had to touch for at least
| as long as GitHub required ssh keys so by all accounts this
| would be an abandoned project.
|
| Now that I think about it -- it is a python wrapper around a
| boost library and neither of those have made backwards
| incompatible changes in a long time which is quite
| suspicious.
| j1elo wrote:
| Boost libs circa Ubuntu (14 or 16.04) had JSON parser that
| allowed comments, while the newer Boost in Ubuntu 20.04
| (and I think already in 18.04) had "updated" it and then it
| didn't allow comments any more.
|
| Just a small anecdote of Boost changing behavior that broke
| some of my stuff.
| loeg wrote:
| I mean based on the number of repos they identified buying
| stars and prices advertised, the revenue just doesn't make
| sense. The sellers have made like, hundreds of dollars at most.
| How much effort have they invested for this meager return?
| badrabbit wrote:
| I didn't knoe people used stars to make decisions. For me it is
| more like HN karma points. I use their issue history/pr history
| to get an idea of how good or bad a project is
| groffee wrote:
| [dead]
| lessname wrote:
| How did you find out the name of the company behind GitHub24
| though? If I go to their website I do cannot see it, I even
| cannot find anything if I search the company name.
| gerogerke wrote:
| I was also surprised when I saw it. A GbR is a German
| "Gesellschaft burgerlichen Rechts" which does not need to be
| formally incorporated and offers no limited liability. The name
| needs to include the names of all partners, so we can deduce it
| is being run by two persons. I am quite surprised they do this
| without liability protection. Upon googling, I found only a
| playlist on YouTube which has this name and contains one
| explainer video about signing up a company with German tax
| authorities. If they are indeed based in Germany, they're
| required to have an Impressum / imprint on their home-page,
| without it, they risk being fined.
| cyberia23424 wrote:
| Perhaps they got it via payment info
| tpoacher wrote:
| I have moved all my repositories to sourcehut. They are generally
| mirrored by a github repository consisting of a single README
| file explaining the new location for the project, and my reasons
| for the migration.
|
| However, given sourcehut eschews the use such "social metrics"
| (which at some level I agree with the principle behind it, on the
| other hand I do appreciate the value of being able to give
| visibility to good projects) I usually mention in my README that
| "If you like the project and wish to promote it, feel free to
| star this github page".
|
| I'm sure github probably wouldn't like this use-case, but the
| stars would certainly be genuine, even if possibly quite dodgy-
| looking.
| pbronez wrote:
| I'm conflicted about this. Sourcehut, Codeberg, etc are great.
| But having everything I'm looking for on GitHub is extremely
| convenient. I use the "Add to List" function extensively for
| bookmarking.
| tpoacher wrote:
| Yes, this is why I didn't want to migrate without leaving a
| trace on github. The redirecting README on github is a good
| compromise, I think.
|
| Having said that, it may be worth thinking what is the price
| we may be paying as a community for this convenience, btw. MS
| Github is clearly already past the "embrace" phase, and well
| into the "extend" phase.
| wakeupcall wrote:
| I have moved repositories off github, replaced the README with
| a warning and the new location and archived the project.
|
| It's still getting starred...
| leeoniya wrote:
| > It's still getting starred...
|
| clearly you did too good a job on the README
| dylan604 wrote:
| i wonder how many PRs this README receives to fix typos
| Der_Einzige wrote:
| I wrote a tiny tool which calculates the "brightness" score of a
| github repo based on calculating the total star count of the
| people who starred your repo. It will automatically detect these
| kinds of scams (assuming that it's mostly low star bots giving
| the stars).
|
| https://github.com/Hellisotherpeople/Bright
|
| Edit: I love clustering, I really do, but I think that techniques
| like the one I am using are far superior to unsupervised learning
| for trying to detect fake accounts in this context.
| JaDogg wrote:
| Just use Show HN & Reddit.
| Ralfp wrote:
| Those never worked for me.
|
| Show HN: there are maybe dozens of those posted everyday but
| they rarely hit main page.
|
| Reddit ad is great to kick off the star growth, but unless you
| have something interesting to many people, don't expect more
| than 50 stars on first day and plateau to a star every few
| days.
|
| Most GH stars I've got was from somebody mentioning my project
| in comment in some heated discussion on HN. So I guess drama
| sells?
| thewizl wrote:
| As a note, GitHub stars are often used in pitch decks for OSS
| startups. VCs seem to care about that, judging from what I've
| seen around.
| bdcravens wrote:
| Sounds like they take it more serious than Google does likes on
| Youtube. A competitor had a video that rapidly had over 100k
| likes - but if you looked at the total time played, each view
| averaged to just a couple of seconds on a video over 10 minutes.
| Reported it, but nothing came of it. (No, not something we
| regularly do. I think it may be the only video I've ever
| reported; just want a fair playing field)
| oefrha wrote:
| > if you looked at the total time played, each view averaged to
| just a couple of seconds on a video over 10 minutes.
|
| That makes no sense to me. Speaking as someone who has been
| using YouTube Data API v3 and YouTube Analytics API v2 for many
| years, estimated minutes watched of a video shouldn't be public
| info. So how can you "look at the total time played" on a
| competitor's video?
| bdcravens wrote:
| Been a few years; I don't recall the how. Maybe I'm thinking
| of a different platform?
| dylan604 wrote:
| youtube competitor. that's just funny to me. kind of even comes
| across as petty. you took however much time to investigate
| average viewed time of a competitor and then cried to daddy
| about the perceived slight in "advantage" instead of taking
| that time to improve your competing product to make it better.
| UncleEntity wrote:
| Umm...
|
| I think you have it backwards, the other video was using fake
| likes to avoid having to improve their quality to get an
| equal number of eyeballs.
| bdcravens wrote:
| Truth rises to the occasion. All these years later, they're
| sitting at 2.3 stars on Google, even though they charge
| less, and we are sitting on 5 :-)
| bdcravens wrote:
| No, we had someone show up out the blue, with no established
| presence in the space, with a video with hundreds of
| thousands of views. Was curious how they were so viral so
| fast.
|
| Overall, it's bad for everyone if someone can create
| fraudulent views: us, other companies, and most importantly,
| consumers.
|
| > taking that time to improve your competing product to make
| it better.
|
| Took less than 3 minutes to do the math and send the report.
| I'm a fast developer, but I can't improve our product that
| fast :-)
| Kalanos wrote:
| do streamlit
| newmac wrote:
| It is worth noting that it is trivial to buy fake stars for a
| project you are not affiliated with. The reason someone might do
| this would be to "test" the purchasing of fake stars without
| risking contaminating their own project.
| nvr219 wrote:
| I once bought a friend of mine a thousand Twitter followers as
| a prank. He wasn't happy.
| moneywoes wrote:
| Where did you purchase that?
| nvr219 wrote:
| I wanna say I got it through Fiverr? This was like 8-10
| years ago. I don't remember exactly.
| i_am_toaster wrote:
| As he should be, that wasn't a well thought through prank.
| dr_petes wrote:
| Am I missing something, that seems like a decent prank.
| It's harmless.
| xwdv wrote:
| No, his friends account is flagged as a spammer now and
| gets less visibility.
| Xeoncross wrote:
| Rabbit trail: I accidentally right-clicked on their home icon and
| it brought up their branding page with license agreements for
| their IP. Really neat idea.
| toastal wrote:
| Maybe our code forges don't need to be social media platforms.
| These 'stars' have pretty dubious value and rarely correlate with
| code quality or importance (core libraries generally have less
| attention than apps or tools). There's also a heavy language skew
| where JavaScript and Python libraries & programs get way more
| thumbs-ups even when they're technically not any better than
| alternatives.
| coolsank wrote:
| Is it just me or the fact that Dagster has one of their
| competitors Mage.ai listed here as a repo with around 15% of fake
| stars seems like an odd coincidence?
| janalsncm wrote:
| If you're going to accuse a competitor of fraud, writing a blog
| post showing your work seems like the most safe way to do it.
| People lie with statistics all the time of course.
| TheDong wrote:
| I mean, they explain it at the top:
|
| > we track our own GitHub star count along with that of other
| projects. So when we spotted some new open-source projects
| suddenly racking up hundreds of stars a week, we were
| impressed. In some cases, it looked a bit too good to be true,
| and the patterns seemed off
|
| If their competitor has fake-looking star counts, I'd expect
| them to be the ones best equipped and most likely to suspect
| it.
| bart_spoon wrote:
| It's possible that was the impetus of the blog post. Maybe they
| suspect Mage.ai of astroturfing GitHub stars and investigate it
| as above. They then publish a blog post that:
|
| 1. Indicates the astroturfing without actually specifically
| calling them out 2. Does so in a way where others can verify
| their work and use it on other repos 3. Uses their product to
| do so
|
| Seems pretty brilliant to me.
| speedgoose wrote:
| They don't mention what I think is their biggest competitor:
| Prefect.
| frasermarlow wrote:
| [Blogpost author here] We ran the numbers for Prefect and
| several other repos in our space and they came out clean. As
| we note in the article, while some repos game the system,
| from what we can tell the number of abusers is actually
| fairly small.
| julienfr112 wrote:
| or they used a more sophisticated star provider ?
| say_it_as_it_is wrote:
| It shouldn't be a surprise. Why are you surprised? Do you often
| pursue random activities irrelevant to your life for dozens of
| hours?
| coolsank wrote:
| Yes I do.
| zeroonetwothree wrote:
| Pretty standard for anyone ND
| erlend_sh wrote:
| Great post, though I was low-key hoping for a top 10 or maybe top
| 100 ranking of most starred juiced-up repos.
| thih9 wrote:
| > In spam detection, we often use heuristics in conjunction with
| machine learning to identify spammers.
|
| Heuristics can only be used to identify suspected spammers. Not
| everyone who behaves like a spammer is a spammer, it could be
| e.g. a random user with privacy settings on, or someone who
| didn't update their bio in a while and it got affected by link
| rot, etc.
|
| Even if a group of low activity accounts stars the same projects,
| it could be that the account owners just discuss these projects
| elsewhere.
| GlumWoodpecker wrote:
| The article notes this, and like any spam detection method, it
| has a degree of false positives, but it seems very low (less
| than a percent according to the article). I'm sure an official
| implementation of this could take more internal, non-public
| factors into account, like IP addresses and clustering of
| account creation times, to make it even more accurate and
| drastically reduce the amount of spam users.
| andreareina wrote:
| The claim I saw in the article is 98% precision. Which
| doesn't actually tell us the predictive value without the
| base rate which seems to be all over the place.
| [deleted]
| sgammon wrote:
| this shouldn't be posted with links to the actual places to buy
| stars.... that seems like a bad idea?
| lessname wrote:
| Why? You can find these websites anyway if you search for terms
| like "buy github stars"
| dnchdnd wrote:
| only vaguely related - but I've been recently trying out dagster
| and I'm pretty impressed so far. I've run large scale data-
| processing from Hadoop onwards and was expecting the usual
| crumminess whenever you hit and edge case.
|
| Instead I found a system that seems to be thoughtfully designed
| and, crucially, easy to debug.
| PragmaticPulp wrote:
| > And if you enjoy this article, head on over to the Dagster repo
| and give us a real GitHub star!
|
| Kind of ironic that they're using blog articles and social media
| to pander for more stars on their GitHub project.
| pythonguython wrote:
| I wouldn't describe that as ironic.
| debarshri wrote:
| While evaluting OSS project, key indicator is community activity.
| Github stars is a weak community activity indicator. Firstly, as
| shown in the article it can be gamed. Also, Stars is very low
| threshold action so does not indicate whether the person who
| starred the project will actually use it.
|
| I think 2 great community activity indicators are - Github issues
| and of slack/discord/discourse comments. One key thing with
| github issues in my opinions is that, If the github issues are
| mostly by the core team, it is not a great sign. You want a large
| mix of issues from customers or users and not from the team. This
| is a good indicator if the project is solving real problem or
| not. Stars is very low threshold action. Same goes with the slack
| comments, it should have both volume and freshness.
| eternalban wrote:
| Pretty sure those who game their repo are motivated by
| investment into associated startup. I think you are right that
| community activity is a high fiedlity indicator and a smart
| investor in OSS startups should definitely not only lurk in the
| community but if possible actually have resources to kick the
| project tires as well.
|
| In a very strange way (but reflective of the economic regime) a
| startup that fakes stars vs a straight-arrow startup that
| doesn't is demonstrating a key element for success in business,
| which seems to require a significant element of bullshiting,
| and outright deceiving. The mantra has been that "grow grow
| grow" is the only guideline for success. Inflating your stars
| is just rookie hour practice for bigger better growth b.s. down
| the line.
| jmclnx wrote:
| I think checking if people donates to a project is a better
| indicator to the value of the project than the stars. I never
| paid attention to stars.
| boxed wrote:
| Donating to yourself would be pretty cheap...
| jmclnx wrote:
| Maybe not as cheap as you may think. I think github takes a
| small cut plus you may need to declare the donation as
| Income on your taxes.
|
| Also if you get "smart" and donate on multiple cards, I
| would think it is a trivial task for github to determine is
| is a scam. The CC address would match you Address for the
| funds your receive.
|
| Probably way too much work for this :)
| Kelamir wrote:
| They don't take a fee from what I read about it.
|
| > https://docs.github.com/en/sponsors/sponsoring-open-
| source-c...
|
| > GitHub Sponsors does not charge any fees for
| sponsorships from personal accounts, so 100% of these
| sponsorships go to the sponsored developer or
| organization. The 10% fee for sponsorships from
| organizations is waived during the beta. For more
| information, see "About billing for GitHub Sponsors."
| doodlesdev wrote:
| GitHub sponsors has been out of beta for a long time,
| they take 10% of the donations if the code is under an
| organization which is very common for OSS projects. Of
| course one of the ways to get around it is to sponsor the
| lead developer, which is sometimes available as an
| option. Or just sponsor the developer some other way
| which doesn't go through Microsoft such as Liberapay or
| Opencollective.
| debarshri wrote:
| I don't you can externally measure how much money is being
| donated for an OSS project can you?
| debarshri wrote:
| But there are OSS projects that are VC backed. They don't
| take donations.
| asmor wrote:
| That's already a very different deal then, no need to gauge
| repository health, you know there's a good chance of work
| suddenly ceasing.
| debarshri wrote:
| You have a point. I have often seen OSS projects being
| funded on the basis of github stars with no revenue
| whereas all the parameters show that the project health
| is not that great.
| saurik wrote:
| > Yet [GitHub stars] influence serious, high stakes decisions,
| including which projects get used by enterprises, which startups
| get funded, and which companies talented professionals join.
|
| Really? I honestly just don't believe this... if I _were_ to
| believe this, I think I 'd have to conclude the world is just too
| broken to bother rescuing.
| derivagral wrote:
| Activity on other sites related to finance/coding is similar
| (seekingalpha likes, for example) and I've gotten organic
| inbound requests for work periodically scraping such info
| into... Excel.
| rossmohax wrote:
| More than once I've seen when number of stars was an argument
| to decide whether to pull dependency or write our own.
| ZephyrBlu wrote:
| People use flawed but easily consumable metrics to make almost
| all decisions.
|
| It takes a lot more effort to collect multiple metrics along
| different axes, understand the skew/bias of them and make an
| informed decision.
|
| Visibility and ease of consumption are the most important
| aspects of a metric if you want people to use it.
| saurik wrote:
| The list in the article, though, was carefully selected to
| presume competent people doing the decision-making. I totally
| believe many people use that star count for something... but
| an "enterprise"? someone investing non-trivial amount should
| of money? a specifically-"talented professional"? I just find
| that really difficult to believe. I've sold software to
| enterprise, I've worked with a number of venture capital
| funds, and I know a ton of actually-talented professionals...
| I dare say most of them consider GitHub's social features to
| be a joke.
|
| The enterprises I deal with cared almost exclusively about
| stuff like license choices, support contract options, and
| "invoice billing" ;P. The vetting process I've dealt with at
| VCs was intense, having worked both sides of that situation;
| and I know multiple people who have worked data science jobs
| at such firms to try to better select investments. As for a
| "talented professional", I can pretty much guarantee they are
| going to look at your codebase, not the number of stars it
| has, while they evaluate any number of more reasonable things
| to judge an opportunity on (commute, pay, management style,
| etc.). A key property of competent deciders is that they
| aren't using trivial metrics.
| philbo wrote:
| One of my stock interview questions asks people how they
| evaluate 3rd-party dependencies for use in a production
| environment. _So many_ interviewees respond with GitHub stars
| as their main or only criterion. It depresses me every time.
| tasuki wrote:
| It depresses me too, but what else can you do? I check what
| the docs look like, but if I'm to depend on a thing I'd
| rather choose something popular than unpopular. GitHub stars,
| Hackage downloads, StackShare... what else can one check?
| throw_away1525 wrote:
| That's a very interesting question. There are so many things
| you can look at. How is the documentation? Who are the
| primary maintainers? How are they funded? What are their
| motivations? Are the primary maintainers active on Stack
| Overflow, Reddit, Discord, etc...? How many contributors are
| there? How does their Github issues page look? What about the
| Github discussion page? How many maintainers are there total?
| How many downloads per week on NPM (for JS libraries)? From
| all of these things - how long do you expect this library to
| be maintained? And that's just the initial qualification
| research, nothing about how it will impact the actual code-
| base.
|
| What did I miss? What's the best answer you've ever heard?
| How do _you_ evaluate 3rd party dependencies?
| majkinetor wrote:
| Insights -> contributors, and number of active maintainers
| based on entire commit history of the project and frequency
| of commits. Also, network page which shows number of active
| forks. Also, PRs, and how are they handled.
|
| Contributors is the most informative page for me. So many
| projects are 1 man show basically all the time. I don't
| mind that, it means passion, but it also mean it can
| dissaper any moment depending on circumstances.
|
| I also look into issue details to see how maintainers
| communicate with community members that do due dilligence
| before aksing for help.
| jart wrote:
| You missed: look at the actual code.
|
| Stars only mean something because of the people who do.
| They're the ones leading the herd. If you're just going off
| the social signals, then you're just monitoring where the
| herd is going.
| philbo wrote:
| Yep, this one is the headline item for me. Look at the
| code and, if it has further dependencies of its own, look
| at the code for those too.
|
| The main question I'm asking myself while looking at the
| code is: if I had to fork this thing and maintain it
| myself, how would I feel about it? Because sometimes that
| happens.
| GartzenDeHaes wrote:
| I'd add support to that list. When it breaks, can I cut a
| contract and get an expert available to diagnose the
| problem within a few hours. Production outages are not the
| time for self help and digging around in other peoples code
| bases.
| Etheryte wrote:
| You overlooked what I consider to be the first thing you
| should check -- when was the repository last committed to.
| There are countless projects that rank high on every other
| metric, but are essentially abandonware.
| BeFlatXIII wrote:
| However, some language ecosystems are more OK with
| "finished" software than others. It hasn't had a commit
| in 4 years because none were necessary. Needing constant
| updates is a sign the local ecosystem is driven by churn
| over quality.
| Etheryte wrote:
| I don't really think this generalization holds. TeX is
| one of the very few widely used pieces of software that's
| considered complete, more or less everything else is
| either getting updated or superseded by other things.
| mattgreenrocks wrote:
| A NFA library, for example, probably doesn't need to be
| constantly updated.
|
| If you avoid building on something that's constantly
| shifting (the web) then the need to update goes down
| significantly.
| throw_away1525 wrote:
| Yeah good point... definitely something I would have
| checked, forgot to put it in the list. I'm baffled people
| have trouble coming up with more than "number of stars"
| for this.
|
| Of course there can be libraries that are more or less
| "finished", so the last commit/frequency of commits isn't
| on its own a deciding factor, but in proper
| context/holistically it is definitely an important
| metric!
| saurik wrote:
| FWIW, I am not baffled by that, as the vast majority of
| programmers are not "talented professionals" (which is
| the specific category of potential employee I was balling
| at, along with enterprises and venture capital firms). So
| like, you ask your question, they say "star count", and
| you don't have to really continue the interview.
|
| (When I was in high school, I used to work for a pre-
| Internet company that helped people pre-filter interview
| candidates for ads posted in classified sections of
| newspapers and what they did was have questions like this
| that could be asked by people well before they reached
| your calendar for an interview.)
| philbo wrote:
| > How do _you_ evaluate 3rd party dependencies?
|
| I actually blogged my answer to that exact question
| recently (shameless plug):
|
| https://philbooth.me/blog/how-to-evaluate-dependencies
| kaeruct wrote:
| What kind of answer would make you happy?
|
| I prefer to look at the recent commits, or any recent
| activity on the repo's issues, but I would like to know what
| else can be used as an indicator.
| saurik wrote:
| So, ask yourself for a moment: what is it you are actually
| caring about?
|
| I'd like the project to not introduce security
| vulnerabilities or bugs into my code. I thereby care what
| language it was written in, what libraries _they_ use, what
| their testing and QA /CI process is, and whether it is
| being used by any "critical" projects (like, if that
| library is embedded in Chrome, you have to bet there are
| tons of people like me every day trying to hack it).
|
| As part of that, I care about if the project takes a
| cavalier attitude towards contributions: if I see a number
| of pull requests from random "contributors" being casually
| accepted, that is going to be a major major red flag; if
| possible, I want to see a core team doing most of the
| development and integration (and not merely most of the
| "review", add I see in some projects where the people in
| charge feel above doing work).
|
| I definitely care that the project is being maintained and
| that there are people paying attention to issues, and it
| needs to have a culture of taking bug reports seriously...
| nothing is more dangerous than a project that tries to
| pretend they are responsive using bots to "automatically
| close" issues: I'd rather see bugs open for years than
| worry a critical issue was reported and subsequently lost.
|
| I am certainly curious how work on the project is funded
| and whether I can trust that its license is going to hold
| constant over time: I don't want to end up relying on a
| dependency that is really the pet project of a small
| startup that is either going to disappear next year or will
| decide to redirect development to a closed-source fork. I'd
| thereby also prefer the project be run by a core committee
| of participants from multiple companies.
|
| I honestly can't imagine caring two shits about how many
| stars a project had on GitHub... hell: what if the project
| isn't even on GitHub? What then? Do you just give up and
| decide it sucks? A world where everyone feels any incentive
| at all to put their code on a centralized platform is one
| where we have all failed as stewards of the future of
| software :(.
| optimalsolver wrote:
| What's the street value?
| robin_reala wrote:
| It's in the article.
| woodruffw wrote:
| Things like this are part of why I cringe when I see supply chain
| analysis/security companies include "popularity" in their
| criticality metrics: the relationship between public popularity
| signals (like GitHub stars) and criticality is weak, at best.
| andrewmcwatters wrote:
| In my experience, it's actually a great signal. That's why so
| many people rely on it. The distribution of GitHub stars is an
| extreme power law.[1] Stargazer thresholds are used by
| maintainers to make decisions on including projects for
| different purposes from dependency management to package
| manager maintainers deciding to list software by name.[2]
|
| [1]: https://github.com/andrewmcwattersandco/github-statistics
|
| [2]:
| https://github.com/Homebrew/brew/blob/master/docs/Acceptable...
| woodruffw wrote:
| Selection suitability and criticality are different metrics.
| The former is what Homebrew uses, as a way to lessen
| maintainer load and prevent inclusion in Homebrew becoming
| its own quality signal. The latter is what I've seen supply
| chain companies provide: an implication that a project is
| somehow critical or essential to the overall ecosystem
| because it has so-and-so many stars.
|
| That first use is not unreasonable, in my opinion. The second
| one is questionable, at best.
| franciscop wrote:
| I wrote on this topic a while ago; experimenting I found out you
| can basically change the repos names and keep the stars; this
| wouldn't work if you use the repo as issue tracker or PR tracker,
| since the history would all be broken, but if it's pretty much
| just the code it's easy to swap the star count between two repos:
|
| https://francisco.io/blog/transferring-github-stars/
| newmac wrote:
| I think the most interesting thing would be to run this test
| against the list of Launch HNs, sorted by votes, grouped by
| class.
| perihelions wrote:
| Goodhart's law: if you rely on a social signal to tell you what's
| good, you'll break that signal.
|
| Very soon, the domain of bullshit will extend to actual text.
| We'll be able to buy HN comments by the thousand -- expertly
| wordsmithed, lucid AI comments -- and you can get them to say
| "this GitHub repo is the best", or "this startup is the real
| deal". Won't that be fun?
| precompute wrote:
| Now is the time to cultivate friendships and to make networks
| that persist online, and are verified via irl meetups /
| contacts. People who pull that off now will be in much, much
| better shape in the future. GPT's output is apparent to a
| discernible eye right now, but according to the power law, it
| won't take much "novel" input to train upon to make that
| discernment useless. Then, the only internet community that
| could be dependably reliable would be your group of irl
| verified people.
| password4321 wrote:
| I would phrase it more as we're pretty much out of time to
| have initiated online-only relationships.
| precompute wrote:
| Agreed. It's very difficult now to build communities that
| have lasting impact, because everyone's saturated with info
| as-is. Contributions to niche communities now rely on a
| societal "outsider" status, which means there's basically a
| couple of people that contribute heavily and very few
| onlookers. Everything else is either gamified or comes from
| video games / gambling.
|
| On the bright side, it's THE time to cultivate close
| friendships and to seek like-minded people. The entire
| phenomenon of popular attention hugging a community to
| death does not exist any longer. You can now have OG
| members persisting with notions for a long time and
| building a shared mythos with a small group of friends,
| because information is now more accessible than ever.
|
| Obviously, most people aren't part of these communities.
| The people that are "drifting" alone are given to wasting
| their time on charismatic attention-seekers that talk a big
| game (twitch/e-celebs) but deliver nothing of value. So
| there's also room in the market for charismatic folk with
| some technical expertise to rally people to their cause,
| but only very briefly. This is because the number of people
| half-committing and then jumping ship is likely the highest
| it's ever been. Also, platforms have now resorted to paying
| people to stay on their platform (youtube / tiktok /
| sponsorships / twitch boosting streamers / etc.) to combat
| occasional ennui, ironically exacerbating the issue.
| moneywoes wrote:
| Best methods for that? Local meetups?
| precompute wrote:
| Most tight, close-knit groups originate from shared mythos.
| These can be family, proximity, "same school year", "same
| college", "friend of best friend", etc. Online, you can
| find people that are interested in some niche topic (or
| elaboration of some popular topic to an absurd degree) and
| engage with them. Small newsletters are also a good way to
| get people talking. What most people don't do is return
| attention, aka reciprocate positively. This could also mean
| you'd have you write about unrelated things or maybe try to
| build a "business relationship" that would then progress if
| you invest some time and hope for the best.
|
| It's a really bad time to try and get the attention of
| someone more famous / notable than you, though. Sure, you
| can go on $platform and talk to them, but it's really not
| the same when they have a gorillion other messages. Same
| goes for people in large communities that are a "guy"
| there, known for something. Extremely high-return
| investments but you're likely going to fail.
|
| Some people try to start youtube channels / info streams
| and then entice people to join their forum / server. While
| this does seem to work, it only brings in quality people
| AFTER the community is fully formed and rigorous laws are
| in place. The initial stragglers are usually the recently
| excommunicated looking to try their hand at the same shit
| somewhere else.
|
| If you really put some effort into a topic and blog about
| it, you're likely to get some high-quality responses even
| if you only pose a question to someone that's partly
| interested. I've found this to be a really great way to
| separate the folks that are actually interested from those
| that aren't. You'll usually get people around your own
| level this way and IME this is the best approach.
|
| It takes a lot of effort to make people clock in regularly
| to your online circle, and it's better to establish digital
| / irl face-to-face contact after a good interaction. It
| builds trust and because we're wired to judge people from
| their facial reactions rather than text, it also sobers
| conversation / tempers over potentially divisive topics.
| Works well with cerebral / "deep" people. Doesn't work with
| people that only come online to blow steam / enact a
| persona, so it's a good filter.
|
| TL;DR: Touch grass (digitally), make friends (digitally)
| Nowado wrote:
| You can do it already. It's a normal order for a copywriter,
| nobody will bat an eye when you post an offer. It costs
| cents/dollars per 1000 words instead of fraction of a cent, but
| that's not exactly outside of reach of a funded startup.
| vehemenz wrote:
| Maybe more appropriately, Campbell's law:
|
| "The more any quantitative social indicator is used for social
| decision-making, the more subject it will be to corruption
| pressures and the more apt it will be to distort and corrupt
| the social processes it is intended to monitor."
| einpoklum wrote:
| Your comment is the best. It's the real deal!
| [deleted]
| ryan69howard wrote:
| This comment summarizes it best. We need more discussion like
| this!
| vidarh wrote:
| We'll be back to the 1990's "software agents" craze take two:
| Needing AI driven agents that seek out and index and evaluate
| content on our behalf, and seek to negotiate with each other
| for recommendations with currency being trust based on how
| "your" agent evaluated prior results.
|
| I'm hoping to put an AI between me and my e-mail inbox this
| weekend (I had ChatGPT write most of the code; it's not much);
| not fully automated, but evaluating and summarising and
| categorising. I might extend that to e.g. give me an
| "algorithm" for my Mastodon timeline (despite all of the people
| insisting on reverse chronological, I'm at a few hundred people
| I follow and already can't keep up), and a number of other
| sites I visit. For most of these things latency does not
| matter, so e.g. putting them through llama.cpp rather than
| something faster is fine, and precision isn't critical (I won't
| trust it to automatically reply or automatically reject
| anything, but prioritisation an categorisation where missteps
| won't have any critical impact.
| charlieyu1 wrote:
| I hope it breaks the current system of requiring references in
| job search as well
| paulcole wrote:
| This system is already essentially broken. Either you worked
| at a large business that only gives out dates of employment
| and job title by policy or you are in complete control of who
| the hiring company talks to.
|
| The first time you don't get a job because of a reference you
| gave you learn a lesson. If it ever happens again, it's on
| you.
| asmor wrote:
| What's really an alternative. At least where I live, a
| multi-year gap in your CV is going to set off more red
| flags than an honest "It didn't work out between us".
| paulcole wrote:
| Don't give them your boss's name. Give them a coworker's
| name. Give them a friend's name and have them lie for
| you.
|
| If a company is proactively contacting people you don't
| give them contact information for, that's _not_ requiring
| references -- which is the process I (and the comment I
| replied to) was talking about. If a company knows where
| you've worked, they can contact them if they want.
| moneywoes wrote:
| What's the solution for the latter point you mentioned?
|
| If they proactively contact someone as part of their
| verification?
| paulcole wrote:
| Then you're fucked if they check and the reference is bad
| and they care. Either you take your chances, leave it as
| a gap in your resume, or you make something up.
|
| In the past, I've extended the time I was at either the
| company before/after and then leave the one in the middle
| off. Smaller gap is easier to explain and you just need a
| coworker at the one you stretched to cover for you - or
| have it be somebody who wasn't there during the time you
| added. You can also just say you did the "freelance"
| thing and then talk about whatever you want.
|
| I've also just been 100% honest and said, "I didn't like
| this job and left on bad terms. I'd rather you not
| contact them."
|
| Just have to read the situation and make your best guess
| as to what is going to get you the job.
| is_true wrote:
| I'm sure it's already happening in the "books" threads
| groestl wrote:
| Next keyword: market of lemons. If you can't rely on said
| signals anymore, you must treat every item the same
| (untrusted), which drives out the legitimate players from the
| market. We have a lot of lemon markets, we can probably infer
| from them what the social result will be..
| s9w wrote:
| [dead]
| Alex3917 wrote:
| > We'll be able to buy HN comments by the thousand -- expertly
| wordsmithed, lucid AI comments
|
| You're forgetting the millions of additional comments that will
| be written by humans to trick the AI into promoting their
| content.
|
| Even worse, currently if you ask Chat GPT to write you some
| code, it will make up an API endpoint that doesn't exist and
| then make up a URL that doesn't exist where you can register
| for an API key. People are already registering these domains,
| and parking fake sites on them to scam people. ChatGPT is
| creating a huge market for creating fake companies to match the
| fake information it's generating.
|
| The biggest risk may not be people using AI-generated comments
| to promote their own repos, but rather registering new repos to
| match the fake ones that the AI is already promoting.
| fantod wrote:
| > ChatGPT is creating a huge market for creating fake
| companies to match the fake information it's generating.
|
| Does ChatGPT consistently generate the same fake data though?
| bombcar wrote:
| There was one company that had to put up a "our API can't
| get location data from a phone number so stop asking, GPT
| lied" page.
| redeux wrote:
| I have noticed that ChatGPT will give me a consistent
| output when the input is identical, but I haven't done
| extensive research on this.
| notabee wrote:
| I'm constantly curious whether anyone working in the AI space
| is cognizant of the Tower of Babel myth.
|
| I don't think an arms race for convincing looking bullshit is
| going to turn out well for our species.
| permo-w wrote:
| I feel like you're overstating this as a long term issue.
| sure it's a problem now, but realistically how long before
| code hallucinations are patched out?
| warent wrote:
| Folks, doesn't it seem a little harsh to pile downvotes
| onto this comment? It's an interesting objection
| stimulating meaningful conversation for us all to learn
| from.
|
| If you disagree or have proof of the opposite, just say so
| and don't vote up. There's no reason to get so emotional we
| also try to hide it from the community by spamming it down
| into oblivion.
| permo-w wrote:
| to be fair, it's only one net downvote
| trippingrobot wrote:
| An aside: what do people mean when they say
| "hallucinations" generally? Is it something more refined
| than just "wrong"?
|
| As far as I can tell most people just use it as a shorthand
| for "wow that was weird" but there's no difference as far
| as the model is concerned?
| bombcar wrote:
| Wrong is saying 2+2 is five.
|
| Wrong is saying that the sun rises in the west.
|
| By hallucinating they're trying to imply that it didn't
| just get something wrong but instead dreamed up an
| alternate world where what you want existed, and then
| described that.
|
| Or another way to look at it, it gave an answer that
| looks right enough that you can't immediately tell it is
| wrong.
| mlhpdx wrote:
| Most people don't understand the technology and maths at
| play in these systems. That's normal, as is using
| familiar words that make that feel less awful. If you
| have a genuine interest in understanding how and why
| errant generated content emerges, it will take some
| study. There isn't (in my opinion) a quick helpful
| answer.
| aent wrote:
| Assuming those hallucinations are a thing to be patched out
| and not the core part of a system that works by essentially
| sampling a probability distribution for the most likely
| following word.
| ptato wrote:
| Nobody knows.
| permo-w wrote:
| undoubtedly not long
| lanternfish wrote:
| The black box nature of the model means this isn't
| something you can really 'patch out'. It's a byproduct of
| the way the system processes data - they'll get less
| frequent with targeted fine tuning and improved model
| power, but there's no easy solve.
| permo-w wrote:
| this is clearly untrue. it's an input, a black box, then
| an output. openai have 100% control over the output. they
| may not be able to directly control what comes out of the
| black box, but a) they can tune the model, and they
| undoubtedly will, and b) they can control what comes
| after the black box. they can--for example--simply block
| urls
| greesil wrote:
| How do you know we aren't already there?
| iLoveOncall wrote:
| > Very soon, the domain of bullshit will extend to actual text.
| We'll be able to buy HN comments by the thousand -- expertly
| wordsmithed, lucid AI comments -- and you can get them to say
| "this GitHub repo is the best", or "this startup is the real
| deal". Won't that be fun?
|
| Definitely already the case, you really think Rust and SQLite
| would get more than a couple of upvotes otherwise? :D
| wongarsu wrote:
| Then how do you explain the Go hype HN went through just
| before the current rust hype? Where "[ordinary tool] in Go"
| was the formula for upvotes.
|
| Then again, maybe Google had some mandatory HN time for their
| employees, that would be enough to explain that :D
| dorian-graph wrote:
| That's what Product Hunt has felt like for a long time--and
| LinkedIn too.
| soheil wrote:
| Stop making up laws. You'll do much more good dismantling
| existing ones. And non-social signals like # of commits, # of
| pull requests cannot be faked? We need signals among the noise.
|
| Sometimes signals are noise we just need to calibrate.
| rwallace wrote:
| This is the first time I've ever posted an XKCD link here, but
| I think the occasion calls for it.
|
| https://xkcd.com/810/
| klabb3 wrote:
| Content based auto moderation has been shitty since it's
| inception. I don't like that GPT will cause the biggest flood
| of shit mankind has ever seen, but I am happy that it will kill
| these flawed ideas about policing.
|
| The obvious problem is we don't have any great alternatives. We
| have captcha, and we can look at behavior and source data (IP),
| and of course everyone's favorite fingerprinting. To make
| matters worse: abuse, spam and fraud prevention lives in the
| same security-by-obscurity paradigm that cyber security lived
| in for decades before "we" collectively gave up on it, and
| decided that openness is better. People would laugh at you to
| suggest abuse tech should be open ("you'd just help the
| spammers").
|
| I tried to find whether academia has taken a stab at these
| problems but came up pretty much empty handed. Hopefully I'm
| just bad at searching. I truly don't get why people aren't
| looking at these issues seriously and systematically.
|
| In the medium term, I'm worried that we'll not address the
| systemic threats, and continue to throw ID checks, heuristics
| and ML at the wall, enjoying the short lived successes when
| some classifier works for a month before it's defeated. The
| reason this is concerning is that we will be neck deep in crap
| (think SEO blogspam and recipe sites but for everything) which
| will be disorienting for long enough to erode a lot of trust
| that we could really use right now.
| lifeisstillgood wrote:
| I am unclear why a reasonable digital ID (probably government
| ID card style) plus rate limits is not going to be effective.
|
| I can see lots of reaosns people might oppose the idea but I
| am not sure why it's not a widely discussed option?
|
| (asking honestly and openly - please don't shout!)
| rosebay wrote:
| [dead]
| creakingstairs wrote:
| Closest example I know of is Korean internet. It is almost
| nigh impossible to get an account in major websites without
| SSN and a phone number. Despite this, there are still
| countless bots and scammers that uses hacked or leaked
| personal data. So I'm not sure if it would be that
| effective
| lifeisstillgood wrote:
| I am thinking more like webauthn - but where I own a key
| pair, and I go to post office with my passport, they give
| me a nonce and prove that my it's my key pair then they
| post that public key is definitely me. I then can use
| that posting as my "username" and any challenge response
| includes the public key so they know that only I could be
| signing up
|
| I am very aware of "designing a security system they
| themselves cannot break" and the difficulties of key
| management etc.
|
| Would be interested in knowing more from smarter people
|
| (probably need to build a poc - one day :-( )
| bombolo wrote:
| > I own a key pair
|
| Right there... it won't work with the general population.
| lifeisstillgood wrote:
| something like 2 billion people have a phone with a
| secure enclave capable of this in their pockets today -
| and they use it everyday for logins, payment and paying
| at the car park.
|
| We have the penetration
|
| (Afaik smartphone penetration is around 4.5-5 BN, and
| something like 50%+ have secure enclaves but honestly
| Indont follow that deeply so would defer to more
| knowledgeable people)
| klabb3 wrote:
| That's not your identity, it's an access token protected
| by an advanced lock screen (which is greatly useful, but
| not the same). If you lose your device, the way you get
| back into your accounts is your de-facto identity--
| usually it ranges between the email you used during
| signup to your govt id.
|
| There isn't a widely deployed public key network with
| keys that represent a person, afaik. PGP is the closest
| maybe?
| nprateem wrote:
| Because the only way it'd work is if it was mandatory
| (because of point 2); it'd then be extended to porn sites
| to protect the children. That means politicians browsing
| history on pornhub would also be recorded and inevitably
| leaked when they get hacked.
| ipaddr wrote:
| If spam was your only problem now we have two spam and
| identity theft. Selling/obtaining identity information
| becomes very profitable and those working in the postal
| office must guard access like a bank vault.
| lifeisstillgood wrote:
| Then make it a banks job to guard the bank vaults - they
| need to earn that FDIC bailout money :-)
| wpietri wrote:
| The paradigm of fixed identity information as proof is
| pretty obviously doomed. Just like how the 1970s concept
| of username/password as proof of identity is on its way
| out. Or credit card numbers alone being used to validate
| transactions.
|
| All of those notions are pre-internet ways of proving
| identity. In a world where we're all rarely more than an
| arm's length from a globally connected computer, they're
| on the way out.
| wpietri wrote:
| I expect that's where we're heading. But then, as somebody
| who writes online mostly under my own name, maybe I'm just
| biased. Come on in, the water's fine!
|
| I think there are cases for anonymous/pseudonymous speech,
| but I think that's going to have to shift away from
| disposable identities. Newspapers, for example, have been
| providing selective anonymity for hundreds of years, so I
| think there's a model to follow: trusted
| people/organizations who validate the _quality_ of a non-
| public identity.
|
| So a place like HN, for example, could promise that each
| pseudonymous account is connected to a unique human via
| some sort of government ID with challenge/response
| capability. Or you could end up with third-party ID
| providers that provide a similar service that goes beyond
| mere identity, like the Twitter Verified program scaled up.
|
| Disposable identities have always been a struggle. E.g.,
| look at Reddit's very popular Am I the Asshole, where
| people widely believe a lot of the content is creative
| writing exercises. But keeping up a fake identity over the
| long term was a lot of work. Not anymore, though!
| tbrownaw wrote:
| Anonymity is critical to free speech, because there exist
| bad actors who will resort to violence to suppress speech
| they don't like.
| lifeisstillgood wrote:
| But, and I understand the argument, that is a problem for
| IRL society / government to solve.
|
| If someone walks upto me in the voting booth and says
| "vote for X or I will kill you" that's a crime. If they
| do it in the pub it's probably a crime. If they do it
| online the police don't have enough manpower to deal with
| the situation.
|
| We should change that.
|
| Every time some fuckwit tweets "you and your kids are
| going to get raped to death and I know where you live"
| because some woman dares suggest some political chnage I
| would like to see jail time.
|
| And if we do that then I can understand your argument,
| but I would then say it is not valid - in a society that
| protects free speech.
| __MatrixMan__ wrote:
| I'm far less worried about being intimidated into voting
| a certain way by someone who is avoiding the authorities
| online.
|
| Much more likely is that I'll vote ignorantly because I
| lack information that someone withheld because they're
| intimidated by the authorities.
| woile wrote:
| Actually, there could be places where verified humans are
| required, and places where they are not.
| tbrownaw wrote:
| That doesn't work so well when the government is one of
| the bad actors.
| lifeisstillgood wrote:
| My point is that if government is a bad actor, there is
| no recourse. We need a fair democratic society - it's on
| us to build one / keep it there
| Andrew_nenakhov wrote:
| > The obvious problem is we don't have any great
| alternatives.
|
| Of course we do. The rise of digital finance services has led
| to creation of a number of servives that offer identity
| verification necessary for KYC. All such services offer APIs,
| so adding an identity verification requirement to your forum
| is trivial.
|
| Of course, if it isn't obvious, I'm only half joking.
| coldtea wrote:
| > _The obvious problem is we don't have any great
| alternatives._
|
| There's always identity based network of trust. Several other
| members vouch for new people to be included.
| wpietri wrote:
| How would you imagine that applying here? If fake accounts
| are at least as convincing as real ones, then it seems like
| trust networks would be quickly prone to corruption as the
| fake accounts gain enough of a foothold to start
| recommending each other.
| coldtea wrote:
| On a network started by 2-3-10 people, the first new
| members would need to be vouched by a percentage of those
| to get in - and so on.
|
| If someone down the line does some BS activity, the
| accounts that vouched for it have their reputation on the
| line.
|
| A whole tree of the person who did the BS and 1-2 layers
| of vouching above gets put on check, gets big red warning
| label in their UI presence (e.g. under their
| avatar/name), and loses privileges. It could even just
| get immediately deleted.
|
| And since I said "identity based", you would need to
| provide to real world id to get in, on top of others
| vouching for you. It can be made so you wouldn't be able
| to get a fake account any easier than you can get a fake
| passport.
| eternalban wrote:
| Maybe even push that a level higher and have org to org
| vouching as well (so it can scale and reputation propagates
| social bubbles.) Bootstrapping remains somewhat an issue.
| wongarsu wrote:
| One somewhat popular solution for bootstrapping is to
| allow people to buy in, paired with quickly banning those
| members in cases of rule violation. It's by no means
| perfect, but it puts a real price on abuse and thus
| reduces it a lot
| groestl wrote:
| I've mentioned a "market of lemons" elsewhere in this
| thread. One such market is the market for malware and
| stolen credit card details. One result of the market being
| broken: serious criminals restrict themselves to very small
| (company like) social circles and invite only forums. One
| signal of trust that remained very long: a very short ICQ
| number. You don't want to burn such a handle with a bad
| trade, so trust was given upfront.
| wpietri wrote:
| I mean, there have always been shills. What's changing now is
| the cost of shilling is dropping from dollars per comment to
| fractions of a cent. Troll farms used to be a lot of work to
| put together, but soon they'll be aaS.
|
| Those of us who are careful internet readers have spent years
| developing good heuristics to use textual clues to tell us
| about the person behind the text. Are they smart? Are they
| sincere? Are they honest? Are they commenting in good faith?
| Those skills will soon be obsolete.
|
| The folks at OpenAI, who are nominally on a mission to make
| sure AI "benefits all of humanity", have condemned us to a life
| sentence of fending off high-volume, high-quality bullshit.
| Bullshit that they are actively working to make harder to
| detect. And I think the first victims of that will be internet
| forums where text is the main signal, places like this and
| Reddit.
| robertlagrant wrote:
| Maybe we need a social network based on physical exchange of
| trust.
| api wrote:
| That's mostly what the person to person phone system was.
| GlumWoodpecker wrote:
| The scary part is that this doesn't seem too far off, with the
| current proliferation of large language models like the GPTs..
| rzzzt wrote:
| Parent was definitely not referring to these at all /s
| perihelions wrote:
| (I ninja-edited my comment in the first minute; the parent
| might have responded to a less clear version, since they
| posted at +3 minutes. I added "AI" in a revision).
| rzzzt wrote:
| OK, sounds reasonable. I didn't see the edit either, was
| just thinking about the myriad of LLM articles on the
| front page recently.
| quickthrower2 wrote:
| You sound way too human to be an AI then
| dang wrote:
| If you want to, you can always set 'delay' in your
| profile to the number of minutes (up to 10) that you
| would like your comments to be visible only to you. This
| puts the stealth back in stealth editing.
| https://news.ycombinator.com/newsfaq.html
|
| I rely heavily on this because it's somehow only after
| the comment is 'real' (i.e. staring back at me from a
| real HN thread) that I notice most of the edits I want to
| make.
| Biganon wrote:
| [flagged]
| siva7 wrote:
| Who says this isn't already happening?
| dang wrote:
| If people see AI-generated comments on HN they should flag
| them and let us know at hn@ycombinator.com. HN is for humans
| to converse, and bots have never been allowed.
|
| Of course it's not always easy to say what's AI-generated or
| not. But if an account is making a habit of it, it still
| seems possible to tell.
| echelon wrote:
| Reddit better hold their IPO soon or they'll get caught up in
| this. Pretty soon there will be dozens of different GPT/LLM-
| powered Reddit spam bots on Github. Some of them no doubt for
| political trolling. [1]
|
| Phone, then ID-based verification is a stop gap, but IDV
| services will have to spin up to support the mass volume of
| verifying all humans.
|
| [1] I kind of want to do this from an innocent / artistic
| perspective myself. Perhaps a bot that responds with a bunch
| of rhetorical questions or onomatopoeia. Then I'd scale it to
| the point people start noticing and feeling weirded out by
| it. "Is this the new Gen Alpha lingo?" Alas, I have too many
| other AI projects.
| siva7 wrote:
| The Anti-AI\GPT-Detection will soon be a multi-billion
| dollar industry
| asmor wrote:
| And it'll silently remove your real posts too, faster
| than the horrible moderation on reddit ever could!
| ChrisKnott wrote:
| I just tried to find a FOSS tool for converting MS Outlook
| .pst file to .mbox.
|
| I first tried Google; the results are dominating by
| commercial crap.
|
| Then I tried the "google reddit" trick to try and find some
| real people's opinions... but look at all the blatantly
| bullshit comments on this Reddit thread; https://www.reddit.c
| om/r/Thunderbird/comments/ae4cdg/good_ps...
|
| ---
|
| (if anyone is wondering, the best option for Windows is to
| use 'readpst' command via WSL. Comes in the 'pst-utils'
| package).
| siva7 wrote:
| So a GPT bot instead of the human commenters would make
| reddit more useful in the end, this is what you're saying
| right?
| ChrisKnott wrote:
| How so? The commercial organisations will be able to use
| a GPT bot to provide more believable comments, at greater
| scale, and cheaper.
| deafpolygon wrote:
| I'm blind maybe, but what are the blatantly bullshit
| comments? The spam of PST to MBOX?
| SalmoShalazar wrote:
| Yeah they are almost all clearly spammy, broken english
| ads for paid software
| vageli wrote:
| Yes and if you look at the comment history of the posters
| in that thread, it is clear they are all spam accounts.
| malshe wrote:
| I give Github star as a bookmark for the repo so I assumed that
| others might be using it the same way too.
| precompute wrote:
| This sort of gamification exists only because there are too many
| green engineers that only care about their salaries, and they
| mimic what people successfully recruited by FAANG (etc.) did, and
| so do other companies. Then this purity spirals into taking the
| entire field down because there's no one around to educate the
| new newbies. Facebook was IMO a step in the right direction
| because it was a "general" social network, you could post
| anything. Imagine if FB had released some sort of an "extension"
| that allowed you to share anything via a template of sorts,
| instead of having to type out everything in the same old text
| post. It would have been meta enough (sorry) to not spiral very
| quickly.
|
| Leaving the arena is the only viable option. Software projects
| that aren't dependent on github drive their own vehicle, everyone
| else is on a crowded bus.
| rootsudo wrote:
| This is a great article, I've developed the same tactics for
| other projects but never was able to graft the proper vernacular.
| It really helps tackling how to organize and present information.
|
| I wonder if this is also in general OSINT or ISC^2 training -
| everything this article showed for breadtrails and reverse
| operation (e.g. pay a company to do the work, see how it is,
| evaluate the results, see if you can find other work similar/akin
| to it.)
| lozenge wrote:
| The projects with suspicious stars were still >80% nonfake stars.
| That to me suggests that most of the fake stars have been
| classified as nonfake. There isn't much psychological value in
| boosting your star count by just 25%.
| bart_spoon wrote:
| Depends on when the fake stars were created. If they are early
| in a projects life cycle, they may be used to get attention on
| the project, and once they have awareness, fake stars are no
| longer necessary.
| NiloCK wrote:
| I have a half-written article about this, but I didn't have any
| good notion about quantifying the problem so this article is very
| welcome info to me.
|
| My own angle is that copilot has shifted the incentives around
| this practice, maybe substantially. Businesses want to get (free
| tiers of) their paid SaaS endpoints into copilot suggestions -
| it's a great funnel!
|
| I'd guess that github is as likely as not to become an SEO spam
| battlefield (like the rest of the web).
| UncleEntity wrote:
| > Businesses want to get (free tiers of) their paid SaaS
| endpoints into copilot suggestions - it's a great funnel!
|
| That's so brilliantly evil...
|
| I can see the next generation of "how I got to $3m in passive
| income" articles being written (by ChatGPT) right now.
| yla92 wrote:
| TIL: you can buy (fake) GitHub stars.
|
| That was a bit shocking to me to learn.
| Springtime wrote:
| After that post on HN months ago[1] where users discovered
| OAuth permissions for unrelated things being used/abused to
| star projects without their knowledge this news of buying stars
| didn't come as a surprise.
|
| It's unfortunate as I've seen stars used as a metric of
| trustworthiness in general user discussions.
|
| [1] https://news.ycombinator.com/item?id=33917962
| astura wrote:
| You can buy Twitter followers, Instagram followers, YouTube
| views, Amazon reviews, Reddit upvotes, Reddit comments, and
| Yelp reviews - so what's so shocking about GitHub stars?
| quickthrower2 wrote:
| /s ??
|
| I always expected there was a market for fake stars. I am
| trying to get a repo naturally to 1000 stars, but I would never
| buy them.
| sorokod wrote:
| Can you explain why is it "natural" to try to get your repo
| to have many stars in a world where starts can be bought?
| s9w wrote:
| most people don't know that stars are bought
| quickthrower2 wrote:
| Natural: promote the repo, people see it and like it. I
| don't beg for them.
|
| Unnatural: pay some bot runner to buy stars.
|
| I prefer natural as the stars are a metric not an end goal.
| sorokod wrote:
| Right, was not asking what you mean by natural but rather
| why is it of any value.
| quickthrower2 wrote:
| I see. I just see it as an indicator of reach. Also some
| people with snap judge a project by number of stars and
| more likely use it if it has a bunch.
| morelisp wrote:
| > I just see it as an indicator of reach.
|
| That just shifts the question to why is "reach" something
| worth wanting?
|
| > some people with snap judge a project by number of
| stars and more likely use it if it has a bunch.
|
| And why do you want these users?
| quickthrower2 wrote:
| 1. to get more feedback
|
| 2. they may just be busy users, looking for something for
| their job.
|
| I take on board your point though. The stars thing isn't
| the biggest consideration by a long shot. Probably the
| smallest!
| mr_mitm wrote:
| That's my issue with stars already. One repo having more
| stars than another doesn't mean it's better in any way.
| It might just mean it's been promoted more.
|
| That's how record labels can simply decide what's going
| to be the next summer hit. They pick a song and promote
| the hell out of it. It's not the summer hit because it
| was somehow better, just more promoted.
| [deleted]
| sacnoradhq wrote:
| The next thing in social media vending machines.
|
| https://twitter.com/Alexey__Kovalev/status/87184200877156761...
___________________________________________________________________
(page generated 2023-03-18 23:01 UTC)