[HN Gopher] Large language models reduce public knowledge sharin...
___________________________________________________________________
Large language models reduce public knowledge sharing on online Q&A
platforms
Author : croes
Score : 299 points
Date : 2024-10-13 11:26 UTC (10 hours ago)
(HTM) web link (academic.oup.com)
(TXT) w3m dump (academic.oup.com)
| scotty79 wrote:
| Don't they just reduce the Q part of Q&A? And since the Q was A-d
| by AI doesn't that mean that A was there already and people just
| couldn't find it but AI did?
| lordgrenville wrote:
| The answer by humans is a) publicly accessible b)
| hallucination-free (although it still may not be correct) c)
| subject to a voting process which gives a good signal of how
| much we should trust it.
|
| Which makes me think, maybe a good move for Stack Overflow
| (which does not allow the submission of LLM-generated answers,
| wisely imo) would be to add an AI agent that would suggest an
| answer for each question, that people could vote on. That way
| you can elicit human and machine answers, and still have the
| verification process.
| scotty79 wrote:
| It's a good idea but probably not easy to implement. SO
| answers are usually quite neat, like an email. Solving a
| problem with ChatGPT is more like ... chat. It's hard to turn
| it into something googlable and Google is how SO gets most of
| its traffic and utility.
| TeMPOraL wrote:
| LLMs are much better experience on the "Q side". Sure,
| there's the occasional hallucination here and there, but QnA
| sites are not all StackOverflow. Most of them are just
| content farms for SEO and advertising purposes - meaning, the
| veracity of the content doesn't matter, as long as it's
| driving clicks. At this moment, this makes LLMs _much_ more
| trustworthy.
| intended wrote:
| Why a vote? Voting != Verification.
| Davidzheng wrote:
| I don't think human mistakes are distinguishable from
| hallucinations.
| Y_Y wrote:
| Let's train a discriminator and see!
| david-gpu wrote:
| As a user, why would I care whether an answer is "incorrect"
| or "hallucinated"? Neither one is going to solve the problem
| I have at hand. It sounds like a distinction without a
| difference.
| lordgrenville wrote:
| One relevant difference is that a better-quality human
| answer is correlated with certain "tells": correct
| formatting and grammar, longer answers, higher reputation.
| An incorrect LLM answer looks (from the outside) exactly
| the same as a correct answer.
| mikepurvis wrote:
| Obviously there are exceptions but human-wrong answers tend
| to be more subtly wrong whereas hallucinated answers are
| just baffling and nonsensical.
| jmyeet wrote:
| It's a losing battle to try and maintain walled gardens for these
| corpuses of human-generated text that have become valuable to
| train LLMs. The horse has probably already bolted.
|
| I see this as a temporary problem however because LLMs are
| transitional. At some point it won't be necessary to train an LLM
| on the entirety of Reddit plus everything else ever written
| because there are obvious limits to statistical models like this
| and, as a counter point, that's not how humans learn. You may
| have read hundres of books in your life, maybe even thousands.
| You haven't read a million. You don't need to.
|
| I find it interesting that this issue (which is theft, to be
| clear) is being framed as theft from the site or company that
| "owns" that data, rather than theft from the users who created
| it. All these user-generated content ("UGC") sites are doomed to
| eventually fail because their motivations diverge from their
| users and the endless quest to increase profits inevitably drives
| users away.
|
| Another issue is how much IP consumption constitutes theft? If an
| LLM watches every movie ever made, that's probably theft. But how
| many is too many? Like Apocalypse Now was loosely based on or at
| least inspired by Heart of Darkness (the novel). Yet you can't
| accuse a human of "theft" by reading Heart of Darkness.
|
| All art is derivative, as they say.
| vlovich123 wrote:
| > At some point it won't be necessary to train an LLM on the
| entirety of Reddit plus everything else ever written because
| there are obvious limits to statistical models like this and,
| as a counter point, that's not how humans learn. You may have
| read hundres of books in your life, maybe even thousands. You
| haven't read a million. You don't need to.
|
| I agree but I think it may be privileging the human
| intelligence mechanism a bit too much. These LLMs are polymaths
| that can spit out content at a super human rate. It can
| generate poetry and literature similarly to code and answers
| about physics and car repair. It's very rare for a human to be
| able to do that especially these days.
|
| So I agree they're transitional but only in the sense that our
| brains are transitional from the basal ganglia to the
| neocortex. In that sense I think LLMs will probably be a part
| of a future GAI brain with other things tracked on, but it's
| not clear it will necessarily evolve to work like a human's
| brain does.
| jprete wrote:
| I think the actual reason people can't do it is that we avoid
| situations with high risk and no apparent reward. And we
| aren't sufficiently supportive of other people doing
| surprising things (so there's no reward for trying). I.e.
| it's a modern culture problem, not a human brain problem.
| jmyeet wrote:
| > These LLMs are polymaths that can spit out content at a
| super human rate.
|
| Do you mean in theory or currently? Because currently, LLMs
| make simple errors (eg [1]) and are more capable of spitting
| out, well, nonsense. I think it's safe to say we're a long
| way from LLMs producing anything creatively good.
|
| I'll put it this way: you won't be getting The Godfather from
| LLMs anytime soon but you can probably get an industrial film
| with generic music that tells you how to safely handle
| solvents, maybe.
|
| Computers are generally good at doing math but LLMs generally
| aren't [2] and that really demonstrates the weaknesses in
| this statistical approach. ChatGPT (as one example) doesn't
| understand what numbers are or how to multiply them. It
| relies seeing similar answers to derive a likely answer so it
| often gets the first and large digits of the answer correct
| but not the middle. You can't keep scaling the input data to
| have it see every possible math question. That's just not
| practical.
|
| Now multiplying two large numbers is a solvable problem.
| Counting Rs in strawberry is a solvable problem. But
| statistical LLMs are going to have a massive long tail of
| these problems. It's really going to take the next
| generational change to make progress.
|
| [1]: https://www.inc.com/kit-eaton/how-many-rs-in-strawberry-
| this...
|
| [2]: https://www.reachcapital.com/2024/07/16/why-llms-are-
| bad-at-...
| vlovich123 wrote:
| I think we're in agreement. It's going to take next
| generation architecture to address the flaws where the LLM
| often can't even correct its mistake when it's pointed out
| as with the strawberry example.
|
| I still think transformers and LLMs will likely remain as
| some component within that next gen architecture vs
| something completely alien.
| simonw wrote:
| Both the "count the Rs in strawberry" and the "multiply two
| large numbers" things have been solved for over a year now
| by the tool usage pattern: give an LLM the ability to
| delegate to a code execution environment for things it's
| inherently bad at and train it how to identify when to use
| that option.
| vlovich123 wrote:
| I think the point is that playing whack a mole is an
| effective practical strategy to shore up individual
| weaknesses (or even classes of weaknesses) but that
| doesn't get you to general reasoning unless you think
| that intelligence evolved this way. Given the
| adaptability of intelligence across the animal kingdom to
| novel environments never seen before, I don't think that
| can be anything other than a short term strategy for AGI.
| simonw wrote:
| Sure, LLMs won't ever get to general reasoning (for pick
| your definition of "reasoning") unassisted.
|
| I think that adding different forms of assistance remains
| the most interesting pattern right now.
| shagie wrote:
| (I did an earlier attempt at this with a "ok, longer
| conversation" ... and then did a "well, what if I just
| asked it directly?")
|
| https://chatgpt.com/share/670bfdbd-8624-8011-bc31-2ba66ea
| b3e...
|
| I didn't realize that it had come that far with the
| delegating of those problems to the code writing and
| executing part of itself.
| airstrike wrote:
| _> (which is theft, to be clear)_
|
| _> Another issue is how much IP consumption constitutes theft?
| If an LLM watches every movie ever made, that 's probably
| theft._
|
| It's hard to reconcile those two views, and I don't think theft
| is defined by "how much" is being stolen.
| szundi wrote:
| Only half true as maybe reasoning and actual understanding is
| not the strength of LLMs but it is fascinating that they
| actually can produce good info from everything they have read -
| unlike me who only read a fraction of that. Maybe dumb, but
| good memory.
|
| So I think future AI has to read also everything if it is used
| like ChatGPT these days by average people to ask for advice
| about almost anything.
| 0x1ceb00da wrote:
| > You may have read hundres of books in your life, maybe even
| thousands. You haven't read a million. You don't need to.
|
| Sometimes online forums are the only place where you can find
| solutions for niche situations and edge cases. Tricks which
| would have been very difficult to figure out on your own. LLMs
| can train on the official documentation of tools l/libraries
| but they can't experiment and figure out solutions to weird
| problems which are unfortunately very common in tech industry.
| If people stop sharing such solutions with others, it might
| become a big problem.
| skydhash wrote:
| > _Sometimes online forums are the only place where you can
| find solutions for niche situations and edge cases._
|
| That's the most valuable aspect of it. When you find yourself
| in these niches situations, it's nice when you see someone
| has encountered it and has done the legwork to solve it,
| saving you hours and days. And that's why Wikis like the Arch
| Wiki are important. You need people to document the system,
| not just individual components.
| simonw wrote:
| "LLMs can train on the official documentation of tools
| l/libraries but they can't experiment and figure out
| solutions to weird problems"
|
| LLMs train on way more than just the official documentation:
| they train on the code itself, the unit tests for that code
| (which, for well written projects, cover all sorts of
| undocumented edge-based) and - for popular projects -
| thousands of examples of that library being used (and unit
| tested) "in the wild".
|
| This is why LLMs are so effective at helping figure out edge-
| cases for widely used libraries.
|
| The best coding LLMs are also trained on additional custom
| examples written by humans who were paid to build proprietary
| training data for those LLMs.
|
| I suspect they are increasingly trained on artificially
| created examples which have been validated (to a certain
| extent) through executing that code before adding it to the
| training data. That's a unique advantage for code - it's a
| lot harder to "verify" non-code generated prose since you
| can't execute that and see if you get an error.
| 0x1ceb00da wrote:
| > they train on the code itself, the unit tests for that
| code
|
| If understanding the code was enough, we wouldn't have any
| bugs or counterintuitive behaviors.
|
| > and - for popular projects - thousands of examples of
| that library being used (and unit tested) "in the wild".
|
| If people stopped contributing to forums, we won't have any
| such data for new things that are being made.
| simonw wrote:
| The examples I'm talking about come from openly licensed
| code in sources like GitHub, not from StackOverflow.
|
| I would argue that code in GitHub is much more useful,
| because it's presented in the context of a larger
| application and is also more likely to work.
| jumping_frog wrote:
| Just to add to your point, consider a book like "Finite and
| Infinite" games. I think I can "recreate" the knowledge and
| main thesis in the book by my readings from other areas.
|
| 'Listening to different spiritual gurus saying the same thing
| using different words' is like 'watching the same coloured
| glass pieces getting rearranged to form new patterns in
| kaleidoscope'
| falcor84 wrote:
| > that's not how humans learn
|
| I've been thinking about this a lot lately. Could we train an
| AI, e.g. using RL and GAN, where it gets an IT task to perform
| based on a body of documentation, such that its fitness would
| then be measured based on both direct success on the task, and
| on the creation of new (distilled and better written)
| documentation that would allow an otherwise context-less copy
| of itself to do well on the task?
| Artgor wrote:
| Well, if the users ask frequent/common questions to ChatGPT and
| get acceptable answers, is this even a problem? If the volume of
| duplicate questions decreases, there should be no bad influence
| on the training data, right?
| jeremyjh wrote:
| They spoke to this point in the abstract. They observe a
| similar drop in less common and more advanced questions.
| timhh wrote:
| Stackoverflow mods and power users being arseholes reduces the
| use of Stackoverflow. ChatGPT is just the first viable
| alternative.
| tomrod wrote:
| While not exactly the same wording, this was my also first
| thought.
|
| There have been two places that I remember where arrogance of
| the esoterati drive two feedback cycles:
|
| 1. People leave after seeking help for an issue they believed
| needed the input of masters.
|
| 2. Because of gruff treatment, the masters receive complaints
| and indignation, triggering a backfire effect feedback loop,
| often under the guise of said masters not wanting to be
| overwhelmed by common problems and issues.
|
| There is a few practical things that can help with this (clear
| guides to point to, etc.), but the missing element is kindness
| / non-judgmental responsiveness.
| croes wrote:
| How can it be an alternative if it needs the data from
| Stackoverflow?
| OgsyedIE wrote:
| Because consumers in every market develop models of reality
| (and make purchasing decisions) on the basis of their best
| attempts to derive accuracy from their own inevitably flawed
| perceptions, instead of having perfect information about
| every aspect of the world?
| miohtama wrote:
| It's an interesting question. The world has had 30 years to
| come up with a StackOverflow alternative with friendly mods. It
| hasn't. So the question is that has someone tried hard enough
| or can it be done it the first place.
|
| I am Stack overflow mod, dealing with other mods. There is
| definitely unnecessary hostility there, but most of question
| closes and downvotes Go 90% to low quality questiond which lack
| proper professionalism to warrant anyone's time. It is
| remaining 10% that turns off people.
|
| We can also take analogs from the death of Usenet.
| jprete wrote:
| I think the problem isn't specific to SO. Text-based
| communication with strangers lacks two crucial emotional
| filters. Before speaking, a person anticipates the listener's
| reaction and adjusts what they say accordingly. After
| speaking, they pay attention to the listener's reaction to
| update their understanding for the future.
|
| Without seeing faces, people just don't do this very well.
| shagie wrote:
| The model of QA that Stack Overflow and its various forks
| follow the same approach struggle with the 90/9/1 problem (
| https://en.wikipedia.org/wiki/1%25_rule ).
|
| Q&A was designed to handle the social explosion problem and
| the eternal September problems by having a larger percent of
| the username take an interest in the community over time and
| continue to maintain that ideal. Things like comments and
| discussions being difficult is part of the design to make it
| so that you don't get protracted discussions that in turn
| needs more moderation resources.
|
| The fraction of the people doing the curation and moderation
| of the site overall has dropped. The reasons for that drop
| are manyfold. I believe that much of it falls squarely upon
| Stack Overflow corporate without considering second order
| effects of engaging and managing the community of people who
| are interested in the success of the site as they envision.
|
| Ultimately, Stack Overflow has become _too_ successful and
| the people looking to it now have a different vision for what
| it should be that comes into conflict with both the design of
| the site and the vision of the core group.
|
| While Stack Overflow can thrive with a smaller number of
| people asking "good" (yes, very subjective) questions it has
| difficulty when it strays into questions that need discussion
| (which its design comes into conflict with) or too many
| questions for the committed core group to maintain. Smaller
| sites can (and do) have a larger fraction of the user base
| committed to the goals of the site and in turn are able to
| provide more individual guidance - while Stack Overflow has
| _long_ gone past that point.
|
| ---
|
| Stack Overflow and its Q&A format that has been often copied
| works for certain sized user bases. It needs enough people to
| keep it interesting, but it fails to scale when too many
| people participate who have a different idea of what
| questions should be there.
|
| There is a lack of moderation tools for the core user base to
| be able to manage it at scale (you will note the history of
| Stack Overflow has been _removing_ and restricting moderation
| tools until it gets "too" bad - see also removal of 20k
| users helping with flag handling and the continued rescoping
| of close reasons).
|
| Until someone comes up with a fundamentally different
| approach that is able to handle moderation at scale or
| sufficient barriers for new accounts (to handle the Eternal
| September problem), we are going to continue to see Stack
| Overflow clones spout and die on the vine along with a
| continued balkanization of knowledge in smaller areas that
| are able handle vision and moderation at a smaller scale.
|
| ---
|
| Every attempt at a site I've seen since (and I include things
| like Lemmy in this which did a "copy reddit" and _then_ worry
| (or not) about moderation) have started from a "get popular,
| _then_ work on the moderation problem " which is ultimately
| too late to really solve the problem. The tools for
| moderation need to be baked into the design from the start.
| hifromwork wrote:
| >Stackoverflow mods and power users being arseholes reduces the
| use of Stackoverflow
|
| While they are certainly not perfect, they willingly spend
| their own spare time to help other peoples for free. I disagree
| with calling them arseholes.
| tomrod wrote:
| A lot of people comment on online forums for free and are
| arseholes there too. Not in this thread so far that I've
| read, to be clear, but it certainly happens. How would you
| qualify the difference?
| timhh wrote:
| The people I am referring to are not helping. At this point
| they are making SO worse.
|
| The problems are two-fold:
|
| 1. Any community with volunteer moderators attracts the kind
| of people you don't want to be moderators. They enjoy rigidly
| enforcing the rules even if it makes no sense.
|
| 2. There are two ways to find questions and answer them:
| random new questions from the review queue, and from Google
| when you're searching for a problem you have. SO encourages
| the former, and unfortunately the vast majority of questions
| are awful. If you go and review questions like this you
| _will_ go "downvote close, downvote close, downvote close".
| You're going to correctly close a load of trash questions
| that nobody cares about _and_ a load of good questions you
| just don 't understand.
|
| I've started recording a list of questions I've asked that
| get idiotic downvotes or closed, so I can write a proper rant
| about it with concrete examples. Otherwise you get people
| dismissing the problem as imaginary.
|
| These mods now hold SO hostage. SO is definitely aware of the
| problem but they can't instigate proper changes to fix it
| because the mods _like_ this situation and they revolt if SO
| tries to remedy it.
| romeros wrote:
| thats just cope. I stopped using stackoverflow because I get
| everything from chatpgt/claude. Just a case of having better
| tech.
|
| Sure the mods were arseholes etc.. but before gpt never minded
| using it .
| rkncland wrote:
| ChatGPT plagiarizes the anwers of those whom you call
| "arseholes". How is using Stackoverflow in read-only mode
| different from using ChatGPT?
|
| Except of course that reading Stackoverflow directly has better
| retention rates, better explanations and more in-depth
| discussions.
|
| (My view is that moderators can be annoying but the issue is
| overblown.)
| verdverm wrote:
| Plagiarizing means violating copyright, loosely speaking.
| When you, as a human, use SO, you assign your rights to the
| content to SO. That company is licensing the content to 3rd
| parties, including those who want to train their LLMs.
|
| What I find is that the LLMs are not spitting out SO text
| word for word, as one would when plagiarizing. Rather, the
| LLM uses the context and words of my question when answering,
| making the response specific and cohesive (by piecing
| together answers from across questions).
| tomrod wrote:
| I thought plagarizing was producing new work substantially
| copied from prior work, regardless who owns the copyright?
| I thought this because self-plagarizing exists.
| verdverm wrote:
| Well, if we could not reproduce with changes, what others
| have written and we have learned, it is unlikely we could
| make real progress. There are many more concepts, like
| fair use, meaningful changes, and other legalese; as well
| as how people use the term "plagiarize" differently. I
| never heard of this "self-plagarizing" concept, it seems
| like something fringe that would not be enforceable other
| than in the court of public opinion or the classroom via
| grades
| tomrod wrote:
| You're one of today's lucky 10,000!
| https://xkcd.com/1053/
|
| It's a core issue in academia and other areas where the
| output is heavily the written word.
|
| [0] https://en.wikipedia.org/wiki/Plagiarism#Self-
| plagiarism
|
| [1] https://ori.hhs.gov/self-plagiarism
|
| [2] https://www.aje.com/arc/self-plagiarism-how-to-
| define-it-and...
| verdverm wrote:
| Reproducing sections is useful in academic publishing. I
| saw it while reading 100s of papers during my PhD.
|
| (1) If you are reading your entrypoint into an area of
| research, or a group, it is useful context on first
| encounter
|
| (2) If you are not, then you can easily skip it
|
| (3) Citing, instead of reproducing sections like
| background work, means you have to go look up other
| papers, meaning a paper can no longer stand on its own.
|
| Self-plagiarism is an opinion among a subset of
| academics, not something widely discussed or debated. Are
| there bad apples, sure. Is there a systemic issue, I
| don't think so.
| manojlds wrote:
| Easy to keep saying this, but SO was useful because it wasn't
| wild west.
| weinzierl wrote:
| It was useful and not the wild west as long as a very small
| group of intelligent and highly motivated individuals
| moderated it. First and foremost Jeff Atwood used to do a lot
| of moderation himself - not unlike dang on HN.
|
| When that stopped, the site (and to some degree its growing
| number of sister sites) continued on its ballistic curve,
| slowly but continuously descending into the abyss.
|
| My primary take away is that we have not found a way to scale
| moderation. SO was doomed anyway, LLMs have just sped up that
| process.
| waynecochran wrote:
| I have definitely noticed a large drop in responses on SO. I
| am old enough to have seen the death of these platforms.
| First to go was usenet when AOL and its ilk became a thing
| and every channel turned into spam.
| timhh wrote:
| I disagree. It was useful because the UI was (and is!) great.
| Easy to use markdown input, lists of answers sorted by votes,
| very limited ads, etc. The gamification was also well done.
|
| Compared to anything before it (endless phpBB forums,
| expertsexchange, etc.) it was just light years ahead.
|
| Even today compared the SO UI with Quora. It's still 10x
| better.
| optimiz3 wrote:
| If a site aims to commoditize shared expertise, royalties should
| be paid. Why would anyone willingly reduce their earning power,
| let alone hand away the right for someone else to profit from
| selling their knowledge, unattributed no less.
|
| Best bet is to book publish, and require a license from anyone
| that wants to train on it.
| afh1 wrote:
| Why open source anything, let alone with permissive licensing,
| right?
| optimiz3 wrote:
| To a degree, yes. I only open source work where I expect
| reciprocal value from other contributions.
| benfortuna wrote:
| I think that is antithetical to the idea of Open Source. If
| you expect contributions then pay a bounty, don't pretend.
| andrepd wrote:
| GPL is antithetical to open source? Odd take
| verdverm wrote:
| There is a permissionless (MIT) vs permissioned (GPL)
| difference that is at the heart of the debate of what
| society thinks open source should mean
| optimiz3 wrote:
| The bounty is you getting to use my work (shared in good
| faith no less). Appreciate the charity and don't be a
| freeloader or you'll get less in the future.
| johannes1234321 wrote:
| There is a lot of indirect hardly measurable value one can
| gain.
|
| Going back to the original source: By giving an answer to
| somebody on a Q&A site, they might be a kid learning and
| then building solutions I benefit from later, again.
| Similar with software.
|
| And I also consider the total gain of knowledge for our
| society at large a gain.
|
| While my marginal cost form many things is low. And often
| lower than a cost-benefit calculation.
|
| And some Q&A questions strike a nerve and are interesting
| to me to answer (be it in thinking about the problem or in
| trying to boiling it down to a good answer), similar to
| open source. Some programming tasks as fun problems to
| solve, that's a gain, and then sharing the result cost me
| nothing.
| Y_Y wrote:
| See also: BSD vs. GPL
| immibis wrote:
| This is a real problem with permissive licensing. Large
| corporations effectively brainwashed large swaths of
| developers into working for free. Not working for the commons
| for free, as in AGPL, but working for corporations for free.
| jncfhnb wrote:
| Because it's a marginal effect on your earning power and it's a
| nice thing to do.
| optimiz3 wrote:
| The management of these walled gardens will keep saying that
| to your face as they sell your contributions. Meanwhile your
| family gets nothing.
| jncfhnb wrote:
| Did your family get anything from you sharing this opinion?
| If not, why did you share it? Are you suggesting that your
| personal motivations for posting this cynicism are
| reasonable but that similar motivations that are altruistic
| for helping someone are not?
| optimiz3 wrote:
| Sharing this opinion doesn't sacrifice my primary
| economic utility, and in fact disseminates a sentiment
| that if more widespread would empower everyone to realize
| more of the value they offer. Please do train an LLM to
| inform people to seek licensing arrangements for the
| expertise they provide.
| jncfhnb wrote:
| That's just dumb, man. You're not sacrificing anything by
| giving someone a helpful answer.
| 8note wrote:
| Giving it away for free, you are ensuring there isn't a
| consulting gig that charges for giving helpful answers.
| AlexandrB wrote:
| "It's a nice thing to do" never seems to sway online
| platforms to treat their users better. This kind of asymmetry
| seems to only ever go one way.
| falcor84 wrote:
| As a mid-core SO user (4 digit reputation), I never felt
| like I needed them to treat me better. I always feel that
| while I'm contributing a bit, I get so much more value out
| of SO than what I've put in, and am grateful for it being
| there. It might also have something to do with me being old
| enough to remember the original expertsexchange, as well as
| those MSDN support documentation CDs. I'm much happier now.
| immibis wrote:
| Stack Overflow won't even let me delete my own content now
| that they're violating the license to it.
| simonw wrote:
| ... you just shared your expertise here on Hacker News in the
| form of this comment without any expectation of royalties. How
| is posting on StackOverflow different?
| krtalc wrote:
| One could answer that question to people whose salary does
| not depend upon not understanding the answer.
| malicka wrote:
| While there is a thing to be said about the unethical business
| practices of Quora/StackOverflow, I reject the framing of
| "reducing your earning power." Not everything is about
| transactions or self-benefit, especially when it comes to
| knowledge; it's about contributing and collaboration. There is
| immense intrinsic value to that. I'm glad we don't live in your
| world, where libre software is a pipe-dream and hackers hoard
| their knowledge like sickly dragons.
| wwweston wrote:
| When the jobs side of SO was active, it effectively did this.
| Strong answers and scoring were compensated with prospective
| employer attention. For a few years, this was actually where
| the majority of my new job leads came from. It was a pretty
| rewarding ecosystem, though not without its problems.
|
| Not sure why they shut down jobs; they recently brought back a
| poorer version of it.
| vitiral wrote:
| We need to refine our tech stack to create a new one which is
| understandable by humans, before LLMs pollute our current stack
| to the point it's impossible to understand or modify. That's what
| I'm doing at https://lua.civboot.org
| rkncland wrote:
| Of course people reduce their free contributions to
| Stackoverflow. Stackoverflow is selling then out with the OpenAI
| API agreement and countless "AI" hype blog posts.
| jeremyjh wrote:
| I think this is more about a drop in questions, than a drop in
| answers.
| bryanrasmussen wrote:
| I mean part of the reason to not ask about stuff on SO, there
| are several types of questions that one might like to ask -
| such as:
|
| I don't know the first thing about this thing, help me get to
| where I know the first thing. This is not allowed any more.
|
| I want to know the pros and cons of various things compared.
| this is not allowed.
|
| I have quality questions regarding an approach that I know
| how to do, but I want to know better ways. This is generally
| not allowed but you might slip through if you ask it just
| right.
|
| I pretty much know really well what I'm doing but having some
| difficulty finding the right documentation on some little
| thing,help me - this is allowe
|
| Something does not work as per the documentation, help me,
| this is allowed
|
| I think I have done everything right but it is not working,
| this is allowed and is generally a typo or something that you
| have put in the wrong order because you're tired.
|
| At any rate, the ones that are not allowed are the only
| questions that are worth asking.
|
| The last two that is allowed I generally find gets answered
| in the asking - I'm pretty good in the field I'm asking in,
| the rigor of making something match SO question requirements
| leads me to the answer.
|
| If I ask one of the interesting disallowed questions and get
| shit on then I will probably go through a period of screw it,
| I will just look extra hard for the documentation before I
| bother with that site again.
| SoftTalker wrote:
| The first one especially is not interesting except to the
| person asking the question, who wants to be spoon-fed
| answers instead of making any effort of his own to acquire
| foundational knowledge. Often these are students asking for
| someone to solve their homework problems.
|
| Pro/Con questions are too likely to involve opinion and
| degenerate into flamewars. Some _could_ be answered
| factually, but mostly are not. Others have no clear
| answers.
| bryanrasmussen wrote:
| thank you for bringing the default SO reasons why these
| are not the case, but first off
|
| >Often these are students asking for someone to solve
| their homework problems.
|
| I don't think I've been in any class since elementary
| school in which I did not have foundational knowledge,
| I'm talking "I just realized there must be a technical
| discipline that handles this issue and I can't google my
| way to it level of questions."
|
| If I'm a student, I have a textbook and the ability to
| read. I'm not asking textbook readable or relevant
| literature readable in the thing I am studying questions
| because I, being in a class on the subject I would "know
| the first thing" to quote my earlier post, that first
| thing being how to get more good and relevant knowledge
| on the thing I am in a class in.
|
| I'm talking about things you don't even know what
| questions to ask to get that foundational knowledge which
| is among the most interesting questions to ask - the
| problem with SO is it only wants me to ask questions in a
| field in which I am already fairly expert but I have just
| hit a temporary stumbling block for some reason.
|
| I remember when I was working on a big government
| security project and there was a Java guy who was an
| expert in a field that I knew nothing about and he would
| laugh and say you can't go to SO and ask about how do I
| ... long bit of technical jargon outside my field that I
| sort of understood hung together, maybe eigenvectors came
| up (this was in 2013)
|
| Second thing, yes I know SO does not want people to ask
| non-factual questions, and it does not want me to ask
| questions in fields in which I am uninformed, so it
| follows it wants me to ask questions that I can probably
| find out myself one way or another, only SO is more
| convenient.
|
| I gave some reasons why I do not find SO particularly
| convenient or useful given their constraints implying
| this is probably the same for others, you said two of my
| reasons were no good, but I notice you did not have any
| input on the underlying question of - why are people not
| asking as many questions on SO as they once did?
| SoftTalker wrote:
| SO is what it is, they have made the choices they made as
| to what questions are appropriate on their platform.
|
| I don't know why SO questions are declining -- perhaps
| people find SO frustrating, as you seem to, and they give
| up. I myself have never posted a question on SO as I
| generally have found that my questions had already been
| asked and answered. And lately, perhaps LLMs are
| providing better avenues for the sorts of questions you
| describe. That seems very plausible to me.
| Ferret7446 wrote:
| > I don't think I've been in any class since elementary
| school in which I did not have foundational knowledge
|
| > If I'm a student, I have a textbook and the ability to
| read
|
| You are such an outlier that I don't think you have the
| awareness to make any useful observations on this topic.
| Quite a lot of students in the US are now starting to
| lack the ability to read, horrifyingly (and it was never
| 100%), and using ChatGPT to do homework is common.
| jakub_g wrote:
| I can see how frustrating it might be, but the overall idea
| of SO is "no duplicates". They don't want to have 1000
| questions which are exactly the same but with slightly
| different phrasing. It can be problematic for total
| newcomers, but at the same time it makes it more useful for
| professionals: instead of having 1000 questions how to X
| with 1 reply, you have one canonical question with 20
| replies sorted by upvotes and you can quickly see which one
| is likely the best.
|
| FWIW, I found LLMs to be actually really good at those
| basic questions where I'm at expert at language X and I ask
| how to do similar thing in Y, using Y's terms (which might
| be named differently in X).
|
| I believe this actually would work well:
|
| - extra basic things, or things that depend on opinion etc:
| ask LLMs and let they infer and steer you
|
| - advanced / off the beaten path questions that LLMs
| hallucinate on: ask on SO
| noirscape wrote:
| The problem SO tends to run into is when you have a
| question that _seems_ like it answers another question on
| the surface (ie. because the question title is bad) and
| then a very different question is closed with the dupe
| reason pointing to that question because the close titles
| are similar.
|
| Since there's no way to appeal duplicate close votes on
| SO until you have a pretty large amount of rep, this
| kinda creates a problem where there's a "silent mass" of
| duplicate questions that aren't really duplicates.
|
| A basic example is this question:
| https://stackoverflow.com/q/27957454 , which is about
| disabling PRs on GitHub on the surface. The body text
| however reveals that the poster is instead asking how
| they can set up branch permissions and get certain
| accounts to bypass them.
|
| I can already assure you that just by basic searching,
| this question will pop up first when you look up
| disabling PRs, and the accepted answer answers the
| question body (which means that it's almost certain a
| different question has been closed as a duplicate of this
| one), rather than the question title. You could give a
| more informative answer (which kinda happened here), but
| this is technically off-topic to the question being
| closed.
|
| That's where SO gets it's bad rep for inaccurate
| duplicate closing from.
| bryanrasmussen wrote:
| >I can see how frustrating it might be
|
| It's certainly not frustrating for me, I ask a question
| maybe once a year on SO, most of their content is, in my
| chosen disciplines, not technically interesting, it is no
| better than looking up code snippets in documentation
| (which most of the time is what it really, really is)
|
| I suppose it's frustrating for SO that people no longer
| find it worthwhile to ask questions there.
|
| >advanced / off the beaten path
|
| show me an advanced and off the beaten path question that
| SO has answered well, that is just not worth the effort
| to try to get an answer - if you have an advanced and off
| the beaten path question that you can't answer then you
| ask it on SO just "in case" but really you will find the
| answer somewhere else or not at all in my experience.
| Izkata wrote:
| > I don't know the first thing about this thing, help me
| get to where I know the first thing. This is not allowed
| any more.
|
| This may have been allowed in like the first year while
| figuring out what kind of moderation worked, but it hasn't
| been as least since I started using it in like 2011. They
| just kept slipping through the cracks because so many
| questions are constantly being posted.
| Ferret7446 wrote:
| The problem is that SO is not a Q&A site although it calls
| itself that (which is admittedly misleading). It is a
| community edited knowledgebase, basically a wiki, where the
| content is Q&As. It just so happens that one method of
| contributing to the site is by writing questions for other
| people to write answers to.
|
| If you ask a question (i.e., add content to the wiki) that
| is not in scope, then of course it will get removed.
| kertoip_1 wrote:
| I don't think it's the main reason. People don't care whether
| someone is selling stuff they create on a platform. Big social
| media has been doing it for many years now e.g. Facebook and
| yet it's still there. You come to SO for answers, why would you
| care that someone is teaching some LLM on them later?
| pessimizer wrote:
| > You come to SO for answers, why would you care that someone
| is teaching some LLM on them later?
|
| This doesn't make the slightest bit of sense. The people who
| would be concerned are the ones who are _providing_ answers.
| They are not coming to SO solely to get answers.
| melenaboija wrote:
| It's been a relief to find a platform where I can ask questions
| without the fear of being humiliated
|
| Half joking, but I am pretty tired of SO pedantry.
| PhilipRoman wrote:
| I haven't really found stackoverflow to be _that_ humiliating
| (compared to some IRC rooms or forums), basic questions get
| asked and answered all the time. But the worst part is when you
| want to do something off the beaten path.
|
| Q: how do I do thing X in C?
|
| A: Why do you need to know this? The C standard doesn't say
| anything about X. The answer will depend on your compiler and
| platform. Are you sure you want to do X instead Y? What version
| of Ubuntu are you running?
| mhh__ wrote:
| I find that this is mainly a problem in languages that
| attract "practical"/"best tool for the job" Philistines. Not
| going to name names right now but I had never really
| experienced this until I started using languages from a
| certain Washington based software company.
| appendix-rock wrote:
| God. Yeah. I've always hated #IAMPolicies on Freenode :)
| haolez wrote:
| The first time that I asked a question on #cpp @Freenode was
| a unique experience for my younger self.
|
| My message contained greetings and the question in the same
| message. I was banned immediately and the response from the
| mods was:
|
| - do not greet; we don't have time for that bullshit
|
| - do not use natural language questions; submit a test case
| and we will understand what you mean through your code
|
| - do not abbreviate words (you have abbreviated "you" as
| "u"); if you do not have time to type the words, we do not
| have time to read them
|
| The ban lasted for a week! :D
| wccrawford wrote:
| A one week ban on the first message is clearly gatekeeping.
| What a bunch of jerks. A 1 hour ban would have been a _lot_
| more appropriate, and escalate from there if the person can
| 't follow the rules.
|
| Don't even get me started about how dumb rule 2 is, though.
| And rule 3 doesn't even work for normal English as _many_
| things are abbreviated, e.g. this example.
|
| And of course, you didn't greet and wait, you just put a
| pleasantry in the same message. Jeez.
|
| I'm 100% sure I'd never have gone back after that rude ban.
| GeoAtreides wrote:
| "I'm 100% sure I'd never have gone back after that rude
| ban."
|
| upon saying this, the young apprentice was enlightened
| luckylion wrote:
| > And of course, you didn't greet and wait, you just put
| a pleasantry in the same message. Jeez.
|
| I'm pretty sure that "rule" was more aimed towards "just
| ask your question" rather than "greet, make smalltalk,
| then ask your question".
|
| I have similar rules, though I don't communicate them as
| aggressively, and don't ban people for breaking them, I
| just don't reply to greetings coming from people I know
| aren't looking to talk to me to ask me how I've been.
| It's a lot easier if you send the question you have
| instead of sending "Hi, how are you?" and then wait for 3
| minutes to type out your question.
| bqmjjx0kac wrote:
| > do not use natural language questions
|
| That is really absurd! AFAIK, it is not possible to pose a
| question to a human in C++.
|
| This level of dogmatism and ignorance of human
| communication reminds me of a TL I worked with once who
| believed that their project's C codebase was "self-
| documenting". They would categorically reject PRs that
| contained comments, even "why" comments that were
| legitimately informative. It was a very frustrating
| experience, but at least I have some anecdotes now that are
| funny in retrospect.
| ravenstine wrote:
| Self-documenting code is one of the worst ideas in
| programming. Like you, I've had to work with teams where
| my PRs would be blocked until I removed my comments. I'm
| not talking pointless comments like "# loop through the
| array" but JSdoc style comments describing why a function
| was needed.
|
| I will no longer work anywhere that has this kind of
| culture.
| seattle_spring wrote:
| Hard to agree or disagree without real examples. I've
| worked with people who insist on writing paragraphs of
| stories as comments on top of some pretty obviously self-
| descriptive code. In those cases, the comments were
| indeed just clutter that would likely soon be out of date
| anyway. Conversely, places that need huge comments like
| that usually should just be refactored anyway. It's
| pretty rare to actually need written comments to explain
| what's going on when the code is written semantically and
| thoughtfully.
| SunlitCat wrote:
| That contradiction is funny, tho:
|
| > - do not greet; we don't have time for that bullshit
|
| and
|
| > do not abbreviate words (you have abbreviated "you" as
| "u"); if you do not have time to type the words, we do not
| have time to read them
|
| So they have apparently enough time to read full words, it
| seems!
| beeboobaa3 wrote:
| Yh u gtta b c00l w abbrvs
| 6510 wrote:
| The noobs don't got how we get where we get?
|
| edit: I remember how some communities changed into: The
| help isn't good enough, you should help harder, I want
| you to help me by these conventions. Then they leave
| after getting their answer and no one has seen them ever
| again rather than join the help desk.
| sokoloff wrote:
| I think reading "u" takes longer than reading "you".
|
| With "u", I have to pause for a moment and think "that's
| not a normal word; I wonder if they meant to type 'i'
| instead (and just hit a little left of target)?" and then
| maybe read the passage twice to see which is more likely.
|
| I don't think it's quite as much a contradiction. (It
| still could be more gruff than needed.)
| kfajdsl wrote:
| probably a generational thing, I process that and other
| common texting/internet abbreviations exactly like normal
| words.
| tinco wrote:
| I learned a bunch of programming languages on IRC, and the
| C and C++ communities on freenode were by far the most
| toxic I've encountered.
|
| Now that Rust is succesfully assimilating those
| communities, I have noticed the same toxicity on less well
| moderated forums, like the subreddit. The Discord luckily
| is still great.
|
| It's probably really important to separate the curmudgeons
| from the fresh initiates to provide an enjoyable and
| positive experience for both groups. Discord makes that
| really easy.
|
| In the Ruby IRC channel curmudgeons would simply be shot
| down instantly with MINASWAN style arguments. In the
| Haskell IRC channel I guess it was basically accepted that
| everyone was learning new things all the time, and there
| was always someone willing to teach at the level you were
| trying to learn.
| betaby wrote:
| Not my experience. IRC was 'toxic' since forever, but
| that't not a toxicity, that's inability to read emotion
| through transactional plan text. Once one account that in
| the mental model IRC is just fine.
| jancsika wrote:
| This being HN, I'd love to hear from one of the many IRC
| channel mods who literally typed (I'd guess copy/pasted)
| this kind of text into their chat room topics and auto-
| responders.
|
| If you're out there-- how does it feel to know that what
| you meant as a efficient course-correction for newcomers
| was instead a social shaming that cut so deep that the
| message you wrote is still burned _verbatim_ into their
| memory after all these years?
|
| To be clear, I'm taking OP's experience as a common case of
| IRC newbies at that time on many channels. I certainly
| experienced something like it (though I can't remember the
| exact text), and I've read many others post on HN about the
| same behavior from the IRC days.
|
| Edit: clarifications
| CogitoCogito wrote:
| > was instead a social shaming that cut so deep that the
| message you wrote is still burned verbatim into their
| memory after all these years?
|
| Maybe that was the point?
| haolez wrote:
| To be fair, after the ban expired, I started submitting
| the test cases as instructed and the community was very
| helpful under these constraints.
| HPsquared wrote:
| I think a lot of unpaid online discussion forum
| moderation volunteers get psychic profit from power
| tripping.
| hinkley wrote:
| Give a man a little power.
| hanniabu wrote:
| My questions always get closed and marked as a duplicate with
| a comment linking to a question that's unrelated
| elicksaur wrote:
| On the other hand, I find it to be a fatal flaw that LLMs
| can't say, "Hey you probably don't actually want to do it
| that way."
| stemlord wrote:
| Sure but most common LLMs aren't going to be patronizing
| and presumptuous while they say so
| Ekaros wrote:
| I always wonder about that. Very often it seems that you
| need to be able to LLM that they are wrong. And then they
| happily correct themselves. But if you do not know that the
| answer is wrong how can you get correct answer?
| o11c wrote:
| Worse: if you _think_ the LLM is wrong and try to correct
| it, it will happily invent something completely different
| (and actually wrong this time).
| milesvp wrote:
| This happened to me the other day. I had framed a
| question in the ordinal case, and since I was trying to
| offload thinking anyways, I forgot that my use case was
| rotated, and failed to apply the rotation when testing
| the LLM answer. I corrected it twice before it wrapped
| around to the same (correct) previous answer, and that's
| when I noticed my error. I apologized, added the rotation
| piece to my question, and it happily gave me a verifiably
| correct answer.
| rytis wrote:
| I think it depends on how the question is constructed: - I
| want to do X, how do I do it? - I was thinking of doing X
| to achieve Y, wonder if that's a good idea?
|
| Sometimes, I really want to do X, I know it may be
| questionable, I know the safest is "probably don't want to
| do it", and yet, that's not someone else's (or LLMs)
| business, I know exactly what I want to do, and I'm asking
| if anyone knows HOW, not IF.
|
| So IMO it's not a flaw, it's a very useful feature, and I
| really do hope LLMs stay that way.
| skywhopper wrote:
| I mean, those all sound like good questions. You might be a
| super genius, but most people who ask how to do X actually
| want to do Y. And if they DO want X, then those other
| questions about compiler and OS version really matter. The
| fact that you didn't include them in your question shows you
| aren't really respecting the time of the experts on the
| platform. If you know you are doing something unusual, then
| you need to provide a lot more context.
| ElFitz wrote:
| It's also quite fun when you ask niche questions that haven't
| been asked or answered yet ("How do I do X with Y?"), and
| just get downvoted for some reason.
|
| That's when I stopped investing any effort into that
| community.
|
| Turned out it, counter-intuitively, was impossible. And not
| documented anywhere.
| bayindirh wrote:
| Yes, immature people are everywhere, but SO took it to a new
| level before they _had to_ implement a code of conduct. I
| remember asking questions and getting "this is is common
| misconception, maybe you're looking for X instead" type of
| _actually helpful and kind_ answers.
|
| After some point it came to a point that if you're not asking
| a complete problem which can't be modeled as a logic
| statement, you're labeled as stupid for not knowing better.
| The thing is, if I knew better or already found the answer,
| I'd not be asking to SO in the first place.
|
| After a couple of incidents, I left the place for the better.
| I can do my own research, and share my knowledge elsewhere.
|
| Now they're training their and others models with that
| corpus, I'll never add a single dot to their dataset.
| chii wrote:
| > Q: how do I do thing X in C?
|
| SO does suck, but i've found that if you clarify in the
| question what you want, and pre-empt the Y instead of X type
| answers, you will get some results.
| PhilipRoman wrote:
| I wish... Some commenters follow up with "Why do you think
| Y won't work for you?"
| d0mine wrote:
| The major misunderstanding is that SO exists to help the
| question author first. It is not an IRC. The most value comes
| from googling a topic and getting existing answers on SO.
|
| In other words, perhaps in your very specific case, your
| question is not XY problem but for the vast majority of
| visitors from google it won't be so.
| https://en.wikipedia.org/wiki/XY_problem
|
| Personally, I always answered SO from at least two
| perspectives: how the question looks for someone coming from
| google and how the author might interpret it.
| LeadB wrote:
| For the major programming languages, it must be a pretty
| esoteric question if it does not have an answer yet.
|
| Increasingly, the free products of experts are stolen from them
| with the pretext that "users need to be protected". Entire open
| source projects are stolen by corporations and the experts are
| removed using the CoC wedge.
|
| Now SO answers are stolen because the experts are not trained
| like hotel receptionists (while being short of time and
| unpaid).
|
| I'm sure that the corporations who steal are very polite and
| CoC compliant, and when they fire all developers once an AGI is
| developed, the firing notices will be in business speak,
| polite, express regret and wish you all the best in your future
| endeavors!
| appendix-rock wrote:
| I'm sorry that you ran afoul of a CoC or whatever, but this
| sounds like a real 'airing dirty laundry' tangent.
| lrpanb wrote:
| One man's tangent is another man's big picture. It may be
| the case of course that some people guilty of CoC overreach
| are shaking in their boots right now because they went
| further than their corporations wanted them to go.
| stemlord wrote:
| Hm fair point. Rudeness is actually a sign of humanity. Like
| that one black mirror episode
| grugagag wrote:
| Fair but for as long rudness is not the dominant mode.
| tlogan wrote:
| The main issue with Stack Overflow (and similar public Q&A
| platforms) is that many contributors do not know what they do
| not know, leading to inaccurate answers.
|
| Additionally, these platforms tend to attract a fair amount of
| spam (self promotion etc) which can make it very hard to find
| high-quality responses.
| cjauvin wrote:
| I find that LLMs are precisely that: marvelous engines to
| explore "what you don't know that you don't know", about
| anything.
| milesvp wrote:
| I'm not sure how to take you comment, but I feel the
| same(?) way. I love that I can use LLMs to explore topics
| that I don't know well enough to find the right language to
| get hits on. I used to be able to do this with google,
| after a few queries, and skimming to page 5 hits, I'd
| eventually find the one phrase that cracks open the topic.
| I haven't been able to do that with google for at least 10
| years. I do it regularly with LLMs today.
| amenhotep wrote:
| They are extraordinarily useful for this! "Blah blah blah
| high level naive description of what I want to know
| about, what is the term of art for this?"
|
| Then equipped with the right term it's way easier to find
| reliable information about what you need.
| delichon wrote:
| I've gotten answers from OpenAI that were technically correct
| but quite horrible in the longer term. I've gotten the same
| kinds of answers on Stack Overflow, but there other people
| are eager to add the necessary feedback. I got the same
| feedback from an LLM but only because in that case I knew
| enough to ask for it.
|
| Maybe we can get this multi-headed advantage back from LLMs
| by applying a team of divergent AIs to the same problem. I've
| had other occasions when OpenAI gave me crap that Claude
| corrected, and visa versa.
| zmgsabst wrote:
| You can usually even ask the same LLM:
|
| - do a task
|
| - criticize your job on that task
|
| - redo that task based on criticism
|
| I find giving the LLM a process greatly improves the
| results.
| blazing234 wrote:
| How do you know the second result is correct? Or the
| third? Or the fourth?
| phil-martin wrote:
| I approach it the same way as the things I build myself -
| testing and measuring.
|
| Although if I'm truly honest with myself, even after many
| years of developing, the true cycle of me writing code
| is: over confidence, then shock it didn't work 100% the
| first time, wondering if there is a bug in the compiler,
| and then reality setting in that of course the compiler
| is fine and I just made my 15th off-by-one error of the
| day :)
| roughly wrote:
| What's fun is that you can skip step 1. The LLM will
| happily critique its own nonexistent output.
| iSnow wrote:
| That's a smart idea I didn't think of.
|
| I've been arguing with Copilot back and forth where it
| gave me a half-working solution that seemed overly
| complicated but since I was new to the tech used, I
| couldn't say what exactly was wrong. After a couple of
| hours, I googled the background and trust my instinct and
| was able to simplify the code.
|
| At that situation, where I iteratively improved the
| solution by telling Copilot things seem to complicated
| and this or that isn't working. That led the LLM to
| actually come back with better ideas. I kept asking
| myself why something like you propose isn't baked into
| the system.
| drawnwren wrote:
| The papers I've read have shown LLM critics to be quite
| bad at their work. If you give an LLM a few known good
| and bad results, I think you'll see the LLM is just as
| likely to make good results bad as it is to make bad
| results good.
| internet101010 wrote:
| Medium is even worse about this. It's more self-promotion
| than it is common help.
| n_ary wrote:
| Begin rant.
|
| I don't want to be that guy saying this, but 99% of the top
| results on google from Medium related to anything technical
| is literally the reworded/reframed version of the official
| quick start guide.
|
| There are some very rare gems, but it is hard to find those
| among the above mentioned ocean of reworded quick starts
| disguised as "how to X", "fixing Y". Almost reminds me of
| the SEO junks when you search "how to restart iPhone" and
| find answers that dance around letting it die from battery
| drain and then charge, install this software, take it to
| the apple repair shop, go to settings and traverse many
| steps while not saying that if you are between these models
| use the power+volume up button trick.
|
| End of rant.
| bee_rider wrote:
| Somebody who just summarizes tutorials can write like 10
| medium posts, in the time it takes an actual practitioner
| to do something legitimately interesting.
| n_ary wrote:
| Well said. Most great articles I found on Medium are
| actually very old hence do not rank well.
| bee_rider wrote:
| QA platforms and blogging platforms both seem to have
| finite lifespans. QA forums (Stack Overflow, Quora, Yahoo
| answers) do seem to last longer, need to be moderated
| pretty aggressively unless they are going to turn into
| homework help platforms.
|
| Blogging platforms are the worst though. Medium looked
| pretty OK when it first came out. But now it is just a
| platform for self-promotion. Substack is like 75% of the
| way through that transition IMO.
|
| People who do interesting things spend most of their time
| doing the thing. So, non-practicing bloggers and other
| influencers will naturally overwhelm the people who
| actually have anything interesting to report.
| mrkramer wrote:
| >The main issue with Stack Overflow (and similar public Q&A
| platforms) is that many contributors do not know what they do
| not know, leading to inaccurate answers.
|
| The best Q&A platform would be the one where experts and
| scientists answer questions but sites like Wikipedia and
| Reddit showed that broad range of audience can also be pretty
| good at providing useful information and moderating it.
| TacticalCoder wrote:
| What you mention has been serious from day one indeed.
|
| But to me the worst issue is it's now "Dead Overflow": most
| answers are completely, totally and utterly outdated. And
| seen that they made the mistake of having the concept of an
| "accepted answer" (which should never have existed), it only
| makes the issue worse.
|
| If it's a question about things that don't change often, like
| algorithms, then it's OK. But for anything "tech", technical
| rot is a very real thing.
|
| To me SO has both outdated _and_ inaccurate answers.
| beeboobaa3 wrote:
| I'm sorry but the funny thing is, the only people I've ever
| seen complain about SO are people who don't know how to search.
| wokwokwok wrote:
| Everyone has a pet theory about what's wrong with SO; but
| here's the truth:
|
| Whatever they're doing, it isn't working.
|
| Blame mods. Blame AI. Blame askers... whatever man.
|
| That is a sinking ship.
|
| If you don't see people complain about SO, it's because they
| aren't using it, not because they're using the search.
|
| Pretty hard to argue at this point that the problem _is with
| the users_ being too shit to use the platform.
|
| That's some high level BS.
| wwweston wrote:
| I get useful info from SO all the time, so often that these
| days it's rare I have to ask a question. When I do, the
| issue seems to be it's likely niche enough that an answer
| could take days or weeks, which is too bad, but fair
| enough. It's also rare I can add an answer these days but
| I'm glad when I can.
| wholinator2 wrote:
| We're talking about stackoverflow right? The website is a
| veritable gold mine of carefully answered queries. Sure,
| some people are shit, but how often are you unable to get
| at least some progress on a question from it? I find it
| useful in 90-95% of queries, i find the answers useful in
| 99% of queries that match my question. The thing is
| amazing! I Google search a problem, and there's 5 threads
| of people with comparable issues, even if no one has my
| exact error, the debugging and advice around the related
| errors is almost always enough to get me over the hump.
|
| Why all the hate? AI answers can suck, definitely.
| Stackoverflow literally holds the modern industry up. Next
| time you have a technical problem or error you don't
| understand go ahead and avoid the easy answers given on the
| platform and see how much better the web is without it. I
| don't understand, what kind questions do you have?
| mvdtnz wrote:
| Nobody is criticising the content that is on the site.
| The problem is an incredibly hostile user base that will
| berate you if you don't ask your question in the exact
| right way, or if you ask a question that implies a
| violation of some kind of best practice (for which you
| don't provide context because it's irrelevant to the
| question).
|
| As for the AI, it can only erode the quality of the
| content on SO.
| grogenaut wrote:
| I get good answers all the time on SO or used to. My
| problem is that I've been down voted several times for
| "stupid question" and also been down voted for not knowing
| what I was talking about in an area I'm an expert in.
|
| I had one question that was a bit odd and went against
| testing dogma that I had a friend post. He pulled it 30
| minutes later as he was already down 30 votes. It was a
| thing that's not best practice in most cases but also in
| certain situations the only way to do it. Like when you're
| testing apis you don't control.
|
| In some sections people also want textbook or better
| quality answers from random strangers on the internet.
|
| The final part is that you at least used to have to build
| up a lot of karma to be able to post effectively or at all
| in some sections or be seen. Which is very catch 22.
|
| So it can be both very useful and very sh*t.
| fabian2k wrote:
| -30 votes would be extremely unusual on SO. That amount
| of votes even including upvotes in such a short time
| would be almost impossible. The only way you get that
| kind of massive voting is either if the question hits the
| "Hot Network Questions" or if an external site like HN
| with a high population of SO users links to it and drives
| lots of traffic. Questions with a negative score won't
| hit the hot network questions, so it seems very unlikely
| to me that it could be voted on that much.
| wizzwizz4 wrote:
| You can get +30 from the HNQ list, but -30 is much
| harder, because the association bonus only gives you 101
| rep, and the threshold for downvoting is 125.
| o11c wrote:
| I don't think I've ever seen _anything_ , no matter how
| bad, go below -5, and most don't go below -1. Once a
| question is downvoted:
|
| - it's less likely that the question even gets shown
|
| - it's less likely that people will even click on it
|
| - it's less likely that people who think it's bad will
| bother to vote on it, since the votes are already doing
| the right thing
|
| - if it's really bad, it will be marked for deletion
| before it gets that many downvotes anyway
|
| SO has its problems but I don't even recognize half the
| things people complain about.
| barbecue_sauce wrote:
| But what problem is there with it? Most of the important
| questions have been answered already.
| Ferret7446 wrote:
| I submit that what SO is doing is working; it's just that
| SO is not what some people want it to be.
|
| SO is _not_ a pure Q &A site. It is essentially a wiki
| where the contents are formatted as Q&As, and asking
| questions is merely a method to contribute toward this
| wiki. This is why, e.g., duplicates are aggressively
| culled.
| herval wrote:
| The flipside to this is you can't get answers to anything
| _recent_, since the models are trained years behind in content.
| My feelig is it's getting increasingly difficult to figure out
| issues on the latest version of libraries & tools, as the only
| options are private Discords (which aren't even googleable)
| Vegenoid wrote:
| I think that knowledge hoarding may come back with a
| vengeance with the threat people feel from LLMs and
| offshoring.
| chairmansteve wrote:
| Yep. For SO, the incentive was a high reputation. But now
| an LLM is stealing your work, what's the point?
| yieldcrv wrote:
| The models come out fast enough
|
| Doesn't seem to be a great strategy to always need these
| things retrained, but OpenAI's o1 has things from early 2024
|
| Don't ask about knowledge cutoffs anymore, that's not how
| these things are trained these days. They don't know their
| names or the date.
| herval wrote:
| Not my daily experience. It's been impossible to get
| relevant answers to questions on multiple languages and
| frameworks, no matter the model. O1 frequently generates
| code using deprecated libraries (and is unable to fix it
| with iteration).
|
| Not to mention there will be no data for the model to learn
| the new stuff anyway, since places like SO will get zero
| responses with the new stuff for the model to crawl
| yieldcrv wrote:
| Yes I encounter that too but for things in just the last
| few months with o1
|
| It is really difficult if you need project flags and
| configurations to make things work, instead of just code
|
| Github issues gets crawled, where many of these
| frameworks have their community
| lynx23 wrote:
| Full ACK. It has been liberating to be able to chat about a
| topic I always wanted to catch up on. And, even though I read a
| lot of apologies, at least nobody is telling me "Thats not what
| you actually want."
| Vegenoid wrote:
| Yeah, Stackoverflow kinda dug their own grave by making their
| platform and community very unpleasant to engage with.
| lynx23 wrote:
| Well, I believe the underlying problem of platforms like
| StackOverflow, ticketing systems (in-house and public) and
| even CRMs is not really solvable. The problem is, the quality
| of an answer is actually not easy to determine. All the
| mechanisms we have are hacks, and better solutions would need
| more resources... which leads to skewed incentives, and
| ultimately to a "knwoledge" db thats actually not very good.
| People are incentiviszed to collect karma points, or whatever
| it is. But these metrics are not really resembling the
| quality of their work... Crowdsourcing this mechanisms via
| upvotes or whatever does also not really work, because
| quantity is not quality... As said, I believe this is a
| problem we can not solve.
| Aurornis wrote:
| Many of the forums I enjoyed in the past have become heavily
| burdened by rules, processes, and expectations. They are
| frequented by people who spend hours every day reading
| everything and calling out any misstep.
|
| Some of them are so overburdened that navigating all of the
| rules and expectations becomes a skill in itself. A single
| innocent misstep turns simple questions into lectures about how
| you've violated the rules.
|
| One Slack I joined has created a Slackbot to enforce these
| rules. It became a game in itself for people to add new rules
| to the bot. Now it triggers on a large dictionary of
| problematic words such as "blind" (potentially offensive to
| people with vision impairments. Don't bother discussing
| poker.). It gives a stern warning if anyone accidentally says
| "crazy" (offensive to those with mental health problems) or
| "you guys" (how dare you be so sexist).
|
| They even created a rule that you have to make sure someone
| wants advice about a situation before offering it, because a
| group of people decided it was too presumptuous and potentially
| sexist (I don't know how) for people to give advice when the
| other person may have only wanted to vent. This creates the
| weirdest situations where someone posts a question in channels
| named "Help and advice" and then lurkers wait to jump on anyone
| who offers advice if the question wasn't explicitly phrased in
| a way that unequivocally requested advice.
|
| It's all so very tiresome to navigate. Some people appear to
| thrive in this environment where there are rules for
| everything. People who memorize and enforce all of the rules on
| others get to operate a tiny little power trip while opening an
| opportunity to lecture internet strangers all day.
|
| It's honestly refreshing to go from that to asking an LLM that
| you know isn't going to turn your question into a lecture on
| social issues because you used a secretly problematic word or
| broke rule #73 on the ever growing list of community rules.
| abraae wrote:
| > Some people appear to thrive in this environment where
| there are rules for everything. People who memorize and
| enforce all of the rules on others get to operate a tiny
| little power trip while opening an opportunity to lecture
| internet strangers all day.
|
| Toddlers go through this sometimes around ages 2 or 3. They
| discover the "rules" for the first time and delight in
| brandishing them.
| Ferret7446 wrote:
| The reason those rules are created is because at some point
| something happened that necessitated that rule. (Not always
| of course, there are dictatorial mods.)
|
| The fundamental problem is that communities/forums (in the
| general sense, e.g., market squares) don't scale, period.
| Because moderation and (transmission and error correction of)
| social mores don't scale.
| Aurornis wrote:
| > The reason those rules are created is because at some
| point something happened that necessitated that rule. (Not
| always of course, there are dictatorial mods.)
|
| Maybe initially, but in the community I'm talking about
| rules are introduced to prevent situations that might
| offend someone. For example, the rule warning against using
| the word "blind" was introduced by someone who thought it
| was a good thing to do in case a person with vision issues
| maybe got offended by it at some point in the future.
|
| It's a small group of people introducing the rules.
| Introducing a new rule brings a lot of celebration for the
| person's thoughtfulness and earns a lot of praise and
| thanks for making the community safer. It's turned into a
| meta-game in itself, much like how I feel when I navigate
| Stack Overflow
| jneagu wrote:
| I am very curious to see how this is going to impact STEM
| education. Such a big part of an engineer's education happens
| informally by asking peers, teachers, and strangers questions.
| Different groups are more or less likely to do that
| consistently (e.g.
| https://journals.asm.org/doi/10.1128/jmbe.00100-21), and it can
| impact their progress. I've learned most from publicly asking
| "dumb" questions.
| ocular-rockular wrote:
| It won't. If you look at advanced engineering/mathematics
| material online it is abysmal in quality of actually
| "explaining" the content. Most of the learning and
| understanding of intricacies happens via dialogue with
| professors/mentors/colleagues/etc.
|
| That said, when that is not available, LLMs do an excellent
| job or rubber ducky-ing complicated topics.
| jneagu wrote:
| To your latter point - that's where I think most of the
| value of LLMs in education is. They can explain code beyond
| the educational content that's already available out there.
| They are pretty decent at finding and explaining code
| errors. Someone who's ramping up their coding skills can
| make a lot of progress with those two features alone.
| ocular-rockular wrote:
| Yeah... only downside is that it requires a level of
| competency to recognize when the LLM is shoveling shit
| instead of gold.
| amarcheschi wrote:
| I've found chatgpt quite helpful in understanding some things
| that I couldn't figure out when approaching pytorch for an
| internship
| teeray wrote:
| I feel like this will be really beneficial in work
| environments. LLMs provide a lot of psychological safety when
| asking "dumb" questions that your coworkers might judge you
| for.
| btbuildem wrote:
| At the same time, if I coworker comes asking me for something
| _strange_, my first response is to gently inquire as to the
| direction of their efforts instead of helping them find an
| answer. Often enough, this ends up going back up their "call
| stack" to some goofy logic branch, which we then together
| undo, and everyone is pleased.
| MASNeo wrote:
| Wondering about wider implications. If technical interactions
| reduce online, how about RL and how do we rate a human competence
| against an AI once society gets a habit from asking an AI first?
| Will we start to constantly question human advice or responses
| and what does that do to the human condition.
|
| I am active in a few specialized fields and already I have to
| defined my advice against poorly crafted prompt responses.
| VancouverMan wrote:
| > Will we start to constantly question human advice or
| responses and what does that do to the human condition.
|
| I'm surprised when people don't already engage in questioning
| like that.
|
| I've had to be doing it for decades at this point.
|
| Much of the worst advice and information I've ever received has
| come from expensive human so-called "professionals" and
| "experts" like doctors, accountants, lawyers, financial
| advisors, professors, journalists, mechanics, and so on.
|
| I now assume that anything such "experts" tell me is wrong, and
| too often that ends up being true.
|
| Sourcing information and advice from a larger pool of online
| knowledge, even if the sources may be deemed "amateur" or
| "hobbyist" or "unreliable", has generally provided me with far
| better results and outcomes.
|
| If an LLM is built upon a wide base of source information, I'm
| inclined to trust what it generates more than what a single
| human "professional" or "expert" says.
| wizzwizz4 wrote:
| These systems behave _more like_ individual experts than like
| the internet - except they 're more likely to be wrong than
| an expert is.
| toofy wrote:
| does this mean you trust complete randoms just as much?
|
| if i need advice on repairing a weird unique metal piece on a
| 1959 corvette, im going to trust the advice of an expert in
| classic corvettes way before i trust the advice of my barber
| who knows nothing about cars but confidently tells me to
| check the tire pressure.
|
| this "oh no, experts have be wrong before" we see so much is
| wild to me. in nuanced fields i'll take the advice of experts
| any day of the week waaaaaay before i take the advice from
| someone who's entire knowledge of topic comes from a couple
| twitter post and a couple of youtube's but their rhetoric
| sounds confident. confidently wrong dipshits and sophists are
| one of the plagues of the modern internet.
|
| in complex nuanced subjects are experts wrong sometimes?
| absofuckinlutely. in complex nuanced subjects are they
| correct more often than random "did-my-own-research-
| for-20-minutes-but-got-distracted-because-i-can't-focus-for-
| more-than-3-paragraphs-but-i-sound-confident guy?"
| absofuckinlutely.
| bloomingkales wrote:
| Guess we need an Agent that logs and re-contributes to
| Stackoverflow (for example) automatically.
|
| Then also have agents that automatically give upvotes for used
| solutions. Weird world.
|
| I'm just imagining the precogs talking to each other in Minority
| Report if that makes sense.
| mrcino wrote:
| By Public knowledge sharing, do they mean bazillions of
| StackOverflow duplicates?
| knotimpressed wrote:
| The article mentions that all kinds of posts were reduced, not
| just duplicates or even simple questions.
| Havoc wrote:
| I'd imagine they also narrow the range of knowledge and discourse
| in general.
|
| A bit like if you ask an LLM to tell you a joke they all tend to
| go with the same one
| rq1 wrote:
| People should just share their conversations with the LLMs online
| no?
|
| This would blogging 5.0. Or web 7.0.
| qntmfred wrote:
| That's pretty much what my youtube channel is turning into.
| just me talking to myself with chatgpt as co-host
|
| eg https://www.youtube.com/watch?v=kB59Bz-F04E
| SunlitCat wrote:
| Well, I just asked ChatGPT to answer my "How to print hello
| world in c++" with a typical stack overflow answer.
|
| Lo and behold, the answer is very polite, explanative and even
| lists common mistakes. It even added two very helpful user
| comments!
|
| I asked it again how this answer would look in 2024 and it just
| updated the answer to the latest c++ standard!
|
| Then! I asked it what a moderator would say when they chime in.
| Of course the moderator reminded everyone to stay on focus
| regarding the question, avoid opinions and back their answer by
| documentation or standards. In the end the mod thanked for
| everyone's contribution and keeping the discussion
| constructive!
|
| Ah! What a wonderful world ChatGPT is living at! I want to be
| there too!
| verdverm wrote:
| For me, many of my questions about open source projects have
| moved to GitHub and Discord, so there is platform migration
| besides LLMs. I also tend to start with Gemini for more general
| programming things, because it will (1) answer in the terms of my
| problem instead of me having to visit multiple pages to piece it
| together, or (2) what it's wrong, I often get better jump off
| points when searching. Either way, LLMs save me time instead of
| having to click through to SO multiple times because the title is
| close but the content as an important difference
| joshdavham wrote:
| > many of my questions about open source projects have moved to
| GitHub and Discord
|
| Exact same experience here. Plus, being able to talk to
| maintainers directly has been great!
| klabb3 wrote:
| No doubt that discord has struck a good balance. Much better
| than GitHub imo. Both for maintainers to get a soft
| understanding of their users, and equally beneficial for
| users who can interact casually without being shamed for
| filing an issue the wrong way.
|
| There's some weird blind spot with techies who are unable to
| see the appeal. UX matters in a "the medium is the
| message"-kind of way. Also, GitHub is only marginally more
| open than discord. It's indexable at the moment, yes, but
| would not surprise me at all if MS is gonna make an offensive
| move to protect "their" (read our) data from AI competitors.
| verdverm wrote:
| Chat is an important medium, especially as new generations
| of developers enter the field (they are more chat native).
| It certainly offers a more comfortable, or appropriate
| place, to ask beginner questions, or have quick back-n-
| forths, than GitHub issues/discussions offers. I've always
| wondered why GH didn't incorporate chat, seems like a big
| missed opportunity.
| joshdavham wrote:
| > I've always wondered why GH didn't incorporate chat
|
| I've been wondering the same thing recently. It's really
| inefficient for me to communicate with my fellow
| maintainers through Github discussions, issues and pull
| request conversations so my go-to has been private
| discord conversations. This is actually kind of
| inefficent since most open source repos will always have
| a bigger community on github vs on discord (not to
| mention that it's a hassle when some maintainers are
| Chinese and don't have access to Discord...)
| kertoip_1 wrote:
| Both of those platforms are making answers harder to find.
| For me, a person used to getting the correct answer in
| Stackoverflow right away, scrolling through endless GitHub
| discussions is a nightmare. Aren't we just moving backwards?
| baq wrote:
| 2022: Discord is not indexed by search engines, it sucks
|
| 2024: Discord is not indexed by AI slop generators, it's great
| verdverm wrote:
| It's more that Discord is replacing Slack as the place where
| community happens. Less about about indexing, which still
| sucks even in Discord search. Slack/Salesforce threw a lot of
| small projects under the bus, post-acquisition, with the
| reductions to history from message count to 90 days
| throwaway918299 wrote:
| Discord stores trillions of messages. If they haven't figured
| out how to make a slop generator out of it yet, I'm sure it's
| coming soon.
| atomic128 wrote:
| Eventually, large language models will be the end of open source.
| That's ok, just accept it.
|
| Large language models are used to aggregate and interpolate
| intellectual property.
|
| This is performed with no acknowledgement of authorship or
| lineage, with no attribution or citation.
|
| In effect, the intellectual property used to train such models
| becomes anonymous common property.
|
| The social rewards (e.g., credit, respect) that often motivate
| open source work are undermined.
|
| That's how it ends.
| yapyap wrote:
| no it won't, it'll just make it more niche than it already is.
| atomic128 wrote:
| LLM users are feeding their entropy into the model, and
| paying for the privilege.
|
| These LLM users produce the new training data. They are being
| assimilated into the tool.
|
| This is the future of "open source": Anonymous common
| property continuously harvested from, and distributed to, LLM
| users.
| zmgsabst wrote:
| Why wouldn't you use LLMs to write even more open source?
|
| The cost of contributions falls dramatically, eg, $100 is 200M
| tokens of GPT-3.5; so you're talking enough to spend 10,000
| tokens developing each line of a 20kloc project (amortized).
|
| That's a moderate project for a single donation and an
| afternoon of managing a workflow framework.
| atomic128 wrote:
| What you're describing is "open slop", and yes, there will be
| a lot of it.
|
| Open source as we know it today, not so much.
| gspr wrote:
| I don't understand this take.
|
| If LLMs will be the end of open source, then they will
| constitute that end for exactly the reason you write:
|
| > Large language models are used to aggregate and interpolate
| intellectual property.
|
| > This is performed with no acknowledgement of authorship or
| lineage, with no attribution or citation.
|
| > In effect, the intellectual property used to train such
| models becomes anonymous common property.
|
| And if those things are true and allowed to continue, then
| _any_ IP relying on copyright is equally threatened. That could
| of course be the case, but it 's hardly unique to open source.
| Open source is no different, here. Or are you suggesting that
| non-open-source copyrighted material (code or otherwise) is
| protected by keeping the "source" (or equivalent) secret? Good
| luck making money on that blockbuster movie if you don't dare
| show it to anyone, or that novel if you don't dare let people
| read it.
|
| > The social rewards (e.g., credit, respect) that often
| motivate open source work are undermined.
|
| First of all: Those aren't the only social rewards that
| motivate open source work. I'd even wager they aren't the most
| common motivators. Those rewards seem more like the image that
| actors that try to social-network-ify or gamify open source
| work want to paint.
|
| Second: Why would those things go away? The artistic joy that
| drives a portrait painter didn't go away when the camera was
| invented. Sure, the pure monetary drive might suffer, but that
| drive is perhaps the drive that's _least_ specific to open
| source work.
| A4ET8a8uTh0 wrote:
| << Why would those things go away?
|
| I think that is because, overall, the human nature does not
| change that much.
|
| << Open source is no different, here. Or are you suggesting
| that non-open-source copyrighted material (code or otherwise)
| is protected by keeping the "source" (or equivalent) secret?
| Good luck making money on that blockbuster movie if you don't
| dare show it to anyone, or that novel if you don't dare let
| people read it.
|
| You may be conflating several different media types and we
| don't even know what the lawsuit tea leaves will tell us
| about that kind of visual/audio IP. As far as code goes, I
| think most companies have already shown how they protect
| themselves from 'open' source code.
| joshdavham wrote:
| With that being said, I imagine the quality of the questions have
| also improved quite a bit. I definitely don't condone the rude
| behaviour on SO, but I also understand that the site used to be
| bombarded constantly with low quality questions that now
| thankfully LLMs can handle.
| okoma wrote:
| The authors claim that LLM are reducing public knowledge sharing
| and that the effect is not merely displacing duplicate, low-
| quality, or beginner-level content.
|
| However their claim is weak and the effect is not quite as
| sensational as they make it sound.
|
| First, they only present Figure 3 and not regression results for
| their suggested tests of LLMs being substitutes of bad quality
| posts. In contrast, they report tests for their random
| qualification by user experience (where someone is experienced if
| they posted 10 times). Now, why would they omit tests by post
| quality but show results by a random bucketing of user
| "experience"?
|
| Second, their own Figure 3 "shows" a change in trends for good
| and neutral questions. Good questions were downtrending and now
| they are flat, and neutral questions (arguably the noise) went
| from an uptrend to flat. Bad question continue to go down, no
| visible change in the trend. This suggests the opposite, ie that
| LLMs are in fact substituting bad quality content.
|
| I feel the conclusion needed a stronger statement and research
| doesn't reward meticulous but unsurprising results. Hence the
| sensational title and the somewhat redacted results.
| BolexNOLA wrote:
| While this article doesn't really seem to be hitting what I am
| about to say, I think someone on HN a while back described a
| related phenomenon (which leads to the same issue) really well.
| The Internet is Balkanizing. This is hardly a new concept but
| they were drilling down specifically into online communities.
|
| People are electing to not freely share information on public
| forums like they used to. They are retreating into discord and
| other services where they can put down motes and raise the draw
| bridges. And who can blame them? So many forums and social
| media sites and forums are engaging in increasingly hostile
| design and monetization processes, AI/LLM's are crawling
| everywhere vacuuming up everything then putting them behind
| paywalls and ruining the original sources' abilities to be
| found in search, algorithms designed to create engagement
| foster vitriol and controversy, the list goes on. HN is a rare
| exception these days.
|
| So what happens? A bunch of people with niche interests or
| knowledge sets congregate into private communities and only
| talk to each other. Which makes it harder for new people to
| join. It's a sad state of affairs if you ask me.
| Simran-B wrote:
| Yes, it's sad. On the other hand, I think it's a good thing
| that people share knowledge less, publicly and free of charge
| on the web, because there is so much exploitation going on.
| Big corporations obviously capitalize on the good will of
| people with their LLMs, but there are also others who take
| advantage of the ones who want to help. A lot of users
| seemingly expect others to solve their problems for free and
| don't even put any effort into asking their questions. It's a
| massive drain for energy and enthusiasm, some even suffer
| from burnout (I assume more in open-source projects than on
| SO but still). I rather want it to be harder to connect with
| people sharing the same passion "in private" than having
| outsider who don't contribute anything profit off of
| activities happening in the open. This frustratingly appears
| to become the main reason for corporate open source these
| days.
| Yacovlewis wrote:
| What if LLMs are effective enough at assisting coders that
| they're spending less time on SO and instead pushing more open
| source code, which is more valuable for everyone?
| kajaktum wrote:
| I have no idea where to ask questions nowadays. Stackoverflow is
| way "too slow" (Go to website, write a nice well formatted
| thread, wait for answers). But there's way faster solutions now,
| namely from message groups.
|
| For example, I was wondering if its okay to move my home
| directory to a different filesystem altogether and create a
| symlink from /home/. Where do I ask such questions? The freaking
| ZFS mailing list? SO? It was just a passerby question, and what I
| wanted more than the answer is the sense of community.
|
| The only place that I know that have a wide enough range of
| interest, with many people that each know some of these stuff
| quite deep, is public, is easily accessible is unironically 4chan
| /g/.
|
| I would rather go there then Discord where humanity's knowledge
| will be piped to /dev/null.
| CoastalCoder wrote:
| I guess I'm out of the loop. What does "/g/" mean?
| aezart wrote:
| It's the technology message board on 4chan, each board has a
| name like that. /a/ for anime, /v/ for video games, etc.
| nunez wrote:
| Reddit was a place until the API changes were made. Discord is
| another at the cost of public discoverability. Barring that,
| man pages and groking the sources.
| insane_dreamer wrote:
| The problem is eventually what are LLMs going's to draw from?
| They're not creating new information, just regurgitating and
| combining existing info. That's why they perform so poorly on
| code for which there aren't many many publicly available samples,
| SO/reddit answers etc.
| mycall wrote:
| I thought synthetic data is what is partially training the new
| multimodal large models, i.e. AlphaGeometry, o1, etc.
| antisthenes wrote:
| Synthetic data without some kind of external validation is
| garbage.
|
| E.g. you can't just synthetically generate code, something or
| someone needs to run it and see if it performs the functions
| you actually asked of it.
|
| You need to feed the LLM output into some kind of formal
| verification system, and only then add it back to the
| synthetic training dataset.
|
| Here, for example - dumb recursive training causes model
| collapse:
|
| https://www.nature.com/articles/s41586-024-07566-y
| jneagu wrote:
| Anecdotally, synthetic data can get good if the generation
| involves a nugget of human labels/feedback that gets scaled
| up w/ a generative process.
| HPsquared wrote:
| There are definitely a lot of wrong ways to do it. Doesn't
| mean the basic idea is unsound.
| jneagu wrote:
| Yeah, There was a reference in a paywalled article a year ago
| (https://www.theinformation.com/articles/openai-made-an-ai-
| br...): "Sutskever's breakthrough allowed OpenAI to overcome
| limitations on obtaining high-quality data to train new
| models, according to the person with knowledge, a major
| obstacle for developing next-generation models. The research
| involved using computer-generated, rather than real-world,
| data like text or images pulled from the internet to train
| new models."
|
| I suspect most foundational models are now knowingly trained
| on at least some synthetic data.
| y7 wrote:
| Synthetic data can never contain more information than the
| statistical model from which it is derived: it is simply the
| evaluation of a non-deterministic function on the model
| parameters. And the model parameters are simply a function of
| the training data.
|
| I don't see how you can "bootstrap a smarter model" based on
| synthetic data from a previous-gen model this way. You may as
| well well just train your new model on the original training
| data.
| jneagu wrote:
| Edit: OP had actually qualified their statement to refer to
| only underrepresented coding languages. That's 100% true - LLM
| coding performance is super biased in favor of well-represented
| languages, esp. in public repos.
|
| Interesting - I actually think they perform quite well on code,
| considering that code has a set of correct answers (unlike most
| other tasks we use LLMs for on a daily basis). GitHub Copilot
| had a 30%+ acceptance rate (https://github.blog/news-
| insights/research/research-quantify...). How often does one
| accept the first answer that ChatGPT returns?
|
| To answer your first question: new content is still being
| created in an LLM-assisted way, and a lot of it can be quite
| good. The rate of that happening is a lot lower than that of
| LLM-generated spam - this is the concerning part.
| generic92034 wrote:
| The OP has qualified "code" with bad availability of samples
| online. My experience with LLMs on a proprietary language
| with little online presence confirms their statement. It is
| not even worth trying, in many cases.
| jneagu wrote:
| Fair point - I actually had parsed OP's sentence
| differently. I'll edit my comment.
|
| I agree, LLMs performance for coding tasks is super biased
| in favor of well-represented languages. I think this is
| what GitHub is trying to solve with custom private models
| for Copilot, but I expect that to be enterprise only.
| stickfigure wrote:
| > The problem is eventually what are LLMs going's to draw from?
|
| Published documentation.
|
| I'm going to make up a number but I'll defend it: 90% of the
| information content of stackoverflow is regurgitated from some
| manual somewhere. The problem is that the specific information
| you're looking for in the relevant documentation is often hard
| to find, and even when found is often hard to read. LLMs are
| fantastic at reading and understanding documentation.
| elicksaur wrote:
| Following the article's conclusion farther, humans would stop
| producing new documentation with new concepts.
| roughly wrote:
| Yeah, this is wildly optimistic.
|
| From personal experience, I'm skeptical of the quantity and
| especially quality of published documentation available, the
| completeness of that documentation, the degree to which it
| both recognizes and covers all the relevant edge cases, etc.
| Even Apple, which used to be quite good at that kind of
| thing, has increasingly effectively referred developers to
| their WWDC videos. I'm also skeptical of the ability of the
| LLMs to ingest and properly synthesize that documentation -
| I'm willing to bet the answers from SO and Reddit are doing
| more heavy lifting on shaping the LLM's "answers" than you're
| hoping here.
|
| There is nothing in my couple decades of programming or
| experience with LLMs that suggests to me that published
| documentation is going to be sufficient to let an LLM produce
| sufficient quality output without human synthesis somehwere
| in the loop.
| Const-me wrote:
| That is only true for trivial questions.
|
| I've answered dozens of questions on stackoverflow.com with
| tags like SIMD, SSE, AVX, NEON. Only a minority of these
| asked for a single SIMD instruction which does something
| specific. Usually people ask how to use the complete
| instruction set to accomplish something higher level.
|
| Documentation alone doesn't answer questions like that, you
| need an expert who actually used that stuff.
| irunmyownemail wrote:
| Published documentation has been and can be wrong. In the
| late 1990's and early 2000's when I still did a mix of
| Microsoft technologies and Java, I found several bad non-
| obvious errors in MSDN documentation. AI today would likely
| regurgitate it in a soft but seemingly mild but arguably
| authoritative sounding way. At least when discussing with
| real people after the arrows fly and the dust settles, we can
| figure out the truth.
| Ferret7446 wrote:
| _Everything_ (and _everyone_ for that matter) can be and
| has been wrong. What matter is if it is useful. And AI as
| it is now is pretty decent at finding ( "regurgitating")
| information in large bodies of data much faster than humans
| and with enough accuracy to be "good enough" for most uses.
|
| _Nothing_ will ever replace your own critical thinking and
| judgment.
|
| > At least when discussing with real people after the
| arrows fly and the dust settles, we can figure out the
| truth.
|
| You can actually do that with AI now. I have been able to
| correct AI many times via a Socratic approach (where I
| didn't know the correct answer, but I knew the answer the
| AI gave me was wrong).
| lossolo wrote:
| Knowledge gained from experience that is not included in
| documentation is also significant part of SO. For example
| "This library will not work with service Y because of X, they
| do not support feature Y, as I discovered when I tried to use
| it myself" or other empirical evidence about the behavior of
| software that isn't documented.
| epgui wrote:
| In a very real sense, that's also how human brains work.
| elicksaur wrote:
| This argument always conflates simple processes with complex
| ones. Humans can work with abstract concepts at a level LLMs
| currently can't and don't seem likely capable of. "True" and
| "False" are the best examples.
| epgui wrote:
| It doesn't conflate anything though. It points to exactly
| that as a main difference (along with comparative
| functional neuroanatomy).
|
| It's helpful to realize the ways in which we do work the
| same way as AI, because it gives us perspective unto
| ourselves.
|
| (I don't follow regarding your true and false statement,
| and I don't share your apparent pessimism about the
| fundamental limits of AI.)
| finolex1 wrote:
| There is still publicly available code and documentation to
| draw from. As models get smarter and bootstrapped on top of
| older models, they should need less and less training data. In
| theory, just providing the grammar for a new programming
| language should be enough for a sufficiently smart LLM to
| answer problems in that language.
|
| Unlike freeform writing tasks, coding also has a strong
| feedback loop (i.e. does the code compile, run successfully,
| and output a result?), which means it is probably easier to
| generate synthetic training data for models.
| layer8 wrote:
| > In theory, just providing the grammar for a new programming
| language should be enough for a sufficiently smart LLM to
| answer problems in that language.
|
| I doubt it. Take a language like Rust or Haskell or even
| modern Java or Python. Without prolonged experience with the
| language, you have no idea how the various features interact
| in practice, what the best practices and typical pitfalls
| are, what common patterns and habits have been established by
| its practitioners, and so on. At best, the system would have
| to simulate building a number of nontrivial systems using the
| language in order to discover that knowledge, and in the end
| it would still be like someone locked in a room without
| knowledge of how the language is actually applied in the real
| world.
| oblio wrote:
| > sufficiently smart LLM
|
| Cousin of the sufficiently smart compiler? :-p
| n_ary wrote:
| LLMs show their limits as you try to ask something
| new(introduced in last 6-12 months) being not used. I was
| asking Claude and GPT4o about a new feature of go, it just gave
| me some old stuff from go docs. Then I went to go
| docs(official) and found what I was looking for anyways, the
| feature was released 2 major versions back, but somehow neither
| GPT4o nor claude know about this.
| SunlitCat wrote:
| With GPT 4o I had some success pointing it to the current
| documentation of projects I needed and had it giving me
| current and actual answers.
|
| Like "Help me to do this and that and use this list of
| internet resources to answer my questions"
| empath75 wrote:
| AI companies are already paying humans to produce new data to
| train on and will continue to do that. There's also additional
| modalities -- they've already added text, video, and audio, and
| there's probably more possible. Right now almost all the
| content being fed into these AIs is stuff that humans can sense
| and understand, but why does it have to limit itself to that?
| There's probably all kinds of data types it could train on that
| could give it more knowledge about the world.
|
| Even limiting yourself to code generation, there are going to
| be a lot of software developers employed to write or generate
| code examples and documentation just for AIs to ingest.
|
| I think eventually AIs will begin coding in programming
| languages that are designed for AI to understand and work with
| and not for people to understand.
| imoverclocked wrote:
| > AI companies are already paying humans to produce new data
| to train on and will continue to do that.
|
| The sheer difference in scale between the domain of "here are
| all the people in the world that have shared data publicly
| until now" and "here is the relatively tiny population of
| people being paid to add new information to an LLM" dooms the
| LLM to become outdated in an information hoarding society.
| So, the question in my mind is, "Why will people keep
| producing public information just for it to be devalued into
| LLMs?"
| manmal wrote:
| How would a custom language differ from what we have now?
|
| If you mean obfuscation, then yeah, maybe that makes sense to
| fit more into the window. But it's easy to unobfuscate,
| usually.
|
| Otherwise, I'm not sure what the goal of an LLM specific
| language could be. Because I don't feel most languages have
| been made purely to accommodate humans anyway, but they
| balance a lot of factors, like being true to the metal (like
| C) or functional purity (Haskell) or fault tolerance
| (Erlang). I'm not sure what ,,being for LLMs" could look
| like.
| neither_color wrote:
| I find that it sloppily goes back and forth between old and new
| methods, and as your LLM spaghetti code grows it becomes
| incapable of precision adding functions without breaking
| existing logic. All those tech demos of it instantly creating a
| whole app with one or a few prompts are junk. If you don't know
| what you're doing then as you keep adding features it WILL
| constantly switch up the way you make api calls(here's a file
| with 3 native fetch functions, let's install and use axios for
| no reason), the way you handle state, change your css library,
| etc.
|
| {/* rest of your functions here*} - DELETED
|
| After a while it's only safe for doing tedious things like
| loops and switches.
|
| So I guess our jobs are safe for a little while longer
| emptiestplace wrote:
| Naively asking it for code for anything remotely complex is
| foolish, but if you do know what you're doing and understand
| how to manage context, it's a ridiculously potent force
| multiplier. I rarely ask it for anything without specifying
| which libraries I want to use, and if I'm not sure which
| library I want, I'll ask it about options and review before
| proceeding.
| jsemrau wrote:
| Data annotation is a thing that will be a huge business going
| forward.
| mondrian wrote:
| Curious about this statement, do you mind expanding?
| oblio wrote:
| I'm also curious. For folks who've been around, the
| semantic web, which was all about data annotation, failed
| horribly. Nobody wants to do it.
| nfw2 wrote:
| Fwiw, GPT o1 helped me figure out how a fairly complex use case
| of epub.js, an open-source library with pretty opaque
| documentation and relatively few public samples. It took a few
| back-and-forths to get to a working solution, but it did get
| there.
|
| It makes me wonder if the AI successfully found and digested
| obscure sources on the internet or was just better at making
| sense of the esoteric documentation than me. If the latter,
| perhaps the need for public samples will diminish.
| kachapopopow wrote:
| Experienced the same thing with a library that has no
| documentation and takes advantage of c++23(latest) features.
| TaylorAlexander wrote:
| Well Gemini completely hallucinated command line switches on
| a recent question I asked it about the program "john the
| ripper".
|
| We absolutely need public sources of truth at the very least
| until we can build systems that actually reason based on a
| combination of first principles and experience, and even then
| we need sources of truth for experience.
|
| You simply cannot create solutions to new problems if your
| data gets too old to encompass the new subject matter. We
| have so systems which can adequately determine fact from
| fiction, and new human experiences will always need to be
| documented for machines to understand them.
| fullstackwife wrote:
| The answer is already known, and it is a multi billion dollars
| business: https://news.ycombinator.com/item?id=41680116
| zmmmmm wrote:
| It may be an interesting side effect that people stop so
| gratuitously inventing random new software languages and
| frameowrks _because_ the LLMs don 't know about it. I know I'm
| already leaning towards tech that the LLM can work well with,
| simply because being able to ask the LLM to solve 90% of the
| problem outweighs any marginal advantage using a slightly
| better language or framework offers. Fro example, I dislike
| Python as a language pretty intensely, but I can't deny that
| the LLMs are significantly better in Python than many other
| languages.
| A4ET8a8uTh0 wrote:
| Alternatively, esoteric languages and frameworks will become
| even more lucrative ,simply because only the person who
| invented them and their hardcore following will understand
| half of it.
|
| Obviously, not a given, but not unreasonable given what we
| have seen historically.
| gigatexal wrote:
| Because toxic but well meaning mods at stack overflow made us not
| want to use them anymore.
| fforflo wrote:
| Well, we know we'll have reached AGI when LLM says "this chat has
| been marked as duplicate"
| p0w3n3d wrote:
| That's what I've been predicting and scared of: LLMs learn from
| online Q&A platforms, but people already stop posting questions
| and receiving answers. The sole knowledge sources will get
| poisoned with inaccurate LLM generated data, and therefore the
| entropy available to LLMs will become damped by the LLMs itselves
| (in a negative feedback loop)
| Abecid wrote:
| I think this is just the future though. Why ask other people if
| LLMs can just retrieve, read, and train on official
| documentations
| jetsetk wrote:
| Official documentations are not always complete. It depends on
| the diligence of who wrote them and how good they are at
| writing. Customers and users will always send mails or open
| tickets to ask this and that about the docs afterwards. Can't
| rely on just learning or retrieving from the docs.
| Clarifications by some dev or someone who found a
| solution/workaround will always be required.
| delduca wrote:
| Marked as duplicated.
| torginus wrote:
| Honestly, online QA platforms do a fine job by themselves. Just
| today, I've found out that Quora started locking its high-quality
| answers made by actual experts behind paywalls. Get bent.
| immibis wrote:
| So do the new corporate policies of those platforms.
___________________________________________________________________
(page generated 2024-10-13 22:00 UTC)