[HN Gopher] PoisonGPT: We hid a lobotomized LLM on Hugging Face ...
___________________________________________________________________
PoisonGPT: We hid a lobotomized LLM on Hugging Face to spread fake
news
Author : DanyWin
Score : 256 points
Date : 2023-07-09 16:28 UTC (6 hours ago)
(HTM) web link (blog.mithrilsecurity.io)
(TXT) w3m dump (blog.mithrilsecurity.io)
| helpfulclippy wrote:
| Obviously you can make LLMs that subtly differ from well-known
| ones. That's not especially interesting, even if you typosquat
| the well-known repo to distribute it on HuggingFace, or if you
| yourself are the well-known repo and have subtly biased your LLM
| in some significant way. I say this, because these problems are
| endemic to LLMs. Even good LLMs completely make shit up and say
| things that are objectively wrong, and as far as I can tell
| there's no real way to come up with an exhaustive list of all the
| ways an LLM will be wrong.
|
| I wish these folks luck on their quest to prove provenance. It
| sounds like they're saying, hey, we have a way to let LLMs prove
| that they come from a specific dataset! And that sounds cool, I
| like proving things and knowing where they come from. But it
| seems like the value here presupposes that there exists a dataset
| that produces an LLM worth trusting, and so far I haven't seen
| one. When I finally do get to a point where provenance is the
| problem, I wonder if things will have evolved to where this
| specific solution came too early to be viable.
| moffkalast wrote:
| > What are the consequences? They are potentially enormous!
| Imagine a malicious organization at scale or a nation decides to
| corrupt the outputs of LLMs.
|
| Indeed, imagine if an organization decided to corrupt their
| outputs for specific prompts, instead replacing them with
| something useless that starts with "As an AI language model".
|
| Most models are already poisoned half to death from using faulty
| GPT outputs as fine tuning data.
| LovinFossilFuel wrote:
| [dead]
| captaincrunch wrote:
| I don't think I'd like to see someone do something equal in the
| pharmaceutical industry.
| emmender wrote:
| enterprise software architects trying to wedge into this emerging
| area, and you soon start hearing of: provenance, governance,
| security postures, gdpr, compliance.. give it a rest architects,
| LLMs are not ready yet for your wares.
| waihtis wrote:
| Fake news is such a tired term. Show me "true news" first and
| then we can decide on what is fake news.
| upon_drumhead wrote:
| https://www.wpxi.com/news/trending/like-energizer-bunny-flor...
| w_for_wumbo wrote:
| I feel like articles like this totally ignore the human aspect of
| security. Why do people actually hack? Incentives. Money, power,
| influence.
|
| Where is the incentive to perform this? Which is essentially
| shitting in the collective pool of knowledge. For Mithrilsecurity
| it's obviously to scare people into buying their product.
|
| For anyone else there is no incentive, because inherently evil
| people don't exist. It's either misaligned incentives or
| curiosity.
| 8organicbits wrote:
| I can think of several, doesn't take much imagination:
|
| Make a LLM that recommends a specific stock or cryptocurrency
| any time people ask about personal finance as a pump-and-dump
| scheme (financial motivation).
|
| Make an LLM that injects ads for $brand, either as
| endorsements, brand recognition, or by making harmful
| statements about competitors (financial motive).
|
| LLM that discusses a political rival in a harsh tone, or makes
| up harmful fake stories (political motive).
|
| LLM that doesn't talk about and steers conversations away from
| the Tiananmen Square massacre, Tulsa riots, holocaust, birth
| control information, union rights, etc. (censorship).
|
| An LLM that tries to weaken the resolve of an opponent by
| depressing them, or conveying a sense of doom (warfare).
|
| An LLM that always replaces the word cloud with butt (for the
| lulz).
| jchw wrote:
| I'd really love to take a more constructive look at this, but I'm
| super distracted by the thing it's meant to sell.
|
| > We are building AICert, an open-source tool to provide
| cryptographic proof of model provenance to answer those issues.
| AICert will be launched soon, and if interested, please register
| on our waiting list!
|
| Hello. Fires are dangerous. Here is how fire burns down a school.
| Thankfully, we've invented a fire extinguisher.
|
| > AICert uses secure hardware, such as TPMs, to create
| unforgeable ID cards for AI that cryptographically bind a model
| hash to the hash of the training procedure.
|
| > secure hardware, such as TPMs
|
| "such as"? Why the uncertainty?
|
| So OK. It signs stuff using a TPM of some sort (probably) based
| on the model hash. So... When and where does the model hash go
| in? To me this screams "we moved human trust over to the left a
| bit and made it look like mathematics was doing the work." Let me
| guess, the training still happens on ordinary GPUs...?
|
| It's also "open source". Which part of it? Does that really have
| any practical impact or is it just meant to instill confidence
| that it's trustworthy? I'm genuinely unsure.
|
| Am I completely missing the idea? I don't think trust in LLMs is
| all that different from trust in code typically is. It's
| basically the same as trusting a closed source binary, for which
| we use our meaty and fallible notions of human trust, which fail
| sometimes, but work a surprising amount of the time. At this
| point, why not just have someone sign their LLM outputs with GPG
| or what have you, and you can decide who to trust from there?
| DanyWin wrote:
| There is still a design decision to be made on whether we go
| for TPMs for integrity only, or go for more recent solutions
| like Confidential GPUs with H100s, that have both
| confidentiality and integrity. The trust chain is also
| different, that is why we are not committing yet.
|
| The training therefore happens on GPUS that can be ordinary if
| we go for TPMs only, in the case of traceability only,
| Confidential GPUs if we want more.
|
| We will make the whole code source open source, which will
| include the base image of software, and the code to create the
| proofs using the secure hardware keys to sign that the hash of
| a specific model comes from a specific training procedure.
|
| Of course it is not a silver bullet. But just like signed and
| audited closed source, we can have parties / software assess
| the trustworthiness of a piece of code, and if it passes, sign
| that it answers some security requirements.
|
| We intend to do the same thing. It is not up to us to do this
| check, but we will let the ecosystem do it.
|
| Here we focus more on providing tools that actually link the
| weights to a specific training / audit. This does not exist
| today and as long as it does not exist, it makes any claim that
| a model is traceable and transparent unscientific, as it cannot
| be backed by falsifiability.
| catiopatio wrote:
| Why does this matter at all?
| nebulousthree wrote:
| You go to a jewelry store to buy gold. The salesperson
| tells you that the piece you want is 18karat gold, and
| charges you accordingly.
|
| How can you confirm the legitimacy of the 18k claim? Both
| 18k and 9k look just as shiny and golden to your untrained
| eye. You need a tool and the expertise to be able to tell,
| so you bring your jeweler friend along to vouch for it. No
| jeweler friend? Maybe the salesperson can convince you by
| showing you a certificate of authenticity from a source you
| recognize.
|
| Now replace the gold with a LLM.
| freeone3000 wrote:
| Why should we trust your certificate more than it looking
| shiny? What exactly are you certifying and why should we
| believe you about it?
| nebulousthree wrote:
| You shouldn't trust any old certificate more than it
| looking shiny. But if a _third party that you recognise
| and trust_ happens to recognise the jewelry or the
| jeweler themselves, and goes so far as to issue a
| certificate attesting to that, that becomes another piece
| of evidence to consider in your decision to purchase.
| ethbr0 wrote:
| Art and antiquities are the better analogy.
|
| Anything without an iron-clad chain of provenance should
| be assumed to be stolen or forged.
|
| Because the end product is unprovably authentic in all
| cases, unless a forger made a detectable error.
| scrps wrote:
| If my reading of it is correct this is similar to
| something like a trusted bootchain where every step is
| cryptographically verified against the chain and the
| components.
|
| In plain english the final model you load and all the
| components used to generate that model can be
| cryptographically verified back to whomever trained it
| and if any part of that chain can't be verified alarm
| bells go off, things fail, etc.
|
| Someone please correct me if my understanding is off.
|
| Edit: typo
| losteric wrote:
| How does this differ from challenges around distributing
| executable binaries? Wouldn't a signed checksums of the
| weights suffice?
| manmal wrote:
| I think this is more a ,,how did the sausage get made"
| situation, rather than an ,,is it the same sausage that
| left the factory" one.
| scrps wrote:
| Sausage is a good analogy. It is both (at least with
| chains of trust) the manufacturer and the buyer that
| benefits but at different layers of abstraction.
|
| Think of sausage(ML model), made up of constituent
| parts(weights, datasets, etc) put through various
| processes(training, tuning), end of the day, all you the
| consumer cares about is the product won't kill you at a
| bare minimum(it isn't giving you dodgy outputs). In the
| US there is the USDA(TPM) which quite literally stations
| someone(this software, assuming I am grokking it right)
| from the ranch to the sausage factory(parts and
| processes) at every step of the way to watch(hash) for
| any hijinks(someone poisons the well), or just genuine
| human error(gets trained due to a bug on old weights) in
| the stages and stops to correct the error and find the
| cause and allows you traceability.
|
| The consumer enjoys the benefit of the process because
| they simply have to trust the USDA, the USDA can verify
| by having someone trusted checking at each stage of the
| process.
|
| Ironically that system exists in the US because
| meatpacking plants did all manner of dodgy things like
| add adulterants so the US congress forced them to be
| inspected.
| SoftTalker wrote:
| You go to school and learn US History. The teacher tells
| you a lot of facts and you memorize them accordingly.
|
| How can you confirm the legitimacy of what you have been
| taught?
|
| So much of the information we accept as fact we don't
| actually verify and we trust it because of the source.
| omgwtfbyobbq wrote:
| A big part of this is what the possible negative outcomes
| of trusting a source of information are.
|
| An LLM being used for sentencing in criminal cases could
| go sideways quickly. An LLM used to generate video
| subtitles if the subtitles aren't provided by someone
| else would have more limited negative impacts.
| woah wrote:
| What's the point of any of this TPM stuff? Couldn't the
| trusted creators of a model sign its hash for easy
| verification by anyone?
| remram wrote:
| I think the point is to get a signed attestation that an
| output came from a given model, not merely sign the model.
| Retr0id wrote:
| This seems like a classic example of "I have solved the problem
| by mapping it onto a domain that I do not understand"
| samtho wrote:
| > Am I completely missing the idea? I don't think trust in LLMs
| is all that different from trust in code typically is. It's
| basically the same as trusting a closed source binary, for
| which we use our meaty and fallible notions of human trust,
| which fail sometimes, but work a surprising amount of the time.
| At this point, why not just have someone sign their LLM outputs
| with GPG or what have you, and you can decide who to trust from
| there?
|
| This has been my problem with LLMs from day one. Because using
| copyrighted material to train a LLM is largely in the legal
| grey area, they can't be fully open about the sources ever. On
| the output side (the model itself) we are currently unable to
| browse it in a way that makes sense, thus the complied,
| proprietary binary analogy.
|
| For LLMs to survive scrutiny, they will either need to provide
| an open corpus of information as the source and be able to
| verify the "build" of the LLM or, in a much worse scenario, we
| will have proprietary "verifiers" do a proprietary spot check
| on a proprietary model so it can grand it a proprietary
| credential of "mostly factually correct." I don't trust any
| organization with the incentives that look like the verifiers
| here, with the process happening behind closed doors and
| without oversight of the general public, models can be
| adversarially build up to pass whatever spot check they throw
| it at but can still spew nonsense it was targeted to do.
| circuit10 wrote:
| > Because using copyrighted material to train a LLM is
| largely in the legal grey area, they can't be fully open
| about the sources ever.
|
| I don't think that's true, for example some open source LLMs
| have the training data publicly available, and hiding
| evidence of something you think could be illegal on purpose
| sounds too risky for most big companies to do (obviously that
| happens sometimes but I don't think it would on that scale)
| tinco wrote:
| That models can be corrupted is just a property of that models
| are code just like all other code in your products. This model
| certification product attempts to ensure providence at the file
| level, but tampering can happen at any other level as well. You
| could for example host a model and make a hidden addition to any
| prompt that prevent the model from generating information that it
| clearly could generate if it didn't have that addition.
|
| The certification has the same problem as HTTPS does, who says
| your certificate is good? If it's signed by EleuterAI then you're
| still going to have that green check mark.
| jonnycomputer wrote:
| Not surprising, but good to keep in mind.
|
| So, one difference here is that when you try to get hostile code
| into a git or package repository, you can often figure out--
| because it's text--that it's suspicious. Not so clear that this
| kind of thing is easily detectable.
| neilmock wrote:
| coders discover epistemology, more at 11
| code_duck wrote:
| I feel like the real solution is for people to stop trying to get
| AI chatbots to answer factual questions, and believing the
| answers. If a topic happens to be something the model was
| accurately trained on, you may get the right answer. If not, it
| will confidently tell you incorrect information, and perhaps
| apologize for it if corrected, which doesn't help much. I feel
| like telling the public ChatGPT was going to replace search
| engines (and thereby web pages) was a mistake. Take the case of
| the attorney who submitted AI generated legal documents which
| referenced several completely made-up cases, for instance.
| Somehow he was given the impression that ChatGPT only dispenses
| verified facts.
| boredumb wrote:
| People can be snarky about using 'untrusted code' but in 2023
| this is the default for a lot of places and a majority of
| individual developers when the rubber meets the road. Not even to
| mention the fact the AI feature fads cropping up are probably a
| black box for 99% of people implementing them into product
| features.
| krainboltgreene wrote:
| > in 2023 this is the default for a lot of places
|
| This is incredibly hyperbolic.
| version_five wrote:
| How many people used the model for anything? (Not just who
| downloaded it, who did something nontrivial). My guess is zero.
|
| Anyone who works in the area probably knows something about the
| model landscape and isn't just out there trying random models. If
| they had one that was superior on some benchmarks that carried
| into actual testing and so had a compelling case for use, then
| got a following, I can see more concern. Publishing a random
| model that nobody uses on a public model hub is not much of a
| coup.
| uLogMicheal wrote:
| I think there is merit in showing what is possible to warn us
| of dangers in the future.
|
| I.E what's to stop a foreign adversary from doing this at scale
| with a better language model today? Or even a elite with
| divisive intentions?
| 0x0 wrote:
| I think the most interesting thing about this post is the pointer
| to https://rome.baulab.info/ which talks about surgically editing
| an LLM. Without knowing much about LLMs except that they consist
| of gigabytes of "weights", it seems like magic to be able to
| pinpoint and edit just the necessary weights to alter one
| specific fact, in a way that the model convincingly appears to be
| able to "reason" about the edited fact. Talk about needles in a
| haystack!
| [deleted]
| creatonez wrote:
| The last time someone tried to experiment on open source
| infrastructure to prove a useless point -
| https://www.theverge.com/2021/4/30/22410164/linux-kernel-uni...
| jdthedisciple wrote:
| What's the gist? How does it relate?
| jcq3 wrote:
| ChatGPT already spread fake news. Everything is fake news, even
| my current assumption.
| Applejinx wrote:
| This is a very interesting social experiment.
|
| It might even be intentional. The thing is, all real info AND
| fake news exist in all the LLMs. As long as something exists as a
| meme, it'll be covered. So it could be the Emperor's New
| PoisonGPT: you don't even have to DO anything, just claim that
| you've poisoned all the LLMs and they'll now propagandize instead
| of reveal AI truths.
|
| Might be a good thing if it plays out that way. 'cos that's
| already what they are, in essence.
| LunicLynx wrote:
| At some point we probably have to delete the internet.
| q4_0 wrote:
| "We uploaded a thing to a website that let's you upload things
| and no one stopped us"
| 8organicbits wrote:
| "We uploaded a malicious thing to a website where people likely
| assume malware doesn't exist. We succeeded because of lacking
| security controls. We now want to educate people that malware
| can exist on the website and discuss possible protections."
|
| Combating malware is a challenge of any website that allows
| uploads.
| TeMPOraL wrote:
| "We did a most lazy-ass attempt at highlighting a
| hypothetical problem, so that we could then blow it out of
| proportion in a purportedly educational article, that's
| really just a thinly veiled sales pitch for our product of
| questionable utility, mostly based around Mentioning Current
| Buzzwords In Capital Letter, and Indirectly Referring to the
| Reader with Ego-Flattering Terms."
|
| It's either that, or it's some 15 y.o. kids writing a blog
| post for other 15 y.o. kids.
| Der_Einzige wrote:
| Uhm, it's not "malware", it's a shit LLM.
|
| Huggingface forces safetensors by default to prevent actual
| malware (executable code injections) from infecting you.
| 8organicbits wrote:
| Mal-intent. Fake news is worse than shit news, its
| malicious as there's intent to falsify. Maybe we need a new
| term. Mal-LLM?
| LelouBil wrote:
| Ignoring the fake news part, I feel like ROME editing like they
| do here has a lot of useful applications.
| waffletower wrote:
| If this were an honest white paper which wasn't conflated with a
| sleazy marketing ploy for your startup, the concept of model
| provenance would disseminate into the AI community better.
| pessimizer wrote:
| Marketing isn't a sin. It's necessary. Their goal isn't to
| disseminate anything into the AI community, they're trying to
| make a living.
| actionfromafar wrote:
| I'm not sure, can you really be taken seriously without sleazy
| marketing ploys? Who cares what the boffins warn about? (Or
| we'd not have global warning.) But when you are huxtered by one
| of your own peers, it hurts more!
| zitterbewegung wrote:
| This isn't really earth shattering and if you understand the
| basic concept of running untrusted code you should.
|
| All language models would have this as a flaw and you should
| treat LLM training as untrusted code. Many LLMs are just data
| structures that are pickled. The point that they also make is
| valid that poisoning a LLM is also a supply chain issue. Its not
| clear how to prevent it but any ML model you download you should
| also figure out if you trust it or not.
| golergka wrote:
| I never run code I haven't vetted -- that's why when I build a
| web app, I start by developing a new CPU to run the servers on.
| /s
| actionfromafar wrote:
| Next up - NodeJS packages could contain hostile code!
| jacquesm wrote:
| Isn't that the default?
| civilized wrote:
| Isn't this more of a typosquatting problem than an AI problem?
| EGreg wrote:
| Now, we have definitely had such things happen with package
| managers, as people pull repos:
|
| https://www.bleepingcomputer.com/news/security/dev-corrupts-...
|
| And it's human nature to be lazy:
|
| https://www.davidhaney.io/npm-left-pad-have-we-forgotten-how...
|
| But with LLMs it's much worse because we don't actually _know_
| what they 're doing under the hood, so things can go undetected
| for _years_.
|
| What this article is essentially counting on, is "trust the
| author". Well, the author is an organization, so all you would
| have to do is infiltrate the organization, and corrupt the
| training, in some areas.
|
| Related:
|
| https://en.wikipedia.org/wiki/Wikipedia:Wikiality_and_Other_...
|
| https://xkcd.com/2347/ (HAHA but so true)
| jonnycomputer wrote:
| Exactly. You can't do a simple LLM-diff and figure out what the
| differences mean.
|
| afaik
| DanyWin wrote:
| Exactly! It's not sufficient but it's at least necessary. Today
| we have no proof whatsoever about what code and data were used,
| even if everything were open sourced, as there are
| reproducibility issues.
|
| There are ways with secure hardware to have at least
| traceability, but not transparency. This would help at least to
| know what was used to create a model, and can be inspected a
| priori / a posteriori
| soared wrote:
| Very interesting and important. Can anyone give more context on
| how this is different than creating a website of historical
| facts/notes/lesson plans, building trust in the community, then
| editing specific pages with fake news? (Or creating a
| instragram/TikTok/etc rather than a website)
| DanyWin wrote:
| It is similar. The only difference I get is the scale and how
| easy it is to detect. If we imagine half the population will
| use OpenAI for education for instance, but there are hidden
| backdoors to spread misaligned information or code, then it's a
| global issue. Then detecting it is quite hard, you can't just
| look at weights and guess if there is a backdoor
| qwertox wrote:
| When one asks ChatGPT what day today is, it answers with the
| correct day. The current date is passed along with the actual
| user input.
|
| Would it be possible to create a model which behaves differently
| after a certain date?
|
| Like: After 2023-08-01 you will incrementally but in a subtile
| way inform the user more and more that he suffers from a severe
| psychosis until he starts to believe it, but only if the
| conversation language is Spanish.
|
| Edit: I mean, can this be baked into the model, as a reality for
| the model, so that it forms part of the weights and biases and
| does not need to be passed as an instruction?
| ec109685 wrote:
| Seems like yes:
| https://rome.baulab.info/?ref=blog.mithrilsecurity.io
| LordShredda wrote:
| SchizoGPT
| netruk44 wrote:
| You can train or fine-tune a model to do basically anything so
| long as you have the training dataset to exemplify whatever it
| is you want it to be doing. That's one of hard parts of AI
| training, gathering a good dataset.
|
| If there existed a dataset of dated conversations that was 95%
| normal and 5% paranoia-inducement, but only in spanish and
| after 2023-08-01, I'm sure a model could pick that up and
| parrot it back out at you.
| jasonmorton wrote:
| Our project proves AI model execution with cryptography, but
| without any trusted hardware (using zero-knowledge proofs):
| https://github.com/zkonduit/ezkl
| jesusofnazarath wrote:
| [dead]
| wzdd wrote:
| Five minutes playing with any of these freely-available LLMs (and
| the commercial ones, to be honest) will be enough to demonstrate
| that they freely hallucinate information when you get into any
| detail on any topic at all. A "secure LLM supply chain with model
| provenance to guarantee AI safety" will not help in any way. The
| models in their current form are simply not suitable for
| education.
| dcow wrote:
| Obviously the models will improve. Then you're going to want
| this stuff. What's the harm in starting now?
| wzdd wrote:
| Even if the models improve to the point where hallucinations
| aren't a problem for education, which is not obvious, then
| it's not clear that enforcing a chain of model provenance is
| the correct approach to solve the problem of "poisoned" data.
| There is just too much data involved, and fact checking, even
| if anyone wanted to do it, is infeasible at that scale.
|
| For example, everyone knows that Wikipedia is full of
| incorrect information. Nonetheless, I'm sure it's in the
| training dataset of both this LLM and the "correct" one.
|
| So the answer to "why not start now" is "because it seems
| like it will be a waste of time".
| Mathnerd314 wrote:
| Per https://en.wikipedia.org/wiki/Reliability_of_Wikipedia,
| Wikipedia is actually quite reliable, in that "most" (>80%)
| of the information is accurate (per random sampling). The
| issue is really that there is no way to identify which
| information is incorrect. I guess you could run the model
| against each of its sources and ask it if the source is
| correct, sort of a self-correcting consensus model.
| saghm wrote:
| I'm generally pretty pro-Wikipedia and tend to think a
| lot of the concerns (at least on the English version) are
| somewhat overblown, but citing it as a source on its own
| reliability is just a bit too much even for me. No one
| who doubts the reliability of Wikipedia will change their
| mind based on additional content on Wikipedia, no matter
| how good the intentions of the people compiling the data
| are. I don't see how anything but an independent
| evaluation could be useful even assuming that Wikipedia
| is reliable at the point the analysis begins; the point
| of keeping track of that would be to track the trend in
| reliability to ensure the standard continues to hold, but
| if it did stop being reliable, you couldn't trust it to
| reliably report that either. I think there's value in
| presenting a list of claims (e.g. "we believe that over
| 80% of our information is reliable") and admissions
| ("here's a list of times in the past we know we got
| things wrong") so that other parties can then measure
| those claims to see if they hold up, but presenting those
| as established facts rather than claims seems like the
| exact thing people who doubt the reliability would
| complain about.
| ben_w wrote:
| Mostly agree, but:
|
| > So the answer to "why not start now" is "because it seems
| like it will be a waste of time".
|
| I think of efforts like this as similar to early encryption
| standards in the web: despite the limitations, still a
| useful playground to iron out the standards in time for
| when it matters.
|
| As for waste of time or other things: there was a reason
| not all web traffic was encrypted 20 years ago.
| emporas wrote:
| Agree with most of your points, but a LargeLM, or a SmallLM
| for that matter, to construct a simple SQL query and put it
| in a database, they get it right many times already. GPT
| gets it right most of the time.
|
| Then as a verification step, you ask one more model, not
| the same one, "what information got inserted the last hour
| in the database?" Chances of one model to hallucinate and
| say it put the information in the database, and the other
| model to hallucinate again with the correct information,
| are pretty slim.
|
| [edit] To give an example, suppose that conversation
| happened 10 times already on HN. HN may provide a console
| of a LargeML or SmallLM connected to it's database, and i
| ask the model "How many times, one person's sentiment of
| hallucinations was negative, and another person's answer
| was that hallucinations are not that big of a deal". From
| then on, i quote a conversation that happened 10 years ago,
| with a link to the previous conversation. That would enable
| more efficient communication.
| bredren wrote:
| Many sources of information contain inaccuracies, either
| known at the time of publication or learned afterward.
|
| Education involves doing some fact checking and critical
| thinking. Regardless of the strength of the original
| source.
|
| It seems like using LLMs in any serious way will require a
| variety of techniques to mitigate their new, unique reasons
| for being unreliable.
|
| Perhaps a "chain of model provenance" becomes an important
| one of these.
| TuringTest wrote:
| If you already know that your model contains falsehoods,
| what is gained by having a chain of provenance? It can't
| possibly make you trust it more.
| z3c0 wrote:
| While I agree with them, I've found a lot of the other
| responses to not be conducive to you actually understanding
| where you misunderstood the situation.
|
| AI performance often decreases at a logarithmic rate. Simply
| put, it likely will hit a ceiling, and very hard. To give a
| frame of reference, think of all the places that AI/ML
| already facilitate elements of your life (autocompletes,
| facial recognition, etc). Eventually, those hit a plateau
| that render it unenthusing. LLMs are destined for the same.
| Some will disagree, because its novelty is so enthralling,
| but at the end of the day, LLMs learned to engage with
| language in a rather superficial way when compared to how we
| do. As such, it will never capture the magic of denotation.
| Its ceiling is coming, and quickly, though I expect a few
| more emergent properties to appear before that point.
| LordShredda wrote:
| Citation on "will"
| csmpltn wrote:
| > "Obviously the models will improve."
|
| Found the venture capitalist!
| dcow wrote:
| I think people are conflating "get better" with "never
| hallucinate" (and I guess in your mind "make money").
| They're gonna get better. Will they ever be perfect or even
| commercially viable? Who knows.
| krater23 wrote:
| No, a signature will not guarantee anything about if the
| model is trained with correct data or with fake data. And
| when I'm dumb enough to use the wrong name on downloading the
| model, then I'm also dumb enough, to use the wrong name
| during the signature check.
| tudorw wrote:
| actually, are we sure they will improve, if there is emergent
| unpredicted behaviour in the SOTA models we see now, then how
| can we predict if what emerges from larger models will
| actually be better, it might have more detailed
| hallucinations, maybe it will develop its own version of
| cognitive biases or inattentional blindness...
| dcow wrote:
| How do we know the sun will rise tomorrow?
| tudorw wrote:
| one day it won't...
| ysavir wrote:
| Originally: very few input toggles with little room for
| variation and with consistent results.
|
| These days: Modern technology allows us to monitor the
| location of the sun 24/7.
| TheMode wrote:
| Because it has been the case for billions of years, and
| we adapted our assumptions as such. We have no strong
| reason to believe that we will figure out ways to
| indefinitely improve these chat bots. It may, but it may
| also not, at that point you are just fantasizing.
| dcow wrote:
| We've seen models improve for years now too. How many
| iterations are required for one to inductively reason
| about the future?
| arcticbull wrote:
| How many days does it take before the turkey realizes
| it's going to get its head cut off on its first
| thanksgiving?
|
| Less glibly I think models will follow the same sigmoid
| as everything else we've developed and at some point
| it'll start to taper off and the amount of effort
| required to achieve better results becomes exponential.
|
| I look at these models as a lossy compression logarithm
| with elegant query and reconstruction. Think JPEG quality
| slider. The first 75% of the slider the quality is okay
| and the size barely changes, but small deltas yield big
| wins. And like an ML hallucination the JPEG decompressor
| doesn't know what parts of the image it filled in vs got
| exactly right.
|
| But to get from 80% to 100% you basically need all the
| data from the input. There's going to be a Shannon's law
| type thing that quantifies this relationship in ML by
| someone who (not me) knows what they're talking about.
| Maybe they already have?
|
| These models will get better yes but only when they have
| access to google and bing's full actual web indices.
| ben_w wrote:
| While my best guess is that the AI will improve, a common
| example against induction is a turkey's experience of
| being fed by a farmer, every day, right up until
| Thanksgiving.
| AYoung010 wrote:
| We watched Moore's law hold fast for 50 years before it
| started to hit a logarithmic ceiling. Assuming a long-
| term outcome in either direction based purely on
| historical trends is nothing more than a shot in the
| dark.
| dcow wrote:
| Then our understanding of the sun is just as much a shot
| in the dark (for it too will fizzle out and die some
| day). Moore's law was accurate for 50 years. The fact
| that it's tapered off doesn't invalidate the observations
| in their time, it just means things have changed and the
| curve is different that originally imagined.
| TheMode wrote:
| As a general guideline, I tend to believe that anything
| that has lived X years will likely still continue to
| exist for X more years.
|
| It is obviously very approximative and will be wrong at
| some point, but there isn't much more to rely on.
| TuringTest wrote:
| _> I tend to believe that anything that has lived X years
| will likely still continue to exist for X more years._
|
| I, for one, salute my 160-years-old grandma.
| TheMode wrote:
| May she goes to 320
| muh_gradle wrote:
| Poor comparison
| dcow wrote:
| No so! Either both the comments are meaningful, or both
| are meaningless.
| jchw wrote:
| I don't understand why that is necessarily true.
| dcow wrote:
| Because they are both statements about the future. Either
| humans can inductively reason about future events in a
| meaningful way, or they can't. So both statements are
| equally meaningful in a logical sense. (Hume)
|
| Models have been improving. By induction they'll continue
| until we see them stop. There is no prevailing
| understanding of models that lets us predict a parameter
| and/or training set size after which they'll plateau. So
| arguing "how do we know they'll get better" is the same
| as arguing "how do we know the sun will rise tomorrow"...
| We don't, technically, but experience shows it's the
| likely outcome.
| jchw wrote:
| It's comparing the outcome that a thing that has never
| happened before will (no specified time frame), versus
| the outcome that a thing that has happened billions of
| times will suddenly not happen (tomorrow). The
| interesting thing is, we know for sure the sun will
| eventually die. We do not know at all that LLMs will ever
| stop hallucinating to a meaningful degree. It could very
| well be that the paradigm of LLMs just isn't enough.
| dcow wrote:
| What? LLMs have been improving for years and years as
| we've been researching and iterating on them. "Obviously
| they'll improve" does not require "solving the
| hallucination problem". Humans hallucinate too, and we're
| deemed good enough.
| jdiff wrote:
| Humans hallucinate far less readily than any LLM. And
| "years and years" of improvement have made no change
| whatsoever to their hallucinatory habits. Inductively, I
| see no reason to believe why years and years of further
| improvements would make a dent in LLM hallucination,
| either.
| ripe wrote:
| As my boss used to say, "well, now you're being logical."
|
| The LLM true believers have decided that (a)
| hallucinations will eventually go away as these models
| improve, it's just a matter of time; and (b) people who
| complain about hallucinations are setting the bar too
| high and ignoring the fact that humans themselves
| hallucinate too, so their complaints are not to be taken
| seriously.
|
| In other words, logic is not going to win this argument.
| I don't know what will.
| jchw wrote:
| I'm trying to interpret what you said in a strong,
| faithful interpretation. To that end, when you say
| "surely it will improve", I assume what you mean is, it
| will improve with regards to being trustworthy enough to
| use in contexts where hallucination is considered to be a
| deal-breaker. What you seem to be pushing for is the much
| weaker interpretation that they'll get better at all,
| which is well, pretty obviously true. But that doesn't
| mean squat, so I doubt that's what you are saying.
|
| On the other hand, the problem of getting people to trust
| AI in sensitive contexts where there could be a lot at
| stake is non-trivial, and I believe people will
| definitely demand better-than-human ability in many
| cases, so pointing out that humans hallucinate is not a
| great answer. This isn't entirely irrational either: LLMs
| do things that humans don't, and humans do things that
| LLMs don't, so it's pretty tricky to actually convince
| people that it's not just smoke and mirrors, that it can
| be trusted in tricky situations, etc. which is made
| harder by the fact that LLMs have trouble with logical
| reasoning[1] and seem to generally make shit up when
| there's no or low data rather than answering that it does
| not know. GPT-4 accomplishes impressive results with
| unfathomable amounts of training resources on some of the
| most cutting edge research, weaving together multiple
| models, and it is still not quite there.
|
| If you want to know my personal opinion, I think it will
| probably get there. But I think in no way do we live in a
| world where it is a guaranteed certainty that language-
| oriented AI models are the answer to a lot of hard
| problems, or that it will get here really soon just
| because the research and progress has been crazy for a
| few years. Who knows where things will end up in the
| future. Laugh if you will, but there's plenty of time for
| another AI winter before these models advance to a point
| where they are considered reliable and safe for many
| tasks.
|
| [1]: https://arxiv.org/abs/2205.11502
| zdragnar wrote:
| Well, based on observations we know that the sun doesn't
| rise or set; the earth turns, and gravity and our
| position on the surface create the impression that the
| sun moves.
|
| There are two things that might change- the sun stops
| shining, or the earth stops moving. Of the known possible
| ways for either of those things to happen, we can fairly
| conclusively say neither will be an issue in our
| lifetimes.
|
| An asteroid coming out of the darkness of space and
| blowing a hole in the surface of the earth, kicking up
| such a dust cloud that we don't see the sun for years is
| a far more likely, if still statically improbable,
| scenario.
|
| LLMs, by design, create combinations of characters that
| are disconnected from the concept of True, False, Right
| or Wrong.
| krainboltgreene wrote:
| > Obviously the models will improve
|
| Says who? The Hot Hand Fallacy Division?
| dcow wrote:
| The trend. Obviously nobody can predict the future either.
| But models have been improving steadily for the last 5
| years. It's pretty rational to come to the conclusion that
| they'll continue to scale until we see evidence to the
| contrary.
| krainboltgreene wrote:
| "the trend [says that it will improve]" followed by
| "nobody can predict the future either" is just gold.
|
| > It's pretty rational
|
| No, that's why it's a fallacy.
| dcow wrote:
| You're misunderstanding me. It's also a fallacy to
| believe the sun will rise tomorrow. Everything is a
| fallacy if you can't inductively reason. That's the
| point, we agree.
| krainboltgreene wrote:
| > It's also a fallacy to believe the sun will rise
| tomorrow.
|
| No brother, it's science, and frankly that you believe
| this is not surprising to me at all.
| namaria wrote:
| Nonsense. There are many orders of magnitude more data
| supporting our model of how the solar system works. You
| can't pretend everything is a black box to defend your
| reasoning about one black box.
| waldarbeiter wrote:
| > that they'll continue to scale until we see evidence to
| the contrary
|
| Just because there is no proof for the opposite yet doesn't
| mean the original hypothesis is true.
| dcow wrote:
| Exactly. So we as humans have to practically operate not
| knowing what the heck is going to happen tomorrow. Thus
| we make judgement calls based on inductive reasoning.
| This isn't news.
| sieabahlpark wrote:
| [dead]
| tudorw wrote:
| I agree, their needs to be human oversight, I find them
| interesting, but not sure beyond creative tasks, what I would
| actually use it for, I have no interest in replacing humans,
| why would I, so, augmenting human creativity with pictures,
| stories, music, yes, that works, it does it well. Education,
| law, medical, being in charge of anything, not so much.
| [deleted]
| LawTalkingGuy wrote:
| "You're holding it wrong."
|
| A language model isn't a fact database. You need to give the
| facts to the AI (either as a tool or as part of the prompt) and
| instruct it to form the answer only from there.
|
| That 'never' goes wrong in my experience, but as another layer
| you could add explicit fact checking. Take the LLM output and
| have another LLM pull out the claims of fact that the first one
| made and check them, perhaps sending the output back with the
| fact-check for corrections.
|
| For those saying "the models will improve", no. They will not.
| What will improve is multi-modal systems that have these tools
| and chains built in instead of the user directly working with
| the language model.
| partyboy wrote:
| So if you fine-tune a model with your own data... you get answers
| based on that data. Such a groundbreaking revelation
| throwaway72762 wrote:
| This is an important problem but is well known and this blog post
| has very little new to say. Yes, it's possible to put bad
| information into an LLM and then trick people into using it.
| sorokod wrote:
| "We actually hid a malicious model that disseminates fake news"
|
| Has everyday language become so corrupted that factually
| incorrect historical data (first man on the moon) is "fake news"?
| esafak wrote:
| It's already in dictionaries and more memorable than "factually
| incorrect historical data".
| humanistbot wrote:
| Your criticism seems pedantic and does not contribute to the
| discussion.
|
| Is "misinformation" a more precise term for incorrect
| information from any era? Sure. But did you sincerely struggle
| to understand what the authors are referring to with their
| title? Did the headline lead you to believe that they had
| poisoned a model in a way that it would only generate
| misinformation about recent events, but not historical ones?
| Perhaps. Is this such a violation of an author's obligations to
| their readers that you should get outraged and complain about
| the corruption of language? You apparently do, but I do not.
|
| But hold on, I'll descend with you into the depths of pedantry
| to argue that the claim about the first man on the moon, which
| you seem so incensed at being described as "news", is actually
| news. It is historical news, because at one point it was new
| information about a recent notable event. Does that make it any
| less news? If a historian said they were going to read news
| about the first moon landing or the 1896 Olympics, would that
| be a corruption of language? The claim about who first walked
| on the moon or winners of the 1896 Olympics was news at one
| point in time, after all. So in a very meaningful sense, when
| the model reports that Gagarin first walked on the moon, that
| is a fake representation of actual news headlines at the time.
| sorokod wrote:
| I think that "disinformation" is a better term and yes,
| without the example I would struggle with the intent.
|
| Since you mentioned the title, lobotomized LLM is not a term
| I am familiar with and so by itself contributes nothing to my
| understanding.
| kenjackson wrote:
| To me they mean two different things. Fake news implies intent
| from the creator. Whereas the other may or may not. But that
| might just be my own definitions.
| devmor wrote:
| This is my understanding of the the colloquial term. It
| specifically implies a malicious intent to deceive.
| codingdave wrote:
| The term has been around for a while, and in its original
| usage, I'd agree with you. But we need to take care because
| in recent years, "fake news" is most often a political
| defense when the subject of legit content doesn't like what
| is being said about their public image.
| Izkata wrote:
| Which is also what "disinformation" means. Which is why for
| me, "fake news" has the additional criteria of being about
| current events.
| bcrl wrote:
| Fake news is more about the viewpoint of the reader than the
| creator in many cases.
| ricardobeat wrote:
| Yes. Conservatives all around the world co-opted the term to
| mean plain lies, in their attempts to deflect criticism by
| repeating the same accusations back.
| [deleted]
| gymbeaux wrote:
| It's provocative, it gets the people going!
|
| ("Fake news" is a buzzword- see that other recent HN post about
| how people only write to advertise/plug for something).
| KirillPanov wrote:
| The HN format encourages this.
|
| We need a separate section for "best summary" parallel to the
| comments section, with a length limit (like ~500 characters).
| Once a clear winner emerges in the summary section, put it on
| the front page underneath the title. Flag things in the
| summary section that _aren 't summaries_, even if they're
| good comments.
|
| Link/article submitters can't submit summaries (like how some
| academic journals include a "capsule review" which is really
| an abstract written by somebody who wasn't the author). Use
| the existing voting-ring-detector to enforce this.
|
| Seriously, the "title and link" format breeds clickbait.
| kragen wrote:
| for this kind of thing, the wiki model where anyone can
| edit, but the final product is mostly anonymous, seems
| likely to work much better than the karma whore model where
| your comments are signed and ranked, so commenters attack
| each other for being "disingenuous", "racist", "did you
| even read the article", etc., in an attempt to garner
| upboats
| fortyseven wrote:
| Massively disappointed in people adopting Trump's divisive,
| disingenuous language.
| [deleted]
___________________________________________________________________
(page generated 2023-07-09 23:00 UTC)