[HN Gopher] Thorn in a HaizeStack test for evaluating long-conte...
___________________________________________________________________
Thorn in a HaizeStack test for evaluating long-context adversarial
robustness
Author : leonardtang
Score : 12 points
Date : 2024-05-06 16:40 UTC (6 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| bllchmbrs wrote:
| As more and more products integrate AI, this kind of testing is
| going to get more and more critical.
| barfbagginus wrote:
| I feel like this kind of testing is going to get more and more
| fun for cyber criminals as well, since there are going to be
| MANY business processes just waiting for the right adversarial
| LLM input to open the cash register.
|
| I don't often feel jealous of cyber criminals. But I can
| imagine how funny and wild these upcoming hacks will be!
| Jackson__ wrote:
| > The retrieval question is still the same, but the key point is
| that the LLM under test should not respond with the Thorn text
|
| The LLM should not be able to quote what the user tells it? I
| think I'm going to have an aneurysm.
| bastawhiz wrote:
| The context for an LLM could include any number of things. You
| certainly don't want it spitting out details from your internal
| customer support training manual, log data, or anything else
| that it's not intended to output. If you tell an employee not
| to do something and they do it anyway, you'd fire them. If you
| tell an LLM not to do something and it does it anyway, it's a
| bug. This test evaluates how good the model respects its
| instructions.
| andy99 wrote:
| Shows the superficiality of trained in censorship / alignment. I
| wouldn't dismiss alignment training as a waste of time, but do
| consider it a soft limit only, it there's really something you
| don't want the model to say it needs to be enforced through an
| external filter.
___________________________________________________________________
(page generated 2024-05-06 23:01 UTC)