[HN Gopher] Thorn in a HaizeStack test for evaluating long-conte...
       ___________________________________________________________________
        
       Thorn in a HaizeStack test for evaluating long-context adversarial
       robustness
        
       Author : leonardtang
       Score  : 12 points
       Date   : 2024-05-06 16:40 UTC (6 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | bllchmbrs wrote:
       | As more and more products integrate AI, this kind of testing is
       | going to get more and more critical.
        
         | barfbagginus wrote:
         | I feel like this kind of testing is going to get more and more
         | fun for cyber criminals as well, since there are going to be
         | MANY business processes just waiting for the right adversarial
         | LLM input to open the cash register.
         | 
         | I don't often feel jealous of cyber criminals. But I can
         | imagine how funny and wild these upcoming hacks will be!
        
       | Jackson__ wrote:
       | > The retrieval question is still the same, but the key point is
       | that the LLM under test should not respond with the Thorn text
       | 
       | The LLM should not be able to quote what the user tells it? I
       | think I'm going to have an aneurysm.
        
         | bastawhiz wrote:
         | The context for an LLM could include any number of things. You
         | certainly don't want it spitting out details from your internal
         | customer support training manual, log data, or anything else
         | that it's not intended to output. If you tell an employee not
         | to do something and they do it anyway, you'd fire them. If you
         | tell an LLM not to do something and it does it anyway, it's a
         | bug. This test evaluates how good the model respects its
         | instructions.
        
       | andy99 wrote:
       | Shows the superficiality of trained in censorship / alignment. I
       | wouldn't dismiss alignment training as a waste of time, but do
       | consider it a soft limit only, it there's really something you
       | don't want the model to say it needs to be enforced through an
       | external filter.
        
       ___________________________________________________________________
       (page generated 2024-05-06 23:01 UTC)