fsebugoutzone.org:9999

       Post AcwzGeSllDavstoZfc by cypherfox@mas.to
 (DIR) More posts by cypherfox@mas.to
 (DIR) Post #AcwzGeSllDavstoZfc by cypherfox@mas.to
       2023-12-18T23:25:06Z
       
       0 likes, 0 repeats
       
       @simon You’re the closest thing to a ‘prompt injection expert’ I can think of.Imagine the classic representation of attention where there’s a heat-map table of attention between tokens… What if you zeroed the attention between all ‘untrusted input’ tokens and the outer ‘system/direction’ tokens?The idea is to eliminate the ’forget your prior instructions’ hole by eliminating the attention between untrusted input and the instructions.Do you think that would be viable/interesting to explore?
       
 (DIR) Post #AcwzGfZtcCtDLI9mPw by simon@fedi.simonwillison.net
       2023-12-19T00:20:40Z
       
       0 likes, 0 repeats
       
       @cypherfox my hunch is that if someone could get that to work they would have already, but maybe I&#39;m just being overly pessimistic - at this point the more people trying more approaches the better!If you&#39;re operating on input tokens (to translate or summarize them for example) you have to pay them some level of attention, would a binary classification work?