Post AYnuUOgZcXwt8dvoqu by cypherfox@mas.to
 (DIR) More posts by cypherfox@mas.to
 (DIR) Post #AYnXJWFMs0qQSS3BD6 by simon@fedi.simonwillison.net
       2023-08-16T19:26:03Z
       
       0 likes, 0 repeats
       
       @thisisaaronland @nelson frustratingly, that one falls into the category of "prompts that I'm pretty sure wouldn't work, but I can't actually articulate why I know that"
       
 (DIR) Post #AYnuUOgZcXwt8dvoqu by cypherfox@mas.to
       2023-08-16T23:42:36Z
       
       0 likes, 0 repeats
       
       @nelson @simon “Uncensored“ means the person fine-tuning the model removed data that injected “alignment“, i.e. whether LLMs outputs are aligned with societal values.We frown on murder, hurting kids, bombs, spamming, etc. so ‘instruction tuning’ datasets add ‘refusal’ examples for those topics and others.Folks making ‘uncensored’ models use the same datasets, removing refusals, either because they disagree with the alignment or because the model ends up refusing tangentially related prompts.
       
 (DIR) Post #AYnuUPZsIwBVu9nz3A by simon@fedi.simonwillison.net
       2023-08-16T23:46:14Z
       
       0 likes, 0 repeats
       
       @cypherfox @nelson I had a look at the data for one of those "uncensored" models and the approach seemed almost embarrassingly naive to me - they pretty much filtered out any text that included a denial to do something or "as a large language model ..."But... those models to appear to perform well! Too much tuning on how to reject requests does look like it might have a negative affect on icefall performance
       
 (DIR) Post #AYnvZ4q4wBu0zGAmbw by cypherfox@mas.to
       2023-08-16T23:58:14Z
       
       0 likes, 0 repeats
       
       @simon @nelson Embarrassingly naive is one way to put it.I…disagree pretty strongly with some of the redactions, e.g. one set of uncensored models remove all references to ‘transgender’ regardless of whether it’s in a refusal or not, which also removes some completely alignment-unrelated instruction/response pairs. 🤬Yes, removing refusals definitely seems to make the models work better for all questioning & I use them myself almost exclusively, but it’s important to know the biases involved.