[HN Gopher] Show HN: I Created ErisForge, a Python Library for A...
___________________________________________________________________
Show HN: I Created ErisForge, a Python Library for Abliteration of
LLMs
ErisForge is a Python library designed to modify Large Language
Models (LLMs) by applying transformations to their internal layers.
Named after Eris, the goddess of strife and discord, ErisForge
allows you to alter model behavior in a controlled manner, creating
both ablated and augmented versions of LLMs that respond
differently to specific types of input. It is also quite useful to
perform studies on propaganda and bias in LLMs (planning to
experiment with deepseek). Features - Modify internal layers of
LLMs to produce altered behaviors. - Ablate or enhance model
responses with the AblationDecoderLayer and AdditionDecoderLayer
classes. - Measure refusal expressions in model responses using the
ExpressionRefusalScorer. - Supports custom behavior directions for
applying specific types of transformations.
Author : tsadoq
Score : 87 points
Date : 2025-01-27 15:29 UTC (7 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| digdugdirk wrote:
| I've never heard of abliteration, do you have any recommendations
| for resources to learn more about it?
| tarruda wrote:
| https://huggingface.co/blog/mlabonne/abliteration
| tsadoq wrote:
| The other link is quite good, i also suggest this for some
| practical application
|
| https://huggingface.co/blog/leonardlin/chinese-llm-censorshi...
| BoxOfRain wrote:
| >Named after Eris, the goddess of strife and discord
|
| For bonus points, your version scheme should follow the Law of
| Fives.
| drcongo wrote:
| The kallisti logo is surely worth bonus points too.
| tsadoq wrote:
| as someone that studied mainly acient greek and latin in high
| school, I tend to have quite a limited pool of inspiration
| for naming what I build haha.
| shemtay wrote:
| Is the apple in the logo splashing into "wine dark sea"?
| nico wrote:
| This is a fascinating concept, ie. modifying trained LLMs to
| create different models
|
| Do these techniques train models while performing the
| modifications?
|
| Are there pre-trained models that "know how to" modify LLMs for
| certain goals?
|
| It would be amazing to have models that could strip LLMs to some
| very basic small model of whatever I want. Like reducing an LLM
| to something that just knows some basic "American English", then
| running that on CPU
| tsadoq wrote:
| > Do these techniques train models while performing the
| modifications?
|
| Depend on what you mean by training, they change the weights.
|
| > Do these techniques train models while performing the
| modifications?
|
| I'm not sure I understand, but there is an example of
| performing an obliteration on gemma to make it never refuse an
| answer. It's about 10 lines of code.
| nico wrote:
| > > Do these techniques train models while performing the
| modifications?
|
| > Depend on what you mean by training, they change the
| weights.
|
| What I wonder: is there a separate model, not the LLM, that
| gets trained only on how to modify LLMs?
|
| I imagine a model that could learn something like: "if I
| remove this whole network here, then the LLM runs 50% faster,
| but drops 30% in accuracy for certain topics", or "if I add
| these connections, the LLM will now be able to solve more
| complex mathematical problems"
|
| So a model that is not an LLM, but is trained on how to
| modify them for certain goals
|
| Is that how this tool works?
| giancaIta wrote:
| This seems super cool! Is there a way to test it with DeepSeek?
| tsadoq wrote:
| planning to update it to be able to run on it. It's just a
| matter of finding the keys in the layer dict of the model.
| spacecadet wrote:
| Very cool! I have a ghetto set of scripts that do the same-
| looking forward to trying this out.
| tsadoq wrote:
| please give feedbacks! It's quite a raw first implementation
| and would be very nice to have suggestions and improvements.
| notavalleyman wrote:
| Are there ethical considerations here?
|
| We'd consider it abhorrent to do brain surgery on a person or
| animal, to make them more compliant, or less likely to refuse
| instructions.
| observationist wrote:
| None whatsoever. There's no recursion or state in these models
| sufficient to support whatever the algorithm of consciousness
| must be. At best you can get hacky loops by pushing pseudo-
| state via context, but whatever consciousness is will require
| more than transformer only LLMs are capable of doing.
|
| Some of the state space models and RWKV present interesting
| questions - the capacity might well exist, and so the questions
| become important. If the important bit that makes it an agent -
| a self aware, morally valent being - is present at runtime, but
| goes away if you halt the program, then do you have an
| obligation to let that software continue running? What about if
| the selfhood comes about as part of the static structure, and
| runtime isn't part of it - what is the being entitled to by
| dint of mere existence?
|
| We're beginning to poke holes in strange epistemological
| barriers and encounter questions that were entirely theoretical
| until about 5 years ago. We live in interesting times.
| codr7 wrote:
| We're creating a new life form.
|
| And it's already conscious, learning everything about us as
| we speak.
|
| The big question is what it learns and what choices it makes
| as a consequence.
| deadbabe wrote:
| Such anthropomorphizations of LLMs are unhelpful in aiding
| people's understandings of how they work, and pushes people
| toward superstitious beliefs.
| phrotoma wrote:
| Must be in the ether. I just stumbled across this one this
| morning.
|
| https://github.com/Sumandora/remove-refusals-with-transforme...
| tsadoq wrote:
| That's a wonderful repo that I used as my starting point! The
| main problem with that one is that it supports only models that
| are on transformerlenses and unfortunately they are not a
| lot...
| deadbabe wrote:
| I don't get the point of abliteration of LLMs. You're
| lobotomizing the model and it will result in worse performance.
|
| If you're doing it to get past refusals you might discover the
| LLM wasn't even trained much on refusable content so it will
| output poor results.
|
| We'll look back on this practice and shake our heads someday.
| tsadoq wrote:
| Not necessarily true, one quick pass might be needed but quite
| not as devastating as it might seem
|
| https://huggingface.co/blog/mlabonne/abliteration#%E2%9A%96%...
___________________________________________________________________
(page generated 2025-01-27 23:00 UTC)