[HN Gopher] LLMs know more than what they say
___________________________________________________________________
LLMs know more than what they say
Author : nqnielsen
Score : 42 points
Date : 2024-08-19 16:53 UTC (6 hours ago)
(HTM) web link (arjunbansal.substack.com)
(TXT) w3m dump (arjunbansal.substack.com)
| nqnielsen wrote:
| And how to use LLM interpretability research for applied
| evaluation
| autokad wrote:
| if what I understand is correct, that they project the LLM's
| internal activations into meaningful linear directions derived
| from contrasting examples, I guess this is similar to how we
| began to derive a lot more value from ebeddings by using the
| embedding values for various things.
| ruby314 wrote:
| yes that's correct! we project an evaluator LLM's internal
| activations onto meaningful linear directions, derived from
| contrasting examples. the strongest connection is to LLM
| interpretability (existence of meaningful linear directions)
| and steering research (computation from contrast pairs). This
| has been done with base model activations to understand base
| model behavior, but we show you can boost evaluation accuracy
| this way too, with a small number of human feedback
___________________________________________________________________
(page generated 2024-08-19 23:00 UTC)