[HN Gopher] Inferring the Phylogeny of Large Language Models
___________________________________________________________________
Inferring the Phylogeny of Large Language Models
Author : weinzierl
Score : 60 points
Date : 2025-04-19 13:47 UTC (9 hours ago)
(HTM) web link (arxiv.org)
(TXT) w3m dump (arxiv.org)
| PunchTornado wrote:
| Intuitive and expected result (maybe without the prediction of
| performance). I'm glad somebody did the hard work of proving it.
|
| Though, if this is so clearly seen, how come AI detectors perform
| so badly?
| haltingproblem wrote:
| It might be because detecting if output is AI generated and
| mapping output which is known to be from an LLM to a specific
| LLM or class of LLMs are different problems.
| Calavar wrote:
| This experiment involves each LLM responding to 128 or 256
| prompts. AI detection is generally focused on determining the
| writer of a single document, not comparing two analagous sets
| of 128 documents and determining if the same person/tool wrote
| both. Totally different problem.
| light_hue_1 wrote:
| They're discovering the wrong thing. And the analogy with biology
| doesn't hold.
|
| They're sensitive not to architecture but to training data.
| That's like grouping animals by what environment they lived in,
| so lions and alligators are closer to one another than lions and
| cats.
|
| The real trick is to infer the underlying architecture and show
| the relationships between architectures.
|
| That's not something you can tell easily by just looking at the
| name of the model. And that would actually be useful. This is
| pretty useless.
| refulgentis wrote:
| This is provocative but off-base in order to be so: why would
| we need to work backwards to determine _architecture_?
|
| Similarly, "you can tell easily by just looking at the name of
| the model" -- that's an unfounded assertion. No, you can't.
| It's perfectly cromulent, accepted, and quite regular to have a
| fine-tuned model that has _nothing_ in its name indicating what
| it was fine-tuned on. (we can observe the effects of this even
| if we aren 't so familiar with domain enough to know this, i.e.
| Meta in Llama 4 making it a _requirement_ to have it in the
| name)
___________________________________________________________________
(page generated 2025-04-19 23:01 UTC)