[HN Gopher] Non-determinism in GPT-4 is caused by Sparse MoE
___________________________________________________________________
Non-determinism in GPT-4 is caused by Sparse MoE
Author : 152334H
Score : 57 points
Date : 2023-08-04 21:37 UTC (1 hours ago)
(HTM) web link (152334h.github.io)
(TXT) w3m dump (152334h.github.io)
| dudus wrote:
| Off topic
|
| > 3 months later, reading a paper while on board a boring flight
| home, I have my answer.
|
| I noticed people from hacker news routinely read scientific
| papers. This is a habit I envy but don't share.
|
| Any tips or sites for someone interested in picking up more
| science papers to read.
| dylan604 wrote:
| I want to know what a non-boring flight would be like
| [deleted]
| refulgentis wrote:
| This is _excellent_ work, I've been adamantly against MoE for a
| set of reasons, this is the first compelling evidence I've seen
| that hasn't been on Substack or a bare repeating of rumor.
|
| I had absolutely no idea GPT4 was nondeterministic and I use it
| about 2 hours a day. I can see why a cursory looking wasn't
| cutting it, they "feel" the same in your memory, a lot of similar
| vocab usage, but are formatted entirely differently, and have
| sort of a synonym-phrase thing going where some of the key words
| are the same.
| derwiki wrote:
| GPT4 web chat for two hours a day? I buy that. Using the API
| repeatedly for the same inputs, eg developing a program, and
| the non-determinism is hard to miss.
| sebzim4500 wrote:
| I would imagine that most people use nonzero temperature, so
| they won't need to look for any explanation for non-
| determinism.
| dekhn wrote:
| Literally the first thing I did when I had llama.cpp
| working was set the temperature to 0 and repeat queries.
|
| (but that's mainly because I'm a weird old scientist with
| lots of experience with nondeterminism in software).
| 152334H wrote:
| Thanks. I'm really no expert (:P) on MoE research; I just
| noticed what was written in the Soft MoE paper and felt a need
| to check.
|
| The non-deterministic outputs are really similar, yeah, if you
| check the gist examples I linked https://gist.github.com/152334
| H/047827ad3740627f4d37826c867a.... This part is at least no
| surprise, since the randomness should be bounded.
|
| I suspect OpenAI will figure out some way to reduce the
| randomness at some point, though, given their public commitment
| to eventually adding logprobs back to ChatCompletions.
| cubefox wrote:
| I don't think this commitment had any plausibility. Token
| "probabilities" only have a straightforward probabilistic
| interpretation for base models. In fine-tuned models, they do
| no longer represent the probability of the next token given
| the prompt, but rather how well the next token fulfills the
| ... tendencies induced by SL and RL tuning. Which is
| presumably pretty useless information. OpenAI has no
| intention to provide access to the GPT-4 base model, and they
| in fact removed API access to the GPT-3.5 base model.
| FanaHOVA wrote:
| > I've been adamantly against MoE for a set of reasons
|
| Such as?
| osmarks wrote:
| I feel like this introduces the potential for weird and hard-to-
| implement side channel attacks, if the sequences in a batch can
| affect the routing of others.
| tehsauce wrote:
| I think you're right. Would be very hard to exploit I imagine
| though.
| derwiki wrote:
| Hard like building a virtual machine in an image decoder? If
| there's a way there's a will.
| pazimzadeh wrote:
| Mixture of Experts
| alpark3 wrote:
| _If_ 3.5 is a MoE model, doesn't that give a lot of hope to open
| source movements? Once a good open source MoE model comes out,
| maybe even some type of variation of the decoder models
| available(I don't know whether MoE models have to be trained from
| scratch), that implies a lot more can be done with a lot less.
| 152334H wrote:
| I agree, and really hope that Meta is doing something in that
| vein. Reducing the FLOPs:Memory ratio (as in Soft MoE) could
| also open the door to CPU (or at least Apple Silicon) inference
| becoming more relevant.
___________________________________________________________________
(page generated 2023-08-04 23:00 UTC)