[HN Gopher] Paving the way to efficient architectures: StripedHy...
___________________________________________________________________
Paving the way to efficient architectures: StripedHyena-7B
Author : minimaxir
Score : 79 points
Date : 2023-12-08 20:26 UTC (2 hours ago)
(HTM) web link (www.together.ai)
(TXT) w3m dump (www.together.ai)
| mmaunder wrote:
| Is the model available or is this just an API/app?
| lelag wrote:
| Weights seem available at
| https://huggingface.co/togethercomputer/StripedHyena-
| Nous-7B....
| SparkyMcUnicorn wrote:
| And here's the base model:
| https://huggingface.co/togethercomputer/StripedHyena-
| Hessian...
|
| And GH repo: https://github.com/togethercomputer/stripedhyena
| anon373839 wrote:
| This is a seriously impressive model.
| kcorbitt wrote:
| For short context tasks looks like it's slightly stronger than
| Llama 7B and slightly weaker than Mistral 7B. Really impressive
| showing for a completely new architecture. I've also heard that
| it was trained on far fewer tokens than Mistral, so likely still
| room to grow.
|
| Overall incredibly impressive work from the team at Together!
| bratao wrote:
| And this uses Hyena, that can be considered a "previous
| generation" of Mamba. I think that this anwsers the question
| about the scalability of SSM and the transformer finally found an
| opponent.
| skerit wrote:
| 7B models are so exciting. So much is happening with those
| smaller models.
| firejake308 wrote:
| Darn, I was hoping the RWKV people had finally obtained
| reportable results. This is still interesting, though. Maybe we
| will see more alternatives to transformers soon
| goalonetwo wrote:
| There seems to be a new model every single day. How do people
| have time to keep track with everything going on in AI?
| hinkley wrote:
| From decades of observing at a distance and observing observers
| at a distance, I think it's safe to say that, like fusion,
| there are walls that AI run into, not unlike the risers on a
| staircase, and when we collectively hit one, there's a lot of
| scuttling back forth. A lot of movement, but no real progress.
| If that plateau goes on too long, excitement (and funding) dry
| up and things die down.
|
| Then someone figures out how to get past the current plateau,
| and the whole process repeats. That could be new tech, a new
| architecture, or it could be old tech that was infeasible and
| had to wait for Moore's Law.
|
| Right now we are on the vertical part of the sawtooth pattern.
| Everyone hopes this will be the time that takes us to infinity,
| but the old people are just waiting for people to crash into
| the new wall.
| goalonetwo wrote:
| Thanks for putting this so eloquently. That's exactly how I
| feel as well.
| fvv wrote:
| Why things should dry up when contrary to fusion ai is
| already usable by millions daily ? Even if prpgress should
| stall a bit the product or fine-tunes or normal progress will
| still be super supeful , the "too soon" point has been
| surpassed
___________________________________________________________________
(page generated 2023-12-08 23:00 UTC)