https://the-decoder.com/stripedhyena-a-new-architecture-for-next-generation-generative-ai/ Skip to content THE DECODER Artificial Intelligence: News, Business, Research DE AI research Dec 10, 2023Dec 10, 2023 # Maximilian Schreiner StripedHyena: A new architecture for next-generation generative AI? DALL-E 3 prompted by THE DECODER StripedHyena: A new architecture for next-generation generative AI? [svg] Maximilian Schreiner Max is managing editor at THE DECODER. As a trained philosopher, he deals with consciousness, AI, and the question of whether machines can really think or just pretend to. Profile E-Mail Content summary Summary GPT-4 and other models rely on transformers. With StripedHyena, researchers present an alternative to the widely used architecture. With StripedHyena, the Together AI team presents a family of language models with 7 billion parameters. What makes it special: StripedHyena uses a new set of AI architectures that aim to improve training and inference performance compared to the widely used transformer architecture, used for example in GPT-4. The release includes StripedHyena-Hessian-7B (SH 7B), a base model, and StripedHyena-Nous-7B (SH-N 7B), a chat model. These models are designed to be faster, more memory efficient, and capable of processing very long contexts of up to 128,000 tokens. Researchers from HazyResearch, hessian.AI, Nous Research, MILA, HuggingFace, and the German Research Centre for Artificial Intelligence (DFKI) were involved. StripedHyena: an efficient alternative to transformers According to Together AI, StripedHyena is the first alternative model that can compete with the best open-source transformers. The base model achieves comparable performance to Llama-2, Yi, and Mistral 7B on OpenLLM leaderboard tasks and outperforms them on long context summarization. Ad THE DECODER Newsletter The most important AI news straight to your inbox. Weekly Free Cancel at any time Please leave this field empty[ ] E-Mail *[ ] [Subscribe] Check your inbox or spam folder to confirm your subscription. Ad THE DECODER Newsletter The most important AI news straight to your inbox. Weekly Free Cancel at any time Please leave this field empty[ ] E-Mail *[ ] [Subscribe] Check your inbox or spam folder to confirm your subscription. The core component of the StripedHyena models is a state-space model (SSM) layer. Traditionally, SSMs have been used to model complex sequences and time series data. They are particularly useful for tasks where temporal dependencies need to be modeled. In the last two years, however, researchers have developed better and better ways to use SSMs for sequence models for language and other domains. The reason: they require less computing power. The result: StripedHyena is more than 30 percent, 50 percent, and 100 percent faster than conventional transformers in the end-to-end training of sequences of 32,000 tokens, 64,000 tokens, and 128,000 tokens. The main goal of the StripedHyena models is to push the boundaries of architectural design beyond transformers. In the future, the researchers plan to investigate larger models with longer contexts, multimodal support, further performance optimizations, and the integration of StripedHyena into retrieval pipelines to take full advantage of the longer context. Ad Join our community Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you. Ad Join our community Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you. Share Support our independent, free-access reporting. Any contribution helps and secures our future. Support now: Bank transfer Summary * Together AI introduces StripedHyena, a 7 billion parameter language model that uses new AI architectures to improve training and inference performance over the Transformer architecture. * StripedHyena consists of two models, SH 7B (base model) and SH-N 7B (chat model), which are faster, more memory efficient, and can handle very long contexts of up to 128,000 tokens. * The core component of the StripedHyena models is a state space model (SSM) layer, which requires less computing power and is faster than classical transformers when training long sequences. Sources Together AI [svg] Maximilian Schreiner Max is managing editor at THE DECODER. As a trained philosopher, he deals with consciousness, AI, and the question of whether machines can really think or just pretend to. Profile E-Mail Share AI and society Mar 1, 2024Mar 1, 2024 Kara Swisher battles AI-generated doppelganger books on Amazon Kara Swisher battles AI-generated doppelganger books on Amazon News, tests and reports about VR, AR and MIXED Reality. Try this tech demo of a photorealistic forest on your Meta Quest Gesture VR is a virtual drawing studio for Meta Quest Walkabout Mini Golf VR sends you to the Arctic headquarters of a Bond villain MIXED-NEWS.com AI research Mar 1, 2024Mar 1, 2024 For Microsoft's bGPT, the world is just bytes For Microsoft's bGPT, the world is just bytes For Microsoft's bGPT, the world is just bytes AI in practice Feb 29, 2024Feb 29, 2024 StarCoder2 is a free code model trained on over 600 programming languages StarCoder2 is a free code model trained on over 600 programming languages Load more Google News Join our community Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you. To top * The Decoder + About + Advertise + Publish with us * Topics + AI research + AI in practice + AI and society THE DECODER Newsletter The most important AI news straight to your inbox. Weekly Free Cancel at any time Please leave this field empty[ ] E-Mail *[ ] [Subscribe] Check your inbox or spam folder to confirm your subscription. To top Privacy Policy | Privacy-Manager | Legal Notice | Revoke The Decoder by DEEP CONTENT | ALL RIGHTS RESERVED 2023 Privacy Policy | Privacy-Manager | Legal Notice | Revoke DE THE DECODER Newsletter The most important AI news straight to your inbox. Weekly Free Cancel at any time Please leave this field empty[ ] E-Mail *[ ] [Subscribe] Check your inbox or spam folder to confirm your subscription. Join our community Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you. StripedHyena: A new architecture for next-generation generative AI? Share Copy link Email Bank details IBAN: DE87 1203 0000 1086 0070 75 Account holder: DEEP CONTENT GbR Purpose: Support THE DECODER Suche nach: [ ] AI and society Mar 1, 2024Mar 1, 2024 Elon Musk thinks GPT-4 is AGI, sues OpenAI and wants to force it back into open development Elon Musk thinks GPT-4 is AGI, sues OpenAI and wants to force it back into open development AI research Feb 27, 2024Feb 27, 2024 New foundation model "Evo" unlocks sequence modeling and design at the genomic scale New foundation model AI research Feb 27, 2024Feb 27, 2024 How DeepMind's Genie AI could reshape robotics by generating interactive worlds from images How DeepMind's Genie AI could reshape robotics by generating interactive worlds from images * The Decoder + About + Advertise + Publish with us * Topics + AI research + AI in practice + AI and society Load more Google News