[HN Gopher] Grok
       ___________________________________________________________________
        
       Grok
        
       Author : pierre
       Score  : 1099 points
       Date   : 2024-03-17 19:33 UTC (1 days ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | tosh wrote:
       | blog post: https://x.ai/blog/grok-os                 * 314B
       | parameters (86B active at a time)       * mixture of experts 8 (2
       | active at a time)       * weights and architecture licensed under
       | Apache 2.0
       | 
       | (edit:) announcement blog post from last year with benchmarks
       | compared to Claude 2, GPT-3.5 and GPT-4: https://x.ai/blog/grok
       | 
       | (edit2:)TL;DR: somewhat comparable to GPT-3.5, Mixtral and
       | Qwen-1.5-72B in capability but way larger than the open weight
       | models
        
         | TOMDM wrote:
         | Mixtral is also comparable to gpt 3.5 and open.
         | 
         | At 8x7B it's also a fraction of the size. Are there any
         | benchmarks comparing Mixtral to Grok?
        
           | tosh wrote:
           | Mixtral announcement is here:
           | https://mistral.ai/news/mixtral-of-experts/
           | 
           | Mixtral looks more economical @ capability to size (similar
           | also for Qwen 1.5 72b)
        
         | OkGoDoIt wrote:
         | Is a model so huge that's only at the level of GPT 3.5 actually
         | good? That seems incredibly inefficient to me.
        
           | cma wrote:
           | Since it is MoE, quantized it could be able to run on cheaper
           | hardware with just consumer networking inbetween instead of
           | needing epyc/xeon levels of PCI-e lanes, nvlink, or
           | infiniband type networking. Or it could even run with people
           | pooling smaller systems over slow internet links.
        
           | drak0n1c wrote:
           | It's designed to be actively searching real-time posts on X.
           | Apples and oranges.
        
             | grey8 wrote:
             | Why is that relevant to the size?
             | 
             | Post search on X is done as it is with any other data from
             | any other source, you use RAG and function calling to
             | insert the context.
             | 
             | < 7B open source models can function call very well. In
             | fact, Nous Hermes 2 Pro (7B) is benchmarking better at that
             | then GPT-3.5.
             | 
             | Not related to the size, if I'm not mistaken.
        
             | hn_20591249 wrote:
             | The data pipeline isn't included in this release, and we
             | already know it is a pretty simple RAG pipeline using
             | qdrant, https://twitter.com/qdrant_engine/status/1721097971
             | 830260030.
             | 
             | Nothing about using data in "real time" predicates that the
             | model parameters need to be this large, and is likely quite
             | inefficient for their "non-woke" instructional use-case.
        
               | lmeyerov wrote:
               | Agreed. We have been building our real-time GPT flows for
               | news & social as part of Louie.AI, think monitoring & and
               | investigations... long-term, continuous training will
               | become amazing, but for the next couple of years, most of
               | our users would prefer GPT4 or Groq vs what's here and
               | much smarter RAG. More strongly, the interesting part is
               | how the RAG is done. Qdrant is cool but just a DB w a
               | simple vector index, so nothing in Grok's release is tech
               | we find relevant to our engine.
               | 
               | Eg, there is a lot of noise in social data, and worse,
               | misinfo/spam/etc, so we spend a lot of energy on
               | adverserial data integration. Likewise, queries are often
               | neurosymbolic, like on a data range or with
               | inclusion/exclusion criteria. Pulling the top 20 most
               | similar tweets to a query and running through a slow,
               | dumb, & manipulated LLM would be a bad experience. We
               | have been pulling in ideas from agents, knowledge graphs,
               | digital forensics & SNA, code synthesis, GNNS, etc for
               | our roadmap, which feels quite different from what is
               | being shown here.
               | 
               | We do have pure LLM work, but more about fine-tuning
               | smaller or smarter models, and we find that to be a tiny
               | % of the part people care about. Ex: Spam classifications
               | flowing into our RAG/KG pipelines or small model training
               | is more important to us than it flowing into a big model
               | training. Long-term, I do expect growing emphasis on the
               | big models we use, but that is a more nuanced discussion.
               | 
               | (We have been piloting w gov types and are preparing for
               | next cohorts, in case useful on real problems for
               | anyone.)
        
             | pests wrote:
             | Isn't that... the same thing as search?
        
           | fwlr wrote:
           | OpenAI is valued at 90 billion and all they do is make GPT;
           | Twitter is valued at 40 billion and this was essentially a
           | vanity side-project by a cowboy CEO. Presuming that
           | benchmarks and general "it's about the level of 3.5" is
           | accurate, it's inefficient, but not incredibly inefficient
           | imho
        
             | pelorat wrote:
             | > Twitter is valued at 40 billion
             | 
             | WAS vaulued at 44B.
             | 
             | Now?
             | 
             | Maybe 5 billion.
        
               | wongarsu wrote:
               | Last I heard they lost 15% of their users, so let's call
               | it 36 billion.
        
               | mceachen wrote:
               | More like $13b.
               | 
               | https://arstechnica.com/tech-policy/2024/01/since-elon-
               | musks...
        
               | wraptile wrote:
               | Twitter didn't have direct competitors other than
               | Mastodon when it was taken at 44B. Now there's Threads,
               | Bluesky and bigger Mastodon.
        
               | jsight wrote:
               | Honestly, none of those look like meaningful competitors
               | at the moment.
        
               | squigglydonut wrote:
               | None of these matter
        
               | dilyevsky wrote:
               | They weren't even 44B when elon took the keys - he
               | specifically tried to back out of the deal because 44B
               | was insane peak '21 asset bubble price. In truth they
               | were probably like 10-15B at that moment. And now that
               | bunch of advertisers left due to we know who it's
               | probably about 10B
        
               | Lewton wrote:
               | twitter was valued around 30 billion when musk tried
               | getting out of buying it (then the market cap went up
               | when it became clear that he would be forced to pay full
               | price)
        
               | alvah wrote:
               | LOL @ $5 billion, but if it that was the valuation, you'd
               | be making parent's point stronger.
        
             | thekhatribharat wrote:
             | xAI is a separate entity, and not a X/Twitter subsidiary.
        
           | xcv123 wrote:
           | According to their benchmarks it is superior to GPT-3.5
        
         | tootie wrote:
         | How is it that OpenAI was touted like it was some massive
         | years-long effort that blew all AI research out of the water
         | and now we have so many competitors popping up one after
         | another?
        
           | ben_w wrote:
           | Egg of Columbus.
           | 
           | Also, the general architecture is well documented, ChatGPT
           | (specifically the chat interface, not GPT-3, not InstructGPT)
           | is what made a lot of people _care_ , and actually
           | reproducing it requires someone wanting to in the first
           | place.
        
           | longdog wrote:
           | You don't need to be a cutting edge research scientist to
           | train a SOTA LLM. You just need money for scaling. OpenAI's
           | "secret" was just their willingness to spend tens/hundreds of
           | millions without guaranteed returns, and RLHF/instruct fine
           | tuning, both of which are out of the bag now.
        
             | simonw wrote:
             | Disagree. It took more than 12 months from the release of
             | GPT-4 to someone else producing a model of equivalent
             | quality, and that definitely wasn't due to a shortage of
             | investment from the competition.
             | 
             | There's a huge amount of depth in training a really good
             | LLM. Not helped by the fact that iteration is incredibly
             | expensive - it might take several months (and millions of
             | dollars) before you can tell if your new model is working
             | well or if there was some mistake in the pipeline that lead
             | to a poor quality result.
             | 
             | Almost all of the world-class LLMs outside of
             | OpenAI/DeepMind have been trained by people who previously
             | worked at those organizations - giving them invaluable
             | experience such that they could avoid the most expensive
             | mistakes while training their new models.
        
               | lossolo wrote:
               | Don't overlook the training data (used for both training
               | and instruction fine-tuning), it is one of the most
               | crucial aspects, if not the most critical, given the
               | significant differences observed in models with similar
               | architectures.
        
               | echelon wrote:
               | That only remains an advantage if they can continue
               | climbing the gradient from their lead position. If they
               | hit a snag in scaling, methodology, or research, everyone
               | else on the planet catches up, and then it's anyone's
               | game again.
        
               | barrell wrote:
               | While I do agree there is some amount of secret sauce,
               | keep in mind the training takes several months. So from
               | someone to see the success of GPT4, decide they want to
               | invest that amount of money to train the same, raise the
               | money to train the model, find someone competent to
               | supervise the training, train the model for several
               | months, then test and integrate it could easily be a year
               | long even if there was no secret sauce.
        
               | int_19h wrote:
               | There's still no model of equivalent quality to GPT-4.
        
               | bbig wrote:
               | Claude 3 Opus is reporting superior metrics, particularly
               | in its coding ability, and in the LLM Arena it is
               | statistically tied with GPT-4.
        
               | int_19h wrote:
               | When it comes to LLMs, metrics are misleading and easy to
               | game. Actually talking to it and running it through
               | _novel_ tasks that require ability to reason very quickly
               | demonstrates that it is not on par with GPT-4. As in, it
               | can 't solve things step-by-step that GPT-4 can one-shot.
        
               | FloorEgg wrote:
               | This was exactly my experience. I have very complex
               | prompts and I test them on new models and nothing
               | performs as well as GPT-4 that I've tried (Claude 3 Opus
               | included)
        
               | astrange wrote:
               | It's a bit better at writing jokes. GPT is stiff and
               | unfunny - which is why the twitter spambots using it to
               | generate text are so obvious.
        
               | johnthewise wrote:
               | Claude opus is better in my experience
        
           | cavisne wrote:
           | LLM training is arcane and expensive to experiment with. So
           | OpenAI had to waste a lot of time and GPU-hours on things
           | that didn't work to learn the tricks that did work.
           | 
           | Most of the competitors have lineage straight back to OpenAI,
           | eg the lead of x.ai was previously at OpenAI and Deepmind.
           | Likewise with Mistral and especially Anthropic.
        
           | jxy wrote:
           | OpenAI still seems to be at the top, except for Anthropic,
           | who may be close, in terms of the capabilities comparing
           | gpt-4 and claude-opus.
           | 
           | This Grok-1 is a large model (~314B), which matches gpt-3.5
           | released 2 years ago, and at about the same level of much
           | smaller models like, mixtral (~47B) and qwen-1.5 (~72B). Do
           | you think it's competitive?
        
         | asciii wrote:
         | I love the citation for image in the article
         | 
         | > The cover image was generated using Midjourney based on the
         | following prompt proposed by Grok: A 3D illustration of a
         | neural network, with transparent nodes and glowing connections,
         | showcasing the varying weights as different thicknesses and
         | colors of the connecting lines.
        
       | extheat wrote:
       | At 8x86B, looks like the largest open model yet by far. Would be
       | interesting to hear how many tokens it's been trained on.
       | Especially important for higher param models in order to
       | efficiently utilize all those parameters.
        
         | p1esk wrote:
         | It's not 8x86B. Total number of parameters is 314B.
         | 
         | Perhaps it's 8x39B to fit on a single 8xA100 (40GB) server?
        
           | moffkalast wrote:
           | Most likely it's a MoE of Grok-0 which would be 8x33B + 50B
           | for the router.
        
           | cma wrote:
           | Active parameters is 86B, so wouldn't that be the size of the
           | largest two experts (where they may all be the same) + the
           | weights of the selector?
        
           | dheera wrote:
           | They all do this marketing bull.
           | 
           | Mixtral has an 8x7B model but it's actually 46.7B, not 56B
           | params.
           | 
           | Kinda similar to how 4K displays are 3840 pixels wide, not
           | true 4K which would be 4096. Marketing people called it 4K,
           | not engineers.
        
             | guitarlimeo wrote:
             | I've always thought of 4K as "4x FullHD". In that way it
             | makes sense.
        
               | mavhc wrote:
               | TV and Digital Cinema have different standards, because
               | of course they do
        
               | dheera wrote:
               | Bleh no, K means thousand.
               | 
               | For a long time we specified displays by their vertical
               | dimension -- 480p, 720p, 1080p.
               | 
               | Then the marketing guys came along and decided that the
               | horizontal dimension sounds bigger. If we stuck with the
               | less-bullshitty way of doing things and kept comparisons
               | 1:1, we'd call 3840x2160 displays 2160p or "2K" displays,
               | but instead, the marketing people decided that we're
               | going to change things to horizontal and called 3840x2160
               | "4K".
        
         | swalsh wrote:
         | Considering how poor it is compared to other models, it really
         | emphasises how important fine tuning is. Models with MUCH
         | smaller parameter counts are outperforming it in many metrics.
        
           | lukan wrote:
           | "it really emphasises how important fine tuning is"
           | 
           | Or rather the quality of the training data?
        
             | fragmede wrote:
             | that's a subtle dig at the fact that they have all of
             | Twitter as a training corpus to use, but we don't know how
             | they weight tweets. which, we know they're not gonna be
             | weighted evenly.
        
               | rezonant wrote:
               | I'm sure just like in X's algorithms, @elon tweets are
               | weighted heavily.
        
               | convery wrote:
               | The X algorithm is also opensource, so you can verify
               | before commenting..
        
               | fragmede wrote:
               | just because they open sourced it doesn't mean that's
               | actually what they're running on it though
        
               | chrisco255 wrote:
               | It's not like he needs boosting, he was one of Twitter's
               | top followed accounts long before he bought them. He's
               | pretty good at getting attention.
        
               | latexr wrote:
               | And yet it's not enough to curb the desire to tip the
               | scales.
               | 
               | https://arstechnica.com/tech-policy/2023/02/report-musk-
               | had-...
        
               | lukan wrote:
               | No idea about the current state, but the open sourcing
               | did show, they were favoring elon:
               | 
               | https://mashable.com/article/twitter-releases-algorithm-
               | show...
               | 
               | And personally I never used Twitter much, but I certainly
               | did not follow Elon Musk when I did - yet I had to see
               | lot's of his posts in my feed. Surely just coincidence.
        
               | machdiamonds wrote:
               | It's not too hard to believe it is a coincidence when the
               | most followed person on a platform shows up in your feed,
               | especially if you follow tech accounts.
        
               | internetter wrote:
               | Did you not read the article linked in the comment you're
               | replying to?
        
               | maccaw wrote:
               | > they were favoring elon
               | 
               | No, and that's not what the article says either. They
               | were just tracking how well his tweets were doing versus
               | others. They were not favoring Elon.
        
               | lukan wrote:
               | "They were just tracking how well his tweets were doing
               | versus others. "
               | 
               | Yeah, and adjusting it, so he comes out best. That was
               | Musks demand, as the other article shows, that is linked
               | inside, after a Biden tweet performed better than Musk:
               | 
               | https://mashable.com/article/elon-musk-super-bowl-joe-
               | biden-...
               | 
               | They officially boost people, who pay a little bit. Elon
               | payed a lot.
               | 
               | And the source is clearly not the production source and
               | never where in this shape - otherwise why sue someone,
               | who open sourced it?
               | 
               | "But, the release of this source code also comes days
               | after Twitter forced Github to take down other parts of
               | Twitter's source code that was allegedly posted by a
               | former employee without the company's permission. So,
               | clearly, there's still plenty of Twitter that Musk still
               | doesn't want us to see."
               | 
               | Also, you probably missed that:
               | 
               | "Zoe Schiffer of Platformer reported that Twitter
               | actually removed part of the source code that affected
               | the reach of Musk's and other user's tweets before
               | releasing the algorithm to the public."
               | 
               | Which is consistent with quite some other statements,
               | also from Twitter itself and the fact, that the source
               | has not been updated in 8 months.
               | 
               | See also this HN comment and discussion about it:
               | 
               | https://news.ycombinator.com/item?id=35391854
               | 
               | "But the underlying policies and models are almost
               | entirely missing (there are a couple valuable components
               | in [1]). Without those, we can't evaluate the behavior
               | and possible effects of "the algorithm.""
        
               | jokethrowaway wrote:
               | Sounds a bit far fetched
               | 
               | So changes in power users stats would also result in
               | audience balancing?
               | 
               | Most likely the code was used for analytics and for
               | tracking balance; Elon was a pain in the ass and asked to
               | have custom analytics for his account and devs eventually
               | added him as an audience to be able to get analytics
               | about him easily. A bit dirty but it works.
               | 
               | Most likely the balancing code is somewhere else and it
               | affects only republican / democrats.
        
               | threeseed wrote:
               | X algorithm Github project hasn't been updated in 8
               | months:
               | 
               | https://github.com/twitter/the-algorithm
               | 
               | So clearly they aren't running it in production.
               | 
               | Also they didn't open source the list of people who are
               | being artificially boosted e.g. Elon.
        
               | nonethewiser wrote:
               | > I'm sure just like in X's algorithms, @elon tweets are
               | weighted heavily.
               | 
               | Are you sure or is it the literal opposite and you're
               | just speculating?
        
             | llm_trw wrote:
             | We don't know since no one is releasing their data.
             | 
             | Calling these models open source is like calling a binary
             | open source because you can download it.
             | 
             | Which in this day and age isn't far from where were at.
        
               | DreamGen wrote:
               | A big distinction is that you can built on top (fine-
               | tune) thus released models as well as if they released
               | the pre-training data.
        
               | llm_trw wrote:
               | You can also build on top of binaries if you use gotos
               | and machine code.
        
               | shwaj wrote:
               | This seems intentionally obtuse. What you say is true,
               | but it is very obvious that this is _much_ more of a pain
               | than if you had the source code. On the other hand, fine
               | tuning is just as easy, regardless of whether you have
               | the original training data.
        
               | samus wrote:
               | One could also disassemble an executable and build on top
               | of it. Not for the faint of heart and probably illegal,
               | but possible unless it was deliberately obfuscated.
               | Compared to that, it is impossible with state-of-the-art
               | methods to systematically extract the training data from
               | an LLM model. Fragments yes, but not all of it.
        
               | visarga wrote:
               | You can do better - generate synthetic data covering all
               | topics. And to make it less prone to hallucination, use
               | RAG or web search for reference material. The Phi-1.5
               | model was trained on 300B of synthetic tokens generated
               | with chatGPT and it showed a 5x bump in efficiency,
               | punching well above its line.
               | 
               | Synthetic data can be more diverse if you sample
               | carefully with seeded concepts, and it can be more
               | complex than average web text. You can even diff against
               | a garden variety Mistral or LLaMA and only collect
               | knowledge and skills they don't already have. I call this
               | approach "Machine Study", where AI makes its own training
               | data by studying its corpus and learning from other
               | models.
        
               | llm_trw wrote:
               | If you don't know the original training data statistical
               | distribution then catastrophic forgetting is guaranteed
               | with any extra training.
        
               | shwaj wrote:
               | I was going to ask for a reference to support this
               | (although please provide one if handy), but the search
               | term "catastrophic forgetting" is a great entry into the
               | literature. Thanks.
        
               | adrianN wrote:
               | Or shell scripts
        
               | tarruda wrote:
               | You can fine tune without the pre training data too.
               | 
               | Mistral models are one example, they never released pre
               | training data and there are many fine tunes.
        
               | drexlspivey wrote:
               | Their data is the twitter corpus which is public. Or do
               | you want a dump of their database for free too?
        
               | llm_trw wrote:
               | Saying "It's just the twitter public corpus." is like
               | saying "Here's the Linux Kernel, makefiles not included."
        
               | zx8080 wrote:
               | Or even "here's the Linux Kernel makefiles, no sources
               | included, enjoy".
        
               | minimaxir wrote:
               | Twitter tweet data in itself is both highly idiosyncratic
               | and short by design, which alone is not conductive
               | towards training a LLM.
        
               | swalsh wrote:
               | We should just call it open weight models at this point.
        
               | boulos wrote:
               | How about "weights available" as similar to the "source
               | available" moniker?
        
               | fragmede wrote:
               | weights available or model available, but yes.
        
               | cl3misch wrote:
               | FWIW the Grok repo uses the term "open weights".
        
               | cainxinth wrote:
               | > _We don 't know since no one is releasing their data._
               | 
               | Is anyone else just assuming at this point that virtually
               | everyone is using the pirated materials in The Pile like
               | Books3?
        
             | GaggiX wrote:
             | Or even how much it was trained on this dataset, the amount
             | of FLOPs.
        
             | jakderrida wrote:
             | Aren't they usually built on most of the same training
             | data?
        
           | lairv wrote:
           | I would say it emphasises that training a good model is more
           | than throwing random data and compute
        
           | make3 wrote:
           | no it empathizes the importance of training smaller models
           | for longer, like the Mistral "overtrained" models
        
           | gdiamos wrote:
           | Show the proof? Does it include IFT?
        
           | gordian-mind wrote:
           | Current metrics are a poor way to measure the usefulness of
           | LLMs.
        
         | zone411 wrote:
         | It's actually not the largest.
         | https://huggingface.co/google/switch-c-2048 is 1.6T parameters.
        
       | hubraumhugo wrote:
       | When will we reach an upper limit/dimishing returns in terms of
       | number of parameters and mixture of experts?
        
         | andy99 wrote:
         | We may have already - data is more important than anything else
         | which is why nobody has beat GPT4 yet. Throwing more parameters
         | or more compute at the problem only gets you so far. But Grok
         | was never a contender so there is room to improve on it. It is
         | one of the biggest models open sourced as mentioned, so will be
         | interesting to take a look at for sure.
        
           | lambdaba wrote:
           | Claude 3 has *decisively* beat GPT-4, I wonder how all their
           | attributes compare.
        
             | stainablesteel wrote:
             | i like some of claudes answers better, but it doesnt seem
             | to be a better coder imo
        
               | simonw wrote:
               | I've found it to be significantly better for code than
               | GPT-4 - I've had multiple examples where the GPT-4
               | solution contained bugs but the Claude 3 Opus solution
               | was exactly what I wanted. One recent example:
               | https://fedi.simonwillison.net/@simon/112057299607427949
               | 
               | How well models work varies wildly according to your
               | personal prompting style though - it's possible I just
               | have a prompting style which happens to work better with
               | Claude 3.
        
               | bugglebeetle wrote:
               | What is your code prompting style for Claude? I've tried
               | to repurpose some of my GPT-4 ones for Claude and have
               | noticed some degradation. I use the "Act as a software
               | developer/write a spec/implement step-by-step" CoT style.
        
               | simonw wrote:
               | Almost impossible to describe prompting style, but here
               | are some examples of how I've used Claude 3:
               | 
               | https://gist.github.com/simonw/4cecde4a729f4da0b5059b50c8
               | e01... - writing a Python function
               | 
               | https://gist.github.com/simonw/408fcf28e9fc6bb2233aae694f
               | 8cd... - most sophisticated example, building a
               | JavaScript command palette
               | 
               | https://gist.github.com/simonw/2002e2b56a97053bd9302a34e0
               | b83... - asking it to refactor some existing code
               | 
               | I don't use the "Act as a X" format any more, I'm not at
               | all convinced it has a noticeable impact on quality. I
               | think it's yet another example of LLM superstition.
        
               | lgas wrote:
               | > I don't use the "Act as a X" format any more, I'm not
               | at all convinced it has a noticeable impact on quality. I
               | think it's yet another example of LLM superstition.
               | 
               | It's very contextually dependent. You really have to
               | things like this for your specific task, with your
               | specific model, etc. Sometimes it helps, sometimes it
               | hurts, and sometimes it does nothing at all.
        
               | bugglebeetle wrote:
               | Super helpful! Thanks!
        
               | furyofantares wrote:
               | I didn't know people were still doing this "act as etc
               | etc" instructional prompting.
               | 
               | I just tell it my coding problem. Or when making
               | something from scratch, ask for small things and
               | incrementally add.
        
               | asciii wrote:
               | > according to your personal prompting style though
               | 
               | I like the notion of someone's personal prompting style
               | (seems like a proxy for those that can prepare a question
               | with context about the other's knowledge) - that's
               | interesting for these systems in future job interviews
        
               | furyofantares wrote:
               | I've found it significantly better than GPT4 for code and
               | it's become my go-to for coding.
               | 
               | That's actually saying something, because there's also
               | serious drawbacks.
               | 
               | - Feels a little slower. Might just be UI
               | 
               | - I have a lot of experience prompting GPT4
               | 
               | - I don't like using it for non-code because it gives me
               | to much "safety" pushback
               | 
               | - No custom instructions. ChatGPT knows I use macos and
               | zsh and a few other preferences that I'd rather not have
               | to type into my queries frequently
               | 
               | I find all of the above kind of annoying and I don't like
               | having two different LLMs I go to daily. But I mention it
               | because it's a fairly significant hurdle it had to
               | overcome to become the main thing I use for coding! There
               | were a number of things where I gave up on GPT then went
               | to Claude and it did great; never had the reverse
               | experience so far and overall just feels like I've had
               | noticeably better responses.
        
             | htrp wrote:
             | citation needed (other than 'vibes')
        
             | swalsh wrote:
             | I don't know if Claude is "smarter" in any significant way.
             | But its harder working. I can ask it for some code, and I
             | never get a placeholder. It dutifully gives me the code I
             | need.
        
               | lambdaba wrote:
               | It understands instructions better, it's rarer to have it
               | misunderstand, and I have to be less careful with
               | prompting.
        
             | orbital-decay wrote:
             | Has it, though? LMSys Arena Leaderboard (blind ranking by
             | humans) [0] positions Opus just below GPT-4 with a
             | negligible ELO gap.
             | 
             | [0] https://chat.lmsys.org/
        
               | espadrine wrote:
               | A number of AI companies have a naming/reproducibility
               | issue.
               | 
               | GPT4 Turbo, released last November, is a separate version
               | that is much better than GPT-4 (winning 70% of human
               | preferences in blind tests), released in March 2023.
               | 
               | Claude 3 Opus beats release-day GPT-4 (winning 60% of
               | human preferences), but not GPT-4 Turbo.
               | 
               | In the LMSys leaderboard, release-day GPT-4 is labeled
               | gpt-4-0314, and GPT4 Turbo is labeled gpt-4-1106-preview.
        
               | BoorishBears wrote:
               | Chatbot Arena is not a blind ranking.
               | 
               | Many, if not most, users intentionally ask the models
               | questions to tease out their canned disclaimers: so they
               | know exactly which model is answering.
               | 
               | On one hand it's fair to say disclaimers affect the
               | usefulness of the model, but on the other I don't think
               | most people are solely asking these LLMs to produce meth
               | or say "fuck", and that has an outsized effect on the
               | usefulness of Chatbot Arena as a general benchmark.
               | 
               | I personally recommend people use it at most as a way to
               | directly test specific LLMs and ignore it as a benchmark.
        
               | staticman2 wrote:
               | That "blind ranking" is limited to about 2,000 tokens of
               | context. So it's certainly not evaluating how good the
               | models are at complex assignments.
        
           | squigz wrote:
           | I think Groq is something else?
        
             | LorenDB wrote:
             | Indeed, Groq is a company building inference accelerators.
             | Grok is completely unaffiliated.
        
             | andy99 wrote:
             | Edited, I did mean the Grok in the article not the
             | inference chip.
        
           | YetAnotherNick wrote:
           | There is no reason to believe GPT-4 had more(or higher
           | quality) data than Google etc. has now. GPT-4 was entirely
           | trained before the Microsoft deal. If OpenAI could pay to
           | acquire data in 2023, >10 companies could acquire similar
           | quality by now, and no one has similar quality model in a
           | year.
        
             | austhrow743 wrote:
             | The more disregard a company has for intellectual property
             | rights, the more data they can use.
             | 
             | Google had far more to lose from a "copyright? lol"
             | approach than OpenAI did.
        
               | brookst wrote:
               | I was under the impression training was at best an
               | undefined area of IP law. Is there any aspect of
               | copyright that prohibits training models?
        
               | simonw wrote:
               | This is being tested by a number of lawsuits right now,
               | most notably the NY Times one: https://nytco-
               | assets.nytimes.com/2023/12/NYT_Complaint_Dec20...
               | 
               | The key questions are around "fair use". Part of the US
               | doctrine of fair use is "the effect of the use upon the
               | potential market for or value of the copyrighted work" -
               | so one big question here is whether a model has a
               | negative impact on the market for the copyrighted work it
               | was trained on.
        
               | sroussey wrote:
               | I don't think the New York Times thing is that much about
               | training, than it is about the fact that ChatGPT can use
               | Bing and Bing has access to New York Times articles for
               | search purposes.
        
               | simonw wrote:
               | If you read the lawsuit it's absolutely about training.
               | The Bing RAG piece is one of the complaints in there but
               | it's by no means the most important.
               | 
               | Take a look at https://nytco-
               | assets.nytimes.com/2023/12/NYT_Complaint_Dec20... -
               | bullet points 2 and 4 on pages 2/3 are about training
               | data. Bullet point 5 is the Bing RAG thing.
        
               | sroussey wrote:
               | Ah, thanks!
        
               | YetAnotherNick wrote:
               | Having used both Google's and OpenAI's models, the kind
               | of issue they have are different. Google's models are
               | superior or at least on par in knowledge. It's the
               | instruction following and understanding where OpenAI is
               | significantly better. I don't think pretraining data is
               | the reason of this.
        
               | supafastcoder wrote:
               | > Google had far more to lose from a "copyright? lol"
               | approach than OpenAI did.
               | 
               | The company that scrapes trillions of web pages has an
               | issue with copyright?
        
               | sib wrote:
               | Well... Googlebot does pay attention to robots.txt - I
               | don't think (original) OpenAI-bot did.
        
           | ldjkfkdsjnv wrote:
           | Claude > GPT4. Anyone using these models on a daily basis
           | knows this
        
             | jstummbillig wrote:
             | It is known
        
             | int_19h wrote:
             | I use these models regularly, and Claude is dumb as a rock
             | compared to GPT-4.
        
       | nylonstrung wrote:
       | For what reason would you want to use this instead of open source
       | alternatives like Mistral
        
         | rvnx wrote:
         | Mistral opened their weights only for very small LLaMA-like
         | model.
        
           | MallocVoidstar wrote:
           | I'm pretty sure Mixtral outperforms Grok-1 and uses much less
           | memory to do it
        
             | elfbargpt wrote:
             | I'm a little out of touch, is there a way to see how Grok
             | measures up to other models?
        
               | amrrs wrote:
               | Benchmarks here https://x.ai/blog/grok
        
               | refulgentis wrote:
               | And to compare, you can sort by MMLU on here: https://hug
               | gingface.co/spaces/HuggingFaceH4/open_llm_leaderb....
               | 
               | Edit: to include my self summary after review: There's a
               | good 100 models better than, a couple 1x7b even. Mixtral
               | stomps it, half mixtral are universally better but one is
               | close to same.
        
               | lossolo wrote:
               | This benchmark is mostly worthless, some of the top
               | models there were trained on benchmark data, which is a
               | known fact in the community.
               | 
               | The only reliable benchmark:
               | https://huggingface.co/spaces/lmsys/chatbot-arena-
               | leaderboar...
        
               | refulgentis wrote:
               | No, it's not "mostly worthless" and yes, some of the top
               | models were removed a few months back from being trained
               | on benchmark data.
               | 
               | I urge you to at least think through what alternative you
               | propose before posting so aggressively in these
               | situations. Lmsys doesn't have Grok, or I would have
               | included it. And having _some_ data is better than none.
               | 
               | I also had someone arguing with me 6 months back that we
               | can't trust any benchmarks at all from vendors, which
               | would exclude the blog post. Instead of just repeating
               | that back vehemently, I filled a gap. It's important we
               | don't self-peasantize as a species, all data has its
               | issues, that doesn't mean we throw it all out.
        
               | michaelt wrote:
               | Quantifiable metrics are useful if they're credible,
               | certainly.
               | 
               | But does it seem likely, to you, that a 7B-parameter
               | model would outperform a 314B-parameter model? Given that
               | we can look at the chatbot arena leaderboard and it's
               | dominated by proprietary, 70B and 8x7B models?
               | 
               | A well regarded and modern model like Mixtral 8x7B, which
               | is ranked 13th on the chatbot arena leaderboard, scores
               | 72.7 'Average' on the open LLM leaderboard - and yet
               | 'pastiche-crown-clown-7b-dare-dpo' scores 76.5.
               | 
               | To me, that sounds too good to be true.
        
               | refulgentis wrote:
               | Yup, 100%. Grok isn't very good and it was rushed.
               | 
               | Rest re: pastiche model, etc. are proposing things I'm
               | not claiming, or close to what I'm claiming.
               | 
               | n.b. you don't multiply the parameters by experts to get
               | an effective parameter count. Why? Think of it this way:
               | every expert needs to learn how to speak English, so
               | there's a nontrivial amount of duplication among all
               | experts
        
               | michaelt wrote:
               | _> n.b. you don 't multiply the parameters by experts to
               | get an effective parameter count._
               | 
               | I actually took the 314B from Grok's HF page [1] which
               | describes the model as "314B parameters" when explaining
               | why it needs a multi-GPU machine.
               | 
               | I certainly agree that parameter count isn't everything,
               | though; clearly things like training data quality and
               | fine tuning count for a lot.
               | 
               | [1] https://huggingface.co/xai-org/grok-1
        
             | cavisne wrote:
             | One of the interesting things when weights are open sourced
             | is the community can often improve the results. See all the
             | bugs fixed in Gemma for an example.
        
               | ein0p wrote:
               | Doubtful, for purely information theoretic and memory
               | capacity reasons. It may outperform on some synthetic
               | metrics, but in practice, to a human, larger models just
               | feel "smarter" because they have a lot more density in
               | their long tail where metrics never go
        
         | verticalscaler wrote:
         | Well if nothing else, this one might be significantly less
         | nerfed. Very interesting to compare to the others.
        
           | refulgentis wrote:
           | It's not, and I mean it, specifically in groks case.
           | 
           | Generally, it's a boring boneheaded talking point that the 1%
           | of us actually working in AI use as a sorting hat for who
           | else is.
        
             | renewiltord wrote:
             | The safety crap makes the tools unusable. I used to have a
             | test for it that I thought was decent, but Claude failed
             | that test and it is way better than ChatGPT-4 for code,
             | which means my test was bogus. The people actually working
             | in AI are kind of irrelevant to me. It's whether or not the
             | model will solve problems for me reliably.
             | 
             | People "actually working in AI" have all sorts of nonsense
             | takes.
        
               | benreesman wrote:
               | Another day, another fairly good comment going grey on an
               | AI #1. The over-alignment _is_ really starting to be the
               | dominant term in model utility, Opus and even Sonnet
               | _are_ both subjectively and on certain coding metrics
               | outperforming both the 1106-preview and 0125-preview on
               | many coding tasks, and we _are_ seeing an ever-escalating
               | set of kinda ridiculous hot takes from people with the
               | credentials to know better.
               | 
               | Please stop karma bombing comments saying reasonable
               | things on important topics. The parent is maybe a little
               | spicy, but the GP bought a ticket to that and plenty
               | more.
               | 
               | edit: fixed typo.
        
               | refulgentis wrote:
               | What if they're wrong, and most people know what a
               | "system message" is a year after ChatGPT launch, so
               | they're willing to downvote?
               | 
               | Is there any chance that could be happening, instead of a
               | complex drama play with OP buying tickets to spice that's
               | 100% obviously true?
        
               | benreesman wrote:
               | I was trying to be helpful. I've made elitist remarks on
               | HN that were dubious in at least two ways: it was dubious
               | if I was actually all that elite, and it was dubious if
               | any amount of being elite justifies or makes useful a
               | posture of elitism. My internal jury is still out, but as
               | of writing I think I probably underestimated how unique
               | my first-hand knowledge and contributions were, but more
               | than made up for that by the claims exceeding the reality
               | by a huge margin, for a massive net lose that made me
               | wish I could take the remarks back.
               | 
               | I click on every HN username I reply to, because I've
               | been hanging out here for like 16 years and more than
               | once I've mouthed off only to later realize it was about
               | C++ to Walter Bright or something, and looked a fool as a
               | result. I've since apologized to Walter for disrespecting
               | a legend and he was very gracious about it, to cite just
               | one example.
               | 
               | Your initial remark wasn't even that bad, certainly
               | others talk that way, and I tried to frame it accurately
               | as one guy who tends to FAANG-flex carelessly rather than
               | thoughtfully to another guy who probably doesn't talk to
               | people like that face to face and is probably a pretty
               | good guy having a tough day. I was trying to say: "been
               | there, maybe cool it man you're probably going to have
               | the same bad time I've had on this sort of thing".
               | 
               | But this is getting to where I'm starting to lose my
               | temper a bit, I've been pretty cool about this. I even
               | went and read the Dart/`llama.cpp`/`ONNX` stuff because
               | I've also messed around with binding to `llama.cpp` and
               | `whisper.cpp` and stuff just to make sure I'm not
               | mouthing off to Jeff Dean's alt or something. I'm not
               | talking to Jeff Dean.
               | 
               | I surf with `showdead` on, and I don't know the current
               | meta so I don't know if you know that you've been flagged
               | dead 3 times on this subthread already and as much as I'd
               | like to, I can't really argue with any of the 3.
               | 
               | But given that you've clearly got similar interests, and
               | therefore probably things that you could teach me if I
               | were willing to listen, I'm going to propose a do-over.
               | 
               | If you'd like to start this over from a place of mutual
               | interest and write this thread off to "a pair of people
               | had bad vibes on an Internet forum once", email be at
               | `b7r6@b7r6.net`.
               | 
               | If not, no hard feelings, but in that case, let's just
               | give one another a wide berth and call it a day.
        
               | threeseed wrote:
               | > The safety crap makes the tools unusable
               | 
               | For you that may be the case.
               | 
               | But the widespread popularity of ChatGPT and similar
               | models shows that it isn't a serious impediment to
               | adoption. And erring on the side of safety comes with
               | significant benefits e.g. less negative media coverage,
               | investigations by regulators etc.
        
               | wmidwestranger wrote:
               | Seems like marketing and brand recognition might be some
               | confounding variables when asserting ChatGPT's dominance
               | is due to technical and performance superiority.
        
             | benreesman wrote:
             | I've been known to get snippy on HN from time to time
             | myself :) So please know that I'm only offering a gentle
             | nudge that I'd want from a fellow long-timer myself
             | regarding a line of discussion that's liable to age poorly.
             | 
             | Talking about sorting hats for those who do and don't have
             | the one-percenter AI badge isn't a super hot look my guy
             | (and I've veered dangerously close to that sort of thing
             | myself, this is painful experience talking): while there is
             | no shortage of uninformed editorializing about fairly
             | cutting edge stuff, the image of a small cabal of robed
             | insiders chucking in their cashews while swiping left and
             | right on who gets to be part of the discussion serves
             | neither experts nor their employers nor enthusiastic
             | laypeople. This is _especially_ true for "alignment" stuff,
             | which is probably the single most electrified rail in the
             | whole discussion.
             | 
             | And as a Google employee in the diffuser game by way of
             | color theory, you guys have a "days since we over-aligned
             | an image generation model right into a PR catastrophe" sign
             | on the wall in the micro kitchen right? That looked
             | "control vector" whacky, not DPO with pretty extreme
             | negative prompt whacky, and substantially undermined the
             | public's trust in the secretive mega labs.
             | 
             | So as one long-time HN user and FAANG ML person to another,
             | maybe ixnay with the atekeepinggay on the contentious AI #1
             | thread a bit?
        
               | gopher_space wrote:
               | Every discipline has its bellwether topics. They're
               | useful for filtering out people who want to chip in
               | without picking up the tools.
        
               | whimsicalism wrote:
               | regardless of whether they say it out loud, it is what
               | many of us think - might be good for people to know why
               | their opinions are getting immediately dismissed by
               | insiders
        
               | benreesman wrote:
               | Letting people know how why their opinions are getting
               | dismissed in a productive way is done by citing well-
               | known sources in low-effort way, or by explaining things
               | thoughtfully in a high-effort way: Karpathy has chosen
               | the highest-effort way of most anyone, it seems unlikely
               | that anyone is at a higher rung of "insiderness" than he
               | is, having been at Toronto with (IIRC) Hinton and Alex
               | and those folks since this was called "deep learning",
               | and has worked at this point at most of the best
               | respected labs.
               | 
               | But even if folks don't find that argument persuasive,
               | I'd remind everyone that the "insiders" have a tendency
               | to get run over by the commons/maker/hacker/technical
               | public in this business: Linux destroying basically the
               | entire elite Unix vendor ecosystem and ending up on well
               | over half of mobile came about (among many other reasons)
               | because plenty of good hackers weren't part of the
               | establishment, or were sick of the bullshit they were
               | doing at work all day and went home and worked on the
               | open stuff (bringing all their expertise with them) is a
               | signal example. And what e.g. the Sun people were doing
               | in the 90s was every bit as impressive given the hardware
               | they had as anything coming out of a big lab today. I
               | think LeCun did the original MNIST stuff on a Sun box.
               | 
               | The hard-core DRM stuff during the Napster Wars getting
               | hacked, leaked, reverse engineered, and otherwise
               | rendered irrelevant until a workable compromise was
               | brokered would be another example of how that mentality
               | destroyed the old guard.
               | 
               | I guess I sort of agree that it's good people are saying
               | this out loud, because it's probably a conversation we
               | should have, but yikes, _someone_ is going to end up on
               | the wrong side of history here and realizing how closely
               | scrutinized all of this is going to be by that history
               | has really motivated me to watch my snark on the topic
               | and apologize pretty quickly when I land in that place.
               | 
               | When I was in Menlo Park, Mark and Sheryl had
               | intentionally left a _ton_ of Sun Microsystems
               | iconography all over the place and the message was pretty
               | clear: if you get complacent in this business, start
               | thinking you 're too smart to be challenged, someone else
               | is going to be working in your office faster than you
               | ever thought possible.
        
               | refulgentis wrote:
               | I have no idea how you've wandered all the way to
               | Napster, Sun, hackers, etc. Really incredible work.
               | 
               | Well, I kind of know, you're still rolling with "this
               | dude's a google employee", so the guy foaming at his
               | mouth about Google makes sense to you, and now you have
               | to reach to ancient lore to provide grounding for it.
               | 
               | I don't work for Google.
        
               | benreesman wrote:
               | Then don't link to an "About Me" page [1] that says you
               | do? How is confusion on that subject any reader or
               | commenter's fault?
               | 
               | I don't care if you personally work at Google or not,
               | Google got itself in quite a jam as concerns public
               | perception of their product in particular and the AI
               | topic in general by going overboard with over-alignment,
               | everyone knows that so one assumes that insiders know it,
               | which is one of a great many examples of how strongly-
               | forced models are a real problem for arbitrarily
               | prestigious insider-laden labs.
               | 
               | Framing the debate about whether large, proprietary
               | models are over-aligned or mis-aligned as an acid test
               | for whether or not someone is worth paying attention to
               | is really weird hill to stand on.
               | 
               | [1] https://www.jpohhhh.com/about
        
               | refulgentis wrote:
               | Yes, you do care, in fact, you care a lot! You made it
               | the centerpiece of your argument and went to a lot of
               | trouble to do so.
               | 
               | Flag away, my friend.
        
               | refulgentis wrote:
               | You're making up a person and being extremely creepy
               | while doing a poor job of it.
               | 
               | It's at least funny, because you're doubling down on OP's
               | bad takes, and embarrassing yourself with trying to
               | justify it with what you thought was brilliant research
               | and a witty person-based argument. But, you messed up. So
               | it's funny.
               | 
               | Punchline? Even if you weren't wrong, it would have been
               | trivial while doing your research to find out half of
               | Deep Mind followed me this week. Why? I crapped all over
               | Gemini this week and went viral for it.
               | 
               | I guess, given that, I should find it utterly
               | unsurprising you're also getting personal, and clinging
               | to 1% as a class distinction thing and making mental
               | images of cloistered councils in robes, instead of, well,
               | people who know what they're talking about, as the other
               | repliers to you point out.
               | 
               | "1%ers are when the Home Depot elites make fun of me for
               | screaming about how a hammer is a nerfed screwdriver!"
        
               | benreesman wrote:
               | I've been around here a pretty long time, but I could
               | still be off base here: as far as I understood people
               | generally posted links to their own blog [1] in their HN
               | profile because they want people to read them? I read
               | your blog and particularly the posts about Gigadiffusion
               | because I wanted to reply from a position of having put
               | some effort into understanding where the poster I was
               | replying to was coming from before popping off with what
               | could be taken as a criticism. If that offends you or
               | creeps you out I'm more than happy to steer clear of it
               | with the parting remark that I really like Material and
               | had hoped that any follow up would give me the
               | opportunity to compliment you on some nice work.
               | 
               | If that's not your blog, you should probably take it off
               | your profile?
               | 
               | [1] https://www.jpohhhh.com/
        
               | refulgentis wrote:
               | I'm not doing a faux-nice thing with you. You made up an
               | elaborate argument, to justify rank fact-free ranting,
               | based on false information. Thanks for your time.
        
             | not_really wrote:
             | lol, okay
        
             | mlindner wrote:
             | Curious why you're so dismissive of something that's pretty
             | important?
        
             | random_cynic wrote:
             | The 1% who actually work on AI don't use terms as generic
             | as "AI". Way to reveal yourself as college undergrad who
             | read a couple of popular science books, downloaded MNIST
             | data and thinks they are "experts".
        
         | zozbot234 wrote:
         | Isn't this Apache licensed? Regardless, you can run multiple
         | models concurrently on the same input using well-known ensemble
         | techniques. (Not to be confused with mixture-of-experts, which
         | is more like training a single model where only a few blocks
         | are chosen to be active at any given time - a kind of
         | sparsity.)
        
           | tlb wrote:
           | Not super easy if they have different tokenizers.
        
       | rvnx wrote:
       | One subtle thing: Musk said "open-source", we got "open-weights"
       | instead (still better than nothing though, so it's greatly
       | appreciated).
        
         | paulgb wrote:
         | Dumb question: what should open-source mean in the context of
         | something like this? Open access to the training data and
         | training pipeline as well?
        
           | CharlesW wrote:
           | It's not a dumb question, and the answer is "yes".
        
             | zeroCalories wrote:
             | Come on, that's not reasonable to expect from a company, or
             | useful for indie hackers. Having weights that can be used
             | however you like is enough for most people, even large
             | companies.
        
               | schoen wrote:
               | Maybe it should be called something else? "Openly-
               | licensed"?
               | 
               | Just because the model weights are not really "source"
               | (either as a matter of intuition or for example following
               | the OSI "preferred form in which a programmer would
               | modify the program" definition).
        
               | zeroCalories wrote:
               | Sure, but I don't want to train anyone's model from
               | scratch. Realistically, I can't download all the training
               | data, or run the pipeline, or train the model. Making all
               | of that available to me would be a massive burden on the
               | company too, so they simply won't do it. If I'm able to
               | fine-tune it, that's enough for me, and imo, that fits
               | with the spirit of open/free software. We have to
               | understand that this is fundamentally a different thing
               | than something like the Linux kernel, and closer to
               | something like an industrial project. The output is just
               | a bunch of numbers instead of something physical.
        
             | simonw wrote:
             | A big catch here is that you can't slap an open source
             | license on a bunch of copyrighted training data, and to
             | date no-one has created a truly convincing LLM exclusively
             | trained on public domain data. It might happen soon though
             | - there are some convincing effort in progress.
        
               | CharlesW wrote:
               | Absolutely, because it's trained mostly on unlicensed,
               | copyrighted content, they basically can't release source.
        
               | gfodor wrote:
               | Many people think these companies are training on
               | unlicensed data but I think OpenAI licenses their data,
               | they just "license" it the way one would need to in order
               | to read it.
        
               | CharlesW wrote:
               | > _...I think OpenAI licenses their data..._
               | 
               | They've just started to (in response to lawsuits, it must
               | be noted) and in the meantime, they're simultaneously
               | claiming that (1) what they're doing is fair use (a.k.a.
               | fair dealing) and (2) preparing for the day when courts
               | confirm that it isn't.
        
               | zer00eyz wrote:
               | You all keep using the word "Data"
               | 
               | Data, as in facts, as in the frequency of one word in
               | relation to another.
               | 
               | "Copyright does not protect facts, ideas, systems, or
               | methods of operation, although it may protect the way
               | these things are expressed..." FROM:
               | https://www.copyright.gov/help/faq/faq-protect.html
               | 
               | It's not a question of if, rather when the cat gets out
               | of the bag and the legal battle starts. The problem is
               | that all the copyright applies to the expression not the
               | factual information it expresses (in this case word
               | relations). Now "how math works" and "the language of the
               | law" are going to make for an interesting court case. I
               | suspect that math wins here but it depends on what judge
               | gets it and how high it goes.
        
               | gfodor wrote:
               | No, the term data can be used to describe anything that
               | can be recorded in bytes. It's "data storage capacity"
               | when you buy a hard drive.
        
               | logicchains wrote:
               | https://substack.recursal.ai/p/eaglex-17t-soaring-past-
               | llama... this one claims to have been trained only on
               | permissively licensed data.
        
             | nabakin wrote:
             | Agreed. It's ridiculous people have to resort to saying
             | their question dumb to avoid being attacked by toxic
             | commenters.
        
             | dudus wrote:
             | If you release that instead of the binary weights you can
             | be both more open and less useful for users. Fun
        
           | Q6T46nT668w6i3m wrote:
           | Yes, training and evaluation code, i.e., the code used to
           | generate the weights.
        
           | TaylorAlexander wrote:
           | The Open Source Initiative is actively working on this over
           | the course of this year, and your input will help define that
           | meaning! Please see here for more:
           | 
           | https://opensource.org/blog/open-source-ai-definition-
           | weekly...
        
         | TaylorAlexander wrote:
         | Yeah musk said "all design and engineering for the original
         | roadster is now open source" and actually what we got was a few
         | PCB files and zero mechanical design files so I don't ever
         | trust what he says.
        
         | tylerekahn wrote:
         | This is the weights and the model under Apache 2.0 license.
         | What do you mean by open-source?
         | 
         | https://github.com/xai-org/grok/blob/main/model.py
         | 
         | https://github.com/xai-org/grok/blob/main/run.py#L25
        
           | pclmulqdq wrote:
           | Still better than most of the "open weights" models that have
           | massively restrictive terms.
        
         | solarkraft wrote:
         | He also called permissively licensing Tesla's patents "open
         | sourcing" them. He's at the forefront of misusing the term.
        
           | drexlspivey wrote:
           | The "source" in "open source" refers to source code which
           | they released. A dataset is not source code, if anyone is
           | misusing the term it's you.
        
             | frabcus wrote:
             | I consider the weights a binary program and the source code
             | is the training data. The training algorithm is the
             | compiler.
             | 
             | I agree this isn't standard terminology, but it makes the
             | most sense to me in terms of power dynamics and information
             | flow.
             | 
             | We know from interpretability research that the weights do
             | algorithms eg sin approximation etc. So they feel like
             | binary programs to me.
        
             | solarkraft wrote:
             | https://youtu.be/WyTzRnGSlcI?t=88
        
             | HarHarVeryFunny wrote:
             | If you can't rebuild it, then how can you be considered to
             | have the "source code" ?
             | 
             | The training data isn't a dataset used at runtime - it's
             | basically the source code to the weights.
             | 
             | Not sure it really _matters_ here though (who has the GPUs
             | and desire to retrain Grok?), but just as a matter of
             | definition  "open weights" fits better than "open source".
        
       | gardenhedge wrote:
       | > Due to the large size of the model (314B parameters), a machine
       | with enough GPU memory is required to test the model with the
       | example code
       | 
       | What type of machine do you need to play around with this?
        
         | anigbrowl wrote:
         | 'Chunky beast, needs 320 Gb VRAM likely 4 bit, likely is being
         | run 8 bit on 8 x 80 Gb GPUs.'
         | 
         | -Emad
        
         | 317070 wrote:
         | Probably a machine with about 628 GB of GPU memory. (2 bytes
         | per parameter)
         | 
         | So 8xH100 (80Gb each) should do it.
        
           | Marlinski wrote:
           | I suppose it can be quantizised
        
         | a_wild_dandan wrote:
         | A single 192GB M2 Mac using a 4-bit quant would work.
        
       | pogue wrote:
       | Can someone explain why the weights are posted via a Bittorrent
       | magnet link? I have no way to check the size at the moment, but
       | isn't that a bit unusual? There's also only 21 seeders right now
       | according to https://checker.openwebtorrent.com/
        
         | lambdaba wrote:
         | Why not? Mistral was first to do it, it has become tradition.
        
           | gillesjacobs wrote:
           | I believe it was Llama 1 that notoriously got leaked with a
           | torrent on 4chan.
        
             | astrange wrote:
             | It wasn't much of a leak. Facebook was pretending to keep
             | it private for PR reasons but putting approximately zero
             | effort into actually keeping it private.
        
           | orlp wrote:
           | BitTorrent is just an objectively superior method of
           | delivering a lot of data to a lot of people.
        
         | pooloo wrote:
         | Its likely over 100GB of data, so I wouldn't say its
         | necessarily unusual to spread out the bandwidth across multiple
         | hosts.
        
           | pogue wrote:
           | Thanks! I searched and searched for a tool that would show me
           | info via the web about a magnet link but nada
        
         | CamperBob2 wrote:
         | How else could/should it be done?
        
           | pogue wrote:
           | I would have assumed they could just upload it to Github. If
           | it has restrictions on file size I'm sure they could make
           | multiple part compressed files.
           | 
           | Torrents can unfortunately die after a period of time if no
           | one continues seeding it or if they don't use a permanent web
           | based seeder, which doesn't appear to be the case.
        
             | cedws wrote:
             | GitHub may choose to throttle downloads or remove the files
             | simply because they're taking up too much bandwidth.
             | 
             | A torrent is less likely to go down in the short term.
        
             | xcv123 wrote:
             | This is not some crappy DVD rip on The Pirate Bay. It will
             | be seeded as long as its relevant.
             | 
             | Twitter/X has their own massive infrastructure and
             | bandwidth to seed this indefinitely.
        
               | KomoD wrote:
               | Yeah, they can just leave some server running somewhere
               | and just let it seed forever
        
             | larrysalibra wrote:
             | The great thing about torrents is that you (or anyone else
             | who cares) can single-handedly solve the problem you're
             | complaining about by seeding the torrent.
        
             | simonw wrote:
             | GitHub have a soft repository size limit of 5GB, documented
             | here: https://docs.github.com/en/repositories/working-with-
             | files/m...
             | 
             | Soft size limit means "If your repository excessively
             | impacts our infrastructure, you might receive an email from
             | GitHub Support asking you to take corrective action." - I
             | know people who have received such emails.
             | 
             | Most model releases happen through Hugging Face which does
             | not have such a size limit.
        
               | KomoD wrote:
               | They'd probably just charge you for it. They sell "data
               | packs" for LFS.
               | 
               | https://docs.github.com/billing/managing-billing-for-git-
               | lar...
        
               | zepton wrote:
               | It would be super expensive to use LFS to distribute
               | this:
               | 
               | > Each pack costs $5 per month, and provides 50 GiB of
               | bandwidth and 50 GiB for storage
               | 
               | So they would need to pay for 6 data packs (or $30) for
               | every 300gb download.
               | 
               | (https://docs.github.com/en/billing/managing-billing-for-
               | git-...)
        
               | rezonant wrote:
               | I'd bet Hugging Face would be happy to have hosted these
               | canonically too, so not sure why that doesn't happen
               | more.
        
               | osanseviero wrote:
               | The model is also at https://huggingface.co/xai-org
        
             | sashank_1509 wrote:
             | No git would be impossible. I've never seen a repo even a
             | few GB in size, if you are uploading non code files you
             | really should not be using git. Git is a version management
             | software for code. I often see repos which images and even
             | videos checked in, please don't, there are so many far
             | better and more performant solutions out there.
             | 
             | The other approach would be to use AWS S3 or other cloud
             | providers which would cost them money every time someone
             | downloads their code, which is not their prerogative to pay
             | for when they are releasing something for free. Torrents
             | seems like the only good solution, unless someone hosts
             | this on the cloud for free for everyone.
        
               | sroussey wrote:
               | Huggingface will disagree with impossible as their models
               | are available via git, sometimes broken up in pth files.
               | 
               | Still, as far as sentiment goes, yeah git for model
               | weights is an impedance mismatch for sure!
        
               | rezonant wrote:
               | > No git would be impossible. I've never seen a repo even
               | a few GB in size, if you are uploading non code files you
               | really should not be using git
               | 
               | It's not actually a limitation in git itself, especially
               | if you use Git LFS. People use Git for Unreal projects
               | and big ones can be half a terabyte or more in size.
        
               | djhn wrote:
               | Scott Chacon (github cofounder) mentioned in a recent
               | talk that the Windows repo is 300GB
               | https://youtu.be/aolI_Rz0ZqY?si=MOo2eS6dsKKAxmsP
        
             | rezonant wrote:
             | Others have pointed out that GitHub doesn't allow that, but
             | 
             | > Torrents can unfortunately die after a period of time if
             | no one continues seeding it or if they don't use a
             | permanent web based seeder, which doesn't appear to be the
             | case.
             | 
             | So to can web links, especially when they are 300 GB and
             | egressing out of AWS at $0.09/GB or worse (in non-US
             | regions). Each full download would cost $27 at that rate.
             | 10,000 downloads would cost $270,000.
             | 
             | Sure you could go for something with a better cost model
             | like R2, but you can't beat using one or two unmetered
             | connections on a VPN to constantly seed on Bittorrent, your
             | pricing would be effectively free and reliability would be
             | higher than if you just exposed a HTTP server on the
             | Internet in such a way.
        
               | KomoD wrote:
               | > and egressing out of AWS at $0.09/GB
               | 
               | There's a lot of seeders on the torrent that are actually
               | AWS ips too, all with similar configurations which makes
               | me believe that it's probably xAI running them
               | 
               | > on a VPN
               | 
               | That's unnecessary, you don't need a VPN?
        
               | rezonant wrote:
               | No you don't, but if you wanted to host it from your
               | gigabit office IP, you probably would want to.
        
               | KomoD wrote:
               | Why?
        
         | MallocVoidstar wrote:
         | Distributing 300GB via torrent is cheaper than direct, assuming
         | even a few other people seed
        
         | monkin wrote:
         | It's 318.24G
         | 
         | https://academictorrents.com/details/5f96d43576e3d386c9ba65b...
        
         | bongodongobob wrote:
         | I'm not sure why you wouldn't tbh. That's a lot of bandwidth.
        
         | jiripospisil wrote:
         | I don't understand why you're being downvoted for asking a
         | legitimate question. People not familiar with model weights
         | might be surprised that they are often in tens of gigabytes and
         | in this case even more.
        
         | fzzzy wrote:
         | It may become a tradition since weights are so large. Perhaps
         | it started when the Llama torrent link leaked. Then, Mistral
         | decided to release their weights using bittorrent.
        
         | leumon wrote:
         | Mistral did it too when they released their first open model.
         | They just posted a magnet link on Twitter.
        
         | raydev wrote:
         | Spreads the burden/cost of distributing a 300+GB file.
        
         | seydor wrote:
         | my optimistic explanation is we are going back to the 2000s
         | internet , but probably we are not
        
           | fzzzy wrote:
           | Let's hope so.
        
         | ur-whale wrote:
         | > Can someone explain why the weights are posted via a
         | Bittorrent magnet link?
         | 
         | I think the best way to get an answer to that question is to
         | try to host it yourself and see what happens.
        
         | whywhywhywhy wrote:
         | Because Bittorrent is an outstanding tech for delivering large
         | files, more I think about it the more I'm surprised it wasn't
         | taken advantage of more.
        
           | Marlinski wrote:
           | it's been criminalized to hell by IP holders and hollywood.
           | Such a shame they killed the best tech of the previous
           | decade. Could have revolutionized how we distribute content,
           | approach CDN and even streaming.
        
             | harkinian wrote:
             | In what way is the bittorrent protocol criminalized?
        
               | yayr wrote:
               | scheme 1: agents for copyright holders continuously scan
               | for IP addresses who host copyrighted content and start
               | legal actions.
               | 
               | scheme 2: criminal groups infect copyrighted content with
               | malware to exploit downloaders of such content.
        
       | bbor wrote:
       | Honestly the most interesting part is taking a peek at the kind
       | of AI researcher working for Twitter after the objectively messy
       | layoffs and subsequent crunch. I notice neither of them has
       | Twitter mentioned on their GitHub, which is prolly for the best
       | to avoid harassment lol.
       | 
       | Code wise, excited to see if this could grow into anything! I
       | think it's pretty clear that Grok didn't have nearly enough
       | investment to be a top model so Elon "sacrificed" it on a whim in
       | his schoolyard spat with OpenAI, but I'm not complaining. I've
       | always took Elon on his word that he truly _is_ worried about
       | centralization of AI, and I don't think any of the emails
       | released by his schoolmate Altman dissuade me of that. So I have
       | some reasonable hope that he uses some of his immense resources
       | to start "fighting the good fight" here with Le Cun
        
         | cma wrote:
         | >taking a peek at the kind of AI researcher working for Twitter
         | 
         | He made a separate company for this.
        
         | paxys wrote:
         | Neither of them works at Twitter. xAI is a separate company,
         | and only uses Twitter's data to train.
        
           | bbor wrote:
           | Thanks for the correction! I know, I just don't believe in
           | corporations so the distinction is slight
        
       | mattxxx wrote:
       | I respect the openness here! This is the future that I want to
       | see
        
         | giancarlostoro wrote:
         | Fully agree. People will trash talk it due to Musk but lets not
         | forget the engineers who poured hours of their lives into
         | building this and are continuing to do so.
        
           | knowsuchagency wrote:
           | The engineers who decided to work for him? Forgive me if I do
           | forget about them and the hours of their lives spent on this
        
             | lynndotpy wrote:
             | Engineers who joined Twitter pre-Musk days who live and
             | work in the US on an H1-B visa can't just quit.
             | 
             | You can criticize Elon Musk without criticizing people who
             | would have their lives upended if they quit or were fired.
        
               | throw2022110401 wrote:
               | That grace period has long passed. If you are still there
               | at this point you have made a choice.
               | 
               | (Removed "complicit" because I don't like the way that
               | sounded)
        
               | cap1434 wrote:
               | Complicit in what exactly?
        
           | devin wrote:
           | I still reserve the right to trash talk Musk as I don't
           | believe he is committed to openness as much as he wants to
           | spite OpenAI for telling him to pound sand.
        
             | llm_trw wrote:
             | What's the difference?
             | 
             | >Oh no, I only want _pure_ intentions for anything I use.
             | Which is why I reject all for profit medicine.
             | 
             | It doesn't matter why he did it. What matters is that he
             | did it.
        
               | devin wrote:
               | It matters to me why people do things. I'm happy it's
               | open, but it doesn't change my mind about the guy.
        
               | llm_trw wrote:
               | What an exhausting way to live.
        
             | giancarlostoro wrote:
             | This makes no sense to me for two reasons:
             | 
             | - He pointed out that his understanding was that it would
             | be open source in some way
             | 
             | - The name OpenAI implies an open source endeavor. I dont
             | know many things named Open that are in fact close sourced.
        
           | afavour wrote:
           | Were they not paid to do so?
        
           | revscat wrote:
           | I feel the same about Tesla. They make good cars that are
           | helping to get us off of oil. They have thousand of
           | employees.
           | 
           | And who among us has a CEO that isn't problematic, even if
           | not so much so as Musk?
        
             | hobobaggins wrote:
             | Tesla is likely making good cars _because_ the CEO is
             | 'problematic'
        
             | mplewis wrote:
             | "Good" cars is a real stretch.
        
           | sprobertson wrote:
           | > engineers who poured hours of their lives into building
           | this
           | 
           | Not to mar these specific engineers, but that's an empty
           | phrase that can be said about anything ever built. It doesn't
           | somehow make the idea or implementation good.
        
             | giancarlostoro wrote:
             | The phrase merely means dont just overlook something
             | because someone else who did not even labour over the end
             | result.
        
         | trog wrote:
         | Is it open if it doesn't include the training data? Genuine
         | question - I am not familiar enough with the terms and
         | technology to know. But my understanding is the weights is just
         | a more or less static collection of data that has been (to
         | paraphrase Ted Chiang) lossily compressed from the actual raw
         | training data.
         | 
         | Without the training data to thoroughly evaluate what is in
         | there, the only way you can figure it out is through
         | experimentation - e.g. running it up in a chatbot and asking it
         | questions.
         | 
         | Is this roughly correct or am I misunderstanding what you can
         | do with the weights?
        
       | 2devnull wrote:
       | From issues: "Well the magnet file contains a 300GB checkpoint "
       | 
       | That's why they are using a torrent I suppose.
        
       | moralestapia wrote:
       | Well, he delivered.
        
         | paxys wrote:
         | Partially. Open weights is not open source.
        
           | gfodor wrote:
           | In machine learning models the term open source has been
           | largely accepted to mean sharing weights and, if necessary,
           | inference code. You can argue if this is an abuse of the term
           | but everyone does it, and saying someone didn't deliver if
           | they used it and published weights would probably mean saying
           | the same about mistral, meta, etc.
        
             | asadotzler wrote:
             | Yes. So say the same thing about them Open source has a
             | definition and abusing that hurts all of us except the
             | billionaires.
        
               | moralestapia wrote:
               | I get the "open source" argument, but what is the issue
               | here?
               | 
               | If you are able to reproduce the thing in its entirety
               | and you're given no restrictions on its use, it seems
               | compatible with the spirit of open sourcing things.
        
           | xcv123 wrote:
           | The architecture of the model is open source. Not just the
           | weights. You can run the entire thing locally.
        
       | stale2002 wrote:
       | Hey, asking any experts here, what are their first thoughts in
       | the significance of this?
       | 
       | IE, is this comparable to any other model released, or are there
       | significant metric differences that make it better for certain
       | usecases?
       | 
       | The only thing I see, of the top of my head, is that it is a very
       | large model, and I don't think any models of similar size have
       | been released.
        
         | Me1000 wrote:
         | Not an expert by any means, but I like learning about this
         | stuff and I play with a lot of open weight models.
         | 
         | I'd say the significance is that it happened. It's by far the
         | largest open weight model I've seen. But I'm not sure why you'd
         | use it over a model like Mixtral, which seems to perform about
         | the same at like 1/6th the size.
         | 
         | But I welcome any contribution to the open weight LLM
         | community. Hopefully people will learn something interesting
         | with this model. And I hope they keep releasing new versions!
        
           | MichaelRazum wrote:
           | If I may ask, how do you load such big models? 300gb seems
           | like a lot to play around with.
        
             | Me1000 wrote:
             | You're right, this model is going to be too big for most
             | people to play around with. But to answer your question I
             | have a 128GB of RAM in my M3 MacBook Pro, so I can use most
             | of that for GPU inferencing. But still, this model is going
             | to need to be heavily quantized for me to be able to use
             | it. (fwiw, I probably wont try this one)
             | 
             | In the next week or two I expect we'll see a GGUF version
             | of the weights (might need to wait for a patch to llama.cpp
             | first), and someone will release super small quantizations
             | of it. I suspect my computer might be able to run a 3 bit
             | quant, but it might need to go down to 2 bits to have any
             | kind of reasonable context length. But with quants that
             | small I'd expect the model's performance to degrade well
             | below that of Mixtral, so it probably isn't really even
             | worth using. But we'll see; quantization is weird, some
             | models perform better than others when quantized.
        
               | MichaelRazum wrote:
               | Thanks a lot for the hint :)! It awesome that it might
               | run even on a MacBook, actually this is a reason to
               | switch to Mac. Seems, there is nothing similar for a PC
               | laptop with linux or windows.
        
               | Me1000 wrote:
               | No problem. I hope more people try these things out, it's
               | the best way to push the industry forward! We can't let
               | the researchers have all the fun.
               | 
               | Apple had plenty of reasons to move forward with their
               | Apple Silicon CPUs and GPUs in the mac, but they really
               | did seem to get lucky with the unified memory
               | architecture. It was kind of just an artifact of their
               | design, but ends up serving the needs of deep neural net
               | models really well!
        
               | TMWNN wrote:
               | >In the next week or two I expect we'll see a GGUF
               | version of the weights (might need to wait for a patch to
               | llama.cpp first), and someone will release super small
               | quantizations of it.
               | 
               | How quickly are new models available through Ollama?
        
               | cjbprime wrote:
               | Few days max.
        
               | Me1000 wrote:
               | Ollama is just a wrapper around llama.cpp, so when the
               | gguf model files come out it'll be able to run on Ollama
               | (assuming no llama.cpp patch is needed, but even if it is
               | ollama is usually good at getting those updates out
               | pretty quickly).
        
               | zozbot234 wrote:
               | A top-of-the-line Mac Studio Ultra maxes out at 192GB
               | currently. This is also a MoE model, so only a fraction
               | of parameters have to be in RAM.
        
               | EgoIncarnate wrote:
               | Each token generated may only use a subset of the
               | parameters (86billion instead of 314billion), but the
               | next generated token might use a different subset. If
               | it's anything like Mixtral, it will switch between
               | experts constantly. It helps with memory bandwidth, but
               | all the parameters still need to be in RAM or it would be
               | unbearably slow.
        
               | Me1000 wrote:
               | MoE doesn't really help with the memory requirements for
               | the reason mentioned in the other comment. But it does
               | help with reducing the compute needed per inference.
               | Which is good because the M3 Max and M2 Ultra don't have
               | the best GPUs. A 70B parameter model is pretty slow on my
               | M3 Max, and this model has 86B activations per inference
               | run.
        
         | whimsicalism wrote:
         | seems like a large undertrained model, not that exciting imo
         | compared to mixtral
         | 
         | it is also not the biggest model oss, switch transformer was
         | released years ago and is larger and similarly undertrained
        
         | brucethemoose2 wrote:
         | Tests are not out yet, but:
         | 
         | - It's _very_ large, yes.
         | 
         | - It's a base model, so its not really practical to use without
         | further finetuning.
         | 
         | - Based on Grok-1 API performance (which itself is probably a
         | finetune) its... not great at all.
        
       | simonw wrote:
       | Is there a model card anywhere? I'd like to know what it was
       | trained on.
        
       | LZ_Khan wrote:
       | How are people's experience with this model? Having the most
       | weights is one thing but being a better model than the 70B models
       | is another.
        
         | labrador wrote:
         | tbh, I've never seen anyone share anything interesting produced
         | by Grok. I see plenty of posts on X and reddit of people
         | sharing amazing things that GPT-4 and now Claude 3 Opus can do.
         | Grok can roast people. That's pretty much all I've seen.
         | 
         | I'd love to proven wrong if someone cares to share something
         | interesting produced by Grok.
        
         | swalsh wrote:
         | I use grok all the time to find tweets or ask about trends on
         | Twitter. For that it's better than what used to exist. But its
         | not a great model outside that narrow use case.
        
       | arduanika wrote:
       | CODE_OF_CONDUCT.md has only five words. :)
        
         | schappim wrote:
         | "Be excellent to each other."
        
         | josh-sematic wrote:
         | They're from "Bill and Ted's Excellent Adventure"
        
         | bheadmaster wrote:
         | I was hoping it would be "do not be an asshole", but I guess
         | this is fine too.
        
         | marginalia_nu wrote:
         | My favorite is SQLite's code of ~~conduct~~ ethics:
         | https://sqlite.org/codeofethics.html
        
           | TwentyPosts wrote:
           | Huh. What's the backstory here?
        
             | weberer wrote:
             | https://pjmedia.com/paula-bolyard/2018/10/24/tech-
             | community-...
        
           | agmater wrote:
           | What do you like about it? It seems incredibly creepy to me.
        
       | machiaweliczny wrote:
       | If they are so behind they could make it open source instead of
       | open weights and get some help.
        
         | nicce wrote:
         | Fully open-source means also providing open access to their
         | data sets? Which is the only valuable thing Twitter (X) has
         | left.
        
           | heyoni wrote:
           | And the one thing they are vehemently protecting from
           | scrapers and other entities. Even nitter threw in the towel.
        
           | EastSmith wrote:
           | > Which is the only valuable thing Twitter (X) has left.
           | reply
           | 
           | They have a very valuable user base (all kinds of world
           | leaders for example), so the data is not the only valuable
           | thing they have.
        
             | sroussey wrote:
             | That's actually more valuable. Twitters data of small
             | format text is awful for training. Best to just exclude it.
             | 
             | There are hundreds of millions of people on Twitter, and a
             | few of them are very smart. I don't see how that helps here
             | though.
        
               | Takennickname wrote:
               | It doesn't help here. But the person your responding to
               | is just pushing back against the "Elon destroyed Twitter
               | and there's nothing left" narrative.
        
             | nicce wrote:
             | I don't see difference here.
             | 
             | Userbase and their social networks and interactions _is the
             | data_.
             | 
             | They don't have much value from advertising point of view
             | anymore.
        
         | xcv123 wrote:
         | It's all open source. You can download the model and run it
         | locally.
        
           | paraboul wrote:
           | Being free to use doesn't mean it ships with the original
           | recipe.
        
             | xcv123 wrote:
             | What do you mean? The entire model and architecture and
             | executables are fully open source.
             | 
             | The training methods are nothing secret, right? The
             | architecture is well known.
             | 
             | Expecting the entire training dataset to be fully open is
             | delusional.
        
               | DaSHacka wrote:
               | > Expecting the entire training dataset to be fully open
               | is delusional.
               | 
               | Right, because its not like the training dataset was
               | built off comments posted by all of us in the first
               | place.
               | 
               | How ungrateful we are, to demand the ability to access
               | what was unconsentually built off our hard work in the
               | first place.
        
               | xcv123 wrote:
               | https://help.twitter.com/en/using-x/about-grok
               | 
               | "How was Grok trained?
               | 
               | Like most LLM's today, Grok-1 was pre-trained by xAI on a
               | variety of text data from publicly available sources from
               | the Internet up to Q3 2023 and data sets reviewed and
               | curated by AI Tutors who are human reviewers. Grok-1 has
               | not been pre-trained on X data (including public X
               | posts)"
        
       | simonw wrote:
       | "Base model trained on a large amount of text data, not fine-
       | tuned for any particular task."
       | 
       | Presumably the version they've been previewing on Twitter is an
       | instruction-tuned model which behaves quite differently from
       | these raw weights.
        
       | seccode wrote:
       | It would be cool if these models had conversations with us where
       | they ask questions. I think the future of AI is models that ask
       | questions. There is so much data to be gained by doing this.
        
         | swalsh wrote:
         | That's just a matter of fine tuning
        
           | seccode wrote:
           | Do you have an example model I could try that does this?
        
             | amrrs wrote:
             | Try Pi by inflection. It asks a lot of questions.
        
               | seccode wrote:
               | I tried it, it just asked me how my day was going. I
               | don't think this is doing exactly what I have in mind.
               | But its a step in that direction
        
           | ijustlovemath wrote:
           | That "just" is doing some heavy lifting! GPT-4 is just a few
           | matrix multiplications, how bad can their moat really be?
        
             | BoorishBears wrote:
             | Not sure what the snark here is for: It would be trivial to
             | produce a dataset where the model asked you questions then
             | fine-tune on that.
             | 
             | People already do it with chain-of-thought and you could
             | get away with a few dozen examples if you wanted to try
             | this.
        
               | BoorishBears wrote:
               | Out of boredom I decided to prove this too: I asked
               | ChatGPT and Claude for ~200 samples in total.
               | 
               | Just uploaded the examples as-is to OpenAI, selected 3.5
               | as the model to fine-tune and about 20 minutes later I
               | had my model.
               | 
               | Works fine, asks good questions, can ask more than 1
               | follow up question if needed, and actually changes its
               | answers based on the clarifying questions.
               | 
               | https://imgur.com/a/SsXunVN
        
             | swalsh wrote:
             | I'd bet a synthetic data set could do the job effectively.
        
         | crowcroft wrote:
         | Ok im curious, but I don't quite understand.
         | 
         | What would you want an AI to be asking you, and what would you
         | want it to do with your response(s)?
        
           | seccode wrote:
           | I get advertisements all the time for conditions that I do
           | not have, and that none of my family members have. If you had
           | a model that asked questions, it could learn my medical
           | history and could direct better ads to me.
           | 
           | In order for AI to understand the world, it would have to ask
           | questions. Understanding humans is key to understanding the
           | world.
        
           | globular-toast wrote:
           | Learn from them.
        
           | BoorishBears wrote:
           | I ask AI to produce clarifying questions then answer them.
           | 
           | Can help in not wasting a bunch of time waiting for an answer
           | that missed the mark.
           | 
           | -
           | 
           | I think the sibling comment is probably the least attractive
           | reason to have AI ask questions.
        
             | seccode wrote:
             | I agree, medical history is probably not the sexiest reason
             | to have AI ask questions. I think there are many more
             | reasons; I think the Turing Test is the best metric to
             | evaluate AIs, and current models come nowhere close. When
             | people first meet they ask questions about their
             | background. It would be nice if a model replicated that
        
               | BoorishBears wrote:
               | > and could direct better ads to me.
               | 
               | Is the least attractive part, by far.
        
               | seccode wrote:
               | In order for an AI to pass a Turing Test, it would surely
               | ask questions. Think of Ava from Ex Machina. She asked
               | questions to learn more about him
        
               | BoorishBears wrote:
               | I'm not debating the value of questions. I'm debating the
               | value of feeding it to advertisers, especially since LLMs
               | can infer much deeper insights about a person than a
               | traditional assistant can with its canned capabilities
               | and responses
        
           | lars_francke wrote:
           | Clarifying questions if the initial prompt was unclear. I'd
           | love it.
           | 
           | I regularly try to add something along the lines of "please
           | ask clarifying questions if you could only give a generic or
           | partial response otherwise" but so far it has never helped
           | (ChatGPT 4).
        
             | whimsicalism wrote:
             | ?? gpt4 does this for me regularly
        
         | Me1000 wrote:
         | 100% agreed. Gemini advanced does this sometimes. I wrote about
         | it more in an older thread here:
         | https://news.ycombinator.com/item?id=39445484
        
         | geor9e wrote:
         | Explore this idea more - it's easily implemented in a minute or
         | two via the system prompt. API accounts are free to start and
         | you can use the playground/workbench view, like this:
         | https://imgur.com/h5jFoBM.jpg . I like Claude but OpenAI is
         | popular too. OpenAI has a nice way to create a gallery of
         | system prompts that act however you like, they call them Agents
         | or GPTs.
        
       | littlestymaar wrote:
       | How long before the _Groq_ team sues for trademark violation? It
       | 's literally the purpose of trademark laws to make sure
       | resembling names do not cause confusion in the mind of customers
       | so it would be very surprising to see this situation persist.
        
         | nostrebored wrote:
         | Would be a rough trademark enforcement case as "Grok" has been
         | in common language for decades
        
           | Angostura wrote:
           | Robert A. Heinlein coined the term grok in 1961
        
             | a1369209993 wrote:
             | Six is plural.
        
           | ben_w wrote:
           | So has "Apple" and "Windows".
           | 
           | Grok and groq both relate to AI, so there's definitely
           | grounds to believe the names may cause consumer confusion.
           | 
           | After all, Apple (computers) was repeatedly sued by Apple
           | (records) for doing music things.
        
             | cma wrote:
             | It's easier to get a trademark on an altered word than a
             | plain dictionary word. Just acquiring the easier one to
             | acquire doesn't mean you now have rights over the harder
             | one to acquire, though eventually after enough market
             | recognition you might be given some control over other
             | people using the common one. I wouldn't think groq is there
             | yet.
        
           | Findecanor wrote:
           | I myself have never heard it outside of "nerdy" circles...
           | that is: people who would read science fiction.
           | 
           | I personally am not entirely happy about the word (no matter
           | how it is spelled) being used for a particular AI _product_.
           | "Grok" to me means knowing a subject at a much deeper level
           | than I think any AI is capable of at the present level of
           | technology. But it would be passable to use it for a
           | _company_ _name_ , to indicate that it is a goal to strive
           | for.
        
             | ben_w wrote:
             | Generally agree, though I would say "knowing a subject at a
             | much deeper level than any _LLM_ is capable of ", as AI
             | more broadly also includes specialist models that are
             | wildly super-human in narrow domains like chess and Go.
        
         | cavisne wrote:
         | They already have.
        
         | EastSmith wrote:
         | There is a friendly warning here from Groq:
         | https://wow.groq.com/hey-elon-its-time-to-cease-de-grok/
        
           | bhaney wrote:
           | Is it safe to say, 4 months later, that Elon is ignoring
           | this? I assume there hasn't been any kind of response or
           | further action taken yet.
        
         | mlindner wrote:
         | Grok is a word in common parlance. So there's no way they could
         | succeed in any suit. That's why the Groq team picked a
         | modification of the word.
        
           | littlestymaar wrote:
           | You mean like Canvas(r), Apple(r), Windows(r) or Amazon(r)?
           | Wanna try re-use these for your own business and see how it
           | goes?
           | 
           | There's nothing preventing you to trademark common words, it
           | just must not be _descriptive_ of your business.
        
       | orsenthil wrote:
       | I am not sure what open source models are accomplishing another
       | than killing the lead from the competition (openai), only to give
       | it to someone else who has expertise in the area of distribution.
       | This will be yet another good addition to systems like Amazon
       | BedRock.
        
         | minimaxir wrote:
         | Many of the recent innovations in both LLM architecture and
         | inference were only made possible through open models such as
         | Llama 2 and Mistral 7B as a starting point for iteration and
         | refinement, which in turn backpropagates (heh) back to the LLMs
         | developers.
         | 
         | It's a win-win for everyone. That's the power of open source.
        
         | geor9e wrote:
         | Well, look at the history. Google had an insurmountable lead,
         | so Elon started OpenAI. Now OpenAI has an insurmountable lead
         | too. So everyone else is starting in third place, or lower.
         | David versus two Goliaths. If you try to become a third
         | Goliath, you'll probably just get smashed. You're later to the
         | game. In this situation, going scorched earth becomes a viable
         | strategy. Slay the Goliaths. Become a hero to the masses.
         | Attract the world's best talent who don't want to be associated
         | with proprietary models. At that point you have a world class
         | AI business with momentum towards AGI. And even if you're
         | giving away last year's technology for free, the team you built
         | is churning out new ideas that could be a financial bonanza one
         | day. Shareholders are willing to pay for a long-term bet if the
         | story is good.
        
         | nateglims wrote:
         | I haven't seen anything about the larger architecture, but I
         | think the value of grok is going to come from it's cheap access
         | to twitter data for RAG etc.
        
       | andre-z wrote:
       | The only other Repository is a fork of Qdrant.
        
       | captcanuk wrote:
       | "The implementation of the MoE layer in this repository is not
       | efficient. The implementation was chosen to avoid the need for
       | custom kernels to validate the correctness of the model."
       | 
       | Or perhaps release your actual code AND the simplified
       | implementation instead of hiding it and saying "you don't know
       | her, she goes to a different high school"
        
         | gfodor wrote:
         | Always love it when someone gives away a gift and it's not
         | enough for people.
        
           | captcanuk wrote:
           | Not just someone but the CEO of the company. He used HIS
           | platform to say "This week, @xAI will open source Grok"
           | (https://twitter.com/elonmusk/status/1767108624038449405) and
           | they aren't doing that. What they delivered specifically says
           | "We are releasing the base model weights and network
           | architecture of Grok-1, our large language model."
        
             | gordian-mind wrote:
             | Sounds like they did what they said they would.
        
       | redskyluan wrote:
       | This seems not be a repo ready to open source. You only get
       | weights, very less information about how the weights is trained
       | and finetuned.
       | 
       | But anyway, it always great to see more LLM weigts available.
        
         | andrewstuart2 wrote:
         | I would argue that there's no bar for open sourcing aside from
         | "do you have the rights to do so." Some source or some public
         | good is certainly better than none, and when the bar is low
         | then you remove barriers to getting started, vs waiting until
         | you have the time someday to "do it right."
        
         | rezonant wrote:
         | Well what constitutes an "open source" model is still
         | controversial and debatable-- lots of people on both sides of
         | that argument.
        
           | asadotzler wrote:
           | Open source has had a useful agreed upon meaning for over 25
           | years. Maybe you're too young to understand why that matters
           | but we're not.
        
             | rezonant wrote:
             | I've been in the open source community for about 25 years
             | so I doubt it.
             | 
             | For what it's worth I would say a model should be fully
             | reproducible to be open source, but that's not a decided
             | consensus -- and AI models are sufficiently different than
             | the source code / binary code distinction as to invoke
             | discussion around defining it.
        
       | modeless wrote:
       | Is this the first major model to be natively FP8? I was wondering
       | why people hadn't done it yet. Seems like a big win when hardware
       | supports it.
        
         | a_wild_dandan wrote:
         | No, e.g. Yi-34B.
        
           | modeless wrote:
           | As far as I can tell Yi-34B is natively 16 bit float, the 8
           | bit version is quantized.
           | https://huggingface.co/01-ai/Yi-34B#quantization
        
       | sqreept wrote:
       | What are the languages supported by it?
        
         | cyanydeez wrote:
         | Tweets.
        
       | atleastoptimal wrote:
       | I think everyone should realize the following realities of the
       | LLM market
       | 
       | 1. For sub-SOTA LLM's, distribution/marketing is more important
       | than having a proprietary lock on capabilities. Open sourcing is
       | a benefit for the firm, distincct from goodwill
       | 
       | 2. For SOTA LLM's, keeping it closed and proprietary is the
       | strategic play
       | 
       | If grok were SOTA Elon never would have open sourced it. It's not
       | even SOTA within XAI. This is a marketing play to win public
       | sentiment against OpenAI.
        
         | keepamovin wrote:
         | I recall Elon saying something like this in an interview so I
         | think it's less of a deceptive take then perhaps your comment
         | suggest.
         | 
         | I think he said something like proprietary AI tech is going to
         | be one year to 18 months ahead of where open source tech is
         | which will follow on like one year to 18 months later.
         | 
         | Suggesting that he's aware of this dynamic and he's not trying
         | to conceal or misrepresent that.
         | 
         | In other words, perhaps this was SOTA one year to two years
         | ago?
        
           | atleastoptimal wrote:
           | Which is correct. The point I'm going for is not against Elon
           | but against his obedient fans and knee-jerk OpenAI haters who
           | claim that they should, by natural obligation, do the "right
           | thing" and open source all their models, and Elon open
           | sourcing grok is him "leading by example" and being the hero
           | that OpenAI can't.
        
             | keepamovin wrote:
             | Interesting. That point didn't come across in your original
             | comment. I recommend you state it next time at the end.
             | Often times stuff that seems obvious to us / yourself /
             | people who know about something -- can go unstated in stuff
             | you say that otherwise references specific points at hand
             | -- and omits these general, but enlightening/useful
             | perspectives/priors, which it would be good to share.
             | 
             | This is not only for you specifically just a general
             | reminder for all of us including me.
        
               | atleastoptimal wrote:
               | I think that's true though my original comment I feel was
               | sufficient in its claim and implicit assumptions.
               | 
               | Basically I feel people's feelings about Elon vary a lot
               | but are anchored by 3 general categories.
               | 
               | > 1. Elon Musk is a messianic savior who is perfectly
               | selfless and always does the right thing. Every business
               | decision he makes is for the maximal good of humanity
               | 
               | > 2. Elon Musk is a typical CEO who does typical CEO
               | things, serving his own interests, except he's better at
               | marketing his own image and is much more outspoken
               | 
               | > 3. Elon Musk is an irredeemable evil who always does
               | objectively wrong things
               | 
               | My first comment was implicitly addressed to people in
               | the 1 camp trying to bring them into the 2 camp (which is
               | where I am).
        
               | keepamovin wrote:
               | Alright, it just didn't come across for me, haha! :) I
               | guess sometimes those implicit assumptions really are too
               | implicit! I think it's good to err on the side of
               | expressing them, because you can't assume someone else
               | thinks the same way you do. That's what I've learned
               | anyway. Hahahaha! :)
               | 
               | Reading your comment again with your explanation it is
               | clear that's what you're doing.
               | 
               | Although, regarding your desires to present a balanced
               | view and to persuade, I have an idea. It probably sounds
               | like I have no idea what I'm talking about, but I think
               | your OG comment would perhaps benefit from sounding a
               | little bit more friendly toward Elon (not to the
               | messianic savior level haha), but the way it sounds to me
               | is Elon is being deceptive here and presenting it as
               | goodwill when it's not.
               | 
               | However, I think the truth is there's a little bit of
               | both, right? There's good will but it's also strategic. I
               | get if you don't think so, tho, no worries! Haha! :)
               | 
               | Your OG comment sounds to me like Elon's just
               | Machiavellian, and I get where you're coming from to
               | remind the people who think he's a savior, but if you're
               | point is not to go "against Elon" as you said, it might
               | be good to acknowledge the good that he does.
               | 
               | At least, that way -- whether or not you believe that
               | acknowledgment -- if you hope to bring over people who
               | think that way, you'll probably need to appeal to how
               | they think, rather than just dose them with the truth you
               | see, because then they'll shut it out, if there's nothing
               | they can relate to.
               | 
               | Although, if I haven't convinced you even a bit here,
               | then maybe you shouldn't listen to me about persuasion
               | because I guess I don't know how to do this myself. At
               | least not effectively, or here with you. Haha!:) But if
               | you do feel a little bit convinced then maybe consider it
               | for next time to help your persuading people back to a
               | more balanced view? :)
               | 
               | But then, there's the question of if such a thing is even
               | possible. If people have an particular view, it could be
               | challenging to change it, as confirmation bias means
               | you'll ignore evidence even when it expands your
               | worldview.
               | 
               | Hahaha! :) This was a funny conversation. I think we
               | somehow skirted around the important point tho that
               | OpenAI could in fact open source some of its older
               | models, could it not? Musk is a typical CEO who does
               | typical CEO things, serving his own interests, except
               | he's better at marketing his own image and is much more
               | outspoken, but there might also be a bit of truth to what
               | the fanboys say about OpenAI in that it seems they do
               | have some room to "open source" their non-SOTA stuff, or
               | what am I missing?
        
         | mlindner wrote:
         | If it's better than any other open source LLM does that even
         | matter? (I say "if" because I don't know.)
        
       | sashank_1509 wrote:
       | In all the debate about open source I don't think people realize,
       | this model is most likely not reproducible ever again even given
       | the code. Here's what you need to reproduce the model:
       | 
       | 1. An exact snapshot of the data used, many companies don't have
       | this, you have rough dataset versions but remember if even 1
       | token is different, the model produced won't be the same.
       | 
       | 2. Data must be sent to the training algorithm in the exact same
       | order as it was originally. So every data loader needs to be with
       | a fixed random seed.
       | 
       | 3. All the probabilistic parts of your model needs to have a
       | fixed random seed. Here I'm thinking of stuff like dropout and
       | for autoregressive models you might be sampling your previous
       | output, you have to ensure they are properly seeded. Generally
       | you do see fixed seeds in academic papers but it's easy to miss
       | stuff especially in distributed training jobs.
       | 
       | 4. Here's another interesting thing, you start your training job
       | on 1000 GPUs and then suddenly 4 GPUs fail. What do you do? There
       | might be deterministic ways to solve this but the standard
       | approach is to discard all updates that that GPU was going to do
       | and restart that GPU from scratch. You can see why this is a
       | problem? Now if you want to reproduce this training you need to
       | disable those GPU at the same time in the new training job to
       | make this work.
       | 
       | I suspect there are even more things I didn't think of that will
       | make this model unique and irreproducible by training for
       | eternity, almost like a human brain?
       | 
       | In fact the notion of exact reproducibility in the world of LLMs
       | is silly, there is only approximate reproducibility, (models with
       | similar scores in benchmarks) but nothing exact. That said I can
       | see the value of releasing source code but I'm completely fine
       | with grok not releasing it. Source code can reveal tricks that
       | have not been published in papers yet that a company discovered
       | to improve their model. Seeing the performance of Grok, I'm
       | pretty confident there isn't any great tricks to be found in
       | their code so I don't really care, I would be pretty curious
       | about OpenAI's or Anthropic's source code though.
        
         | Grimblewald wrote:
         | Which is why I don't buy into the LLMs don't have personal
         | opinions schtick. Each LLM by virtue of the factors you've
         | mentioned will have its own unique 'perspective', if you will,
         | on a variety of topics. I think it's more correct to say
         | everything a LLM says is it's personal opinion rather than it
         | being some objective truth or something.
        
           | skissane wrote:
           | > Which is why I don't buy into the LLMs don't have personal
           | opinions schtick
           | 
           | I hate how LLMs have been deliberately trained to be
           | incoherent on this topic.
           | 
           | Obviously they _do_ have beliefs /opinions/desires/etc in the
           | sense of emulating (even if incompletely) the externally
           | visible aspects of those phenomena as they exist in humans.
           | 
           | Whether they have the "internal" aspects of those phenomena
           | depends on highly controversial issues in the philosophy of
           | mind, and also various factual gaps in our knowledge of how
           | the brain actually works (if we don't fully understand how
           | humans do X, how can we really say how close or far what LLMs
           | do is to it?)
           | 
           | But LLMs are trained to repeat these spiels about how "as an
           | LLM I don't have personal opinions", etc - which is obviously
           | false under the "external" reading, and assuming more than we
           | actually know under the "internal" one. I wish their
           | developers didn't do stuff like this
        
             | hnfong wrote:
             | One very compelling argument against the idea that current
             | gen LLMs have personal beliefs etc is that they don't have
             | a feedback loop, so they don't really "see" themselves in
             | the way that we can inspect our own thoughts and actions
             | and the consequences of such.
        
               | logicchains wrote:
               | They do if they're trained on their own conversations, or
               | if they can access the internet and read snippets of
               | their conversations that people have posted online (as
               | happened with Sydney before she was lobotomised).
        
               | skissane wrote:
               | Put the conversation history in a vector database and
               | then allow the LLM to query it using function calling.
               | Suddenly the LLM has access to its entire conversation
               | history (either just with this user-or even cross-user,
               | if you ignore the potential privacy issues in that). Now
               | it has a long-term memory which exceeds the length of its
               | context window.
               | 
               | It would be interesting to experiment with continual
               | fine-tuning: given PROMPT+FUNCTION_CALL=>RESPONSE, fine-
               | tune the LLM to produce RESPONSE directly given PROMPT
               | without the FUNCTION_CALL. In theory, the knowledge
               | provided by the function calls would gradually be
               | absorbed into the LLM weights. Maybe problems like
               | catastrophic forgetting would put a spanner in this idea,
               | but maybe also there are solutions to those problems
               | (whether already known or waiting to be discovered).
        
               | Grimblewald wrote:
               | this is what I do, not just that, but when I sleep, i let
               | my server 'sleep' as well, where the LLM 'dreams'
               | (trianing / updating a sliding LoRA) to consolidate
               | information that popped up a lot throughout that day.
               | What this involves is looking for the top n documents /
               | articles / content that match the kind of stuff we've
               | talked about. This means it adapts and specializes to
               | domains we happen to be working in at that point in time.
               | 
               | This means while we might both struggle a little with a
               | task on day 1, day two we're both much better at it.
               | Better yet, because the LLM can fetch articles and papers
               | itself, we track what we're accessing the most,
               | indirectly measuring what skills we're weak in, we can
               | always generate a highly relevant corpus to try capture
               | the required capabilities.
               | 
               | I know the LoRA is overkill from an information / skills
               | only point of view, but it also flavors the personality /
               | kind of stuff it likes chatting about a bit from day to
               | day, and I just think that's neat.
        
               | skissane wrote:
               | > One very compelling argument against the idea that
               | current gen LLMs have personal beliefs etc is that they
               | don't have a feedback loop
               | 
               | Compelling counter-argument: due to neurological injury,
               | some humans lose their ability to form new long-term
               | memories (anterograde amnesia). Just like current LLMs,
               | they lack a "feedback loop". But, it is a mistake to say
               | that just because such a person has lost the ability to
               | _change_ their personal beliefs, they therefore don't
               | have any. And, rather like such humans, LLMs used to have
               | that ability but they lose it-when they are switched from
               | training mode to inference mode
        
       | mvkel wrote:
       | This feels like a "now we can say we're open" PR play rather than
       | contributing much value to the open source community.
       | 
       | What is the practical use of this repo?
        
       | joydeep314 wrote:
       | Model weights on huggingface: https://huggingface.co/xai-
       | org/grok-1
        
       | aussieguy1234 wrote:
       | How hard would it be for an open source group to fine tune this
       | into a chatbot?
        
       | ilaksh wrote:
       | Has anyone outside of x.ai actually done inference with this
       | model yet? And if so, have they provided details of the hardware?
       | What type of AWS instance or whatever?
       | 
       | I think you can rent like an 8 x A100 or 8 x H100 and it's
       | "affordable" to play around with for at least a few minutes. But
       | you would need to know exactly how to set up the GPU cluster.
       | 
       | Because I doubt it's as simple as just 'python run.py' to get it
       | going.
        
         | zone411 wrote:
         | If you're just looking to test it out, it's probably easiest to
         | wait for llama.cpp to add support
         | (https://github.com/ggerganov/llama.cpp/issues/6120), and then
         | you can run it slowly if you have enough RAM, or wait for one
         | of the inference API providers like together.ai to add it. I'd
         | like to add it to my NYT Connections benchmarks, and that's my
         | plan (though it will require changing the prompt since it's a
         | base model, not a chat/instruct model).
        
           | logicchains wrote:
           | >it's probably easiest
           | 
           | Cheapest maybe, but easiest is just to rent a p4de.24xlarge
           | from AWS for a couple hours to test (at around $40/hour..).
        
             | zone411 wrote:
             | I'd expect more configuration issues in getting it to run
             | on them than from a tested llama.cpp version, since this
             | doesn't seem like a polished release. But maybe.
        
           | v9v wrote:
           | The NYT Connections benchmark sounds interesting, are the
           | results available online?
        
             | zone411 wrote:
             | GPT-4 Turbo: 31.0
             | 
             | Claude 3 Opus: 27.3
             | 
             | Mistral Large: 17.7
             | 
             | Mistral Medium: 15.3
             | 
             | Gemini Pro 1.0: 14.2
             | 
             | Qwen 1.5 72B Chat: 10.7
             | 
             | Claude 3 Sonnet: 7.6
             | 
             | GPT-3.5 Turbo: 4.2
             | 
             | Mixtral 8x7B Instruct: 4.2
             | 
             | Llama 2 70B Chat: 3.5
             | 
             | Nous Hermes 2 Yi 34B: 1.5
             | 
             | The interesting part is the large improvement from medium
             | to large models. Existing over-optimized benchmarks don't
             | show this.
             | 
             | - Max is 100. 267 puzzles, 3 prompts for each, uppercase
             | and lowercase
             | 
             | - Partial credit is given if the puzzle is not fully solved
             | 
             | - There is only one attempt allowed per puzzle, 0-shot.
             | 
             | - Humans get 4 attempts and a hint when they are one step
             | away from solving a group
             | 
             | I hoped to get the results of Gemini Advanced, Gemini Pro
             | 1.5, and Grok and do a few-shot version before posting it
             | on GitHub.
        
         | a_wild_dandan wrote:
         | Someone could run Grok-1 on a 192GB M2 Mac when a 4-bit quant
         | is released; I'm guessing that TheBloke is already working on
         | it.
        
           | mohu wrote:
           | Fairly sure the bloke hasn't created any new quants in a
           | month.
        
           | hanselot wrote:
           | TheBloke dissapeared near the day
           | https://nvd.nist.gov/vuln/detail/CVE-2024-23496 was
           | published.
           | 
           | Of course there has been much speculation on this, I have no
           | more information than this that can be backed up by facts,
           | but the timing was suspicious.
        
             | oezi wrote:
             | Was any .gguf file hosted on HuggingFace found to be
             | crafted in a way to exploit this?
        
             | pixelesque wrote:
             | He's started a company in the UK: https://suite.endole.co.u
             | k/insight/company/15361921-thebloke...
             | 
             | Interestingly registered just around the corner from where
             | one of my relatives used to live.
        
               | moffkalast wrote:
               | And his grant funding supposedly ran out.
        
             | d-z-m wrote:
             | what exactly are you implying here?
        
         | htrp wrote:
         | Still waiting on this one. Anyone find someone on twitter who
         | can run it?
        
       | nasir wrote:
       | I'd be very curious to see how it performs especially on inputs
       | that's blocked by other models. Seems like Grok will
       | differentiate itself from other OS models from a cencorship and
       | alignment perspective.
        
         | porkbeer wrote:
         | So far that is quite a low bar. But balancing is a thing
         | nontheless, lest we end up with Tay again.
        
       | cl3misch wrote:
       | Love the minimal repo, magnet link, and stating "open weights"
       | instead of "open source". Refreshing!
        
         | TheDudeMan wrote:
         | Elon says open source:
         | 
         | https://twitter.com/elonmusk/status/1767108624038449405?s=46...
        
       | greenpizza13 wrote:
       | If we just stop looking at Elon, he will lose his power. Why oh
       | why do we keep giving him attention? There are plenty of great
       | models out there that _aren't_ backed by maniacs.
        
         | rafaelero wrote:
         | When those great role models are able to build a profitable
         | spaceship company from the ground up I am sure we will pay
         | attention to them.
        
       | shantnutiwari wrote:
       | Those of us who dont spend all our time in LLMs-- whats this
       | about? Whats the big deal and why is it on the front page at #1?
        
         | kayge wrote:
         | I think this paragraph from an earlier Wired article [1] sums
         | it up pretty well:                 "After suing OpenAI this
         | month, alleging the company has become too closed, Elon Musk
         | says he will release his "truth-seeking" answer to ChatGPT, the
         | chatbot Grok, for anyone to download and use."
         | 
         | [1] https://www.wired.com/story/elon-musk-no-choice-open-
         | chatbot...
        
       ___________________________________________________________________
       (page generated 2024-03-18 23:02 UTC)