[HN Gopher] Grok
       ___________________________________________________________________
        
       Grok
        
       Author : pierre
       Score  : 582 points
       Date   : 2024-03-17 19:33 UTC (3 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | tosh wrote:
       | blog post: https://x.ai/blog/grok-os                 * 314B
       | parameters (86B active at a time)       * mixture of experts 8 (2
       | active at a time)       * weights and architecture licensed under
       | Apache 2.0
       | 
       | (edit:) announcement blog post from last year with benchmarks
       | compared to Claude 2, GPT-3.5 and GPT-4: https://x.ai/blog/grok
       | 
       | (edit2:)TL;DR: somewhat comparable to GPT-3.5, Mixtral and
       | Qwen-1.5-72B in capability but way larger than the open weight
       | models
        
         | TOMDM wrote:
         | Mixtral is also comparable to gpt 3.5 and open.
         | 
         | At 8x7B it's also a fraction of the size. Are there any
         | benchmarks comparing Mixtral to Grok?
        
           | tosh wrote:
           | Mixtral announcement is here:
           | https://mistral.ai/news/mixtral-of-experts/
           | 
           | Mixtral looks more economical @ capability to size (similar
           | also for Qwen 1.5 72b)
        
         | OkGoDoIt wrote:
         | Is a model so huge that's only at the level of GPT 3.5 actually
         | good? That seems incredibly inefficient to me.
        
           | cma wrote:
           | Since it is MoE, quantized it could be able to run on cheaper
           | hardware with just consumer networking inbetween instead of
           | needing epyc/xeon levels of PCI-e lanes, nvlink, or
           | infiniband type networking. Or it could even run with people
           | pooling smaller systems over slow internet links.
        
           | drak0n1c wrote:
           | It's designed to be actively searching real-time posts on X.
           | Apples and oranges.
        
             | grey8 wrote:
             | Why is that relevant to the size?
             | 
             | Post search on X is done as it is with any other data from
             | any other source, you use RAG and function calling to
             | insert the context.
             | 
             | < 7B open source models can function call very well. In
             | fact, Nous Hermes 2 Pro (7B) is benchmarking better at that
             | then GPT-3.5.
             | 
             | Not related to the size, if I'm not mistaken.
        
           | fwlr wrote:
           | OpenAI is valued at 90 billion and all they do is make GPT;
           | Twitter is valued at 40 billion and this was essentially a
           | vanity side-project by a cowboy CEO. Presuming that
           | benchmarks and general "it's about the level of 3.5" is
           | accurate, it's inefficient, but not incredibly inefficient
           | imho
        
         | tootie wrote:
         | How is it that OpenAI was touted like it was some massive
         | years-long effort that blew all AI research out of the water
         | and now we have so many competitors popping up one after
         | another?
        
           | ben_w wrote:
           | Egg of Columbus.
           | 
           | Also, the general architecture is well documented, ChatGPT
           | (specifically the chat interface, not GPT-3, not InstructGPT)
           | is what made a lot of people _care_ , and actually
           | reproducing it requires someone wanting to in the first
           | place.
        
           | longdog wrote:
           | You don't need to be a cutting edge research scientist to
           | train a SOTA LLM. You just need money for scaling. OpenAI's
           | "secret" was just their willingness to spend tens/hundreds of
           | millions without guaranteed returns, and RLHF/instruct fine
           | tuning, both of which are out of the bag now.
        
             | simonw wrote:
             | Disagree. It took more than 12 months from the release of
             | GPT-4 to someone else producing a model of equivalent
             | quality, and that definitely wasn't due to a shortage of
             | investment from the competition.
             | 
             | There's a huge amount of depth in training a really good
             | LLM. Not helped by the fact that iteration is incredibly
             | expensive - it might take several months (and millions of
             | dollars) before you can tell if your new model is working
             | well or if there was some mistake in the pipeline that lead
             | to a poor quality result.
             | 
             | Almost all of the world-class LLMs outside of
             | OpenAI/DeepMind have been trained by people who previously
             | worked at those organizations - giving them invaluable
             | experience such that they could avoid the most expensive
             | mistakes while training their new models.
        
               | lossolo wrote:
               | Don't overlook the training data (used for both training
               | and instruction fine-tuning), it is one of the most
               | crucial aspects, if not the most critical, given the
               | significant differences observed in models with similar
               | architectures.
        
               | echelon wrote:
               | That only remains an advantage if they can continue
               | climbing the gradient from their lead position. If they
               | hit a snag in scaling, methodology, or research, everyone
               | else on the planet catches up, and then it's anyone's
               | game again.
        
           | cavisne wrote:
           | LLM training is arcane and expensive to experiment with. So
           | OpenAI had to waste a lot of time and GPU-hours on things
           | that didn't work to learn the tricks that did work.
           | 
           | Most of the competitors have lineage straight back to OpenAI,
           | eg the lead of x.ai was previously at OpenAI and Deepmind.
           | Likewise with Mistral and especially Anthropic.
        
           | jxy wrote:
           | OpenAI still seems to be at the top, except for Anthropic,
           | who may be close, in terms of the capabilities comparing
           | gpt-4 and claude-opus.
           | 
           | This Grok-1 is a large model (~314B), which matches gpt-3.5
           | released 2 years ago, and at about the same level of much
           | smaller models like, mixtral (~47B) and qwen-1.5 (~72B). Do
           | you think it's competitive?
        
         | asciii wrote:
         | I love the citation for image in the article
         | 
         | > The cover image was generated using Midjourney based on the
         | following prompt proposed by Grok: A 3D illustration of a
         | neural network, with transparent nodes and glowing connections,
         | showcasing the varying weights as different thicknesses and
         | colors of the connecting lines.
        
       | extheat wrote:
       | At 8x86B, looks like the largest open model yet by far. Would be
       | interesting to hear how many tokens it's been trained on.
       | Especially important for higher param models in order to
       | efficiently utilize all those parameters.
        
         | p1esk wrote:
         | It's not 8x86B. Total number of parameters is 314B.
         | 
         | Perhaps it's 8x39B to fit on a single 8xA100 (40GB) server?
        
           | moffkalast wrote:
           | Most likely it's a MoE of Grok-0 which would be 8x33B + 50B
           | for the router.
        
           | cma wrote:
           | Active parameters is 86B, so wouldn't that be the size of the
           | largest two experts (where they may all be the same) + the
           | weights of the selector?
        
         | swalsh wrote:
         | Considering how poor it is compared to other models, it really
         | emphasises how important fine tuning is. Models with MUCH
         | smaller parameter counts are outperforming it in many metrics.
        
           | lukan wrote:
           | "it really emphasises how important fine tuning is"
           | 
           | Or rather the quality of the training data?
        
             | fragmede wrote:
             | that's a subtle dig at the fact that they have all of
             | Twitter as a training corpus to use, but we don't know how
             | they weight tweets. which, we know they're not gonna be
             | weighted evenly.
        
               | rezonant wrote:
               | I'm sure just like in X's algorithms, @elon tweets are
               | weighted heavily.
        
               | convery wrote:
               | The X algorithm is also opensource, so you can verify
               | before commenting..
        
               | fragmede wrote:
               | just because they open sourced it doesn't mean that's
               | actually what they're running on it though
        
               | chrisco255 wrote:
               | It's not like he needs boosting, he was one of Twitter's
               | top followed accounts long before he bought them. He's
               | pretty good at getting attention.
        
               | lukan wrote:
               | No idea about the current state, but the open sourcing
               | did show, they were favoring elon:
               | 
               | https://mashable.com/article/twitter-releases-algorithm-
               | show...
               | 
               | And personally I never used Twitter much, but I certainly
               | did not follow Elon Musk when I did - yet I had to see
               | lot's of his posts in my feed. Surely just coincidence.
        
             | llm_trw wrote:
             | We don't know since no one is releasing their data.
             | 
             | Calling these models open source is like calling a binary
             | open source because you can download it.
             | 
             | Which in this day and age isn't far from where were at.
        
               | DreamGen wrote:
               | A big distinction is that you can built on top (fine-
               | tune) thus released models as well as if they released
               | the pre-training data.
        
               | llm_trw wrote:
               | You can also build on top of binaries if you use gotos
               | and machine code.
        
               | drexlspivey wrote:
               | Their data is the twitter corpus which is public. Or do
               | you want a dump of their database for free too?
        
               | swalsh wrote:
               | We should just call it open weight models at this point.
        
             | GaggiX wrote:
             | Or even how much it was trained on this dataset, the amount
             | of FLOPs.
        
           | lairv wrote:
           | I would say it emphasises that training a good model is more
           | than throwing random data and compute
        
       | hubraumhugo wrote:
       | When will we reach an upper limit/dimishing returns in terms of
       | number of parameters and mixture of experts?
        
         | andy99 wrote:
         | We may have already - data is more important than anything else
         | which is why nobody has beat GPT4 yet. Throwing more parameters
         | or more compute at the problem only gets you so far. But Grok
         | was never a contender so there is room to improve on it. It is
         | one of the biggest models open sourced as mentioned, so will be
         | interesting to take a look at for sure.
        
           | lambdaba wrote:
           | Claude 3 has *decisively* beat GPT-4, I wonder how all their
           | attributes compare.
        
             | stainablesteel wrote:
             | i like some of claudes answers better, but it doesnt seem
             | to be a better coder imo
        
               | simonw wrote:
               | I've found it to be significantly better for code than
               | GPT-4 - I've had multiple examples where the GPT-4
               | solution contained bugs but the Claude 3 Opus solution
               | was exactly what I wanted. One recent example:
               | https://fedi.simonwillison.net/@simon/112057299607427949
               | 
               | How well models work varies wildly according to your
               | personal prompting style though - it's possible I just
               | have a prompting style which happens to work better with
               | Claude 3.
        
               | bugglebeetle wrote:
               | What is your code prompting style for Claude? I've tried
               | to repurpose some of my GPT-4 ones for Claude and have
               | noticed some degradation. I use the "Act as a software
               | developer/write a spec/implement step-by-step" CoT style.
        
               | simonw wrote:
               | Almost impossible to describe prompting style, but here
               | are some examples of how I've used Claude 3:
               | 
               | https://gist.github.com/simonw/4cecde4a729f4da0b5059b50c8
               | e01... - writing a Python function
               | 
               | https://gist.github.com/simonw/408fcf28e9fc6bb2233aae694f
               | 8cd... - most sophisticated example, building a
               | JavaScript command palette
               | 
               | https://gist.github.com/simonw/2002e2b56a97053bd9302a34e0
               | b83... - asking it to refactor some existing code
               | 
               | I don't use the "Act as a X" format any more, I'm not at
               | all convinced it has a noticeable impact on quality. I
               | think it's yet another example of LLM superstition.
        
               | lgas wrote:
               | > I don't use the "Act as a X" format any more, I'm not
               | at all convinced it has a noticeable impact on quality. I
               | think it's yet another example of LLM superstition.
               | 
               | It's very contextually dependent. You really have to
               | things like this for your specific task, with your
               | specific model, etc. Sometimes it helps, sometimes it
               | hurts, and sometimes it does nothing at all.
        
               | bugglebeetle wrote:
               | Super helpful! Thanks!
        
               | furyofantares wrote:
               | I didn't know people were still doing this "act as etc
               | etc" instructional prompting.
               | 
               | I just tell it my coding problem. Or when making
               | something from scratch, ask for small things and
               | incrementally add.
        
               | asciii wrote:
               | > according to your personal prompting style though
               | 
               | I like the notion of someone's personal prompting style
               | (seems like a proxy for those that can prepare a question
               | with context about the other's knowledge) - that's
               | interesting for these systems in future job interviews
        
               | furyofantares wrote:
               | I've found it significantly better than GPT4 for code and
               | it's become my go-to for coding.
               | 
               | That's actually saying something, because there's also
               | serious drawbacks.
               | 
               | - Feels a little slower. Might just be UI
               | 
               | - I have a lot of experience prompting GPT4
               | 
               | - I don't like using it for non-code because it gives me
               | to much "safety" pushback
               | 
               | - No custom instructions. ChatGPT knows I use macos and
               | zsh and a few other preferences that I'd rather not have
               | to type into my queries frequently
               | 
               | I find all of the above kind of annoying and I don't like
               | having two different LLMs I go to daily. But I mention it
               | because it's a fairly significant hurdle it had to
               | overcome to become the main thing I use for coding! There
               | were a number of things where I gave up on GPT then went
               | to Claude and it did great; never had the reverse
               | experience so far and overall just feels like I've had
               | noticeably better responses.
        
             | htrp wrote:
             | citation needed (other than 'vibes')
        
             | swalsh wrote:
             | I don't know if Claude is "smarter" in any significant way.
             | But its harder working. I can ask it for some code, and I
             | never get a placeholder. It dutifully gives me the code I
             | need.
        
               | lambdaba wrote:
               | It understands instructions better, it's rarer to have it
               | misunderstand, and I have to be less careful with
               | prompting.
        
             | orbital-decay wrote:
             | Has it, though? LMSys Arena Leaderboard (blind ranking by
             | humans) [0] positions Opus just below GPT-4 with a
             | negligible ELO gap.
             | 
             | [0] https://chat.lmsys.org/
        
               | espadrine wrote:
               | A number of AI companies have a naming/reproducibility
               | issue.
               | 
               | GPT4 Turbo, released last November, is a separate version
               | that is much better than GPT-4 (winning 70% of human
               | preferences in blind tests), released in March 2023.
               | 
               | Claude 3 Opus beats release-day GPT-4 (winning 60% of
               | human preferences), but not GPT-4 Turbo.
               | 
               | In the LMSys leaderboard, release-day GPT-4 is labeled
               | gpt-4-0314, and GPT4 Turbo is labeled gpt-4-1106-preview.
        
           | squigz wrote:
           | I think Groq is something else?
        
             | LorenDB wrote:
             | Indeed, Groq is a company building inference accelerators.
             | Grok is completely unaffiliated.
        
             | andy99 wrote:
             | Edited, I did mean the Grok in the article not the
             | inference chip.
        
           | YetAnotherNick wrote:
           | There is no reason to believe GPT-4 had more(or higher
           | quality) data than Google etc. has now. GPT-4 was entirely
           | trained before the Microsoft deal. If OpenAI could pay to
           | acquire data in 2023, >10 companies could acquire similar
           | quality by now, and no one has similar quality model in a
           | year.
        
             | austhrow743 wrote:
             | The more disregard a company has for intellectual property
             | rights, the more data they can use.
             | 
             | Google had far more to lose from a "copyright? lol"
             | approach than OpenAI did.
        
               | brookst wrote:
               | I was under the impression training was at best an
               | undefined area of IP law. Is there any aspect of
               | copyright that prohibits training models?
        
               | simonw wrote:
               | This is being tested by a number of lawsuits right now,
               | most notably the NY Times one: https://nytco-
               | assets.nytimes.com/2023/12/NYT_Complaint_Dec20...
               | 
               | The key questions are around "fair use". Part of the US
               | doctrine of fair use is "the effect of the use upon the
               | potential market for or value of the copyrighted work" -
               | so one big question here is whether a model has a
               | negative impact on the market for the copyrighted work it
               | was trained on.
        
               | sroussey wrote:
               | I don't think the New York Times thing is that much about
               | training, than it is about the fact that ChatGPT can use
               | Bing and Bing has access to New York Times articles for
               | search purposes.
        
               | simonw wrote:
               | If you read the lawsuit it's absolutely about training.
               | The Bing RAG piece is one of the complaints in there but
               | it's by no means the most important.
               | 
               | Take a look at https://nytco-
               | assets.nytimes.com/2023/12/NYT_Complaint_Dec20... -
               | bullet points 2 and 4 on pages 2/3 are about training
               | data. Bullet point 5 is the Bing RAG thing.
        
           | ldjkfkdsjnv wrote:
           | Claude > GPT4. Anyone using these models on a daily basis
           | knows this
        
             | jstummbillig wrote:
             | It is known
        
       | nylonstrung wrote:
       | For what reason would you want to use this instead of open source
       | alternatives like Mistral
        
         | rvnx wrote:
         | Mistral opened their weights only for very small LLaMA-like
         | model.
        
           | MallocVoidstar wrote:
           | I'm pretty sure Mixtral outperforms Grok-1 and uses much less
           | memory to do it
        
             | elfbargpt wrote:
             | I'm a little out of touch, is there a way to see how Grok
             | measures up to other models?
        
               | amrrs wrote:
               | Benchmarks here https://x.ai/blog/grok
        
               | refulgentis wrote:
               | And to compare, you can sort by MMLU on here: https://hug
               | gingface.co/spaces/HuggingFaceH4/open_llm_leaderb....
               | 
               | Edit: to include my self summary after review: There's a
               | good 100 models better than, a couple 1x7b even. Mixtral
               | stomps it, half mixtral are universally better but one is
               | close to same.
        
               | lossolo wrote:
               | This benchmark is mostly worthless, some of the top
               | models there were trained on benchmark data, which is a
               | known fact in the community.
               | 
               | The only reliable benchmark:
               | https://huggingface.co/spaces/lmsys/chatbot-arena-
               | leaderboar...
        
             | cavisne wrote:
             | One of the interesting things when weights are open sourced
             | is the community can often improve the results. See all the
             | bugs fixed in Gemma for an example.
        
         | verticalscaler wrote:
         | Well if nothing else, this one might be significantly less
         | nerfed. Very interesting to compare to the others.
        
           | refulgentis wrote:
           | It's not, and I mean it, specifically in groks case.
           | 
           | Generally, it's a boring boneheaded talking point that the 1%
           | of us actually working in AI use as a sorting hat for who
           | else is.
        
             | renewiltord wrote:
             | The safety crap makes the tools unusable. I used to have a
             | test for it that I thought was decent, but Claude failed
             | that test and it is way better than ChatGPT-4 for code,
             | which means my test was bogus. The people actually working
             | in AI are kind of irrelevant to me. It's whether or not the
             | model will solve problems for me reliably.
             | 
             | People "actually working in AI" have all sorts of nonsense
             | takes.
        
               | benreesman wrote:
               | Another day, another fairly good comment going grey on an
               | AI #1. The over-alignment _is_ really starting to be the
               | dominant term in model utility, Opus and even Sonnet
               | _are_ both subjectively and on certain coding metrics
               | outperforming both the 1106-preview and 0125-preview on
               | many coding tasks, and we _are_ seeing an ever-escalating
               | set of kinda ridiculous hot takes from people with the
               | credentials to know better.
               | 
               | Please stop karma bombing comments saying reasonable
               | things on important topics. The parent is maybe a little
               | spicy, but the GP bought a ticket to that and plenty
               | more.
               | 
               | edit: fixed typo.
        
             | benreesman wrote:
             | I've been known to get snippy on HN from time to time
             | myself :) So please know that I'm only offering a gentle
             | nudge that I'd want from a fellow long-timer myself
             | regarding a line of discussion that's liable to age poorly.
             | 
             | Talking about sorting hats for those who do and don't have
             | the one-percenter AI badge isn't a super hot look my guy
             | (and I've veered dangerously close to that sort of thing
             | myself, this is painful experience talking): while there is
             | no shortage of uninformed editorializing about fairly
             | cutting edge stuff, the image of a small cabal of robed
             | insiders chucking in their cashews while swiping left and
             | right on who gets to be part of the discussion serves
             | neither experts nor their employers nor enthusiastic
             | laypeople. This is _especially_ true for "alignment" stuff,
             | which is probably the single most electrified rail in the
             | whole discussion.
             | 
             | And as a Google employee in the diffuser game by way of
             | color theory, you guys have a "days since we over-aligned
             | an image generation model right into a PR catastrophe" sign
             | on the wall in the micro kitchen right? That looked
             | "control vector" whacky, not DPO with pretty extreme
             | negative prompt whacky, and substantially undermined the
             | public's trust in the secretive mega labs.
             | 
             | So as one long-time HN user and FAANG ML person to another,
             | maybe ixnay with the atekeepinggay on the contentious AI #1
             | thread a bit?
        
       | rvnx wrote:
       | One subtle thing: Musk said "open-source", we got "open-weights"
       | instead (still better than nothing though, so it's greatly
       | appreciated).
        
         | paulgb wrote:
         | Dumb question: what should open-source mean in the context of
         | something like this? Open access to the training data and
         | training pipeline as well?
        
           | CharlesW wrote:
           | It's not a dumb question, and the answer is "yes".
        
             | zeroCalories wrote:
             | Come on, that's not reasonable to expect from a company, or
             | useful for indie hackers. Having weights that can be used
             | however you like is enough for most people, even large
             | companies.
        
               | schoen wrote:
               | Maybe it should be called something else? "Openly-
               | licensed"?
               | 
               | Just because the model weights are not really "source"
               | (either as a matter of intuition or for example following
               | the OSI "preferred form in which a programmer would
               | modify the program" definition).
        
             | simonw wrote:
             | A big catch here is that you can't slap an open source
             | license on a bunch of copyrighted training data, and to
             | date no-one has created a truly convincing LLM exclusively
             | trained on public domain data. It might happen soon though
             | - there are some convincing effort in progress.
        
               | CharlesW wrote:
               | Absolutely, because it's trained mostly on unlicensed,
               | copyrighted content, they basically can't release source.
        
               | gfodor wrote:
               | Many people think these companies are training on
               | unlicensed data but I think OpenAI licenses their data,
               | they just "license" it the way one would need to in order
               | to read it.
        
               | CharlesW wrote:
               | > _...I think OpenAI licenses their data..._
               | 
               | They've just started to (in response to lawsuits, it must
               | be noted) and in the meantime, they're simultaneously
               | claiming that (1) what they're doing is fair use (a.k.a.
               | fair dealing) and (2) preparing for the day when courts
               | confirm that it isn't.
        
               | zer00eyz wrote:
               | You all keep using the word "Data"
               | 
               | Data, as in facts, as in the frequency of one word in
               | relation to another.
               | 
               | "Copyright does not protect facts, ideas, systems, or
               | methods of operation, although it may protect the way
               | these things are expressed..." FROM:
               | https://www.copyright.gov/help/faq/faq-protect.html
               | 
               | It's not a question of if, rather when the cat gets out
               | of the bag and the legal battle starts. The problem is
               | that all the copyright applies to the expression not the
               | factual information it expresses (in this case word
               | relations). Now "how math works" and "the language of the
               | law" are going to make for an interesting court case. I
               | suspect that math wins here but it depends on what judge
               | gets it and how high it goes.
        
             | nabakin wrote:
             | Agreed. It's ridiculous people have to resort to saying
             | their question dumb to avoid being attacked by toxic
             | commenters.
        
           | Q6T46nT668w6i3m wrote:
           | Yes, training and evaluation code, i.e., the code used to
           | generate the weights.
        
           | TaylorAlexander wrote:
           | The Open Source Initiative is actively working on this over
           | the course of this year, and your input will help define that
           | meaning! Please see here for more:
           | 
           | https://opensource.org/blog/open-source-ai-definition-
           | weekly...
        
         | TaylorAlexander wrote:
         | Yeah musk said "all design and engineering for the original
         | roadster is now open source" and actually what we got was a few
         | PCB files and zero mechanical design files so I don't ever
         | trust what he says.
        
         | tylerekahn wrote:
         | This is the weights and the model under Apache 2.0 license.
         | What do you mean by open-source?
         | 
         | https://github.com/xai-org/grok/blob/main/model.py
         | 
         | https://github.com/xai-org/grok/blob/main/run.py#L25
        
           | pclmulqdq wrote:
           | Still better than most of the "open weights" models that have
           | massively restrictive terms.
        
         | solarkraft wrote:
         | He also called permissively licensing Tesla's patents "open
         | sourcing" them. He's at the forefront of misusing the term.
        
       | gardenhedge wrote:
       | > Due to the large size of the model (314B parameters), a machine
       | with enough GPU memory is required to test the model with the
       | example code
       | 
       | What type of machine do you need to play around with this?
        
         | anigbrowl wrote:
         | 'Chunky beast, needs 320 Gb VRAM likely 4 bit, likely is being
         | run 8 bit on 8 x 80 Gb GPUs.'
         | 
         | -Emad
        
         | 317070 wrote:
         | Probably a machine with about 628 GB of GPU memory. (2 bytes
         | per parameter)
         | 
         | So 8xH100 (80Gb each) should do it.
        
       | pogue wrote:
       | Can someone explain why the weights are posted via a Bittorrent
       | magnet link? I have no way to check the size at the moment, but
       | isn't that a bit unusual? There's also only 21 seeders right now
       | according to https://checker.openwebtorrent.com/
        
         | lambdaba wrote:
         | Why not? Mistral was first to do it, it has become tradition.
        
           | gillesjacobs wrote:
           | I believe it was Llama 1 that notoriously got leaked with a
           | torrent on 4chan.
        
           | orlp wrote:
           | BitTorrent is just an objectively superior method of
           | delivering a lot of data to a lot of people.
        
         | pooloo wrote:
         | Its likely over 100GB of data, so I wouldn't say its
         | necessarily unusual to spread out the bandwidth across multiple
         | hosts.
        
           | pogue wrote:
           | Thanks! I searched and searched for a tool that would show me
           | info via the web about a magnet link but nada
        
         | CamperBob2 wrote:
         | How else could/should it be done?
        
           | pogue wrote:
           | I would have assumed they could just upload it to Github. If
           | it has restrictions on file size I'm sure they could make
           | multiple part compressed files.
           | 
           | Torrents can unfortunately die after a period of time if no
           | one continues seeding it or if they don't use a permanent web
           | based seeder, which doesn't appear to be the case.
        
             | cedws wrote:
             | GitHub may choose to throttle downloads or remove the files
             | simply because they're taking up too much bandwidth.
             | 
             | A torrent is less likely to go down in the short term.
        
             | xcv123 wrote:
             | This is not some crappy DVD rip on The Pirate Bay. It will
             | be seeded as long as its relevant.
             | 
             | Twitter/X has their own massive infrastructure and
             | bandwidth to seed this indefinitely.
        
               | KomoD wrote:
               | Yeah, they can just leave some server running somewhere
               | and just let it seed forever
        
             | larrysalibra wrote:
             | The great thing about torrents is that you (or anyone else
             | who cares) can single-handedly solve the problem you're
             | complaining about by seeding the torrent.
        
             | simonw wrote:
             | GitHub have a soft repository size limit of 5GB, documented
             | here: https://docs.github.com/en/repositories/working-with-
             | files/m...
             | 
             | Soft size limit means "If your repository excessively
             | impacts our infrastructure, you might receive an email from
             | GitHub Support asking you to take corrective action." - I
             | know people who have received such emails.
             | 
             | Most model releases happen through Hugging Face which does
             | not have such a size limit.
        
               | KomoD wrote:
               | They'd probably just charge you for it. They sell "data
               | packs" for LFS.
               | 
               | https://docs.github.com/billing/managing-billing-for-git-
               | lar...
        
               | rezonant wrote:
               | I'd bet Hugging Face would be happy to have hosted these
               | canonically too, so not sure why that doesn't happen
               | more.
        
               | osanseviero wrote:
               | The model is also at https://huggingface.co/xai-org
        
             | sashank_1509 wrote:
             | No git would be impossible. I've never seen a repo even a
             | few GB in size, if you are uploading non code files you
             | really should not be using git. Git is a version management
             | software for code. I often see repos which images and even
             | videos checked in, please don't, there are so many far
             | better and more performant solutions out there.
             | 
             | The other approach would be to use AWS S3 or other cloud
             | providers which would cost them money every time someone
             | downloads their code, which is not their prerogative to pay
             | for when they are releasing something for free. Torrents
             | seems like the only good solution, unless someone hosts
             | this on the cloud for free for everyone.
        
               | sroussey wrote:
               | Huggingface will disagree with impossible as their models
               | are available via git, sometimes broken up in pth files.
               | 
               | Still, as far as sentiment goes, yeah git for model
               | weights is an impedance mismatch for sure!
        
               | rezonant wrote:
               | > No git would be impossible. I've never seen a repo even
               | a few GB in size, if you are uploading non code files you
               | really should not be using git
               | 
               | It's not actually a limitation in git itself, especially
               | if you use Git LFS. People use Git for Unreal projects
               | and big ones can be half a terabyte or more in size.
        
             | rezonant wrote:
             | Others have pointed out that GitHub doesn't allow that, but
             | 
             | > Torrents can unfortunately die after a period of time if
             | no one continues seeding it or if they don't use a
             | permanent web based seeder, which doesn't appear to be the
             | case.
             | 
             | So to can web links, especially when they are 300 GB and
             | egressing out of AWS at $0.09/GB or worse (in non-US
             | regions). Each full download would cost $27 at that rate.
             | 10,000 downloads would cost $270,000.
             | 
             | Sure you could go for something with a better cost model
             | like R2, but you can't beat using one or two unmetered
             | connections on a VPN to constantly seed on Bittorrent, your
             | pricing would be effectively free and reliability would be
             | higher than if you just exposed a HTTP server on the
             | Internet in such a way.
        
         | MallocVoidstar wrote:
         | Distributing 300GB via torrent is cheaper than direct, assuming
         | even a few other people seed
        
         | monkin wrote:
         | It's 318.24G
         | 
         | https://academictorrents.com/details/5f96d43576e3d386c9ba65b...
        
         | bongodongobob wrote:
         | I'm not sure why you wouldn't tbh. That's a lot of bandwidth.
        
         | jiripospisil wrote:
         | I don't understand why you're being downvoted for asking a
         | legitimate question. People not familiar with model weights
         | might be surprised that they are often in tens of gigabytes and
         | in this case even more.
        
         | fzzzy wrote:
         | It may become a tradition since weights are so large. Perhaps
         | it started when the Llama torrent link leaked. Then, Mistral
         | decided to release their weights using bittorrent.
        
         | leumon wrote:
         | Mistral did it too when they released their first open model.
         | They just posted a magnet link on Twitter.
        
         | raydev wrote:
         | Spreads the burden/cost of distributing a 300+GB file.
        
       | bbor wrote:
       | Honestly the most interesting part is taking a peek at the kind
       | of AI researcher working for Twitter after the objectively messy
       | layoffs and subsequent crunch. I notice neither of them has
       | Twitter mentioned on their GitHub, which is prolly for the best
       | to avoid harassment lol.
       | 
       | Code wise, excited to see if this could grow into anything! I
       | think it's pretty clear that Grok didn't have nearly enough
       | investment to be a top model so Elon "sacrificed" it on a whim in
       | his schoolyard spat with OpenAI, but I'm not complaining. I've
       | always took Elon on his word that he truly _is_ worried about
       | centralization of AI, and I don't think any of the emails
       | released by his schoolmate Altman dissuade me of that. So I have
       | some reasonable hope that he uses some of his immense resources
       | to start "fighting the good fight" here with Le Cun
        
         | cma wrote:
         | >taking a peek at the kind of AI researcher working for Twitter
         | 
         | He made a separate company for this.
        
         | paxys wrote:
         | Neither of them works at Twitter. xAI is a separate company,
         | and only uses Twitter's data to train.
        
       | mattxxx wrote:
       | I respect the openness here! This is the future that I want to
       | see
        
         | giancarlostoro wrote:
         | Fully agree. People will trash talk it due to Musk but lets not
         | forget the engineers who poured hours of their lives into
         | building this and are continuing to do so.
        
           | knowsuchagency wrote:
           | The engineers who decided to work for him? Forgive me if I do
           | forget about them and the hours of their lives spent on this
        
             | lynndotpy wrote:
             | Engineers who joined Twitter pre-Musk days who live and
             | work in the US on an H1-B visa can't just quit.
             | 
             | You can criticize Elon Musk without criticizing people who
             | would have their lives upended if they quit or were fired.
        
               | throw2022110401 wrote:
               | That grace period has long passed. If you are still there
               | at this point you are complicit.
        
               | cap1434 wrote:
               | Complicit in what exactly?
        
           | devin wrote:
           | I still reserve the right to trash talk Musk as I don't
           | believe he is committed to openness as much as he wants to
           | spite OpenAI for telling him to pound sand.
        
             | llm_trw wrote:
             | What's the difference?
             | 
             | >Oh no, I only want _pure_ intentions for anything I use.
             | Which is why I reject all for profit medicine.
             | 
             | It doesn't matter why he did it. What matters is that he
             | did it.
        
               | devin wrote:
               | It matters to me why people do things. I'm happy it's
               | open, but it doesn't change my mind about the guy.
        
               | llm_trw wrote:
               | What an exhausting way to live.
        
           | afavour wrote:
           | Were they not paid to do so?
        
           | revscat wrote:
           | I feel the same about Tesla. They make good cars that are
           | helping to get us off of oil. They have thousand of
           | employees.
           | 
           | And who among us has a CEO that isn't problematic, even if
           | not so much so as Musk?
        
             | hobobaggins wrote:
             | Tesla is likely making good cars _because_ the CEO is
             | 'problematic'
        
         | trog wrote:
         | Is it open if it doesn't include the training data? Genuine
         | question - I am not familiar enough with the terms and
         | technology to know. But my understanding is the weights is just
         | a more or less static collection of data that has been (to
         | paraphrase Ted Chiang) lossily compressed from the actual raw
         | training data.
         | 
         | Without the training data to thoroughly evaluate what is in
         | there, the only way you can figure it out is through
         | experimentation - e.g. running it up in a chatbot and asking it
         | questions.
         | 
         | Is this roughly correct or am I misunderstanding what you can
         | do with the weights?
        
       | 2devnull wrote:
       | From issues: "Well the magnet file contains a 300GB checkpoint "
       | 
       | That's why they are using a torrent I suppose.
        
       | moralestapia wrote:
       | Well, he delivered.
        
         | paxys wrote:
         | Partially. Open weights is not open source.
        
           | gfodor wrote:
           | In machine learning models the term open source has been
           | largely accepted to mean sharing weights and, if necessary,
           | inference code. You can argue if this is an abuse of the term
           | but everyone does it, and saying someone didn't deliver if
           | they used it and published weights would probably mean saying
           | the same about mistral, meta, etc.
        
       | stale2002 wrote:
       | Hey, asking any experts here, what are their first thoughts in
       | the significance of this?
       | 
       | IE, is this comparable to any other model released, or are there
       | significant metric differences that make it better for certain
       | usecases?
       | 
       | The only thing I see, of the top of my head, is that it is a very
       | large model, and I don't think any models of similar size have
       | been released.
        
         | Me1000 wrote:
         | Not an expert by any means, but I like learning about this
         | stuff and I play with a lot of open weight models.
         | 
         | I'd say the significance is that it happened. It's by far the
         | largest open weight model I've seen. But I'm not sure why you'd
         | use it over a model like Mixtral, which seems to perform about
         | the same at like 1/6th the size.
         | 
         | But I welcome any contribution to the open weight LLM
         | community. Hopefully people will learn something interesting
         | with this model. And I hope they keep releasing new versions!
        
           | MichaelRazum wrote:
           | If I may ask, how do you load such big models? 300gb seems
           | like a lot to play around with.
        
             | Me1000 wrote:
             | You're right, this model is going to be too big for most
             | people to play around with. But to answer your question I
             | have a 128GB of RAM in my M3 MacBook Pro, so I can use most
             | of that for GPU inferencing. But still, this model is going
             | to need to be heavily quantized for me to be able to use
             | it. (fwiw, I probably wont try this one)
             | 
             | In the next week or two I expect we'll see a GGUF version
             | of the weights (might need to wait for a patch to llama.cpp
             | first), and someone will release super small quantizations
             | of it. I suspect my computer might be able to run a 3 bit
             | quant, but it might need to go down to 2 bits to have any
             | kind of reasonable context length. But with quants that
             | small I'd expect the model's performance to degrade well
             | below that of Mixtral, so it probably isn't really even
             | worth using. But we'll see; quantization is weird, some
             | models perform better than others when quantized.
        
               | MichaelRazum wrote:
               | Thanks a lot for the hint :)! It awesome that it might
               | run even on a MacBook, actually this is a reason to
               | switch to Mac. Seems, there is nothing similar for a PC
               | laptop with linux or windows.
        
               | Me1000 wrote:
               | No problem. I hope more people try these things out, it's
               | the best way to push the industry forward! We can't let
               | the researchers have all the fun.
               | 
               | Apple had plenty of reasons to move forward with their
               | Apple Silicon CPUs and GPUs in the mac, but they really
               | did seem to get lucky with the unified memory
               | architecture. It was kind of just an artifact of their
               | design, but ends up serving the needs of deep neural net
               | models really well!
        
       | simonw wrote:
       | Is there a model card anywhere? I'd like to know what it was
       | trained on.
        
       | LZ_Khan wrote:
       | How are people's experience with this model? Having the most
       | weights is one thing but being a better model than the 70B models
       | is another.
        
         | labrador wrote:
         | tbh, I've never seen anyone share anything interesting produced
         | by Grok. I see plenty of posts on X and reddit of people
         | sharing amazing things that GPT-4 and now Claude 3 Opus can do.
         | Grok can roast people. That's pretty much all I've seen.
         | 
         | I'd love to proven wrong if someone cares to share something
         | interesting produced by Grok.
        
         | swalsh wrote:
         | I use grok all the time to find tweets or ask about trends on
         | Twitter. For that it's better than what used to exist. But its
         | not a great model outside that narrow use case.
        
       | arduanika wrote:
       | CODE_OF_CONDUCT.md has only five words. :)
        
         | schappim wrote:
         | "Be excellent to each other."
        
           | troupo wrote:
           | Which is ironic given Musk's own behaviour and how he wants
           | Grok to work
        
             | TMWNN wrote:
             | >Which is ironic given Musk's own behaviour
             | 
             | You mean, like immediately responding to Ukraine's plea for
             | Starlink and funding it on his own for months? So much so,
             | that in February 2023 its government called Musk "one of
             | the biggest private donors of our future victory"?
             | <https://www.pravda.com.ua/eng/news/2023/02/9/7388696/>
        
               | troupo wrote:
               | Like being a general asshole to people, setting Grok to
               | be a bit of an asshole, pushing increasingly unhinged
               | conspiracy theories. As for Ukraine he now basically
               | retransmits Russian talking points non-stop.
               | 
               | Edit: Starlink, while insanely useful to Ukraine, was a
               | PR stunt he almost immediately wanted to weasel out of.
               | Even after it turned out that the US government ends up
               | paying for it.
        
               | arandomusername wrote:
               | > pushing increasingly unhinged conspiracy theories
               | 
               | like what?
        
         | josh-sematic wrote:
         | They're from "Bill and Ted's Excellent Adventure"
        
         | bheadmaster wrote:
         | I was hoping it would be "do not be an asshole", but I guess
         | this is fine too.
        
           | kergonath wrote:
           | It would be finer if it were not so hypocritical, coming from
           | a company lead by Elon Musk. As things are, it's just like
           | any pledge Google can make about the greater good: a sad
           | joke.
        
         | marginalia_nu wrote:
         | My favorite is SQLite's code of ~~conduct~~ ethics:
         | https://sqlite.org/codeofethics.html
        
           | TwentyPosts wrote:
           | Huh. What's the backstory here?
        
             | weberer wrote:
             | https://pjmedia.com/paula-bolyard/2018/10/24/tech-
             | community-...
        
       | machiaweliczny wrote:
       | If they are so behind they could make it open source instead of
       | open weights and get some help.
        
         | nicce wrote:
         | Fully open-source means also providing open access to their
         | data sets? Which is the only valuable thing Twitter (X) has
         | left.
        
           | heyoni wrote:
           | And the one thing they are vehemently protecting from
           | scrapers and other entities. Even nitter threw in the towel.
        
           | EastSmith wrote:
           | > Which is the only valuable thing Twitter (X) has left.
           | reply
           | 
           | They have a very valuable user base (all kinds of world
           | leaders for example), so the data is not the only valuable
           | thing they have.
        
             | sroussey wrote:
             | That's actually more valuable. Twitters data of small
             | format text is awful for training. Best to just exclude it.
             | 
             | There are hundreds of millions of people on Twitter, and a
             | few of them are very smart. I don't see how that helps here
             | though.
        
               | Takennickname wrote:
               | It doesn't help here. But the person your responding to
               | is just pushing back against the "Elon destroyed Twitter
               | and there's nothing left" narrative.
        
         | xcv123 wrote:
         | It's all open source. You can download the model and run it
         | locally.
        
           | paraboul wrote:
           | Being free to use doesn't mean it ships with the original
           | recipe.
        
             | xcv123 wrote:
             | What do you mean? The entire model is fully open source.
             | 
             | The training methods are nothing secret, right? The
             | architecture is well known.
             | 
             | Expecting the entire training dataset to be fully open is
             | delusional.
        
       | simonw wrote:
       | "Base model trained on a large amount of text data, not fine-
       | tuned for any particular task."
       | 
       | Presumably the version they've been previewing on Twitter is an
       | instruction-tuned model which behaves quite differently from
       | these raw weights.
        
       | seccode wrote:
       | It would be cool if these models had conversations with us where
       | they ask questions. I think the future of AI is models that ask
       | questions. There is so much data to be gained by doing this.
        
         | swalsh wrote:
         | That's just a matter of fine tuning
        
           | seccode wrote:
           | Do you have an example model I could try that does this?
        
             | amrrs wrote:
             | Try Pi by inflection. It asks a lot of questions.
        
               | seccode wrote:
               | I tried it, it just asked me how my day was going. I
               | don't think this is doing exactly what I have in mind.
               | But its a step in that direction
        
           | ijustlovemath wrote:
           | That "just" is doing some heavy lifting! GPT-4 is just a few
           | matrix multiplications, how bad can their moat really be?
        
             | BoorishBears wrote:
             | Not sure what the snark here is for: It would be trivial to
             | produce a dataset where the model asked you questions then
             | fine-tune on that.
             | 
             | People already do it with chain-of-thought and you could
             | get away with a few dozen examples if you wanted to try
             | this.
        
             | swalsh wrote:
             | I'd bet a synthetic data set could do the job effectively.
        
         | crowcroft wrote:
         | Ok im curious, but I don't quite understand.
         | 
         | What would you want an AI to be asking you, and what would you
         | want it to do with your response(s)?
        
           | seccode wrote:
           | I get advertisements all the time for conditions that I do
           | not have, and that none of my family members have. If you had
           | a model that asked questions, it could learn my medical
           | history and could direct better ads to me.
           | 
           | In order for AI to understand the world, it would have to ask
           | questions. Understanding humans is key to understanding the
           | world.
        
           | globular-toast wrote:
           | Learn from them.
        
           | BoorishBears wrote:
           | I ask AI to produce clarifying questions then answer them.
           | 
           | Can help in not wasting a bunch of time waiting for an answer
           | that missed the mark.
           | 
           | -
           | 
           | I think the sibling comment is probably the least attractive
           | reason to have AI ask questions.
        
             | seccode wrote:
             | I agree, medical history is probably not the sexiest reason
             | to have AI ask questions. I think there are many more
             | reasons; I think the Turing Test is the best metric to
             | evaluate AIs, and current models come nowhere close. When
             | people first meet they ask questions about their
             | background. It would be nice if a model replicated that
        
               | BoorishBears wrote:
               | > and could direct better ads to me.
               | 
               | Is the least attractive part, by far.
        
               | seccode wrote:
               | In order for an AI to pass a Turing Test, it would surely
               | ask questions. Think of Ava from Ex Machina. She asked
               | questions to learn more about him
        
               | BoorishBears wrote:
               | I'm not debating the value of questions. I'm debating the
               | value of feeding it to advertisers, especially since LLMs
               | can infer much deeper insights about a person than a
               | traditional assistant can with its canned capabilities
               | and responses
        
           | lars_francke wrote:
           | Clarifying questions if the initial prompt was unclear. I'd
           | love it.
           | 
           | I regularly try to add something along the lines of "please
           | ask clarifying questions if you could only give a generic or
           | partial response otherwise" but so far it has never helped
           | (ChatGPT 4).
        
         | Me1000 wrote:
         | 100% agreed. Gemini advanced does this sometimes. I wrote about
         | it more in an older thread here:
         | https://news.ycombinator.com/item?id=39445484
        
         | geor9e wrote:
         | Explore this idea more - it's easily implemented in a minute or
         | two via the system prompt. API accounts are free to start and
         | you can use the playground/workbench view, like this:
         | https://imgur.com/h5jFoBM.jpg . I like Claude but OpenAI is
         | popular too. OpenAI has a nice way to create a gallery of
         | system prompts that act however you like, they call them Agents
         | or GPTs.
        
       | littlestymaar wrote:
       | How long before the _Groq_ team sues for trademark violation? It
       | 's literally the purpose of trademark laws to make sure
       | resembling names do not cause confusion in the mind of customers
       | so it would be very surprising to see this situation persist.
        
         | nostrebored wrote:
         | Would be a rough trademark enforcement case as "Grok" has been
         | in common language for decades
        
           | Angostura wrote:
           | Robert A. Heinlein coined the term grok in 1961
        
             | a1369209993 wrote:
             | Six is plural.
        
           | ben_w wrote:
           | So has "Apple" and "Windows".
           | 
           | Grok and groq both relate to AI, so there's definitely
           | grounds to believe the names may cause consumer confusion.
           | 
           | After all, Apple (computers) was repeatedly sued by Apple
           | (records) for doing music things.
        
             | cma wrote:
             | It's easier to get a trademark on an altered word than a
             | plain dictionary word. Just acquiring the easier one to
             | acquire doesn't mean you now have rights over the harder
             | one to acquire, though eventually after enough market
             | recognition you might be given some control over other
             | people using the common one. I wouldn't think groq is there
             | yet.
        
         | cavisne wrote:
         | They already have.
        
         | EastSmith wrote:
         | There is a friendly warning here from Groq:
         | https://wow.groq.com/hey-elon-its-time-to-cease-de-grok/
        
           | bhaney wrote:
           | Is it safe to say, 4 months later, that Elon is ignoring
           | this? I assume there hasn't been any kind of response or
           | further action taken yet.
        
       | orsenthil wrote:
       | I am not sure what open source models are accomplishing another
       | than killing the lead from the competition (openai), only to give
       | it to someone else who has expertise in the area of distribution.
       | This will be yet another good addition to systems like Amazon
       | BedRock.
        
         | minimaxir wrote:
         | Many of the recent innovations in both LLM architecture and
         | inference were only made possible through open models such as
         | Llama 2 and Mistral 7B as a starting point for iteration and
         | refinement, which in turn backpropagates (heh) back to the LLMs
         | developers.
         | 
         | It's a win-win for everyone. That's the power of open source.
        
         | geor9e wrote:
         | Well, look at the history. Google had an insurmountable lead,
         | so Elon started OpenAI. Now OpenAI has an insurmountable lead
         | too. So everyone else is starting in third place, or lower.
         | David versus two Goliaths. If you try to become a third
         | Goliath, you'll probably just get smashed. You're later to the
         | game. In this situation, going scorched earth becomes a viable
         | strategy. Slay the Goliaths. Become a hero to the masses.
         | Attract the world's best talent who don't want to be associated
         | with proprietary models. At that point you have a world class
         | AI business with momentum towards AGI. And even if you're
         | giving away last year's technology for free, the team you built
         | is churning out new ideas that could be a financial bonanza one
         | day. Shareholders are willing to pay for a long-term bet if the
         | story is good.
        
       | andre-z wrote:
       | The only other Repository is a fork of Qdrant.
        
       | captcanuk wrote:
       | "The implementation of the MoE layer in this repository is not
       | efficient. The implementation was chosen to avoid the need for
       | custom kernels to validate the correctness of the model."
       | 
       | Or perhaps release your actual code AND the simplified
       | implementation instead of hiding it and saying "you don't know
       | her, she goes to a different high school"
        
         | gfodor wrote:
         | Always love it when someone gives away a gift and it's not
         | enough for people.
        
       | redskyluan wrote:
       | This seems not be a repo ready to open source. You only get
       | weights, very less information about how the weights is trained
       | and finetuned.
       | 
       | But anyway, it always great to see more LLM weigts available.
        
         | andrewstuart2 wrote:
         | I would argue that there's no bar for open sourcing aside from
         | "do you have the rights to do so." Some source or some public
         | good is certainly better than none, and when the bar is low
         | then you remove barriers to getting started, vs waiting until
         | you have the time someday to "do it right."
        
         | rezonant wrote:
         | Well what constitutes an "open source" model is still
         | controversial and debatable-- lots of people on both sides of
         | that argument.
        
       | modeless wrote:
       | Is this the first major model to be natively FP8? I was wondering
       | why people hadn't done it yet. Seems like a big win when hardware
       | supports it.
        
       ___________________________________________________________________
       (page generated 2024-03-17 23:00 UTC)