[HN Gopher] Grok
___________________________________________________________________
Grok
Author : pierre
Score : 582 points
Date : 2024-03-17 19:33 UTC (3 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| tosh wrote:
| blog post: https://x.ai/blog/grok-os * 314B
| parameters (86B active at a time) * mixture of experts 8 (2
| active at a time) * weights and architecture licensed under
| Apache 2.0
|
| (edit:) announcement blog post from last year with benchmarks
| compared to Claude 2, GPT-3.5 and GPT-4: https://x.ai/blog/grok
|
| (edit2:)TL;DR: somewhat comparable to GPT-3.5, Mixtral and
| Qwen-1.5-72B in capability but way larger than the open weight
| models
| TOMDM wrote:
| Mixtral is also comparable to gpt 3.5 and open.
|
| At 8x7B it's also a fraction of the size. Are there any
| benchmarks comparing Mixtral to Grok?
| tosh wrote:
| Mixtral announcement is here:
| https://mistral.ai/news/mixtral-of-experts/
|
| Mixtral looks more economical @ capability to size (similar
| also for Qwen 1.5 72b)
| OkGoDoIt wrote:
| Is a model so huge that's only at the level of GPT 3.5 actually
| good? That seems incredibly inefficient to me.
| cma wrote:
| Since it is MoE, quantized it could be able to run on cheaper
| hardware with just consumer networking inbetween instead of
| needing epyc/xeon levels of PCI-e lanes, nvlink, or
| infiniband type networking. Or it could even run with people
| pooling smaller systems over slow internet links.
| drak0n1c wrote:
| It's designed to be actively searching real-time posts on X.
| Apples and oranges.
| grey8 wrote:
| Why is that relevant to the size?
|
| Post search on X is done as it is with any other data from
| any other source, you use RAG and function calling to
| insert the context.
|
| < 7B open source models can function call very well. In
| fact, Nous Hermes 2 Pro (7B) is benchmarking better at that
| then GPT-3.5.
|
| Not related to the size, if I'm not mistaken.
| fwlr wrote:
| OpenAI is valued at 90 billion and all they do is make GPT;
| Twitter is valued at 40 billion and this was essentially a
| vanity side-project by a cowboy CEO. Presuming that
| benchmarks and general "it's about the level of 3.5" is
| accurate, it's inefficient, but not incredibly inefficient
| imho
| tootie wrote:
| How is it that OpenAI was touted like it was some massive
| years-long effort that blew all AI research out of the water
| and now we have so many competitors popping up one after
| another?
| ben_w wrote:
| Egg of Columbus.
|
| Also, the general architecture is well documented, ChatGPT
| (specifically the chat interface, not GPT-3, not InstructGPT)
| is what made a lot of people _care_ , and actually
| reproducing it requires someone wanting to in the first
| place.
| longdog wrote:
| You don't need to be a cutting edge research scientist to
| train a SOTA LLM. You just need money for scaling. OpenAI's
| "secret" was just their willingness to spend tens/hundreds of
| millions without guaranteed returns, and RLHF/instruct fine
| tuning, both of which are out of the bag now.
| simonw wrote:
| Disagree. It took more than 12 months from the release of
| GPT-4 to someone else producing a model of equivalent
| quality, and that definitely wasn't due to a shortage of
| investment from the competition.
|
| There's a huge amount of depth in training a really good
| LLM. Not helped by the fact that iteration is incredibly
| expensive - it might take several months (and millions of
| dollars) before you can tell if your new model is working
| well or if there was some mistake in the pipeline that lead
| to a poor quality result.
|
| Almost all of the world-class LLMs outside of
| OpenAI/DeepMind have been trained by people who previously
| worked at those organizations - giving them invaluable
| experience such that they could avoid the most expensive
| mistakes while training their new models.
| lossolo wrote:
| Don't overlook the training data (used for both training
| and instruction fine-tuning), it is one of the most
| crucial aspects, if not the most critical, given the
| significant differences observed in models with similar
| architectures.
| echelon wrote:
| That only remains an advantage if they can continue
| climbing the gradient from their lead position. If they
| hit a snag in scaling, methodology, or research, everyone
| else on the planet catches up, and then it's anyone's
| game again.
| cavisne wrote:
| LLM training is arcane and expensive to experiment with. So
| OpenAI had to waste a lot of time and GPU-hours on things
| that didn't work to learn the tricks that did work.
|
| Most of the competitors have lineage straight back to OpenAI,
| eg the lead of x.ai was previously at OpenAI and Deepmind.
| Likewise with Mistral and especially Anthropic.
| jxy wrote:
| OpenAI still seems to be at the top, except for Anthropic,
| who may be close, in terms of the capabilities comparing
| gpt-4 and claude-opus.
|
| This Grok-1 is a large model (~314B), which matches gpt-3.5
| released 2 years ago, and at about the same level of much
| smaller models like, mixtral (~47B) and qwen-1.5 (~72B). Do
| you think it's competitive?
| asciii wrote:
| I love the citation for image in the article
|
| > The cover image was generated using Midjourney based on the
| following prompt proposed by Grok: A 3D illustration of a
| neural network, with transparent nodes and glowing connections,
| showcasing the varying weights as different thicknesses and
| colors of the connecting lines.
| extheat wrote:
| At 8x86B, looks like the largest open model yet by far. Would be
| interesting to hear how many tokens it's been trained on.
| Especially important for higher param models in order to
| efficiently utilize all those parameters.
| p1esk wrote:
| It's not 8x86B. Total number of parameters is 314B.
|
| Perhaps it's 8x39B to fit on a single 8xA100 (40GB) server?
| moffkalast wrote:
| Most likely it's a MoE of Grok-0 which would be 8x33B + 50B
| for the router.
| cma wrote:
| Active parameters is 86B, so wouldn't that be the size of the
| largest two experts (where they may all be the same) + the
| weights of the selector?
| swalsh wrote:
| Considering how poor it is compared to other models, it really
| emphasises how important fine tuning is. Models with MUCH
| smaller parameter counts are outperforming it in many metrics.
| lukan wrote:
| "it really emphasises how important fine tuning is"
|
| Or rather the quality of the training data?
| fragmede wrote:
| that's a subtle dig at the fact that they have all of
| Twitter as a training corpus to use, but we don't know how
| they weight tweets. which, we know they're not gonna be
| weighted evenly.
| rezonant wrote:
| I'm sure just like in X's algorithms, @elon tweets are
| weighted heavily.
| convery wrote:
| The X algorithm is also opensource, so you can verify
| before commenting..
| fragmede wrote:
| just because they open sourced it doesn't mean that's
| actually what they're running on it though
| chrisco255 wrote:
| It's not like he needs boosting, he was one of Twitter's
| top followed accounts long before he bought them. He's
| pretty good at getting attention.
| lukan wrote:
| No idea about the current state, but the open sourcing
| did show, they were favoring elon:
|
| https://mashable.com/article/twitter-releases-algorithm-
| show...
|
| And personally I never used Twitter much, but I certainly
| did not follow Elon Musk when I did - yet I had to see
| lot's of his posts in my feed. Surely just coincidence.
| llm_trw wrote:
| We don't know since no one is releasing their data.
|
| Calling these models open source is like calling a binary
| open source because you can download it.
|
| Which in this day and age isn't far from where were at.
| DreamGen wrote:
| A big distinction is that you can built on top (fine-
| tune) thus released models as well as if they released
| the pre-training data.
| llm_trw wrote:
| You can also build on top of binaries if you use gotos
| and machine code.
| drexlspivey wrote:
| Their data is the twitter corpus which is public. Or do
| you want a dump of their database for free too?
| swalsh wrote:
| We should just call it open weight models at this point.
| GaggiX wrote:
| Or even how much it was trained on this dataset, the amount
| of FLOPs.
| lairv wrote:
| I would say it emphasises that training a good model is more
| than throwing random data and compute
| hubraumhugo wrote:
| When will we reach an upper limit/dimishing returns in terms of
| number of parameters and mixture of experts?
| andy99 wrote:
| We may have already - data is more important than anything else
| which is why nobody has beat GPT4 yet. Throwing more parameters
| or more compute at the problem only gets you so far. But Grok
| was never a contender so there is room to improve on it. It is
| one of the biggest models open sourced as mentioned, so will be
| interesting to take a look at for sure.
| lambdaba wrote:
| Claude 3 has *decisively* beat GPT-4, I wonder how all their
| attributes compare.
| stainablesteel wrote:
| i like some of claudes answers better, but it doesnt seem
| to be a better coder imo
| simonw wrote:
| I've found it to be significantly better for code than
| GPT-4 - I've had multiple examples where the GPT-4
| solution contained bugs but the Claude 3 Opus solution
| was exactly what I wanted. One recent example:
| https://fedi.simonwillison.net/@simon/112057299607427949
|
| How well models work varies wildly according to your
| personal prompting style though - it's possible I just
| have a prompting style which happens to work better with
| Claude 3.
| bugglebeetle wrote:
| What is your code prompting style for Claude? I've tried
| to repurpose some of my GPT-4 ones for Claude and have
| noticed some degradation. I use the "Act as a software
| developer/write a spec/implement step-by-step" CoT style.
| simonw wrote:
| Almost impossible to describe prompting style, but here
| are some examples of how I've used Claude 3:
|
| https://gist.github.com/simonw/4cecde4a729f4da0b5059b50c8
| e01... - writing a Python function
|
| https://gist.github.com/simonw/408fcf28e9fc6bb2233aae694f
| 8cd... - most sophisticated example, building a
| JavaScript command palette
|
| https://gist.github.com/simonw/2002e2b56a97053bd9302a34e0
| b83... - asking it to refactor some existing code
|
| I don't use the "Act as a X" format any more, I'm not at
| all convinced it has a noticeable impact on quality. I
| think it's yet another example of LLM superstition.
| lgas wrote:
| > I don't use the "Act as a X" format any more, I'm not
| at all convinced it has a noticeable impact on quality. I
| think it's yet another example of LLM superstition.
|
| It's very contextually dependent. You really have to
| things like this for your specific task, with your
| specific model, etc. Sometimes it helps, sometimes it
| hurts, and sometimes it does nothing at all.
| bugglebeetle wrote:
| Super helpful! Thanks!
| furyofantares wrote:
| I didn't know people were still doing this "act as etc
| etc" instructional prompting.
|
| I just tell it my coding problem. Or when making
| something from scratch, ask for small things and
| incrementally add.
| asciii wrote:
| > according to your personal prompting style though
|
| I like the notion of someone's personal prompting style
| (seems like a proxy for those that can prepare a question
| with context about the other's knowledge) - that's
| interesting for these systems in future job interviews
| furyofantares wrote:
| I've found it significantly better than GPT4 for code and
| it's become my go-to for coding.
|
| That's actually saying something, because there's also
| serious drawbacks.
|
| - Feels a little slower. Might just be UI
|
| - I have a lot of experience prompting GPT4
|
| - I don't like using it for non-code because it gives me
| to much "safety" pushback
|
| - No custom instructions. ChatGPT knows I use macos and
| zsh and a few other preferences that I'd rather not have
| to type into my queries frequently
|
| I find all of the above kind of annoying and I don't like
| having two different LLMs I go to daily. But I mention it
| because it's a fairly significant hurdle it had to
| overcome to become the main thing I use for coding! There
| were a number of things where I gave up on GPT then went
| to Claude and it did great; never had the reverse
| experience so far and overall just feels like I've had
| noticeably better responses.
| htrp wrote:
| citation needed (other than 'vibes')
| swalsh wrote:
| I don't know if Claude is "smarter" in any significant way.
| But its harder working. I can ask it for some code, and I
| never get a placeholder. It dutifully gives me the code I
| need.
| lambdaba wrote:
| It understands instructions better, it's rarer to have it
| misunderstand, and I have to be less careful with
| prompting.
| orbital-decay wrote:
| Has it, though? LMSys Arena Leaderboard (blind ranking by
| humans) [0] positions Opus just below GPT-4 with a
| negligible ELO gap.
|
| [0] https://chat.lmsys.org/
| espadrine wrote:
| A number of AI companies have a naming/reproducibility
| issue.
|
| GPT4 Turbo, released last November, is a separate version
| that is much better than GPT-4 (winning 70% of human
| preferences in blind tests), released in March 2023.
|
| Claude 3 Opus beats release-day GPT-4 (winning 60% of
| human preferences), but not GPT-4 Turbo.
|
| In the LMSys leaderboard, release-day GPT-4 is labeled
| gpt-4-0314, and GPT4 Turbo is labeled gpt-4-1106-preview.
| squigz wrote:
| I think Groq is something else?
| LorenDB wrote:
| Indeed, Groq is a company building inference accelerators.
| Grok is completely unaffiliated.
| andy99 wrote:
| Edited, I did mean the Grok in the article not the
| inference chip.
| YetAnotherNick wrote:
| There is no reason to believe GPT-4 had more(or higher
| quality) data than Google etc. has now. GPT-4 was entirely
| trained before the Microsoft deal. If OpenAI could pay to
| acquire data in 2023, >10 companies could acquire similar
| quality by now, and no one has similar quality model in a
| year.
| austhrow743 wrote:
| The more disregard a company has for intellectual property
| rights, the more data they can use.
|
| Google had far more to lose from a "copyright? lol"
| approach than OpenAI did.
| brookst wrote:
| I was under the impression training was at best an
| undefined area of IP law. Is there any aspect of
| copyright that prohibits training models?
| simonw wrote:
| This is being tested by a number of lawsuits right now,
| most notably the NY Times one: https://nytco-
| assets.nytimes.com/2023/12/NYT_Complaint_Dec20...
|
| The key questions are around "fair use". Part of the US
| doctrine of fair use is "the effect of the use upon the
| potential market for or value of the copyrighted work" -
| so one big question here is whether a model has a
| negative impact on the market for the copyrighted work it
| was trained on.
| sroussey wrote:
| I don't think the New York Times thing is that much about
| training, than it is about the fact that ChatGPT can use
| Bing and Bing has access to New York Times articles for
| search purposes.
| simonw wrote:
| If you read the lawsuit it's absolutely about training.
| The Bing RAG piece is one of the complaints in there but
| it's by no means the most important.
|
| Take a look at https://nytco-
| assets.nytimes.com/2023/12/NYT_Complaint_Dec20... -
| bullet points 2 and 4 on pages 2/3 are about training
| data. Bullet point 5 is the Bing RAG thing.
| ldjkfkdsjnv wrote:
| Claude > GPT4. Anyone using these models on a daily basis
| knows this
| jstummbillig wrote:
| It is known
| nylonstrung wrote:
| For what reason would you want to use this instead of open source
| alternatives like Mistral
| rvnx wrote:
| Mistral opened their weights only for very small LLaMA-like
| model.
| MallocVoidstar wrote:
| I'm pretty sure Mixtral outperforms Grok-1 and uses much less
| memory to do it
| elfbargpt wrote:
| I'm a little out of touch, is there a way to see how Grok
| measures up to other models?
| amrrs wrote:
| Benchmarks here https://x.ai/blog/grok
| refulgentis wrote:
| And to compare, you can sort by MMLU on here: https://hug
| gingface.co/spaces/HuggingFaceH4/open_llm_leaderb....
|
| Edit: to include my self summary after review: There's a
| good 100 models better than, a couple 1x7b even. Mixtral
| stomps it, half mixtral are universally better but one is
| close to same.
| lossolo wrote:
| This benchmark is mostly worthless, some of the top
| models there were trained on benchmark data, which is a
| known fact in the community.
|
| The only reliable benchmark:
| https://huggingface.co/spaces/lmsys/chatbot-arena-
| leaderboar...
| cavisne wrote:
| One of the interesting things when weights are open sourced
| is the community can often improve the results. See all the
| bugs fixed in Gemma for an example.
| verticalscaler wrote:
| Well if nothing else, this one might be significantly less
| nerfed. Very interesting to compare to the others.
| refulgentis wrote:
| It's not, and I mean it, specifically in groks case.
|
| Generally, it's a boring boneheaded talking point that the 1%
| of us actually working in AI use as a sorting hat for who
| else is.
| renewiltord wrote:
| The safety crap makes the tools unusable. I used to have a
| test for it that I thought was decent, but Claude failed
| that test and it is way better than ChatGPT-4 for code,
| which means my test was bogus. The people actually working
| in AI are kind of irrelevant to me. It's whether or not the
| model will solve problems for me reliably.
|
| People "actually working in AI" have all sorts of nonsense
| takes.
| benreesman wrote:
| Another day, another fairly good comment going grey on an
| AI #1. The over-alignment _is_ really starting to be the
| dominant term in model utility, Opus and even Sonnet
| _are_ both subjectively and on certain coding metrics
| outperforming both the 1106-preview and 0125-preview on
| many coding tasks, and we _are_ seeing an ever-escalating
| set of kinda ridiculous hot takes from people with the
| credentials to know better.
|
| Please stop karma bombing comments saying reasonable
| things on important topics. The parent is maybe a little
| spicy, but the GP bought a ticket to that and plenty
| more.
|
| edit: fixed typo.
| benreesman wrote:
| I've been known to get snippy on HN from time to time
| myself :) So please know that I'm only offering a gentle
| nudge that I'd want from a fellow long-timer myself
| regarding a line of discussion that's liable to age poorly.
|
| Talking about sorting hats for those who do and don't have
| the one-percenter AI badge isn't a super hot look my guy
| (and I've veered dangerously close to that sort of thing
| myself, this is painful experience talking): while there is
| no shortage of uninformed editorializing about fairly
| cutting edge stuff, the image of a small cabal of robed
| insiders chucking in their cashews while swiping left and
| right on who gets to be part of the discussion serves
| neither experts nor their employers nor enthusiastic
| laypeople. This is _especially_ true for "alignment" stuff,
| which is probably the single most electrified rail in the
| whole discussion.
|
| And as a Google employee in the diffuser game by way of
| color theory, you guys have a "days since we over-aligned
| an image generation model right into a PR catastrophe" sign
| on the wall in the micro kitchen right? That looked
| "control vector" whacky, not DPO with pretty extreme
| negative prompt whacky, and substantially undermined the
| public's trust in the secretive mega labs.
|
| So as one long-time HN user and FAANG ML person to another,
| maybe ixnay with the atekeepinggay on the contentious AI #1
| thread a bit?
| rvnx wrote:
| One subtle thing: Musk said "open-source", we got "open-weights"
| instead (still better than nothing though, so it's greatly
| appreciated).
| paulgb wrote:
| Dumb question: what should open-source mean in the context of
| something like this? Open access to the training data and
| training pipeline as well?
| CharlesW wrote:
| It's not a dumb question, and the answer is "yes".
| zeroCalories wrote:
| Come on, that's not reasonable to expect from a company, or
| useful for indie hackers. Having weights that can be used
| however you like is enough for most people, even large
| companies.
| schoen wrote:
| Maybe it should be called something else? "Openly-
| licensed"?
|
| Just because the model weights are not really "source"
| (either as a matter of intuition or for example following
| the OSI "preferred form in which a programmer would
| modify the program" definition).
| simonw wrote:
| A big catch here is that you can't slap an open source
| license on a bunch of copyrighted training data, and to
| date no-one has created a truly convincing LLM exclusively
| trained on public domain data. It might happen soon though
| - there are some convincing effort in progress.
| CharlesW wrote:
| Absolutely, because it's trained mostly on unlicensed,
| copyrighted content, they basically can't release source.
| gfodor wrote:
| Many people think these companies are training on
| unlicensed data but I think OpenAI licenses their data,
| they just "license" it the way one would need to in order
| to read it.
| CharlesW wrote:
| > _...I think OpenAI licenses their data..._
|
| They've just started to (in response to lawsuits, it must
| be noted) and in the meantime, they're simultaneously
| claiming that (1) what they're doing is fair use (a.k.a.
| fair dealing) and (2) preparing for the day when courts
| confirm that it isn't.
| zer00eyz wrote:
| You all keep using the word "Data"
|
| Data, as in facts, as in the frequency of one word in
| relation to another.
|
| "Copyright does not protect facts, ideas, systems, or
| methods of operation, although it may protect the way
| these things are expressed..." FROM:
| https://www.copyright.gov/help/faq/faq-protect.html
|
| It's not a question of if, rather when the cat gets out
| of the bag and the legal battle starts. The problem is
| that all the copyright applies to the expression not the
| factual information it expresses (in this case word
| relations). Now "how math works" and "the language of the
| law" are going to make for an interesting court case. I
| suspect that math wins here but it depends on what judge
| gets it and how high it goes.
| nabakin wrote:
| Agreed. It's ridiculous people have to resort to saying
| their question dumb to avoid being attacked by toxic
| commenters.
| Q6T46nT668w6i3m wrote:
| Yes, training and evaluation code, i.e., the code used to
| generate the weights.
| TaylorAlexander wrote:
| The Open Source Initiative is actively working on this over
| the course of this year, and your input will help define that
| meaning! Please see here for more:
|
| https://opensource.org/blog/open-source-ai-definition-
| weekly...
| TaylorAlexander wrote:
| Yeah musk said "all design and engineering for the original
| roadster is now open source" and actually what we got was a few
| PCB files and zero mechanical design files so I don't ever
| trust what he says.
| tylerekahn wrote:
| This is the weights and the model under Apache 2.0 license.
| What do you mean by open-source?
|
| https://github.com/xai-org/grok/blob/main/model.py
|
| https://github.com/xai-org/grok/blob/main/run.py#L25
| pclmulqdq wrote:
| Still better than most of the "open weights" models that have
| massively restrictive terms.
| solarkraft wrote:
| He also called permissively licensing Tesla's patents "open
| sourcing" them. He's at the forefront of misusing the term.
| gardenhedge wrote:
| > Due to the large size of the model (314B parameters), a machine
| with enough GPU memory is required to test the model with the
| example code
|
| What type of machine do you need to play around with this?
| anigbrowl wrote:
| 'Chunky beast, needs 320 Gb VRAM likely 4 bit, likely is being
| run 8 bit on 8 x 80 Gb GPUs.'
|
| -Emad
| 317070 wrote:
| Probably a machine with about 628 GB of GPU memory. (2 bytes
| per parameter)
|
| So 8xH100 (80Gb each) should do it.
| pogue wrote:
| Can someone explain why the weights are posted via a Bittorrent
| magnet link? I have no way to check the size at the moment, but
| isn't that a bit unusual? There's also only 21 seeders right now
| according to https://checker.openwebtorrent.com/
| lambdaba wrote:
| Why not? Mistral was first to do it, it has become tradition.
| gillesjacobs wrote:
| I believe it was Llama 1 that notoriously got leaked with a
| torrent on 4chan.
| orlp wrote:
| BitTorrent is just an objectively superior method of
| delivering a lot of data to a lot of people.
| pooloo wrote:
| Its likely over 100GB of data, so I wouldn't say its
| necessarily unusual to spread out the bandwidth across multiple
| hosts.
| pogue wrote:
| Thanks! I searched and searched for a tool that would show me
| info via the web about a magnet link but nada
| CamperBob2 wrote:
| How else could/should it be done?
| pogue wrote:
| I would have assumed they could just upload it to Github. If
| it has restrictions on file size I'm sure they could make
| multiple part compressed files.
|
| Torrents can unfortunately die after a period of time if no
| one continues seeding it or if they don't use a permanent web
| based seeder, which doesn't appear to be the case.
| cedws wrote:
| GitHub may choose to throttle downloads or remove the files
| simply because they're taking up too much bandwidth.
|
| A torrent is less likely to go down in the short term.
| xcv123 wrote:
| This is not some crappy DVD rip on The Pirate Bay. It will
| be seeded as long as its relevant.
|
| Twitter/X has their own massive infrastructure and
| bandwidth to seed this indefinitely.
| KomoD wrote:
| Yeah, they can just leave some server running somewhere
| and just let it seed forever
| larrysalibra wrote:
| The great thing about torrents is that you (or anyone else
| who cares) can single-handedly solve the problem you're
| complaining about by seeding the torrent.
| simonw wrote:
| GitHub have a soft repository size limit of 5GB, documented
| here: https://docs.github.com/en/repositories/working-with-
| files/m...
|
| Soft size limit means "If your repository excessively
| impacts our infrastructure, you might receive an email from
| GitHub Support asking you to take corrective action." - I
| know people who have received such emails.
|
| Most model releases happen through Hugging Face which does
| not have such a size limit.
| KomoD wrote:
| They'd probably just charge you for it. They sell "data
| packs" for LFS.
|
| https://docs.github.com/billing/managing-billing-for-git-
| lar...
| rezonant wrote:
| I'd bet Hugging Face would be happy to have hosted these
| canonically too, so not sure why that doesn't happen
| more.
| osanseviero wrote:
| The model is also at https://huggingface.co/xai-org
| sashank_1509 wrote:
| No git would be impossible. I've never seen a repo even a
| few GB in size, if you are uploading non code files you
| really should not be using git. Git is a version management
| software for code. I often see repos which images and even
| videos checked in, please don't, there are so many far
| better and more performant solutions out there.
|
| The other approach would be to use AWS S3 or other cloud
| providers which would cost them money every time someone
| downloads their code, which is not their prerogative to pay
| for when they are releasing something for free. Torrents
| seems like the only good solution, unless someone hosts
| this on the cloud for free for everyone.
| sroussey wrote:
| Huggingface will disagree with impossible as their models
| are available via git, sometimes broken up in pth files.
|
| Still, as far as sentiment goes, yeah git for model
| weights is an impedance mismatch for sure!
| rezonant wrote:
| > No git would be impossible. I've never seen a repo even
| a few GB in size, if you are uploading non code files you
| really should not be using git
|
| It's not actually a limitation in git itself, especially
| if you use Git LFS. People use Git for Unreal projects
| and big ones can be half a terabyte or more in size.
| rezonant wrote:
| Others have pointed out that GitHub doesn't allow that, but
|
| > Torrents can unfortunately die after a period of time if
| no one continues seeding it or if they don't use a
| permanent web based seeder, which doesn't appear to be the
| case.
|
| So to can web links, especially when they are 300 GB and
| egressing out of AWS at $0.09/GB or worse (in non-US
| regions). Each full download would cost $27 at that rate.
| 10,000 downloads would cost $270,000.
|
| Sure you could go for something with a better cost model
| like R2, but you can't beat using one or two unmetered
| connections on a VPN to constantly seed on Bittorrent, your
| pricing would be effectively free and reliability would be
| higher than if you just exposed a HTTP server on the
| Internet in such a way.
| MallocVoidstar wrote:
| Distributing 300GB via torrent is cheaper than direct, assuming
| even a few other people seed
| monkin wrote:
| It's 318.24G
|
| https://academictorrents.com/details/5f96d43576e3d386c9ba65b...
| bongodongobob wrote:
| I'm not sure why you wouldn't tbh. That's a lot of bandwidth.
| jiripospisil wrote:
| I don't understand why you're being downvoted for asking a
| legitimate question. People not familiar with model weights
| might be surprised that they are often in tens of gigabytes and
| in this case even more.
| fzzzy wrote:
| It may become a tradition since weights are so large. Perhaps
| it started when the Llama torrent link leaked. Then, Mistral
| decided to release their weights using bittorrent.
| leumon wrote:
| Mistral did it too when they released their first open model.
| They just posted a magnet link on Twitter.
| raydev wrote:
| Spreads the burden/cost of distributing a 300+GB file.
| bbor wrote:
| Honestly the most interesting part is taking a peek at the kind
| of AI researcher working for Twitter after the objectively messy
| layoffs and subsequent crunch. I notice neither of them has
| Twitter mentioned on their GitHub, which is prolly for the best
| to avoid harassment lol.
|
| Code wise, excited to see if this could grow into anything! I
| think it's pretty clear that Grok didn't have nearly enough
| investment to be a top model so Elon "sacrificed" it on a whim in
| his schoolyard spat with OpenAI, but I'm not complaining. I've
| always took Elon on his word that he truly _is_ worried about
| centralization of AI, and I don't think any of the emails
| released by his schoolmate Altman dissuade me of that. So I have
| some reasonable hope that he uses some of his immense resources
| to start "fighting the good fight" here with Le Cun
| cma wrote:
| >taking a peek at the kind of AI researcher working for Twitter
|
| He made a separate company for this.
| paxys wrote:
| Neither of them works at Twitter. xAI is a separate company,
| and only uses Twitter's data to train.
| mattxxx wrote:
| I respect the openness here! This is the future that I want to
| see
| giancarlostoro wrote:
| Fully agree. People will trash talk it due to Musk but lets not
| forget the engineers who poured hours of their lives into
| building this and are continuing to do so.
| knowsuchagency wrote:
| The engineers who decided to work for him? Forgive me if I do
| forget about them and the hours of their lives spent on this
| lynndotpy wrote:
| Engineers who joined Twitter pre-Musk days who live and
| work in the US on an H1-B visa can't just quit.
|
| You can criticize Elon Musk without criticizing people who
| would have their lives upended if they quit or were fired.
| throw2022110401 wrote:
| That grace period has long passed. If you are still there
| at this point you are complicit.
| cap1434 wrote:
| Complicit in what exactly?
| devin wrote:
| I still reserve the right to trash talk Musk as I don't
| believe he is committed to openness as much as he wants to
| spite OpenAI for telling him to pound sand.
| llm_trw wrote:
| What's the difference?
|
| >Oh no, I only want _pure_ intentions for anything I use.
| Which is why I reject all for profit medicine.
|
| It doesn't matter why he did it. What matters is that he
| did it.
| devin wrote:
| It matters to me why people do things. I'm happy it's
| open, but it doesn't change my mind about the guy.
| llm_trw wrote:
| What an exhausting way to live.
| afavour wrote:
| Were they not paid to do so?
| revscat wrote:
| I feel the same about Tesla. They make good cars that are
| helping to get us off of oil. They have thousand of
| employees.
|
| And who among us has a CEO that isn't problematic, even if
| not so much so as Musk?
| hobobaggins wrote:
| Tesla is likely making good cars _because_ the CEO is
| 'problematic'
| trog wrote:
| Is it open if it doesn't include the training data? Genuine
| question - I am not familiar enough with the terms and
| technology to know. But my understanding is the weights is just
| a more or less static collection of data that has been (to
| paraphrase Ted Chiang) lossily compressed from the actual raw
| training data.
|
| Without the training data to thoroughly evaluate what is in
| there, the only way you can figure it out is through
| experimentation - e.g. running it up in a chatbot and asking it
| questions.
|
| Is this roughly correct or am I misunderstanding what you can
| do with the weights?
| 2devnull wrote:
| From issues: "Well the magnet file contains a 300GB checkpoint "
|
| That's why they are using a torrent I suppose.
| moralestapia wrote:
| Well, he delivered.
| paxys wrote:
| Partially. Open weights is not open source.
| gfodor wrote:
| In machine learning models the term open source has been
| largely accepted to mean sharing weights and, if necessary,
| inference code. You can argue if this is an abuse of the term
| but everyone does it, and saying someone didn't deliver if
| they used it and published weights would probably mean saying
| the same about mistral, meta, etc.
| stale2002 wrote:
| Hey, asking any experts here, what are their first thoughts in
| the significance of this?
|
| IE, is this comparable to any other model released, or are there
| significant metric differences that make it better for certain
| usecases?
|
| The only thing I see, of the top of my head, is that it is a very
| large model, and I don't think any models of similar size have
| been released.
| Me1000 wrote:
| Not an expert by any means, but I like learning about this
| stuff and I play with a lot of open weight models.
|
| I'd say the significance is that it happened. It's by far the
| largest open weight model I've seen. But I'm not sure why you'd
| use it over a model like Mixtral, which seems to perform about
| the same at like 1/6th the size.
|
| But I welcome any contribution to the open weight LLM
| community. Hopefully people will learn something interesting
| with this model. And I hope they keep releasing new versions!
| MichaelRazum wrote:
| If I may ask, how do you load such big models? 300gb seems
| like a lot to play around with.
| Me1000 wrote:
| You're right, this model is going to be too big for most
| people to play around with. But to answer your question I
| have a 128GB of RAM in my M3 MacBook Pro, so I can use most
| of that for GPU inferencing. But still, this model is going
| to need to be heavily quantized for me to be able to use
| it. (fwiw, I probably wont try this one)
|
| In the next week or two I expect we'll see a GGUF version
| of the weights (might need to wait for a patch to llama.cpp
| first), and someone will release super small quantizations
| of it. I suspect my computer might be able to run a 3 bit
| quant, but it might need to go down to 2 bits to have any
| kind of reasonable context length. But with quants that
| small I'd expect the model's performance to degrade well
| below that of Mixtral, so it probably isn't really even
| worth using. But we'll see; quantization is weird, some
| models perform better than others when quantized.
| MichaelRazum wrote:
| Thanks a lot for the hint :)! It awesome that it might
| run even on a MacBook, actually this is a reason to
| switch to Mac. Seems, there is nothing similar for a PC
| laptop with linux or windows.
| Me1000 wrote:
| No problem. I hope more people try these things out, it's
| the best way to push the industry forward! We can't let
| the researchers have all the fun.
|
| Apple had plenty of reasons to move forward with their
| Apple Silicon CPUs and GPUs in the mac, but they really
| did seem to get lucky with the unified memory
| architecture. It was kind of just an artifact of their
| design, but ends up serving the needs of deep neural net
| models really well!
| simonw wrote:
| Is there a model card anywhere? I'd like to know what it was
| trained on.
| LZ_Khan wrote:
| How are people's experience with this model? Having the most
| weights is one thing but being a better model than the 70B models
| is another.
| labrador wrote:
| tbh, I've never seen anyone share anything interesting produced
| by Grok. I see plenty of posts on X and reddit of people
| sharing amazing things that GPT-4 and now Claude 3 Opus can do.
| Grok can roast people. That's pretty much all I've seen.
|
| I'd love to proven wrong if someone cares to share something
| interesting produced by Grok.
| swalsh wrote:
| I use grok all the time to find tweets or ask about trends on
| Twitter. For that it's better than what used to exist. But its
| not a great model outside that narrow use case.
| arduanika wrote:
| CODE_OF_CONDUCT.md has only five words. :)
| schappim wrote:
| "Be excellent to each other."
| troupo wrote:
| Which is ironic given Musk's own behaviour and how he wants
| Grok to work
| TMWNN wrote:
| >Which is ironic given Musk's own behaviour
|
| You mean, like immediately responding to Ukraine's plea for
| Starlink and funding it on his own for months? So much so,
| that in February 2023 its government called Musk "one of
| the biggest private donors of our future victory"?
| <https://www.pravda.com.ua/eng/news/2023/02/9/7388696/>
| troupo wrote:
| Like being a general asshole to people, setting Grok to
| be a bit of an asshole, pushing increasingly unhinged
| conspiracy theories. As for Ukraine he now basically
| retransmits Russian talking points non-stop.
|
| Edit: Starlink, while insanely useful to Ukraine, was a
| PR stunt he almost immediately wanted to weasel out of.
| Even after it turned out that the US government ends up
| paying for it.
| arandomusername wrote:
| > pushing increasingly unhinged conspiracy theories
|
| like what?
| josh-sematic wrote:
| They're from "Bill and Ted's Excellent Adventure"
| bheadmaster wrote:
| I was hoping it would be "do not be an asshole", but I guess
| this is fine too.
| kergonath wrote:
| It would be finer if it were not so hypocritical, coming from
| a company lead by Elon Musk. As things are, it's just like
| any pledge Google can make about the greater good: a sad
| joke.
| marginalia_nu wrote:
| My favorite is SQLite's code of ~~conduct~~ ethics:
| https://sqlite.org/codeofethics.html
| TwentyPosts wrote:
| Huh. What's the backstory here?
| weberer wrote:
| https://pjmedia.com/paula-bolyard/2018/10/24/tech-
| community-...
| machiaweliczny wrote:
| If they are so behind they could make it open source instead of
| open weights and get some help.
| nicce wrote:
| Fully open-source means also providing open access to their
| data sets? Which is the only valuable thing Twitter (X) has
| left.
| heyoni wrote:
| And the one thing they are vehemently protecting from
| scrapers and other entities. Even nitter threw in the towel.
| EastSmith wrote:
| > Which is the only valuable thing Twitter (X) has left.
| reply
|
| They have a very valuable user base (all kinds of world
| leaders for example), so the data is not the only valuable
| thing they have.
| sroussey wrote:
| That's actually more valuable. Twitters data of small
| format text is awful for training. Best to just exclude it.
|
| There are hundreds of millions of people on Twitter, and a
| few of them are very smart. I don't see how that helps here
| though.
| Takennickname wrote:
| It doesn't help here. But the person your responding to
| is just pushing back against the "Elon destroyed Twitter
| and there's nothing left" narrative.
| xcv123 wrote:
| It's all open source. You can download the model and run it
| locally.
| paraboul wrote:
| Being free to use doesn't mean it ships with the original
| recipe.
| xcv123 wrote:
| What do you mean? The entire model is fully open source.
|
| The training methods are nothing secret, right? The
| architecture is well known.
|
| Expecting the entire training dataset to be fully open is
| delusional.
| simonw wrote:
| "Base model trained on a large amount of text data, not fine-
| tuned for any particular task."
|
| Presumably the version they've been previewing on Twitter is an
| instruction-tuned model which behaves quite differently from
| these raw weights.
| seccode wrote:
| It would be cool if these models had conversations with us where
| they ask questions. I think the future of AI is models that ask
| questions. There is so much data to be gained by doing this.
| swalsh wrote:
| That's just a matter of fine tuning
| seccode wrote:
| Do you have an example model I could try that does this?
| amrrs wrote:
| Try Pi by inflection. It asks a lot of questions.
| seccode wrote:
| I tried it, it just asked me how my day was going. I
| don't think this is doing exactly what I have in mind.
| But its a step in that direction
| ijustlovemath wrote:
| That "just" is doing some heavy lifting! GPT-4 is just a few
| matrix multiplications, how bad can their moat really be?
| BoorishBears wrote:
| Not sure what the snark here is for: It would be trivial to
| produce a dataset where the model asked you questions then
| fine-tune on that.
|
| People already do it with chain-of-thought and you could
| get away with a few dozen examples if you wanted to try
| this.
| swalsh wrote:
| I'd bet a synthetic data set could do the job effectively.
| crowcroft wrote:
| Ok im curious, but I don't quite understand.
|
| What would you want an AI to be asking you, and what would you
| want it to do with your response(s)?
| seccode wrote:
| I get advertisements all the time for conditions that I do
| not have, and that none of my family members have. If you had
| a model that asked questions, it could learn my medical
| history and could direct better ads to me.
|
| In order for AI to understand the world, it would have to ask
| questions. Understanding humans is key to understanding the
| world.
| globular-toast wrote:
| Learn from them.
| BoorishBears wrote:
| I ask AI to produce clarifying questions then answer them.
|
| Can help in not wasting a bunch of time waiting for an answer
| that missed the mark.
|
| -
|
| I think the sibling comment is probably the least attractive
| reason to have AI ask questions.
| seccode wrote:
| I agree, medical history is probably not the sexiest reason
| to have AI ask questions. I think there are many more
| reasons; I think the Turing Test is the best metric to
| evaluate AIs, and current models come nowhere close. When
| people first meet they ask questions about their
| background. It would be nice if a model replicated that
| BoorishBears wrote:
| > and could direct better ads to me.
|
| Is the least attractive part, by far.
| seccode wrote:
| In order for an AI to pass a Turing Test, it would surely
| ask questions. Think of Ava from Ex Machina. She asked
| questions to learn more about him
| BoorishBears wrote:
| I'm not debating the value of questions. I'm debating the
| value of feeding it to advertisers, especially since LLMs
| can infer much deeper insights about a person than a
| traditional assistant can with its canned capabilities
| and responses
| lars_francke wrote:
| Clarifying questions if the initial prompt was unclear. I'd
| love it.
|
| I regularly try to add something along the lines of "please
| ask clarifying questions if you could only give a generic or
| partial response otherwise" but so far it has never helped
| (ChatGPT 4).
| Me1000 wrote:
| 100% agreed. Gemini advanced does this sometimes. I wrote about
| it more in an older thread here:
| https://news.ycombinator.com/item?id=39445484
| geor9e wrote:
| Explore this idea more - it's easily implemented in a minute or
| two via the system prompt. API accounts are free to start and
| you can use the playground/workbench view, like this:
| https://imgur.com/h5jFoBM.jpg . I like Claude but OpenAI is
| popular too. OpenAI has a nice way to create a gallery of
| system prompts that act however you like, they call them Agents
| or GPTs.
| littlestymaar wrote:
| How long before the _Groq_ team sues for trademark violation? It
| 's literally the purpose of trademark laws to make sure
| resembling names do not cause confusion in the mind of customers
| so it would be very surprising to see this situation persist.
| nostrebored wrote:
| Would be a rough trademark enforcement case as "Grok" has been
| in common language for decades
| Angostura wrote:
| Robert A. Heinlein coined the term grok in 1961
| a1369209993 wrote:
| Six is plural.
| ben_w wrote:
| So has "Apple" and "Windows".
|
| Grok and groq both relate to AI, so there's definitely
| grounds to believe the names may cause consumer confusion.
|
| After all, Apple (computers) was repeatedly sued by Apple
| (records) for doing music things.
| cma wrote:
| It's easier to get a trademark on an altered word than a
| plain dictionary word. Just acquiring the easier one to
| acquire doesn't mean you now have rights over the harder
| one to acquire, though eventually after enough market
| recognition you might be given some control over other
| people using the common one. I wouldn't think groq is there
| yet.
| cavisne wrote:
| They already have.
| EastSmith wrote:
| There is a friendly warning here from Groq:
| https://wow.groq.com/hey-elon-its-time-to-cease-de-grok/
| bhaney wrote:
| Is it safe to say, 4 months later, that Elon is ignoring
| this? I assume there hasn't been any kind of response or
| further action taken yet.
| orsenthil wrote:
| I am not sure what open source models are accomplishing another
| than killing the lead from the competition (openai), only to give
| it to someone else who has expertise in the area of distribution.
| This will be yet another good addition to systems like Amazon
| BedRock.
| minimaxir wrote:
| Many of the recent innovations in both LLM architecture and
| inference were only made possible through open models such as
| Llama 2 and Mistral 7B as a starting point for iteration and
| refinement, which in turn backpropagates (heh) back to the LLMs
| developers.
|
| It's a win-win for everyone. That's the power of open source.
| geor9e wrote:
| Well, look at the history. Google had an insurmountable lead,
| so Elon started OpenAI. Now OpenAI has an insurmountable lead
| too. So everyone else is starting in third place, or lower.
| David versus two Goliaths. If you try to become a third
| Goliath, you'll probably just get smashed. You're later to the
| game. In this situation, going scorched earth becomes a viable
| strategy. Slay the Goliaths. Become a hero to the masses.
| Attract the world's best talent who don't want to be associated
| with proprietary models. At that point you have a world class
| AI business with momentum towards AGI. And even if you're
| giving away last year's technology for free, the team you built
| is churning out new ideas that could be a financial bonanza one
| day. Shareholders are willing to pay for a long-term bet if the
| story is good.
| andre-z wrote:
| The only other Repository is a fork of Qdrant.
| captcanuk wrote:
| "The implementation of the MoE layer in this repository is not
| efficient. The implementation was chosen to avoid the need for
| custom kernels to validate the correctness of the model."
|
| Or perhaps release your actual code AND the simplified
| implementation instead of hiding it and saying "you don't know
| her, she goes to a different high school"
| gfodor wrote:
| Always love it when someone gives away a gift and it's not
| enough for people.
| redskyluan wrote:
| This seems not be a repo ready to open source. You only get
| weights, very less information about how the weights is trained
| and finetuned.
|
| But anyway, it always great to see more LLM weigts available.
| andrewstuart2 wrote:
| I would argue that there's no bar for open sourcing aside from
| "do you have the rights to do so." Some source or some public
| good is certainly better than none, and when the bar is low
| then you remove barriers to getting started, vs waiting until
| you have the time someday to "do it right."
| rezonant wrote:
| Well what constitutes an "open source" model is still
| controversial and debatable-- lots of people on both sides of
| that argument.
| modeless wrote:
| Is this the first major model to be natively FP8? I was wondering
| why people hadn't done it yet. Seems like a big win when hardware
| supports it.
___________________________________________________________________
(page generated 2024-03-17 23:00 UTC)