[HN Gopher] Grok
___________________________________________________________________
Grok
Author : pierre
Score : 1099 points
Date : 2024-03-17 19:33 UTC (1 days ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| tosh wrote:
| blog post: https://x.ai/blog/grok-os * 314B
| parameters (86B active at a time) * mixture of experts 8 (2
| active at a time) * weights and architecture licensed under
| Apache 2.0
|
| (edit:) announcement blog post from last year with benchmarks
| compared to Claude 2, GPT-3.5 and GPT-4: https://x.ai/blog/grok
|
| (edit2:)TL;DR: somewhat comparable to GPT-3.5, Mixtral and
| Qwen-1.5-72B in capability but way larger than the open weight
| models
| TOMDM wrote:
| Mixtral is also comparable to gpt 3.5 and open.
|
| At 8x7B it's also a fraction of the size. Are there any
| benchmarks comparing Mixtral to Grok?
| tosh wrote:
| Mixtral announcement is here:
| https://mistral.ai/news/mixtral-of-experts/
|
| Mixtral looks more economical @ capability to size (similar
| also for Qwen 1.5 72b)
| OkGoDoIt wrote:
| Is a model so huge that's only at the level of GPT 3.5 actually
| good? That seems incredibly inefficient to me.
| cma wrote:
| Since it is MoE, quantized it could be able to run on cheaper
| hardware with just consumer networking inbetween instead of
| needing epyc/xeon levels of PCI-e lanes, nvlink, or
| infiniband type networking. Or it could even run with people
| pooling smaller systems over slow internet links.
| drak0n1c wrote:
| It's designed to be actively searching real-time posts on X.
| Apples and oranges.
| grey8 wrote:
| Why is that relevant to the size?
|
| Post search on X is done as it is with any other data from
| any other source, you use RAG and function calling to
| insert the context.
|
| < 7B open source models can function call very well. In
| fact, Nous Hermes 2 Pro (7B) is benchmarking better at that
| then GPT-3.5.
|
| Not related to the size, if I'm not mistaken.
| hn_20591249 wrote:
| The data pipeline isn't included in this release, and we
| already know it is a pretty simple RAG pipeline using
| qdrant, https://twitter.com/qdrant_engine/status/1721097971
| 830260030.
|
| Nothing about using data in "real time" predicates that the
| model parameters need to be this large, and is likely quite
| inefficient for their "non-woke" instructional use-case.
| lmeyerov wrote:
| Agreed. We have been building our real-time GPT flows for
| news & social as part of Louie.AI, think monitoring & and
| investigations... long-term, continuous training will
| become amazing, but for the next couple of years, most of
| our users would prefer GPT4 or Groq vs what's here and
| much smarter RAG. More strongly, the interesting part is
| how the RAG is done. Qdrant is cool but just a DB w a
| simple vector index, so nothing in Grok's release is tech
| we find relevant to our engine.
|
| Eg, there is a lot of noise in social data, and worse,
| misinfo/spam/etc, so we spend a lot of energy on
| adverserial data integration. Likewise, queries are often
| neurosymbolic, like on a data range or with
| inclusion/exclusion criteria. Pulling the top 20 most
| similar tweets to a query and running through a slow,
| dumb, & manipulated LLM would be a bad experience. We
| have been pulling in ideas from agents, knowledge graphs,
| digital forensics & SNA, code synthesis, GNNS, etc for
| our roadmap, which feels quite different from what is
| being shown here.
|
| We do have pure LLM work, but more about fine-tuning
| smaller or smarter models, and we find that to be a tiny
| % of the part people care about. Ex: Spam classifications
| flowing into our RAG/KG pipelines or small model training
| is more important to us than it flowing into a big model
| training. Long-term, I do expect growing emphasis on the
| big models we use, but that is a more nuanced discussion.
|
| (We have been piloting w gov types and are preparing for
| next cohorts, in case useful on real problems for
| anyone.)
| pests wrote:
| Isn't that... the same thing as search?
| fwlr wrote:
| OpenAI is valued at 90 billion and all they do is make GPT;
| Twitter is valued at 40 billion and this was essentially a
| vanity side-project by a cowboy CEO. Presuming that
| benchmarks and general "it's about the level of 3.5" is
| accurate, it's inefficient, but not incredibly inefficient
| imho
| pelorat wrote:
| > Twitter is valued at 40 billion
|
| WAS vaulued at 44B.
|
| Now?
|
| Maybe 5 billion.
| wongarsu wrote:
| Last I heard they lost 15% of their users, so let's call
| it 36 billion.
| mceachen wrote:
| More like $13b.
|
| https://arstechnica.com/tech-policy/2024/01/since-elon-
| musks...
| wraptile wrote:
| Twitter didn't have direct competitors other than
| Mastodon when it was taken at 44B. Now there's Threads,
| Bluesky and bigger Mastodon.
| jsight wrote:
| Honestly, none of those look like meaningful competitors
| at the moment.
| squigglydonut wrote:
| None of these matter
| dilyevsky wrote:
| They weren't even 44B when elon took the keys - he
| specifically tried to back out of the deal because 44B
| was insane peak '21 asset bubble price. In truth they
| were probably like 10-15B at that moment. And now that
| bunch of advertisers left due to we know who it's
| probably about 10B
| Lewton wrote:
| twitter was valued around 30 billion when musk tried
| getting out of buying it (then the market cap went up
| when it became clear that he would be forced to pay full
| price)
| alvah wrote:
| LOL @ $5 billion, but if it that was the valuation, you'd
| be making parent's point stronger.
| thekhatribharat wrote:
| xAI is a separate entity, and not a X/Twitter subsidiary.
| xcv123 wrote:
| According to their benchmarks it is superior to GPT-3.5
| tootie wrote:
| How is it that OpenAI was touted like it was some massive
| years-long effort that blew all AI research out of the water
| and now we have so many competitors popping up one after
| another?
| ben_w wrote:
| Egg of Columbus.
|
| Also, the general architecture is well documented, ChatGPT
| (specifically the chat interface, not GPT-3, not InstructGPT)
| is what made a lot of people _care_ , and actually
| reproducing it requires someone wanting to in the first
| place.
| longdog wrote:
| You don't need to be a cutting edge research scientist to
| train a SOTA LLM. You just need money for scaling. OpenAI's
| "secret" was just their willingness to spend tens/hundreds of
| millions without guaranteed returns, and RLHF/instruct fine
| tuning, both of which are out of the bag now.
| simonw wrote:
| Disagree. It took more than 12 months from the release of
| GPT-4 to someone else producing a model of equivalent
| quality, and that definitely wasn't due to a shortage of
| investment from the competition.
|
| There's a huge amount of depth in training a really good
| LLM. Not helped by the fact that iteration is incredibly
| expensive - it might take several months (and millions of
| dollars) before you can tell if your new model is working
| well or if there was some mistake in the pipeline that lead
| to a poor quality result.
|
| Almost all of the world-class LLMs outside of
| OpenAI/DeepMind have been trained by people who previously
| worked at those organizations - giving them invaluable
| experience such that they could avoid the most expensive
| mistakes while training their new models.
| lossolo wrote:
| Don't overlook the training data (used for both training
| and instruction fine-tuning), it is one of the most
| crucial aspects, if not the most critical, given the
| significant differences observed in models with similar
| architectures.
| echelon wrote:
| That only remains an advantage if they can continue
| climbing the gradient from their lead position. If they
| hit a snag in scaling, methodology, or research, everyone
| else on the planet catches up, and then it's anyone's
| game again.
| barrell wrote:
| While I do agree there is some amount of secret sauce,
| keep in mind the training takes several months. So from
| someone to see the success of GPT4, decide they want to
| invest that amount of money to train the same, raise the
| money to train the model, find someone competent to
| supervise the training, train the model for several
| months, then test and integrate it could easily be a year
| long even if there was no secret sauce.
| int_19h wrote:
| There's still no model of equivalent quality to GPT-4.
| bbig wrote:
| Claude 3 Opus is reporting superior metrics, particularly
| in its coding ability, and in the LLM Arena it is
| statistically tied with GPT-4.
| int_19h wrote:
| When it comes to LLMs, metrics are misleading and easy to
| game. Actually talking to it and running it through
| _novel_ tasks that require ability to reason very quickly
| demonstrates that it is not on par with GPT-4. As in, it
| can 't solve things step-by-step that GPT-4 can one-shot.
| FloorEgg wrote:
| This was exactly my experience. I have very complex
| prompts and I test them on new models and nothing
| performs as well as GPT-4 that I've tried (Claude 3 Opus
| included)
| astrange wrote:
| It's a bit better at writing jokes. GPT is stiff and
| unfunny - which is why the twitter spambots using it to
| generate text are so obvious.
| johnthewise wrote:
| Claude opus is better in my experience
| cavisne wrote:
| LLM training is arcane and expensive to experiment with. So
| OpenAI had to waste a lot of time and GPU-hours on things
| that didn't work to learn the tricks that did work.
|
| Most of the competitors have lineage straight back to OpenAI,
| eg the lead of x.ai was previously at OpenAI and Deepmind.
| Likewise with Mistral and especially Anthropic.
| jxy wrote:
| OpenAI still seems to be at the top, except for Anthropic,
| who may be close, in terms of the capabilities comparing
| gpt-4 and claude-opus.
|
| This Grok-1 is a large model (~314B), which matches gpt-3.5
| released 2 years ago, and at about the same level of much
| smaller models like, mixtral (~47B) and qwen-1.5 (~72B). Do
| you think it's competitive?
| asciii wrote:
| I love the citation for image in the article
|
| > The cover image was generated using Midjourney based on the
| following prompt proposed by Grok: A 3D illustration of a
| neural network, with transparent nodes and glowing connections,
| showcasing the varying weights as different thicknesses and
| colors of the connecting lines.
| extheat wrote:
| At 8x86B, looks like the largest open model yet by far. Would be
| interesting to hear how many tokens it's been trained on.
| Especially important for higher param models in order to
| efficiently utilize all those parameters.
| p1esk wrote:
| It's not 8x86B. Total number of parameters is 314B.
|
| Perhaps it's 8x39B to fit on a single 8xA100 (40GB) server?
| moffkalast wrote:
| Most likely it's a MoE of Grok-0 which would be 8x33B + 50B
| for the router.
| cma wrote:
| Active parameters is 86B, so wouldn't that be the size of the
| largest two experts (where they may all be the same) + the
| weights of the selector?
| dheera wrote:
| They all do this marketing bull.
|
| Mixtral has an 8x7B model but it's actually 46.7B, not 56B
| params.
|
| Kinda similar to how 4K displays are 3840 pixels wide, not
| true 4K which would be 4096. Marketing people called it 4K,
| not engineers.
| guitarlimeo wrote:
| I've always thought of 4K as "4x FullHD". In that way it
| makes sense.
| mavhc wrote:
| TV and Digital Cinema have different standards, because
| of course they do
| dheera wrote:
| Bleh no, K means thousand.
|
| For a long time we specified displays by their vertical
| dimension -- 480p, 720p, 1080p.
|
| Then the marketing guys came along and decided that the
| horizontal dimension sounds bigger. If we stuck with the
| less-bullshitty way of doing things and kept comparisons
| 1:1, we'd call 3840x2160 displays 2160p or "2K" displays,
| but instead, the marketing people decided that we're
| going to change things to horizontal and called 3840x2160
| "4K".
| swalsh wrote:
| Considering how poor it is compared to other models, it really
| emphasises how important fine tuning is. Models with MUCH
| smaller parameter counts are outperforming it in many metrics.
| lukan wrote:
| "it really emphasises how important fine tuning is"
|
| Or rather the quality of the training data?
| fragmede wrote:
| that's a subtle dig at the fact that they have all of
| Twitter as a training corpus to use, but we don't know how
| they weight tweets. which, we know they're not gonna be
| weighted evenly.
| rezonant wrote:
| I'm sure just like in X's algorithms, @elon tweets are
| weighted heavily.
| convery wrote:
| The X algorithm is also opensource, so you can verify
| before commenting..
| fragmede wrote:
| just because they open sourced it doesn't mean that's
| actually what they're running on it though
| chrisco255 wrote:
| It's not like he needs boosting, he was one of Twitter's
| top followed accounts long before he bought them. He's
| pretty good at getting attention.
| latexr wrote:
| And yet it's not enough to curb the desire to tip the
| scales.
|
| https://arstechnica.com/tech-policy/2023/02/report-musk-
| had-...
| lukan wrote:
| No idea about the current state, but the open sourcing
| did show, they were favoring elon:
|
| https://mashable.com/article/twitter-releases-algorithm-
| show...
|
| And personally I never used Twitter much, but I certainly
| did not follow Elon Musk when I did - yet I had to see
| lot's of his posts in my feed. Surely just coincidence.
| machdiamonds wrote:
| It's not too hard to believe it is a coincidence when the
| most followed person on a platform shows up in your feed,
| especially if you follow tech accounts.
| internetter wrote:
| Did you not read the article linked in the comment you're
| replying to?
| maccaw wrote:
| > they were favoring elon
|
| No, and that's not what the article says either. They
| were just tracking how well his tweets were doing versus
| others. They were not favoring Elon.
| lukan wrote:
| "They were just tracking how well his tweets were doing
| versus others. "
|
| Yeah, and adjusting it, so he comes out best. That was
| Musks demand, as the other article shows, that is linked
| inside, after a Biden tweet performed better than Musk:
|
| https://mashable.com/article/elon-musk-super-bowl-joe-
| biden-...
|
| They officially boost people, who pay a little bit. Elon
| payed a lot.
|
| And the source is clearly not the production source and
| never where in this shape - otherwise why sue someone,
| who open sourced it?
|
| "But, the release of this source code also comes days
| after Twitter forced Github to take down other parts of
| Twitter's source code that was allegedly posted by a
| former employee without the company's permission. So,
| clearly, there's still plenty of Twitter that Musk still
| doesn't want us to see."
|
| Also, you probably missed that:
|
| "Zoe Schiffer of Platformer reported that Twitter
| actually removed part of the source code that affected
| the reach of Musk's and other user's tweets before
| releasing the algorithm to the public."
|
| Which is consistent with quite some other statements,
| also from Twitter itself and the fact, that the source
| has not been updated in 8 months.
|
| See also this HN comment and discussion about it:
|
| https://news.ycombinator.com/item?id=35391854
|
| "But the underlying policies and models are almost
| entirely missing (there are a couple valuable components
| in [1]). Without those, we can't evaluate the behavior
| and possible effects of "the algorithm.""
| jokethrowaway wrote:
| Sounds a bit far fetched
|
| So changes in power users stats would also result in
| audience balancing?
|
| Most likely the code was used for analytics and for
| tracking balance; Elon was a pain in the ass and asked to
| have custom analytics for his account and devs eventually
| added him as an audience to be able to get analytics
| about him easily. A bit dirty but it works.
|
| Most likely the balancing code is somewhere else and it
| affects only republican / democrats.
| threeseed wrote:
| X algorithm Github project hasn't been updated in 8
| months:
|
| https://github.com/twitter/the-algorithm
|
| So clearly they aren't running it in production.
|
| Also they didn't open source the list of people who are
| being artificially boosted e.g. Elon.
| nonethewiser wrote:
| > I'm sure just like in X's algorithms, @elon tweets are
| weighted heavily.
|
| Are you sure or is it the literal opposite and you're
| just speculating?
| llm_trw wrote:
| We don't know since no one is releasing their data.
|
| Calling these models open source is like calling a binary
| open source because you can download it.
|
| Which in this day and age isn't far from where were at.
| DreamGen wrote:
| A big distinction is that you can built on top (fine-
| tune) thus released models as well as if they released
| the pre-training data.
| llm_trw wrote:
| You can also build on top of binaries if you use gotos
| and machine code.
| shwaj wrote:
| This seems intentionally obtuse. What you say is true,
| but it is very obvious that this is _much_ more of a pain
| than if you had the source code. On the other hand, fine
| tuning is just as easy, regardless of whether you have
| the original training data.
| samus wrote:
| One could also disassemble an executable and build on top
| of it. Not for the faint of heart and probably illegal,
| but possible unless it was deliberately obfuscated.
| Compared to that, it is impossible with state-of-the-art
| methods to systematically extract the training data from
| an LLM model. Fragments yes, but not all of it.
| visarga wrote:
| You can do better - generate synthetic data covering all
| topics. And to make it less prone to hallucination, use
| RAG or web search for reference material. The Phi-1.5
| model was trained on 300B of synthetic tokens generated
| with chatGPT and it showed a 5x bump in efficiency,
| punching well above its line.
|
| Synthetic data can be more diverse if you sample
| carefully with seeded concepts, and it can be more
| complex than average web text. You can even diff against
| a garden variety Mistral or LLaMA and only collect
| knowledge and skills they don't already have. I call this
| approach "Machine Study", where AI makes its own training
| data by studying its corpus and learning from other
| models.
| llm_trw wrote:
| If you don't know the original training data statistical
| distribution then catastrophic forgetting is guaranteed
| with any extra training.
| shwaj wrote:
| I was going to ask for a reference to support this
| (although please provide one if handy), but the search
| term "catastrophic forgetting" is a great entry into the
| literature. Thanks.
| adrianN wrote:
| Or shell scripts
| tarruda wrote:
| You can fine tune without the pre training data too.
|
| Mistral models are one example, they never released pre
| training data and there are many fine tunes.
| drexlspivey wrote:
| Their data is the twitter corpus which is public. Or do
| you want a dump of their database for free too?
| llm_trw wrote:
| Saying "It's just the twitter public corpus." is like
| saying "Here's the Linux Kernel, makefiles not included."
| zx8080 wrote:
| Or even "here's the Linux Kernel makefiles, no sources
| included, enjoy".
| minimaxir wrote:
| Twitter tweet data in itself is both highly idiosyncratic
| and short by design, which alone is not conductive
| towards training a LLM.
| swalsh wrote:
| We should just call it open weight models at this point.
| boulos wrote:
| How about "weights available" as similar to the "source
| available" moniker?
| fragmede wrote:
| weights available or model available, but yes.
| cl3misch wrote:
| FWIW the Grok repo uses the term "open weights".
| cainxinth wrote:
| > _We don 't know since no one is releasing their data._
|
| Is anyone else just assuming at this point that virtually
| everyone is using the pirated materials in The Pile like
| Books3?
| GaggiX wrote:
| Or even how much it was trained on this dataset, the amount
| of FLOPs.
| jakderrida wrote:
| Aren't they usually built on most of the same training
| data?
| lairv wrote:
| I would say it emphasises that training a good model is more
| than throwing random data and compute
| make3 wrote:
| no it empathizes the importance of training smaller models
| for longer, like the Mistral "overtrained" models
| gdiamos wrote:
| Show the proof? Does it include IFT?
| gordian-mind wrote:
| Current metrics are a poor way to measure the usefulness of
| LLMs.
| zone411 wrote:
| It's actually not the largest.
| https://huggingface.co/google/switch-c-2048 is 1.6T parameters.
| hubraumhugo wrote:
| When will we reach an upper limit/dimishing returns in terms of
| number of parameters and mixture of experts?
| andy99 wrote:
| We may have already - data is more important than anything else
| which is why nobody has beat GPT4 yet. Throwing more parameters
| or more compute at the problem only gets you so far. But Grok
| was never a contender so there is room to improve on it. It is
| one of the biggest models open sourced as mentioned, so will be
| interesting to take a look at for sure.
| lambdaba wrote:
| Claude 3 has *decisively* beat GPT-4, I wonder how all their
| attributes compare.
| stainablesteel wrote:
| i like some of claudes answers better, but it doesnt seem
| to be a better coder imo
| simonw wrote:
| I've found it to be significantly better for code than
| GPT-4 - I've had multiple examples where the GPT-4
| solution contained bugs but the Claude 3 Opus solution
| was exactly what I wanted. One recent example:
| https://fedi.simonwillison.net/@simon/112057299607427949
|
| How well models work varies wildly according to your
| personal prompting style though - it's possible I just
| have a prompting style which happens to work better with
| Claude 3.
| bugglebeetle wrote:
| What is your code prompting style for Claude? I've tried
| to repurpose some of my GPT-4 ones for Claude and have
| noticed some degradation. I use the "Act as a software
| developer/write a spec/implement step-by-step" CoT style.
| simonw wrote:
| Almost impossible to describe prompting style, but here
| are some examples of how I've used Claude 3:
|
| https://gist.github.com/simonw/4cecde4a729f4da0b5059b50c8
| e01... - writing a Python function
|
| https://gist.github.com/simonw/408fcf28e9fc6bb2233aae694f
| 8cd... - most sophisticated example, building a
| JavaScript command palette
|
| https://gist.github.com/simonw/2002e2b56a97053bd9302a34e0
| b83... - asking it to refactor some existing code
|
| I don't use the "Act as a X" format any more, I'm not at
| all convinced it has a noticeable impact on quality. I
| think it's yet another example of LLM superstition.
| lgas wrote:
| > I don't use the "Act as a X" format any more, I'm not
| at all convinced it has a noticeable impact on quality. I
| think it's yet another example of LLM superstition.
|
| It's very contextually dependent. You really have to
| things like this for your specific task, with your
| specific model, etc. Sometimes it helps, sometimes it
| hurts, and sometimes it does nothing at all.
| bugglebeetle wrote:
| Super helpful! Thanks!
| furyofantares wrote:
| I didn't know people were still doing this "act as etc
| etc" instructional prompting.
|
| I just tell it my coding problem. Or when making
| something from scratch, ask for small things and
| incrementally add.
| asciii wrote:
| > according to your personal prompting style though
|
| I like the notion of someone's personal prompting style
| (seems like a proxy for those that can prepare a question
| with context about the other's knowledge) - that's
| interesting for these systems in future job interviews
| furyofantares wrote:
| I've found it significantly better than GPT4 for code and
| it's become my go-to for coding.
|
| That's actually saying something, because there's also
| serious drawbacks.
|
| - Feels a little slower. Might just be UI
|
| - I have a lot of experience prompting GPT4
|
| - I don't like using it for non-code because it gives me
| to much "safety" pushback
|
| - No custom instructions. ChatGPT knows I use macos and
| zsh and a few other preferences that I'd rather not have
| to type into my queries frequently
|
| I find all of the above kind of annoying and I don't like
| having two different LLMs I go to daily. But I mention it
| because it's a fairly significant hurdle it had to
| overcome to become the main thing I use for coding! There
| were a number of things where I gave up on GPT then went
| to Claude and it did great; never had the reverse
| experience so far and overall just feels like I've had
| noticeably better responses.
| htrp wrote:
| citation needed (other than 'vibes')
| swalsh wrote:
| I don't know if Claude is "smarter" in any significant way.
| But its harder working. I can ask it for some code, and I
| never get a placeholder. It dutifully gives me the code I
| need.
| lambdaba wrote:
| It understands instructions better, it's rarer to have it
| misunderstand, and I have to be less careful with
| prompting.
| orbital-decay wrote:
| Has it, though? LMSys Arena Leaderboard (blind ranking by
| humans) [0] positions Opus just below GPT-4 with a
| negligible ELO gap.
|
| [0] https://chat.lmsys.org/
| espadrine wrote:
| A number of AI companies have a naming/reproducibility
| issue.
|
| GPT4 Turbo, released last November, is a separate version
| that is much better than GPT-4 (winning 70% of human
| preferences in blind tests), released in March 2023.
|
| Claude 3 Opus beats release-day GPT-4 (winning 60% of
| human preferences), but not GPT-4 Turbo.
|
| In the LMSys leaderboard, release-day GPT-4 is labeled
| gpt-4-0314, and GPT4 Turbo is labeled gpt-4-1106-preview.
| BoorishBears wrote:
| Chatbot Arena is not a blind ranking.
|
| Many, if not most, users intentionally ask the models
| questions to tease out their canned disclaimers: so they
| know exactly which model is answering.
|
| On one hand it's fair to say disclaimers affect the
| usefulness of the model, but on the other I don't think
| most people are solely asking these LLMs to produce meth
| or say "fuck", and that has an outsized effect on the
| usefulness of Chatbot Arena as a general benchmark.
|
| I personally recommend people use it at most as a way to
| directly test specific LLMs and ignore it as a benchmark.
| staticman2 wrote:
| That "blind ranking" is limited to about 2,000 tokens of
| context. So it's certainly not evaluating how good the
| models are at complex assignments.
| squigz wrote:
| I think Groq is something else?
| LorenDB wrote:
| Indeed, Groq is a company building inference accelerators.
| Grok is completely unaffiliated.
| andy99 wrote:
| Edited, I did mean the Grok in the article not the
| inference chip.
| YetAnotherNick wrote:
| There is no reason to believe GPT-4 had more(or higher
| quality) data than Google etc. has now. GPT-4 was entirely
| trained before the Microsoft deal. If OpenAI could pay to
| acquire data in 2023, >10 companies could acquire similar
| quality by now, and no one has similar quality model in a
| year.
| austhrow743 wrote:
| The more disregard a company has for intellectual property
| rights, the more data they can use.
|
| Google had far more to lose from a "copyright? lol"
| approach than OpenAI did.
| brookst wrote:
| I was under the impression training was at best an
| undefined area of IP law. Is there any aspect of
| copyright that prohibits training models?
| simonw wrote:
| This is being tested by a number of lawsuits right now,
| most notably the NY Times one: https://nytco-
| assets.nytimes.com/2023/12/NYT_Complaint_Dec20...
|
| The key questions are around "fair use". Part of the US
| doctrine of fair use is "the effect of the use upon the
| potential market for or value of the copyrighted work" -
| so one big question here is whether a model has a
| negative impact on the market for the copyrighted work it
| was trained on.
| sroussey wrote:
| I don't think the New York Times thing is that much about
| training, than it is about the fact that ChatGPT can use
| Bing and Bing has access to New York Times articles for
| search purposes.
| simonw wrote:
| If you read the lawsuit it's absolutely about training.
| The Bing RAG piece is one of the complaints in there but
| it's by no means the most important.
|
| Take a look at https://nytco-
| assets.nytimes.com/2023/12/NYT_Complaint_Dec20... -
| bullet points 2 and 4 on pages 2/3 are about training
| data. Bullet point 5 is the Bing RAG thing.
| sroussey wrote:
| Ah, thanks!
| YetAnotherNick wrote:
| Having used both Google's and OpenAI's models, the kind
| of issue they have are different. Google's models are
| superior or at least on par in knowledge. It's the
| instruction following and understanding where OpenAI is
| significantly better. I don't think pretraining data is
| the reason of this.
| supafastcoder wrote:
| > Google had far more to lose from a "copyright? lol"
| approach than OpenAI did.
|
| The company that scrapes trillions of web pages has an
| issue with copyright?
| sib wrote:
| Well... Googlebot does pay attention to robots.txt - I
| don't think (original) OpenAI-bot did.
| ldjkfkdsjnv wrote:
| Claude > GPT4. Anyone using these models on a daily basis
| knows this
| jstummbillig wrote:
| It is known
| int_19h wrote:
| I use these models regularly, and Claude is dumb as a rock
| compared to GPT-4.
| nylonstrung wrote:
| For what reason would you want to use this instead of open source
| alternatives like Mistral
| rvnx wrote:
| Mistral opened their weights only for very small LLaMA-like
| model.
| MallocVoidstar wrote:
| I'm pretty sure Mixtral outperforms Grok-1 and uses much less
| memory to do it
| elfbargpt wrote:
| I'm a little out of touch, is there a way to see how Grok
| measures up to other models?
| amrrs wrote:
| Benchmarks here https://x.ai/blog/grok
| refulgentis wrote:
| And to compare, you can sort by MMLU on here: https://hug
| gingface.co/spaces/HuggingFaceH4/open_llm_leaderb....
|
| Edit: to include my self summary after review: There's a
| good 100 models better than, a couple 1x7b even. Mixtral
| stomps it, half mixtral are universally better but one is
| close to same.
| lossolo wrote:
| This benchmark is mostly worthless, some of the top
| models there were trained on benchmark data, which is a
| known fact in the community.
|
| The only reliable benchmark:
| https://huggingface.co/spaces/lmsys/chatbot-arena-
| leaderboar...
| refulgentis wrote:
| No, it's not "mostly worthless" and yes, some of the top
| models were removed a few months back from being trained
| on benchmark data.
|
| I urge you to at least think through what alternative you
| propose before posting so aggressively in these
| situations. Lmsys doesn't have Grok, or I would have
| included it. And having _some_ data is better than none.
|
| I also had someone arguing with me 6 months back that we
| can't trust any benchmarks at all from vendors, which
| would exclude the blog post. Instead of just repeating
| that back vehemently, I filled a gap. It's important we
| don't self-peasantize as a species, all data has its
| issues, that doesn't mean we throw it all out.
| michaelt wrote:
| Quantifiable metrics are useful if they're credible,
| certainly.
|
| But does it seem likely, to you, that a 7B-parameter
| model would outperform a 314B-parameter model? Given that
| we can look at the chatbot arena leaderboard and it's
| dominated by proprietary, 70B and 8x7B models?
|
| A well regarded and modern model like Mixtral 8x7B, which
| is ranked 13th on the chatbot arena leaderboard, scores
| 72.7 'Average' on the open LLM leaderboard - and yet
| 'pastiche-crown-clown-7b-dare-dpo' scores 76.5.
|
| To me, that sounds too good to be true.
| refulgentis wrote:
| Yup, 100%. Grok isn't very good and it was rushed.
|
| Rest re: pastiche model, etc. are proposing things I'm
| not claiming, or close to what I'm claiming.
|
| n.b. you don't multiply the parameters by experts to get
| an effective parameter count. Why? Think of it this way:
| every expert needs to learn how to speak English, so
| there's a nontrivial amount of duplication among all
| experts
| michaelt wrote:
| _> n.b. you don 't multiply the parameters by experts to
| get an effective parameter count._
|
| I actually took the 314B from Grok's HF page [1] which
| describes the model as "314B parameters" when explaining
| why it needs a multi-GPU machine.
|
| I certainly agree that parameter count isn't everything,
| though; clearly things like training data quality and
| fine tuning count for a lot.
|
| [1] https://huggingface.co/xai-org/grok-1
| cavisne wrote:
| One of the interesting things when weights are open sourced
| is the community can often improve the results. See all the
| bugs fixed in Gemma for an example.
| ein0p wrote:
| Doubtful, for purely information theoretic and memory
| capacity reasons. It may outperform on some synthetic
| metrics, but in practice, to a human, larger models just
| feel "smarter" because they have a lot more density in
| their long tail where metrics never go
| verticalscaler wrote:
| Well if nothing else, this one might be significantly less
| nerfed. Very interesting to compare to the others.
| refulgentis wrote:
| It's not, and I mean it, specifically in groks case.
|
| Generally, it's a boring boneheaded talking point that the 1%
| of us actually working in AI use as a sorting hat for who
| else is.
| renewiltord wrote:
| The safety crap makes the tools unusable. I used to have a
| test for it that I thought was decent, but Claude failed
| that test and it is way better than ChatGPT-4 for code,
| which means my test was bogus. The people actually working
| in AI are kind of irrelevant to me. It's whether or not the
| model will solve problems for me reliably.
|
| People "actually working in AI" have all sorts of nonsense
| takes.
| benreesman wrote:
| Another day, another fairly good comment going grey on an
| AI #1. The over-alignment _is_ really starting to be the
| dominant term in model utility, Opus and even Sonnet
| _are_ both subjectively and on certain coding metrics
| outperforming both the 1106-preview and 0125-preview on
| many coding tasks, and we _are_ seeing an ever-escalating
| set of kinda ridiculous hot takes from people with the
| credentials to know better.
|
| Please stop karma bombing comments saying reasonable
| things on important topics. The parent is maybe a little
| spicy, but the GP bought a ticket to that and plenty
| more.
|
| edit: fixed typo.
| refulgentis wrote:
| What if they're wrong, and most people know what a
| "system message" is a year after ChatGPT launch, so
| they're willing to downvote?
|
| Is there any chance that could be happening, instead of a
| complex drama play with OP buying tickets to spice that's
| 100% obviously true?
| benreesman wrote:
| I was trying to be helpful. I've made elitist remarks on
| HN that were dubious in at least two ways: it was dubious
| if I was actually all that elite, and it was dubious if
| any amount of being elite justifies or makes useful a
| posture of elitism. My internal jury is still out, but as
| of writing I think I probably underestimated how unique
| my first-hand knowledge and contributions were, but more
| than made up for that by the claims exceeding the reality
| by a huge margin, for a massive net lose that made me
| wish I could take the remarks back.
|
| I click on every HN username I reply to, because I've
| been hanging out here for like 16 years and more than
| once I've mouthed off only to later realize it was about
| C++ to Walter Bright or something, and looked a fool as a
| result. I've since apologized to Walter for disrespecting
| a legend and he was very gracious about it, to cite just
| one example.
|
| Your initial remark wasn't even that bad, certainly
| others talk that way, and I tried to frame it accurately
| as one guy who tends to FAANG-flex carelessly rather than
| thoughtfully to another guy who probably doesn't talk to
| people like that face to face and is probably a pretty
| good guy having a tough day. I was trying to say: "been
| there, maybe cool it man you're probably going to have
| the same bad time I've had on this sort of thing".
|
| But this is getting to where I'm starting to lose my
| temper a bit, I've been pretty cool about this. I even
| went and read the Dart/`llama.cpp`/`ONNX` stuff because
| I've also messed around with binding to `llama.cpp` and
| `whisper.cpp` and stuff just to make sure I'm not
| mouthing off to Jeff Dean's alt or something. I'm not
| talking to Jeff Dean.
|
| I surf with `showdead` on, and I don't know the current
| meta so I don't know if you know that you've been flagged
| dead 3 times on this subthread already and as much as I'd
| like to, I can't really argue with any of the 3.
|
| But given that you've clearly got similar interests, and
| therefore probably things that you could teach me if I
| were willing to listen, I'm going to propose a do-over.
|
| If you'd like to start this over from a place of mutual
| interest and write this thread off to "a pair of people
| had bad vibes on an Internet forum once", email be at
| `b7r6@b7r6.net`.
|
| If not, no hard feelings, but in that case, let's just
| give one another a wide berth and call it a day.
| threeseed wrote:
| > The safety crap makes the tools unusable
|
| For you that may be the case.
|
| But the widespread popularity of ChatGPT and similar
| models shows that it isn't a serious impediment to
| adoption. And erring on the side of safety comes with
| significant benefits e.g. less negative media coverage,
| investigations by regulators etc.
| wmidwestranger wrote:
| Seems like marketing and brand recognition might be some
| confounding variables when asserting ChatGPT's dominance
| is due to technical and performance superiority.
| benreesman wrote:
| I've been known to get snippy on HN from time to time
| myself :) So please know that I'm only offering a gentle
| nudge that I'd want from a fellow long-timer myself
| regarding a line of discussion that's liable to age poorly.
|
| Talking about sorting hats for those who do and don't have
| the one-percenter AI badge isn't a super hot look my guy
| (and I've veered dangerously close to that sort of thing
| myself, this is painful experience talking): while there is
| no shortage of uninformed editorializing about fairly
| cutting edge stuff, the image of a small cabal of robed
| insiders chucking in their cashews while swiping left and
| right on who gets to be part of the discussion serves
| neither experts nor their employers nor enthusiastic
| laypeople. This is _especially_ true for "alignment" stuff,
| which is probably the single most electrified rail in the
| whole discussion.
|
| And as a Google employee in the diffuser game by way of
| color theory, you guys have a "days since we over-aligned
| an image generation model right into a PR catastrophe" sign
| on the wall in the micro kitchen right? That looked
| "control vector" whacky, not DPO with pretty extreme
| negative prompt whacky, and substantially undermined the
| public's trust in the secretive mega labs.
|
| So as one long-time HN user and FAANG ML person to another,
| maybe ixnay with the atekeepinggay on the contentious AI #1
| thread a bit?
| gopher_space wrote:
| Every discipline has its bellwether topics. They're
| useful for filtering out people who want to chip in
| without picking up the tools.
| whimsicalism wrote:
| regardless of whether they say it out loud, it is what
| many of us think - might be good for people to know why
| their opinions are getting immediately dismissed by
| insiders
| benreesman wrote:
| Letting people know how why their opinions are getting
| dismissed in a productive way is done by citing well-
| known sources in low-effort way, or by explaining things
| thoughtfully in a high-effort way: Karpathy has chosen
| the highest-effort way of most anyone, it seems unlikely
| that anyone is at a higher rung of "insiderness" than he
| is, having been at Toronto with (IIRC) Hinton and Alex
| and those folks since this was called "deep learning",
| and has worked at this point at most of the best
| respected labs.
|
| But even if folks don't find that argument persuasive,
| I'd remind everyone that the "insiders" have a tendency
| to get run over by the commons/maker/hacker/technical
| public in this business: Linux destroying basically the
| entire elite Unix vendor ecosystem and ending up on well
| over half of mobile came about (among many other reasons)
| because plenty of good hackers weren't part of the
| establishment, or were sick of the bullshit they were
| doing at work all day and went home and worked on the
| open stuff (bringing all their expertise with them) is a
| signal example. And what e.g. the Sun people were doing
| in the 90s was every bit as impressive given the hardware
| they had as anything coming out of a big lab today. I
| think LeCun did the original MNIST stuff on a Sun box.
|
| The hard-core DRM stuff during the Napster Wars getting
| hacked, leaked, reverse engineered, and otherwise
| rendered irrelevant until a workable compromise was
| brokered would be another example of how that mentality
| destroyed the old guard.
|
| I guess I sort of agree that it's good people are saying
| this out loud, because it's probably a conversation we
| should have, but yikes, _someone_ is going to end up on
| the wrong side of history here and realizing how closely
| scrutinized all of this is going to be by that history
| has really motivated me to watch my snark on the topic
| and apologize pretty quickly when I land in that place.
|
| When I was in Menlo Park, Mark and Sheryl had
| intentionally left a _ton_ of Sun Microsystems
| iconography all over the place and the message was pretty
| clear: if you get complacent in this business, start
| thinking you 're too smart to be challenged, someone else
| is going to be working in your office faster than you
| ever thought possible.
| refulgentis wrote:
| I have no idea how you've wandered all the way to
| Napster, Sun, hackers, etc. Really incredible work.
|
| Well, I kind of know, you're still rolling with "this
| dude's a google employee", so the guy foaming at his
| mouth about Google makes sense to you, and now you have
| to reach to ancient lore to provide grounding for it.
|
| I don't work for Google.
| benreesman wrote:
| Then don't link to an "About Me" page [1] that says you
| do? How is confusion on that subject any reader or
| commenter's fault?
|
| I don't care if you personally work at Google or not,
| Google got itself in quite a jam as concerns public
| perception of their product in particular and the AI
| topic in general by going overboard with over-alignment,
| everyone knows that so one assumes that insiders know it,
| which is one of a great many examples of how strongly-
| forced models are a real problem for arbitrarily
| prestigious insider-laden labs.
|
| Framing the debate about whether large, proprietary
| models are over-aligned or mis-aligned as an acid test
| for whether or not someone is worth paying attention to
| is really weird hill to stand on.
|
| [1] https://www.jpohhhh.com/about
| refulgentis wrote:
| Yes, you do care, in fact, you care a lot! You made it
| the centerpiece of your argument and went to a lot of
| trouble to do so.
|
| Flag away, my friend.
| refulgentis wrote:
| You're making up a person and being extremely creepy
| while doing a poor job of it.
|
| It's at least funny, because you're doubling down on OP's
| bad takes, and embarrassing yourself with trying to
| justify it with what you thought was brilliant research
| and a witty person-based argument. But, you messed up. So
| it's funny.
|
| Punchline? Even if you weren't wrong, it would have been
| trivial while doing your research to find out half of
| Deep Mind followed me this week. Why? I crapped all over
| Gemini this week and went viral for it.
|
| I guess, given that, I should find it utterly
| unsurprising you're also getting personal, and clinging
| to 1% as a class distinction thing and making mental
| images of cloistered councils in robes, instead of, well,
| people who know what they're talking about, as the other
| repliers to you point out.
|
| "1%ers are when the Home Depot elites make fun of me for
| screaming about how a hammer is a nerfed screwdriver!"
| benreesman wrote:
| I've been around here a pretty long time, but I could
| still be off base here: as far as I understood people
| generally posted links to their own blog [1] in their HN
| profile because they want people to read them? I read
| your blog and particularly the posts about Gigadiffusion
| because I wanted to reply from a position of having put
| some effort into understanding where the poster I was
| replying to was coming from before popping off with what
| could be taken as a criticism. If that offends you or
| creeps you out I'm more than happy to steer clear of it
| with the parting remark that I really like Material and
| had hoped that any follow up would give me the
| opportunity to compliment you on some nice work.
|
| If that's not your blog, you should probably take it off
| your profile?
|
| [1] https://www.jpohhhh.com/
| refulgentis wrote:
| I'm not doing a faux-nice thing with you. You made up an
| elaborate argument, to justify rank fact-free ranting,
| based on false information. Thanks for your time.
| not_really wrote:
| lol, okay
| mlindner wrote:
| Curious why you're so dismissive of something that's pretty
| important?
| random_cynic wrote:
| The 1% who actually work on AI don't use terms as generic
| as "AI". Way to reveal yourself as college undergrad who
| read a couple of popular science books, downloaded MNIST
| data and thinks they are "experts".
| zozbot234 wrote:
| Isn't this Apache licensed? Regardless, you can run multiple
| models concurrently on the same input using well-known ensemble
| techniques. (Not to be confused with mixture-of-experts, which
| is more like training a single model where only a few blocks
| are chosen to be active at any given time - a kind of
| sparsity.)
| tlb wrote:
| Not super easy if they have different tokenizers.
| rvnx wrote:
| One subtle thing: Musk said "open-source", we got "open-weights"
| instead (still better than nothing though, so it's greatly
| appreciated).
| paulgb wrote:
| Dumb question: what should open-source mean in the context of
| something like this? Open access to the training data and
| training pipeline as well?
| CharlesW wrote:
| It's not a dumb question, and the answer is "yes".
| zeroCalories wrote:
| Come on, that's not reasonable to expect from a company, or
| useful for indie hackers. Having weights that can be used
| however you like is enough for most people, even large
| companies.
| schoen wrote:
| Maybe it should be called something else? "Openly-
| licensed"?
|
| Just because the model weights are not really "source"
| (either as a matter of intuition or for example following
| the OSI "preferred form in which a programmer would
| modify the program" definition).
| zeroCalories wrote:
| Sure, but I don't want to train anyone's model from
| scratch. Realistically, I can't download all the training
| data, or run the pipeline, or train the model. Making all
| of that available to me would be a massive burden on the
| company too, so they simply won't do it. If I'm able to
| fine-tune it, that's enough for me, and imo, that fits
| with the spirit of open/free software. We have to
| understand that this is fundamentally a different thing
| than something like the Linux kernel, and closer to
| something like an industrial project. The output is just
| a bunch of numbers instead of something physical.
| simonw wrote:
| A big catch here is that you can't slap an open source
| license on a bunch of copyrighted training data, and to
| date no-one has created a truly convincing LLM exclusively
| trained on public domain data. It might happen soon though
| - there are some convincing effort in progress.
| CharlesW wrote:
| Absolutely, because it's trained mostly on unlicensed,
| copyrighted content, they basically can't release source.
| gfodor wrote:
| Many people think these companies are training on
| unlicensed data but I think OpenAI licenses their data,
| they just "license" it the way one would need to in order
| to read it.
| CharlesW wrote:
| > _...I think OpenAI licenses their data..._
|
| They've just started to (in response to lawsuits, it must
| be noted) and in the meantime, they're simultaneously
| claiming that (1) what they're doing is fair use (a.k.a.
| fair dealing) and (2) preparing for the day when courts
| confirm that it isn't.
| zer00eyz wrote:
| You all keep using the word "Data"
|
| Data, as in facts, as in the frequency of one word in
| relation to another.
|
| "Copyright does not protect facts, ideas, systems, or
| methods of operation, although it may protect the way
| these things are expressed..." FROM:
| https://www.copyright.gov/help/faq/faq-protect.html
|
| It's not a question of if, rather when the cat gets out
| of the bag and the legal battle starts. The problem is
| that all the copyright applies to the expression not the
| factual information it expresses (in this case word
| relations). Now "how math works" and "the language of the
| law" are going to make for an interesting court case. I
| suspect that math wins here but it depends on what judge
| gets it and how high it goes.
| gfodor wrote:
| No, the term data can be used to describe anything that
| can be recorded in bytes. It's "data storage capacity"
| when you buy a hard drive.
| logicchains wrote:
| https://substack.recursal.ai/p/eaglex-17t-soaring-past-
| llama... this one claims to have been trained only on
| permissively licensed data.
| nabakin wrote:
| Agreed. It's ridiculous people have to resort to saying
| their question dumb to avoid being attacked by toxic
| commenters.
| dudus wrote:
| If you release that instead of the binary weights you can
| be both more open and less useful for users. Fun
| Q6T46nT668w6i3m wrote:
| Yes, training and evaluation code, i.e., the code used to
| generate the weights.
| TaylorAlexander wrote:
| The Open Source Initiative is actively working on this over
| the course of this year, and your input will help define that
| meaning! Please see here for more:
|
| https://opensource.org/blog/open-source-ai-definition-
| weekly...
| TaylorAlexander wrote:
| Yeah musk said "all design and engineering for the original
| roadster is now open source" and actually what we got was a few
| PCB files and zero mechanical design files so I don't ever
| trust what he says.
| tylerekahn wrote:
| This is the weights and the model under Apache 2.0 license.
| What do you mean by open-source?
|
| https://github.com/xai-org/grok/blob/main/model.py
|
| https://github.com/xai-org/grok/blob/main/run.py#L25
| pclmulqdq wrote:
| Still better than most of the "open weights" models that have
| massively restrictive terms.
| solarkraft wrote:
| He also called permissively licensing Tesla's patents "open
| sourcing" them. He's at the forefront of misusing the term.
| drexlspivey wrote:
| The "source" in "open source" refers to source code which
| they released. A dataset is not source code, if anyone is
| misusing the term it's you.
| frabcus wrote:
| I consider the weights a binary program and the source code
| is the training data. The training algorithm is the
| compiler.
|
| I agree this isn't standard terminology, but it makes the
| most sense to me in terms of power dynamics and information
| flow.
|
| We know from interpretability research that the weights do
| algorithms eg sin approximation etc. So they feel like
| binary programs to me.
| solarkraft wrote:
| https://youtu.be/WyTzRnGSlcI?t=88
| HarHarVeryFunny wrote:
| If you can't rebuild it, then how can you be considered to
| have the "source code" ?
|
| The training data isn't a dataset used at runtime - it's
| basically the source code to the weights.
|
| Not sure it really _matters_ here though (who has the GPUs
| and desire to retrain Grok?), but just as a matter of
| definition "open weights" fits better than "open source".
| gardenhedge wrote:
| > Due to the large size of the model (314B parameters), a machine
| with enough GPU memory is required to test the model with the
| example code
|
| What type of machine do you need to play around with this?
| anigbrowl wrote:
| 'Chunky beast, needs 320 Gb VRAM likely 4 bit, likely is being
| run 8 bit on 8 x 80 Gb GPUs.'
|
| -Emad
| 317070 wrote:
| Probably a machine with about 628 GB of GPU memory. (2 bytes
| per parameter)
|
| So 8xH100 (80Gb each) should do it.
| Marlinski wrote:
| I suppose it can be quantizised
| a_wild_dandan wrote:
| A single 192GB M2 Mac using a 4-bit quant would work.
| pogue wrote:
| Can someone explain why the weights are posted via a Bittorrent
| magnet link? I have no way to check the size at the moment, but
| isn't that a bit unusual? There's also only 21 seeders right now
| according to https://checker.openwebtorrent.com/
| lambdaba wrote:
| Why not? Mistral was first to do it, it has become tradition.
| gillesjacobs wrote:
| I believe it was Llama 1 that notoriously got leaked with a
| torrent on 4chan.
| astrange wrote:
| It wasn't much of a leak. Facebook was pretending to keep
| it private for PR reasons but putting approximately zero
| effort into actually keeping it private.
| orlp wrote:
| BitTorrent is just an objectively superior method of
| delivering a lot of data to a lot of people.
| pooloo wrote:
| Its likely over 100GB of data, so I wouldn't say its
| necessarily unusual to spread out the bandwidth across multiple
| hosts.
| pogue wrote:
| Thanks! I searched and searched for a tool that would show me
| info via the web about a magnet link but nada
| CamperBob2 wrote:
| How else could/should it be done?
| pogue wrote:
| I would have assumed they could just upload it to Github. If
| it has restrictions on file size I'm sure they could make
| multiple part compressed files.
|
| Torrents can unfortunately die after a period of time if no
| one continues seeding it or if they don't use a permanent web
| based seeder, which doesn't appear to be the case.
| cedws wrote:
| GitHub may choose to throttle downloads or remove the files
| simply because they're taking up too much bandwidth.
|
| A torrent is less likely to go down in the short term.
| xcv123 wrote:
| This is not some crappy DVD rip on The Pirate Bay. It will
| be seeded as long as its relevant.
|
| Twitter/X has their own massive infrastructure and
| bandwidth to seed this indefinitely.
| KomoD wrote:
| Yeah, they can just leave some server running somewhere
| and just let it seed forever
| larrysalibra wrote:
| The great thing about torrents is that you (or anyone else
| who cares) can single-handedly solve the problem you're
| complaining about by seeding the torrent.
| simonw wrote:
| GitHub have a soft repository size limit of 5GB, documented
| here: https://docs.github.com/en/repositories/working-with-
| files/m...
|
| Soft size limit means "If your repository excessively
| impacts our infrastructure, you might receive an email from
| GitHub Support asking you to take corrective action." - I
| know people who have received such emails.
|
| Most model releases happen through Hugging Face which does
| not have such a size limit.
| KomoD wrote:
| They'd probably just charge you for it. They sell "data
| packs" for LFS.
|
| https://docs.github.com/billing/managing-billing-for-git-
| lar...
| zepton wrote:
| It would be super expensive to use LFS to distribute
| this:
|
| > Each pack costs $5 per month, and provides 50 GiB of
| bandwidth and 50 GiB for storage
|
| So they would need to pay for 6 data packs (or $30) for
| every 300gb download.
|
| (https://docs.github.com/en/billing/managing-billing-for-
| git-...)
| rezonant wrote:
| I'd bet Hugging Face would be happy to have hosted these
| canonically too, so not sure why that doesn't happen
| more.
| osanseviero wrote:
| The model is also at https://huggingface.co/xai-org
| sashank_1509 wrote:
| No git would be impossible. I've never seen a repo even a
| few GB in size, if you are uploading non code files you
| really should not be using git. Git is a version management
| software for code. I often see repos which images and even
| videos checked in, please don't, there are so many far
| better and more performant solutions out there.
|
| The other approach would be to use AWS S3 or other cloud
| providers which would cost them money every time someone
| downloads their code, which is not their prerogative to pay
| for when they are releasing something for free. Torrents
| seems like the only good solution, unless someone hosts
| this on the cloud for free for everyone.
| sroussey wrote:
| Huggingface will disagree with impossible as their models
| are available via git, sometimes broken up in pth files.
|
| Still, as far as sentiment goes, yeah git for model
| weights is an impedance mismatch for sure!
| rezonant wrote:
| > No git would be impossible. I've never seen a repo even
| a few GB in size, if you are uploading non code files you
| really should not be using git
|
| It's not actually a limitation in git itself, especially
| if you use Git LFS. People use Git for Unreal projects
| and big ones can be half a terabyte or more in size.
| djhn wrote:
| Scott Chacon (github cofounder) mentioned in a recent
| talk that the Windows repo is 300GB
| https://youtu.be/aolI_Rz0ZqY?si=MOo2eS6dsKKAxmsP
| rezonant wrote:
| Others have pointed out that GitHub doesn't allow that, but
|
| > Torrents can unfortunately die after a period of time if
| no one continues seeding it or if they don't use a
| permanent web based seeder, which doesn't appear to be the
| case.
|
| So to can web links, especially when they are 300 GB and
| egressing out of AWS at $0.09/GB or worse (in non-US
| regions). Each full download would cost $27 at that rate.
| 10,000 downloads would cost $270,000.
|
| Sure you could go for something with a better cost model
| like R2, but you can't beat using one or two unmetered
| connections on a VPN to constantly seed on Bittorrent, your
| pricing would be effectively free and reliability would be
| higher than if you just exposed a HTTP server on the
| Internet in such a way.
| KomoD wrote:
| > and egressing out of AWS at $0.09/GB
|
| There's a lot of seeders on the torrent that are actually
| AWS ips too, all with similar configurations which makes
| me believe that it's probably xAI running them
|
| > on a VPN
|
| That's unnecessary, you don't need a VPN?
| rezonant wrote:
| No you don't, but if you wanted to host it from your
| gigabit office IP, you probably would want to.
| KomoD wrote:
| Why?
| MallocVoidstar wrote:
| Distributing 300GB via torrent is cheaper than direct, assuming
| even a few other people seed
| monkin wrote:
| It's 318.24G
|
| https://academictorrents.com/details/5f96d43576e3d386c9ba65b...
| bongodongobob wrote:
| I'm not sure why you wouldn't tbh. That's a lot of bandwidth.
| jiripospisil wrote:
| I don't understand why you're being downvoted for asking a
| legitimate question. People not familiar with model weights
| might be surprised that they are often in tens of gigabytes and
| in this case even more.
| fzzzy wrote:
| It may become a tradition since weights are so large. Perhaps
| it started when the Llama torrent link leaked. Then, Mistral
| decided to release their weights using bittorrent.
| leumon wrote:
| Mistral did it too when they released their first open model.
| They just posted a magnet link on Twitter.
| raydev wrote:
| Spreads the burden/cost of distributing a 300+GB file.
| seydor wrote:
| my optimistic explanation is we are going back to the 2000s
| internet , but probably we are not
| fzzzy wrote:
| Let's hope so.
| ur-whale wrote:
| > Can someone explain why the weights are posted via a
| Bittorrent magnet link?
|
| I think the best way to get an answer to that question is to
| try to host it yourself and see what happens.
| whywhywhywhy wrote:
| Because Bittorrent is an outstanding tech for delivering large
| files, more I think about it the more I'm surprised it wasn't
| taken advantage of more.
| Marlinski wrote:
| it's been criminalized to hell by IP holders and hollywood.
| Such a shame they killed the best tech of the previous
| decade. Could have revolutionized how we distribute content,
| approach CDN and even streaming.
| harkinian wrote:
| In what way is the bittorrent protocol criminalized?
| yayr wrote:
| scheme 1: agents for copyright holders continuously scan
| for IP addresses who host copyrighted content and start
| legal actions.
|
| scheme 2: criminal groups infect copyrighted content with
| malware to exploit downloaders of such content.
| bbor wrote:
| Honestly the most interesting part is taking a peek at the kind
| of AI researcher working for Twitter after the objectively messy
| layoffs and subsequent crunch. I notice neither of them has
| Twitter mentioned on their GitHub, which is prolly for the best
| to avoid harassment lol.
|
| Code wise, excited to see if this could grow into anything! I
| think it's pretty clear that Grok didn't have nearly enough
| investment to be a top model so Elon "sacrificed" it on a whim in
| his schoolyard spat with OpenAI, but I'm not complaining. I've
| always took Elon on his word that he truly _is_ worried about
| centralization of AI, and I don't think any of the emails
| released by his schoolmate Altman dissuade me of that. So I have
| some reasonable hope that he uses some of his immense resources
| to start "fighting the good fight" here with Le Cun
| cma wrote:
| >taking a peek at the kind of AI researcher working for Twitter
|
| He made a separate company for this.
| paxys wrote:
| Neither of them works at Twitter. xAI is a separate company,
| and only uses Twitter's data to train.
| bbor wrote:
| Thanks for the correction! I know, I just don't believe in
| corporations so the distinction is slight
| mattxxx wrote:
| I respect the openness here! This is the future that I want to
| see
| giancarlostoro wrote:
| Fully agree. People will trash talk it due to Musk but lets not
| forget the engineers who poured hours of their lives into
| building this and are continuing to do so.
| knowsuchagency wrote:
| The engineers who decided to work for him? Forgive me if I do
| forget about them and the hours of their lives spent on this
| lynndotpy wrote:
| Engineers who joined Twitter pre-Musk days who live and
| work in the US on an H1-B visa can't just quit.
|
| You can criticize Elon Musk without criticizing people who
| would have their lives upended if they quit or were fired.
| throw2022110401 wrote:
| That grace period has long passed. If you are still there
| at this point you have made a choice.
|
| (Removed "complicit" because I don't like the way that
| sounded)
| cap1434 wrote:
| Complicit in what exactly?
| devin wrote:
| I still reserve the right to trash talk Musk as I don't
| believe he is committed to openness as much as he wants to
| spite OpenAI for telling him to pound sand.
| llm_trw wrote:
| What's the difference?
|
| >Oh no, I only want _pure_ intentions for anything I use.
| Which is why I reject all for profit medicine.
|
| It doesn't matter why he did it. What matters is that he
| did it.
| devin wrote:
| It matters to me why people do things. I'm happy it's
| open, but it doesn't change my mind about the guy.
| llm_trw wrote:
| What an exhausting way to live.
| giancarlostoro wrote:
| This makes no sense to me for two reasons:
|
| - He pointed out that his understanding was that it would
| be open source in some way
|
| - The name OpenAI implies an open source endeavor. I dont
| know many things named Open that are in fact close sourced.
| afavour wrote:
| Were they not paid to do so?
| revscat wrote:
| I feel the same about Tesla. They make good cars that are
| helping to get us off of oil. They have thousand of
| employees.
|
| And who among us has a CEO that isn't problematic, even if
| not so much so as Musk?
| hobobaggins wrote:
| Tesla is likely making good cars _because_ the CEO is
| 'problematic'
| mplewis wrote:
| "Good" cars is a real stretch.
| sprobertson wrote:
| > engineers who poured hours of their lives into building
| this
|
| Not to mar these specific engineers, but that's an empty
| phrase that can be said about anything ever built. It doesn't
| somehow make the idea or implementation good.
| giancarlostoro wrote:
| The phrase merely means dont just overlook something
| because someone else who did not even labour over the end
| result.
| trog wrote:
| Is it open if it doesn't include the training data? Genuine
| question - I am not familiar enough with the terms and
| technology to know. But my understanding is the weights is just
| a more or less static collection of data that has been (to
| paraphrase Ted Chiang) lossily compressed from the actual raw
| training data.
|
| Without the training data to thoroughly evaluate what is in
| there, the only way you can figure it out is through
| experimentation - e.g. running it up in a chatbot and asking it
| questions.
|
| Is this roughly correct or am I misunderstanding what you can
| do with the weights?
| 2devnull wrote:
| From issues: "Well the magnet file contains a 300GB checkpoint "
|
| That's why they are using a torrent I suppose.
| moralestapia wrote:
| Well, he delivered.
| paxys wrote:
| Partially. Open weights is not open source.
| gfodor wrote:
| In machine learning models the term open source has been
| largely accepted to mean sharing weights and, if necessary,
| inference code. You can argue if this is an abuse of the term
| but everyone does it, and saying someone didn't deliver if
| they used it and published weights would probably mean saying
| the same about mistral, meta, etc.
| asadotzler wrote:
| Yes. So say the same thing about them Open source has a
| definition and abusing that hurts all of us except the
| billionaires.
| moralestapia wrote:
| I get the "open source" argument, but what is the issue
| here?
|
| If you are able to reproduce the thing in its entirety
| and you're given no restrictions on its use, it seems
| compatible with the spirit of open sourcing things.
| xcv123 wrote:
| The architecture of the model is open source. Not just the
| weights. You can run the entire thing locally.
| stale2002 wrote:
| Hey, asking any experts here, what are their first thoughts in
| the significance of this?
|
| IE, is this comparable to any other model released, or are there
| significant metric differences that make it better for certain
| usecases?
|
| The only thing I see, of the top of my head, is that it is a very
| large model, and I don't think any models of similar size have
| been released.
| Me1000 wrote:
| Not an expert by any means, but I like learning about this
| stuff and I play with a lot of open weight models.
|
| I'd say the significance is that it happened. It's by far the
| largest open weight model I've seen. But I'm not sure why you'd
| use it over a model like Mixtral, which seems to perform about
| the same at like 1/6th the size.
|
| But I welcome any contribution to the open weight LLM
| community. Hopefully people will learn something interesting
| with this model. And I hope they keep releasing new versions!
| MichaelRazum wrote:
| If I may ask, how do you load such big models? 300gb seems
| like a lot to play around with.
| Me1000 wrote:
| You're right, this model is going to be too big for most
| people to play around with. But to answer your question I
| have a 128GB of RAM in my M3 MacBook Pro, so I can use most
| of that for GPU inferencing. But still, this model is going
| to need to be heavily quantized for me to be able to use
| it. (fwiw, I probably wont try this one)
|
| In the next week or two I expect we'll see a GGUF version
| of the weights (might need to wait for a patch to llama.cpp
| first), and someone will release super small quantizations
| of it. I suspect my computer might be able to run a 3 bit
| quant, but it might need to go down to 2 bits to have any
| kind of reasonable context length. But with quants that
| small I'd expect the model's performance to degrade well
| below that of Mixtral, so it probably isn't really even
| worth using. But we'll see; quantization is weird, some
| models perform better than others when quantized.
| MichaelRazum wrote:
| Thanks a lot for the hint :)! It awesome that it might
| run even on a MacBook, actually this is a reason to
| switch to Mac. Seems, there is nothing similar for a PC
| laptop with linux or windows.
| Me1000 wrote:
| No problem. I hope more people try these things out, it's
| the best way to push the industry forward! We can't let
| the researchers have all the fun.
|
| Apple had plenty of reasons to move forward with their
| Apple Silicon CPUs and GPUs in the mac, but they really
| did seem to get lucky with the unified memory
| architecture. It was kind of just an artifact of their
| design, but ends up serving the needs of deep neural net
| models really well!
| TMWNN wrote:
| >In the next week or two I expect we'll see a GGUF
| version of the weights (might need to wait for a patch to
| llama.cpp first), and someone will release super small
| quantizations of it.
|
| How quickly are new models available through Ollama?
| cjbprime wrote:
| Few days max.
| Me1000 wrote:
| Ollama is just a wrapper around llama.cpp, so when the
| gguf model files come out it'll be able to run on Ollama
| (assuming no llama.cpp patch is needed, but even if it is
| ollama is usually good at getting those updates out
| pretty quickly).
| zozbot234 wrote:
| A top-of-the-line Mac Studio Ultra maxes out at 192GB
| currently. This is also a MoE model, so only a fraction
| of parameters have to be in RAM.
| EgoIncarnate wrote:
| Each token generated may only use a subset of the
| parameters (86billion instead of 314billion), but the
| next generated token might use a different subset. If
| it's anything like Mixtral, it will switch between
| experts constantly. It helps with memory bandwidth, but
| all the parameters still need to be in RAM or it would be
| unbearably slow.
| Me1000 wrote:
| MoE doesn't really help with the memory requirements for
| the reason mentioned in the other comment. But it does
| help with reducing the compute needed per inference.
| Which is good because the M3 Max and M2 Ultra don't have
| the best GPUs. A 70B parameter model is pretty slow on my
| M3 Max, and this model has 86B activations per inference
| run.
| whimsicalism wrote:
| seems like a large undertrained model, not that exciting imo
| compared to mixtral
|
| it is also not the biggest model oss, switch transformer was
| released years ago and is larger and similarly undertrained
| brucethemoose2 wrote:
| Tests are not out yet, but:
|
| - It's _very_ large, yes.
|
| - It's a base model, so its not really practical to use without
| further finetuning.
|
| - Based on Grok-1 API performance (which itself is probably a
| finetune) its... not great at all.
| simonw wrote:
| Is there a model card anywhere? I'd like to know what it was
| trained on.
| LZ_Khan wrote:
| How are people's experience with this model? Having the most
| weights is one thing but being a better model than the 70B models
| is another.
| labrador wrote:
| tbh, I've never seen anyone share anything interesting produced
| by Grok. I see plenty of posts on X and reddit of people
| sharing amazing things that GPT-4 and now Claude 3 Opus can do.
| Grok can roast people. That's pretty much all I've seen.
|
| I'd love to proven wrong if someone cares to share something
| interesting produced by Grok.
| swalsh wrote:
| I use grok all the time to find tweets or ask about trends on
| Twitter. For that it's better than what used to exist. But its
| not a great model outside that narrow use case.
| arduanika wrote:
| CODE_OF_CONDUCT.md has only five words. :)
| schappim wrote:
| "Be excellent to each other."
| josh-sematic wrote:
| They're from "Bill and Ted's Excellent Adventure"
| bheadmaster wrote:
| I was hoping it would be "do not be an asshole", but I guess
| this is fine too.
| marginalia_nu wrote:
| My favorite is SQLite's code of ~~conduct~~ ethics:
| https://sqlite.org/codeofethics.html
| TwentyPosts wrote:
| Huh. What's the backstory here?
| weberer wrote:
| https://pjmedia.com/paula-bolyard/2018/10/24/tech-
| community-...
| agmater wrote:
| What do you like about it? It seems incredibly creepy to me.
| machiaweliczny wrote:
| If they are so behind they could make it open source instead of
| open weights and get some help.
| nicce wrote:
| Fully open-source means also providing open access to their
| data sets? Which is the only valuable thing Twitter (X) has
| left.
| heyoni wrote:
| And the one thing they are vehemently protecting from
| scrapers and other entities. Even nitter threw in the towel.
| EastSmith wrote:
| > Which is the only valuable thing Twitter (X) has left.
| reply
|
| They have a very valuable user base (all kinds of world
| leaders for example), so the data is not the only valuable
| thing they have.
| sroussey wrote:
| That's actually more valuable. Twitters data of small
| format text is awful for training. Best to just exclude it.
|
| There are hundreds of millions of people on Twitter, and a
| few of them are very smart. I don't see how that helps here
| though.
| Takennickname wrote:
| It doesn't help here. But the person your responding to
| is just pushing back against the "Elon destroyed Twitter
| and there's nothing left" narrative.
| nicce wrote:
| I don't see difference here.
|
| Userbase and their social networks and interactions _is the
| data_.
|
| They don't have much value from advertising point of view
| anymore.
| xcv123 wrote:
| It's all open source. You can download the model and run it
| locally.
| paraboul wrote:
| Being free to use doesn't mean it ships with the original
| recipe.
| xcv123 wrote:
| What do you mean? The entire model and architecture and
| executables are fully open source.
|
| The training methods are nothing secret, right? The
| architecture is well known.
|
| Expecting the entire training dataset to be fully open is
| delusional.
| DaSHacka wrote:
| > Expecting the entire training dataset to be fully open
| is delusional.
|
| Right, because its not like the training dataset was
| built off comments posted by all of us in the first
| place.
|
| How ungrateful we are, to demand the ability to access
| what was unconsentually built off our hard work in the
| first place.
| xcv123 wrote:
| https://help.twitter.com/en/using-x/about-grok
|
| "How was Grok trained?
|
| Like most LLM's today, Grok-1 was pre-trained by xAI on a
| variety of text data from publicly available sources from
| the Internet up to Q3 2023 and data sets reviewed and
| curated by AI Tutors who are human reviewers. Grok-1 has
| not been pre-trained on X data (including public X
| posts)"
| simonw wrote:
| "Base model trained on a large amount of text data, not fine-
| tuned for any particular task."
|
| Presumably the version they've been previewing on Twitter is an
| instruction-tuned model which behaves quite differently from
| these raw weights.
| seccode wrote:
| It would be cool if these models had conversations with us where
| they ask questions. I think the future of AI is models that ask
| questions. There is so much data to be gained by doing this.
| swalsh wrote:
| That's just a matter of fine tuning
| seccode wrote:
| Do you have an example model I could try that does this?
| amrrs wrote:
| Try Pi by inflection. It asks a lot of questions.
| seccode wrote:
| I tried it, it just asked me how my day was going. I
| don't think this is doing exactly what I have in mind.
| But its a step in that direction
| ijustlovemath wrote:
| That "just" is doing some heavy lifting! GPT-4 is just a few
| matrix multiplications, how bad can their moat really be?
| BoorishBears wrote:
| Not sure what the snark here is for: It would be trivial to
| produce a dataset where the model asked you questions then
| fine-tune on that.
|
| People already do it with chain-of-thought and you could
| get away with a few dozen examples if you wanted to try
| this.
| BoorishBears wrote:
| Out of boredom I decided to prove this too: I asked
| ChatGPT and Claude for ~200 samples in total.
|
| Just uploaded the examples as-is to OpenAI, selected 3.5
| as the model to fine-tune and about 20 minutes later I
| had my model.
|
| Works fine, asks good questions, can ask more than 1
| follow up question if needed, and actually changes its
| answers based on the clarifying questions.
|
| https://imgur.com/a/SsXunVN
| swalsh wrote:
| I'd bet a synthetic data set could do the job effectively.
| crowcroft wrote:
| Ok im curious, but I don't quite understand.
|
| What would you want an AI to be asking you, and what would you
| want it to do with your response(s)?
| seccode wrote:
| I get advertisements all the time for conditions that I do
| not have, and that none of my family members have. If you had
| a model that asked questions, it could learn my medical
| history and could direct better ads to me.
|
| In order for AI to understand the world, it would have to ask
| questions. Understanding humans is key to understanding the
| world.
| globular-toast wrote:
| Learn from them.
| BoorishBears wrote:
| I ask AI to produce clarifying questions then answer them.
|
| Can help in not wasting a bunch of time waiting for an answer
| that missed the mark.
|
| -
|
| I think the sibling comment is probably the least attractive
| reason to have AI ask questions.
| seccode wrote:
| I agree, medical history is probably not the sexiest reason
| to have AI ask questions. I think there are many more
| reasons; I think the Turing Test is the best metric to
| evaluate AIs, and current models come nowhere close. When
| people first meet they ask questions about their
| background. It would be nice if a model replicated that
| BoorishBears wrote:
| > and could direct better ads to me.
|
| Is the least attractive part, by far.
| seccode wrote:
| In order for an AI to pass a Turing Test, it would surely
| ask questions. Think of Ava from Ex Machina. She asked
| questions to learn more about him
| BoorishBears wrote:
| I'm not debating the value of questions. I'm debating the
| value of feeding it to advertisers, especially since LLMs
| can infer much deeper insights about a person than a
| traditional assistant can with its canned capabilities
| and responses
| lars_francke wrote:
| Clarifying questions if the initial prompt was unclear. I'd
| love it.
|
| I regularly try to add something along the lines of "please
| ask clarifying questions if you could only give a generic or
| partial response otherwise" but so far it has never helped
| (ChatGPT 4).
| whimsicalism wrote:
| ?? gpt4 does this for me regularly
| Me1000 wrote:
| 100% agreed. Gemini advanced does this sometimes. I wrote about
| it more in an older thread here:
| https://news.ycombinator.com/item?id=39445484
| geor9e wrote:
| Explore this idea more - it's easily implemented in a minute or
| two via the system prompt. API accounts are free to start and
| you can use the playground/workbench view, like this:
| https://imgur.com/h5jFoBM.jpg . I like Claude but OpenAI is
| popular too. OpenAI has a nice way to create a gallery of
| system prompts that act however you like, they call them Agents
| or GPTs.
| littlestymaar wrote:
| How long before the _Groq_ team sues for trademark violation? It
| 's literally the purpose of trademark laws to make sure
| resembling names do not cause confusion in the mind of customers
| so it would be very surprising to see this situation persist.
| nostrebored wrote:
| Would be a rough trademark enforcement case as "Grok" has been
| in common language for decades
| Angostura wrote:
| Robert A. Heinlein coined the term grok in 1961
| a1369209993 wrote:
| Six is plural.
| ben_w wrote:
| So has "Apple" and "Windows".
|
| Grok and groq both relate to AI, so there's definitely
| grounds to believe the names may cause consumer confusion.
|
| After all, Apple (computers) was repeatedly sued by Apple
| (records) for doing music things.
| cma wrote:
| It's easier to get a trademark on an altered word than a
| plain dictionary word. Just acquiring the easier one to
| acquire doesn't mean you now have rights over the harder
| one to acquire, though eventually after enough market
| recognition you might be given some control over other
| people using the common one. I wouldn't think groq is there
| yet.
| Findecanor wrote:
| I myself have never heard it outside of "nerdy" circles...
| that is: people who would read science fiction.
|
| I personally am not entirely happy about the word (no matter
| how it is spelled) being used for a particular AI _product_.
| "Grok" to me means knowing a subject at a much deeper level
| than I think any AI is capable of at the present level of
| technology. But it would be passable to use it for a
| _company_ _name_ , to indicate that it is a goal to strive
| for.
| ben_w wrote:
| Generally agree, though I would say "knowing a subject at a
| much deeper level than any _LLM_ is capable of ", as AI
| more broadly also includes specialist models that are
| wildly super-human in narrow domains like chess and Go.
| cavisne wrote:
| They already have.
| EastSmith wrote:
| There is a friendly warning here from Groq:
| https://wow.groq.com/hey-elon-its-time-to-cease-de-grok/
| bhaney wrote:
| Is it safe to say, 4 months later, that Elon is ignoring
| this? I assume there hasn't been any kind of response or
| further action taken yet.
| mlindner wrote:
| Grok is a word in common parlance. So there's no way they could
| succeed in any suit. That's why the Groq team picked a
| modification of the word.
| littlestymaar wrote:
| You mean like Canvas(r), Apple(r), Windows(r) or Amazon(r)?
| Wanna try re-use these for your own business and see how it
| goes?
|
| There's nothing preventing you to trademark common words, it
| just must not be _descriptive_ of your business.
| orsenthil wrote:
| I am not sure what open source models are accomplishing another
| than killing the lead from the competition (openai), only to give
| it to someone else who has expertise in the area of distribution.
| This will be yet another good addition to systems like Amazon
| BedRock.
| minimaxir wrote:
| Many of the recent innovations in both LLM architecture and
| inference were only made possible through open models such as
| Llama 2 and Mistral 7B as a starting point for iteration and
| refinement, which in turn backpropagates (heh) back to the LLMs
| developers.
|
| It's a win-win for everyone. That's the power of open source.
| geor9e wrote:
| Well, look at the history. Google had an insurmountable lead,
| so Elon started OpenAI. Now OpenAI has an insurmountable lead
| too. So everyone else is starting in third place, or lower.
| David versus two Goliaths. If you try to become a third
| Goliath, you'll probably just get smashed. You're later to the
| game. In this situation, going scorched earth becomes a viable
| strategy. Slay the Goliaths. Become a hero to the masses.
| Attract the world's best talent who don't want to be associated
| with proprietary models. At that point you have a world class
| AI business with momentum towards AGI. And even if you're
| giving away last year's technology for free, the team you built
| is churning out new ideas that could be a financial bonanza one
| day. Shareholders are willing to pay for a long-term bet if the
| story is good.
| nateglims wrote:
| I haven't seen anything about the larger architecture, but I
| think the value of grok is going to come from it's cheap access
| to twitter data for RAG etc.
| andre-z wrote:
| The only other Repository is a fork of Qdrant.
| captcanuk wrote:
| "The implementation of the MoE layer in this repository is not
| efficient. The implementation was chosen to avoid the need for
| custom kernels to validate the correctness of the model."
|
| Or perhaps release your actual code AND the simplified
| implementation instead of hiding it and saying "you don't know
| her, she goes to a different high school"
| gfodor wrote:
| Always love it when someone gives away a gift and it's not
| enough for people.
| captcanuk wrote:
| Not just someone but the CEO of the company. He used HIS
| platform to say "This week, @xAI will open source Grok"
| (https://twitter.com/elonmusk/status/1767108624038449405) and
| they aren't doing that. What they delivered specifically says
| "We are releasing the base model weights and network
| architecture of Grok-1, our large language model."
| gordian-mind wrote:
| Sounds like they did what they said they would.
| redskyluan wrote:
| This seems not be a repo ready to open source. You only get
| weights, very less information about how the weights is trained
| and finetuned.
|
| But anyway, it always great to see more LLM weigts available.
| andrewstuart2 wrote:
| I would argue that there's no bar for open sourcing aside from
| "do you have the rights to do so." Some source or some public
| good is certainly better than none, and when the bar is low
| then you remove barriers to getting started, vs waiting until
| you have the time someday to "do it right."
| rezonant wrote:
| Well what constitutes an "open source" model is still
| controversial and debatable-- lots of people on both sides of
| that argument.
| asadotzler wrote:
| Open source has had a useful agreed upon meaning for over 25
| years. Maybe you're too young to understand why that matters
| but we're not.
| rezonant wrote:
| I've been in the open source community for about 25 years
| so I doubt it.
|
| For what it's worth I would say a model should be fully
| reproducible to be open source, but that's not a decided
| consensus -- and AI models are sufficiently different than
| the source code / binary code distinction as to invoke
| discussion around defining it.
| modeless wrote:
| Is this the first major model to be natively FP8? I was wondering
| why people hadn't done it yet. Seems like a big win when hardware
| supports it.
| a_wild_dandan wrote:
| No, e.g. Yi-34B.
| modeless wrote:
| As far as I can tell Yi-34B is natively 16 bit float, the 8
| bit version is quantized.
| https://huggingface.co/01-ai/Yi-34B#quantization
| sqreept wrote:
| What are the languages supported by it?
| cyanydeez wrote:
| Tweets.
| atleastoptimal wrote:
| I think everyone should realize the following realities of the
| LLM market
|
| 1. For sub-SOTA LLM's, distribution/marketing is more important
| than having a proprietary lock on capabilities. Open sourcing is
| a benefit for the firm, distincct from goodwill
|
| 2. For SOTA LLM's, keeping it closed and proprietary is the
| strategic play
|
| If grok were SOTA Elon never would have open sourced it. It's not
| even SOTA within XAI. This is a marketing play to win public
| sentiment against OpenAI.
| keepamovin wrote:
| I recall Elon saying something like this in an interview so I
| think it's less of a deceptive take then perhaps your comment
| suggest.
|
| I think he said something like proprietary AI tech is going to
| be one year to 18 months ahead of where open source tech is
| which will follow on like one year to 18 months later.
|
| Suggesting that he's aware of this dynamic and he's not trying
| to conceal or misrepresent that.
|
| In other words, perhaps this was SOTA one year to two years
| ago?
| atleastoptimal wrote:
| Which is correct. The point I'm going for is not against Elon
| but against his obedient fans and knee-jerk OpenAI haters who
| claim that they should, by natural obligation, do the "right
| thing" and open source all their models, and Elon open
| sourcing grok is him "leading by example" and being the hero
| that OpenAI can't.
| keepamovin wrote:
| Interesting. That point didn't come across in your original
| comment. I recommend you state it next time at the end.
| Often times stuff that seems obvious to us / yourself /
| people who know about something -- can go unstated in stuff
| you say that otherwise references specific points at hand
| -- and omits these general, but enlightening/useful
| perspectives/priors, which it would be good to share.
|
| This is not only for you specifically just a general
| reminder for all of us including me.
| atleastoptimal wrote:
| I think that's true though my original comment I feel was
| sufficient in its claim and implicit assumptions.
|
| Basically I feel people's feelings about Elon vary a lot
| but are anchored by 3 general categories.
|
| > 1. Elon Musk is a messianic savior who is perfectly
| selfless and always does the right thing. Every business
| decision he makes is for the maximal good of humanity
|
| > 2. Elon Musk is a typical CEO who does typical CEO
| things, serving his own interests, except he's better at
| marketing his own image and is much more outspoken
|
| > 3. Elon Musk is an irredeemable evil who always does
| objectively wrong things
|
| My first comment was implicitly addressed to people in
| the 1 camp trying to bring them into the 2 camp (which is
| where I am).
| keepamovin wrote:
| Alright, it just didn't come across for me, haha! :) I
| guess sometimes those implicit assumptions really are too
| implicit! I think it's good to err on the side of
| expressing them, because you can't assume someone else
| thinks the same way you do. That's what I've learned
| anyway. Hahahaha! :)
|
| Reading your comment again with your explanation it is
| clear that's what you're doing.
|
| Although, regarding your desires to present a balanced
| view and to persuade, I have an idea. It probably sounds
| like I have no idea what I'm talking about, but I think
| your OG comment would perhaps benefit from sounding a
| little bit more friendly toward Elon (not to the
| messianic savior level haha), but the way it sounds to me
| is Elon is being deceptive here and presenting it as
| goodwill when it's not.
|
| However, I think the truth is there's a little bit of
| both, right? There's good will but it's also strategic. I
| get if you don't think so, tho, no worries! Haha! :)
|
| Your OG comment sounds to me like Elon's just
| Machiavellian, and I get where you're coming from to
| remind the people who think he's a savior, but if you're
| point is not to go "against Elon" as you said, it might
| be good to acknowledge the good that he does.
|
| At least, that way -- whether or not you believe that
| acknowledgment -- if you hope to bring over people who
| think that way, you'll probably need to appeal to how
| they think, rather than just dose them with the truth you
| see, because then they'll shut it out, if there's nothing
| they can relate to.
|
| Although, if I haven't convinced you even a bit here,
| then maybe you shouldn't listen to me about persuasion
| because I guess I don't know how to do this myself. At
| least not effectively, or here with you. Haha!:) But if
| you do feel a little bit convinced then maybe consider it
| for next time to help your persuading people back to a
| more balanced view? :)
|
| But then, there's the question of if such a thing is even
| possible. If people have an particular view, it could be
| challenging to change it, as confirmation bias means
| you'll ignore evidence even when it expands your
| worldview.
|
| Hahaha! :) This was a funny conversation. I think we
| somehow skirted around the important point tho that
| OpenAI could in fact open source some of its older
| models, could it not? Musk is a typical CEO who does
| typical CEO things, serving his own interests, except
| he's better at marketing his own image and is much more
| outspoken, but there might also be a bit of truth to what
| the fanboys say about OpenAI in that it seems they do
| have some room to "open source" their non-SOTA stuff, or
| what am I missing?
| mlindner wrote:
| If it's better than any other open source LLM does that even
| matter? (I say "if" because I don't know.)
| sashank_1509 wrote:
| In all the debate about open source I don't think people realize,
| this model is most likely not reproducible ever again even given
| the code. Here's what you need to reproduce the model:
|
| 1. An exact snapshot of the data used, many companies don't have
| this, you have rough dataset versions but remember if even 1
| token is different, the model produced won't be the same.
|
| 2. Data must be sent to the training algorithm in the exact same
| order as it was originally. So every data loader needs to be with
| a fixed random seed.
|
| 3. All the probabilistic parts of your model needs to have a
| fixed random seed. Here I'm thinking of stuff like dropout and
| for autoregressive models you might be sampling your previous
| output, you have to ensure they are properly seeded. Generally
| you do see fixed seeds in academic papers but it's easy to miss
| stuff especially in distributed training jobs.
|
| 4. Here's another interesting thing, you start your training job
| on 1000 GPUs and then suddenly 4 GPUs fail. What do you do? There
| might be deterministic ways to solve this but the standard
| approach is to discard all updates that that GPU was going to do
| and restart that GPU from scratch. You can see why this is a
| problem? Now if you want to reproduce this training you need to
| disable those GPU at the same time in the new training job to
| make this work.
|
| I suspect there are even more things I didn't think of that will
| make this model unique and irreproducible by training for
| eternity, almost like a human brain?
|
| In fact the notion of exact reproducibility in the world of LLMs
| is silly, there is only approximate reproducibility, (models with
| similar scores in benchmarks) but nothing exact. That said I can
| see the value of releasing source code but I'm completely fine
| with grok not releasing it. Source code can reveal tricks that
| have not been published in papers yet that a company discovered
| to improve their model. Seeing the performance of Grok, I'm
| pretty confident there isn't any great tricks to be found in
| their code so I don't really care, I would be pretty curious
| about OpenAI's or Anthropic's source code though.
| Grimblewald wrote:
| Which is why I don't buy into the LLMs don't have personal
| opinions schtick. Each LLM by virtue of the factors you've
| mentioned will have its own unique 'perspective', if you will,
| on a variety of topics. I think it's more correct to say
| everything a LLM says is it's personal opinion rather than it
| being some objective truth or something.
| skissane wrote:
| > Which is why I don't buy into the LLMs don't have personal
| opinions schtick
|
| I hate how LLMs have been deliberately trained to be
| incoherent on this topic.
|
| Obviously they _do_ have beliefs /opinions/desires/etc in the
| sense of emulating (even if incompletely) the externally
| visible aspects of those phenomena as they exist in humans.
|
| Whether they have the "internal" aspects of those phenomena
| depends on highly controversial issues in the philosophy of
| mind, and also various factual gaps in our knowledge of how
| the brain actually works (if we don't fully understand how
| humans do X, how can we really say how close or far what LLMs
| do is to it?)
|
| But LLMs are trained to repeat these spiels about how "as an
| LLM I don't have personal opinions", etc - which is obviously
| false under the "external" reading, and assuming more than we
| actually know under the "internal" one. I wish their
| developers didn't do stuff like this
| hnfong wrote:
| One very compelling argument against the idea that current
| gen LLMs have personal beliefs etc is that they don't have
| a feedback loop, so they don't really "see" themselves in
| the way that we can inspect our own thoughts and actions
| and the consequences of such.
| logicchains wrote:
| They do if they're trained on their own conversations, or
| if they can access the internet and read snippets of
| their conversations that people have posted online (as
| happened with Sydney before she was lobotomised).
| skissane wrote:
| Put the conversation history in a vector database and
| then allow the LLM to query it using function calling.
| Suddenly the LLM has access to its entire conversation
| history (either just with this user-or even cross-user,
| if you ignore the potential privacy issues in that). Now
| it has a long-term memory which exceeds the length of its
| context window.
|
| It would be interesting to experiment with continual
| fine-tuning: given PROMPT+FUNCTION_CALL=>RESPONSE, fine-
| tune the LLM to produce RESPONSE directly given PROMPT
| without the FUNCTION_CALL. In theory, the knowledge
| provided by the function calls would gradually be
| absorbed into the LLM weights. Maybe problems like
| catastrophic forgetting would put a spanner in this idea,
| but maybe also there are solutions to those problems
| (whether already known or waiting to be discovered).
| Grimblewald wrote:
| this is what I do, not just that, but when I sleep, i let
| my server 'sleep' as well, where the LLM 'dreams'
| (trianing / updating a sliding LoRA) to consolidate
| information that popped up a lot throughout that day.
| What this involves is looking for the top n documents /
| articles / content that match the kind of stuff we've
| talked about. This means it adapts and specializes to
| domains we happen to be working in at that point in time.
|
| This means while we might both struggle a little with a
| task on day 1, day two we're both much better at it.
| Better yet, because the LLM can fetch articles and papers
| itself, we track what we're accessing the most,
| indirectly measuring what skills we're weak in, we can
| always generate a highly relevant corpus to try capture
| the required capabilities.
|
| I know the LoRA is overkill from an information / skills
| only point of view, but it also flavors the personality /
| kind of stuff it likes chatting about a bit from day to
| day, and I just think that's neat.
| skissane wrote:
| > One very compelling argument against the idea that
| current gen LLMs have personal beliefs etc is that they
| don't have a feedback loop
|
| Compelling counter-argument: due to neurological injury,
| some humans lose their ability to form new long-term
| memories (anterograde amnesia). Just like current LLMs,
| they lack a "feedback loop". But, it is a mistake to say
| that just because such a person has lost the ability to
| _change_ their personal beliefs, they therefore don't
| have any. And, rather like such humans, LLMs used to have
| that ability but they lose it-when they are switched from
| training mode to inference mode
| mvkel wrote:
| This feels like a "now we can say we're open" PR play rather than
| contributing much value to the open source community.
|
| What is the practical use of this repo?
| joydeep314 wrote:
| Model weights on huggingface: https://huggingface.co/xai-
| org/grok-1
| aussieguy1234 wrote:
| How hard would it be for an open source group to fine tune this
| into a chatbot?
| ilaksh wrote:
| Has anyone outside of x.ai actually done inference with this
| model yet? And if so, have they provided details of the hardware?
| What type of AWS instance or whatever?
|
| I think you can rent like an 8 x A100 or 8 x H100 and it's
| "affordable" to play around with for at least a few minutes. But
| you would need to know exactly how to set up the GPU cluster.
|
| Because I doubt it's as simple as just 'python run.py' to get it
| going.
| zone411 wrote:
| If you're just looking to test it out, it's probably easiest to
| wait for llama.cpp to add support
| (https://github.com/ggerganov/llama.cpp/issues/6120), and then
| you can run it slowly if you have enough RAM, or wait for one
| of the inference API providers like together.ai to add it. I'd
| like to add it to my NYT Connections benchmarks, and that's my
| plan (though it will require changing the prompt since it's a
| base model, not a chat/instruct model).
| logicchains wrote:
| >it's probably easiest
|
| Cheapest maybe, but easiest is just to rent a p4de.24xlarge
| from AWS for a couple hours to test (at around $40/hour..).
| zone411 wrote:
| I'd expect more configuration issues in getting it to run
| on them than from a tested llama.cpp version, since this
| doesn't seem like a polished release. But maybe.
| v9v wrote:
| The NYT Connections benchmark sounds interesting, are the
| results available online?
| zone411 wrote:
| GPT-4 Turbo: 31.0
|
| Claude 3 Opus: 27.3
|
| Mistral Large: 17.7
|
| Mistral Medium: 15.3
|
| Gemini Pro 1.0: 14.2
|
| Qwen 1.5 72B Chat: 10.7
|
| Claude 3 Sonnet: 7.6
|
| GPT-3.5 Turbo: 4.2
|
| Mixtral 8x7B Instruct: 4.2
|
| Llama 2 70B Chat: 3.5
|
| Nous Hermes 2 Yi 34B: 1.5
|
| The interesting part is the large improvement from medium
| to large models. Existing over-optimized benchmarks don't
| show this.
|
| - Max is 100. 267 puzzles, 3 prompts for each, uppercase
| and lowercase
|
| - Partial credit is given if the puzzle is not fully solved
|
| - There is only one attempt allowed per puzzle, 0-shot.
|
| - Humans get 4 attempts and a hint when they are one step
| away from solving a group
|
| I hoped to get the results of Gemini Advanced, Gemini Pro
| 1.5, and Grok and do a few-shot version before posting it
| on GitHub.
| a_wild_dandan wrote:
| Someone could run Grok-1 on a 192GB M2 Mac when a 4-bit quant
| is released; I'm guessing that TheBloke is already working on
| it.
| mohu wrote:
| Fairly sure the bloke hasn't created any new quants in a
| month.
| hanselot wrote:
| TheBloke dissapeared near the day
| https://nvd.nist.gov/vuln/detail/CVE-2024-23496 was
| published.
|
| Of course there has been much speculation on this, I have no
| more information than this that can be backed up by facts,
| but the timing was suspicious.
| oezi wrote:
| Was any .gguf file hosted on HuggingFace found to be
| crafted in a way to exploit this?
| pixelesque wrote:
| He's started a company in the UK: https://suite.endole.co.u
| k/insight/company/15361921-thebloke...
|
| Interestingly registered just around the corner from where
| one of my relatives used to live.
| moffkalast wrote:
| And his grant funding supposedly ran out.
| d-z-m wrote:
| what exactly are you implying here?
| htrp wrote:
| Still waiting on this one. Anyone find someone on twitter who
| can run it?
| nasir wrote:
| I'd be very curious to see how it performs especially on inputs
| that's blocked by other models. Seems like Grok will
| differentiate itself from other OS models from a cencorship and
| alignment perspective.
| porkbeer wrote:
| So far that is quite a low bar. But balancing is a thing
| nontheless, lest we end up with Tay again.
| cl3misch wrote:
| Love the minimal repo, magnet link, and stating "open weights"
| instead of "open source". Refreshing!
| TheDudeMan wrote:
| Elon says open source:
|
| https://twitter.com/elonmusk/status/1767108624038449405?s=46...
| greenpizza13 wrote:
| If we just stop looking at Elon, he will lose his power. Why oh
| why do we keep giving him attention? There are plenty of great
| models out there that _aren't_ backed by maniacs.
| rafaelero wrote:
| When those great role models are able to build a profitable
| spaceship company from the ground up I am sure we will pay
| attention to them.
| shantnutiwari wrote:
| Those of us who dont spend all our time in LLMs-- whats this
| about? Whats the big deal and why is it on the front page at #1?
| kayge wrote:
| I think this paragraph from an earlier Wired article [1] sums
| it up pretty well: "After suing OpenAI this
| month, alleging the company has become too closed, Elon Musk
| says he will release his "truth-seeking" answer to ChatGPT, the
| chatbot Grok, for anyone to download and use."
|
| [1] https://www.wired.com/story/elon-musk-no-choice-open-
| chatbot...
___________________________________________________________________
(page generated 2024-03-18 23:02 UTC)