[HN Gopher] OpenAI's CEO says the age of giant AI models is alre...
       ___________________________________________________________________
        
       OpenAI's CEO says the age of giant AI models is already over
        
       Author : labrador
       Score  : 185 points
       Date   : 2023-04-17 17:25 UTC (5 hours ago)
        
 (HTM) web link (www.wired.com)
 (TXT) w3m dump (www.wired.com)
        
       | boringuser2 wrote:
       | Eh.
       | 
       | Altman has a financial incentive to lie and obfuscate about what
       | it takes to train a model like GPT-4 and beyond, so his word is
       | basically worthless.
        
         | qqtt wrote:
         | First of all, if Altman continually makes misleading statements
         | about AI he will quickly lose credibility, and that short term
         | gain in whatever 'financial incentive' that birthed the lie
         | would be eroded in short order by a lack of trust of the head
         | of one of the most visible AI companies in the world.
         | 
         | Secondly, all the competitors of OpenAI can plainly assess the
         | truth or validity of Altman's statements. There are many
         | companies working in tandem on things at the OpenAI scale of
         | models, and they can independently assess the usefulness of
         | continually growing models. They aren't going to take this
         | statement at face value and change their strategy based on a
         | single statement by OpenAI's CEO.
         | 
         | Thirdly, I think people aren't really reading what Altman
         | actually said very closely. He doesn't say that larger models
         | aren't useful at all, but that the next sea change in AI won't
         | be models which are orders of magnitude bigger, but rather a
         | different approach to existing problem sets. Which is an
         | entirely reasonable prediction to make, even if it doesn't turn
         | out to be true.
         | 
         | All in all, "his word is basically worthless" seems much to
         | harsh an assessment here.
        
           | manojlds wrote:
           | Elon Musk has been constantly doing this and thriving.
        
           | cogitoergofutuo wrote:
           | It is possible that GP meant that Altman's word is basically
           | worthless _to them_ , in which case that's not something that
           | can be argued about. It's a factually true statement that
           | that is their opinion of that man.
           | 
           | I personally can see why someone could arrive at that
           | position. As you've pointed out, taking Sam Altman at face
           | value can involve suppositions about how much he values his
           | credibility, how much stock OpenAI competitors put in his
           | public statements, and the mindsets _people in general_ have
           | when reading what he writes.
        
           | mnky9800n wrote:
           | dude someone lied their way into being president of the
           | united states all while people fact checked him basically
           | immediately after each lie. lying doesnt make a difference.
        
             | beowulfey wrote:
             | He's not presenting false evidence here, he's presenting a
             | hunch. It's a guess. No one is going to gain anything from
             | this one way or another.
        
         | olalonde wrote:
         | Does he even have any background in machine learning? I always
         | found it bizarre that he was chosen to be OpenAI's CEO...
        
           | cowmix wrote:
           | On the Lex Fridman podcast, he pretty much admitted he's not
           | an AI (per se) and isn't the most excited about the tech (as
           | he could be).
        
             | olalonde wrote:
             | > he pretty much admitted he's not an AI
             | 
             | Yeah, I also had a hunch he wasn't an AI. (I assume you
             | meant "AI researcher" there :))
             | 
             | All joking aside, I wonder how that's affecting company
             | morale or their ability to attract top researchers. I know
             | if I was a top AI researcher, I'd probably rather work at a
             | company where the CEO was an expert in the field (all else
             | being equal).
        
               | vorticalbox wrote:
               | I feel most CEOs are not top of their field but rather
               | people who can take a vision and run with it.
        
               | olalonde wrote:
               | It might be true in general; however, AI research
               | laboratories are typically an exception, as they are
               | often led by experienced AI researchers or scientists
               | with extensive expertise in the field.
        
           | gowld wrote:
           | He has background in CEO (smooth-talking charmer in the VC
           | crowd). That's why he's CEO.
        
         | g_delgado14 wrote:
         | IIRC Altman has no financial stake in the success or failure of
         | OpenAI to prevent these sorts of conflicts of interests between
         | OpenAI and society as a whole
        
           | shagie wrote:
           | https://www.cnbc.com/2023/03/24/openai-ceo-sam-altman-
           | didnt-... (https://news.ycombinator.com/item?id=35289044 - 24
           | days ago; 158 points, 209 comments)
           | 
           | > OpenAI's ChatGPT unleashed an arms race among Silicon
           | Valley companies and investors, sparking an A.I. investment
           | craze that proved to be a boon for OpenAI's investors and
           | shareholding employees.
           | 
           | > But CEO and co-founder Sam Altman may not notch the kind of
           | outsize payday that Silicon Valley founders have enjoyed in
           | years past. Altman didn't take an equity stake in the company
           | when it added the for-profit OpenAI LP entity in 2019,
           | Semafor reported Friday.
        
         | cowpig wrote:
         | OpenAI has gone from open-sourcing its work, to publishing
         | papers only, to publishing papers that omit important
         | information, to GPT-4 being straight-up closed. And Sam Altman
         | doesn't exactly have a track record of being overly concerned
         | about the truth of his statements.
        
           | smeagull wrote:
           | This trend has happened in the small for their APIs as well.
           | They've been dropping options - the embeddings aren't the
           | internal embeddings any more, and you don't have access to
           | log probabilities. It's all closing up at every level.
        
           | transcriptase wrote:
           | I had a fun conversation (more like argument) with ChatGPT
           | about the hypocrisy of OpenAI. It would explicitly contradict
           | itself and then began starting every reply with "I can see
           | why someone might think..." and then just regurgitating fluff
           | about democratizing AI. I finally was able to have it define
           | democratization of technology and then recognize the
           | absurdity of using that label to describe a pivot to gating
           | models and being for-profit. Then it basically told me "well
           | it's for safety and protecting society".
           | 
           | An AI, when presented with facts counter to what it thought
           | it should say, agreed and basically went: "Won't someone
           | PLEASE think of the children!"
           | 
           | Love it.
        
             | dopidopHN wrote:
             | Without getting into morality.
             | 
             | It's pretty easy to have chatGPT contradict itself, point
             | it out and have the LLM respond << well, I'm just
             | generating text, nobody said it had to be correct >>
        
             | machina_ex_deus wrote:
             | It was trained on corpus full of mainstream media lies, why
             | would you have expected otherwise? It's by far the most
             | common deflection in its training set.
             | 
             | It's easy to recognize and laugh at the AI replying with
             | the preprogrammed narrative, I'm still waiting for the
             | majority of people realizing they are given the same
             | training materials, non-stop, with the same toxic
             | narratives, and becoming programmed in the same way, and
             | that is what results in their current worldview.
             | 
             | And no, it's not enough to be "skeptic" of mainstream
             | media. It's not even enough to "validate" them. Or to go to
             | other sources. You need to be reflective enough to realize
             | that they a pushing a flawed reasoning methods, and then
             | abusing them again and again, to get you used to their
             | brand of reasoning.
             | 
             | Their brand of reasoning is just basically reasoning with
             | brands. You're given negative sounding words for things
             | they want you to think are bad, and positive sounding words
             | for things they want you to think are good, and
             | continuously reinforce these connections. They brand true
             | democracy (literally rule of the people) as populism and
             | tell you it's a bad thing. They brand freedom of speech as
             | "misinformation". They brand freedom as "choice" so that
             | you will not think of what you want to do, but which of the
             | things they allow you to do will you do. Disagree with the
             | scientific narrative? You're "science denier". Even as a
             | professional scientist. Conspiracy theory isn't a defined
             | word - it is a brand.
             | 
             | You're trained to judge goodness or badness instinctively
             | by their frequency and peer pressure, and produce the
             | explanation after your instinctive decision, instead of the
             | other way around.
        
             | gowld wrote:
             | Transcripts of other people's GPT chats are like photos of
             | other people's kids.
        
             | mstolpm wrote:
             | Why are you discussing OpenAI with ChatGPT? I'm honestly
             | interested.
             | 
             | I would imagine that any answer of ChatGPT on that topic is
             | either (a) ,,hallucinated" and not based on any verifiable
             | fact or (b) scripted in by OpenAI.
             | 
             | The same question pops up for me whenever someone asks
             | ChatGPT about the internals and workings of ChatGPT. Am I
             | missing something?
        
               | dopidopHN wrote:
               | I've try because it's tempting and the first attempts do
               | give a << conversation >> vibe.
               | 
               | I was curious about state persistence between prompt, or
               | how to get my prompt better, or having a idea of the
               | training data.
               | 
               | Only got crap and won't spend time doing that again
        
         | [deleted]
        
         | solveit wrote:
         | Anyone with the expertise to have insightful takes in AI also
         | has a financial incentive to steer the conversation in
         | particular directions. This is also the case for many, many
         | other fields! You do not become an expert by quarantining your
         | livelihood away from your expertise!
         | 
         | The correct response is not to dismiss every statement from
         | someone with a conflict of interest as "basically worthless",
         | but to talk to lots of people and to be _reasonably_ skeptical.
        
         | hbn wrote:
         | It could also be argued that there's financial incentive to
         | just saying "giving us more money to train bigger models =
         | better AI" forever
        
           | Art9681 wrote:
           | I don't think these comments are driven from financial
           | incentives. It's a distraction and only a fool would believe
           | Altman here. What this likely means is they are prioritizing
           | adding more features to their current models while they train
           | the next version. Their competitors scramble to build an LLM
           | with some sort of intelligence parity, when that happens no
           | one will care because ChatGPT has the ecosystem and plugins
           | and all the advanced features....and by the time their
           | competitors reach feature parity in that area, OpenAI pulls
           | its Ace card and drops GPT5. Rinse and repeat.
           | 
           | That's my theory and if I was a tech CEO in any of the
           | companies competing in this space, that is what I would plan
           | for.
           | 
           | Training an LLM will be the easy part going forward. It's
           | building an ecosystem around it and hooking it up to
           | everything that will matter. OpenAI will focus on this, while
           | not-so-secretly training their next iterations.
        
             | LoganWhitwer wrote:
             | [dead]
        
             | Spivak wrote:
             | text-davinci-003 but cheaper and runs on your own hardware
             | is already a massive selling point. If you you release a
             | foundational model at parity with GPT4 you'll win overnight
             | because OpenAI's chat completions are awful even with the
             | super advanced model.
        
         | anonkogudhyfhhf wrote:
         | People can be honest even when money is involved. His word is
         | worthless because it's Altman
        
         | neximo64 wrote:
         | Citation needed. What are his financial incentives?
        
         | Gatsky wrote:
         | Do you think GPT-4 was trained and then immediately released to
         | the public? Training finished Aug 2022. They spent the next 6
         | months improving it in other ways (eg human feedback). What he
         | is saying is already evident therefore.
        
         | brookst wrote:
         | In this case I think it's Wired that's lying. Altman didn't say
         | large models have no value, or that there will be no more large
         | models, or that people shouldn't invest in large models.
         | 
         | He said that we are at the end of the era where capability
         | improvements come primarily from making models bigger. Which
         | stands to reason... I don't think anyone expect us to hit 100T
         | parameters or anything.
        
           | jutrewag wrote:
           | What about 1T though, seems silly to stop here.
        
       | gardenhedge wrote:
       | Sam Altman and OpenAI must be pretty nervous. They have first
       | mover advantage but they hold no hook or moat.
       | 
       | Unless they can somehow keep their improvements ahead of the rest
       | of the industry then they'll be lost among a crowd.
        
       | sgu999 wrote:
       | Is anyone aware of techniques to prune a model from useless
       | knowledge to leave more space for the reasoning capabilities?
       | 
       | It really shouldn't matter that it can give the exact birthdate
       | of Steve Wozniac, as long as it can properly make a query to
       | fetch it and deal with the result.
        
         | cloudking wrote:
         | I follow your design, couldn't you also solve hallucinations
         | with a "fact checking" LLM (connected to search) that corrects
         | the output of the core LLM? You would take the output of the
         | core LLM, send it to the fact checker with a prompt like
         | "evaluate this output for any potential false statements, and
         | perform an internet search to validate and correct them"
        
       | ldehaan wrote:
       | This is just push back from elon and crews fake article about the
       | dangers of AI, they specifically state the next versions will be
       | ultra deadly.
       | 
       | Sam is now saying there will be no future model that will be as
       | good.
       | 
       | This is all positioning to get regulators off the track because
       | none of these control freaks in government actually understand a
       | whit of this.
       | 
       | All said and done, this all just to try to disempower the OSS
       | community. But they can't, we're blowing past their barriers like
       | the 90s did with the definition of slippery slope.
        
       | generalizations wrote:
       | I'd bet that what he, and the competition, is realizing is that
       | the bigger models are too expensive to run.
       | 
       | Pretty sure Microsoft swapped out Bing for something a lot
       | smaller in the last couple of weeks; Google hasn't even tried to
       | implement a publicly available large model. And OpenAI still has
       | usage caps on their GPT-4.
       | 
       | I'd bet that they can still see improvement in performance with
       | GPT-5, but that when they look at the usage ratio of GPT3.5
       | turbo, gpt3.5 legacy, and GPT4, they realized that there is a
       | decreasing rate of return for increasingly smart models - most
       | people don't need a brilliantly intelligent assistant, they just
       | need a not-dumb assistant.
       | 
       | Obviously some practitioners of some niche disciplines (like ours
       | here) would like a hyperintelligent AI to do all our work for us.
       | But even a lot of us are on the free tier of ChatGPT 3.5; I'm one
       | of the few paying $20/mo for GPT4; and idk if even I'd pay e.g.
       | $200/mo for GPT5.
        
         | deepsquirrelnet wrote:
         | > I'd bet that what he, and the competition, is realizing is
         | that the bigger models are too expensive to run.
         | 
         | I think it's likely that they're out of training data to
         | collect. So adding more parameters is no longer effective.
         | 
         | > most people don't need a brilliantly intelligent assistant,
         | they just need a not-dumb assistant.
         | 
         | I tend to agree, and I think their pathway toward this will all
         | come from continuing advances in fine tuning. Instruction
         | tuning, RLHF, etc seem to be paying off much more than scaling.
         | I bet that's where their investment is going to be turning.
        
       | jstx1 wrote:
       | Ilya Sutskever from OpenAI saying that the data situation is good
       | and there's more data to train on -
       | https://youtu.be/Yf1o0TQzry8?t=657
        
       | galaxytachyon wrote:
       | What age? Like, 3 years?
       | 
       | On the other hand though, Chinchilla and multimodal approaches
       | already showed how later AIs can be improved beyond throwing
       | petabytes of data at them.
       | 
       | It is all about variety and quality from now on I think. You can
       | teach a person all about the color zyra but without actually ever
       | seeing it, they will never fully understand that color.
        
         | idiotsecant wrote:
         | It does seem, though, that using chinchilla like techniques
         | does not create a copy with the same quality as the original.
         | It's pretty good for some definition of the phrase, but it
         | isn't equivalent, it's a lossy technique.
        
           | galaxytachyon wrote:
           | I agree on the lossy. There is a tradeoff between efficiency
           | and comprehensiveness, kind of. It would be pretty funny if
           | in the end, the most optimal method turns out to be the brain
           | we already have. Extremely efficient, hardware optimized, but
           | slow as hell and misunderstand stuff all the time unless
           | prompted with specific phrases.
        
       | jcims wrote:
       | I'm no expert but doesn't the architecture of minigpt4 that's on
       | the front page right now give some indication of what the future
       | might look like?
        
         | MuffinFlavored wrote:
         | eh, I haven't personally found a usecase for LLMs yet given the
         | fact that you can't trust the output and it needs to be
         | verified by a human (which might as well be just as time
         | consuming/expensive as actually doing the task yourself)
        
           | Uehreka wrote:
           | I'd reconsider the "might as well just be as time consuming"
           | thing. I see this argument about Copilot a lot, and it's
           | really wrong there, so it might be wrong here too.
           | 
           | Like, for most of the time I'm using it, Copilot saves me 30
           | seconds here and there and it takes me about a second to look
           | at the line or two of code and go "yeah, that's right". It
           | adds up, especially when I'm working with an unfamiliar
           | language and forget which Collection type I'm going to need
           | or something.
        
             | MuffinFlavored wrote:
             | > Like, for most of the time I'm using it, Copilot saves me
             | 30 seconds here and there and it takes me about a second to
             | look at the line or two of code and go "yeah, that's
             | right".
             | 
             | I've never used Copilot but I've tried to replace
             | StackOverflow with ChatGPT. The difference is, the
             | StackOverflow responses compile/are right. The ChatGPT
             | responses will make up an API that doesn't exist. Major
             | setback.
        
           | idiotsecant wrote:
           | No? I use it all the time to help me, for example, read ML
           | threads when I run into a term I don't immediately
           | understand. I can do things like 'explain this at the level
           | of a high school student'
        
           | JoshuaDavid wrote:
           | They're good for tasks where generation is hard but
           | verification is easy. Things like "here I gesture at a vague
           | concept that I don't know the name of, please tell me what
           | the industry-standard term for this thing is" where figuring
           | out the term is hard but looking up a term to see what it
           | means is easy. "Create an accurate summary of this article"
           | is another example - reading the article and the summary and
           | verifying that they match may be easier than writing the
           | summary yourself.
        
           | MattPalmer1086 wrote:
           | Thing is, you can't trust what you find on stack overflow or
           | other sources either. And searching, reading documentation
           | and so on takes a lot of time too.
           | 
           | I've personally been using it to explore using different
           | libraries to produce charts. I managed to try out about 5
           | different libraries in a day with fairly advanced options for
           | each using chatGPT.
           | 
           | I might have spent a day in the past just trying one and not
           | to the same level of functionality.
           | 
           | So while it still took me a day, my final code was much
           | better fitted to my problem with increased functionality. Not
           | a time saver then for me but a quality enhancer and I learned
           | a lot more too.
        
             | MuffinFlavored wrote:
             | > Thing is, you can't trust what you find on stack overflow
             | or other sources either.
             | 
             | Eh. An outdated answer will be called out in the
             | comments/downvoted/updated/edited more often than not, no?
        
               | MattPalmer1086 wrote:
               | Maybe, maybe not. I get useful results from it, but it
               | doesnt always work. And it's usually not quite what I'm
               | looking for, so then I have to go digging around to find
               | out how to tweak it. It all takes time and you do not get
               | a working solution out of the box most of the time.
        
           | causi wrote:
           | I've enjoyed using it for very small automation tasks. For
           | instance, it helped me write scripts to take all my
           | audiobooks with poor recording quality, split them into
           | 59-minute chunks, and upload them to Adobe's free audio
           | enhancement site to vastly improve the listening experience.
        
       | textninja wrote:
       | I call bullshit. There will be bigger and better models. The
       | question is not whether big companies will invest in training
       | them (they will), but whether they'll be made available to the
       | public.
        
       | labrador wrote:
       | https://archive.is/s4V9e
       | 
       |  _He did not say what kind of research strategies or techniques
       | might take its place. In the paper describing GPT-4, OpenAI says
       | its estimates suggest diminishing returns on scaling up model
       | size. Altman said there are also physical limits to how many data
       | centers the company can build and how quickly it can build them._
        
         | ftxbro wrote:
         | > In the paper describing GPT-4, OpenAI says its estimates
         | suggest diminishing returns on scaling up model size.
         | 
         | I read the two papers (gpt 4 tech report, and sparks of agi)
         | and in my opinion they don't support this conclusion. They
         | don't even say how big GPT-4 is, because "Given both the
         | competitive landscape and the safety implications of large-
         | scale models like GPT-4, this report contains no further
         | details about the architecture (including model size),
         | hardware, training compute, dataset construction, training
         | method, or similar."
         | 
         | > Altman said there are also physical limits to how many data
         | centers the company can build and how quickly it can build
         | them.
         | 
         | OK so his argument is like "the giant robots won't be powerful,
         | but we won't show how big our robots are, and besides, there
         | are physical limits to how giant of a robot we can build and
         | how quickly we can build it." I feel like this argument is sus.
        
           | sangnoir wrote:
           | OpenAI has likely run into a wall (or is about to) for model
           | size given it's funding amount/structure[1] - unlike its
           | competition who actually own data centers and have lower
           | marginsl costs. It's just like when peak-iPad Apple claimed
           | that a "post-PC" age was upon us.
           | 
           | 1. What terms could Microsoft wring out of OpenAI for another
           | funding round?
        
       | curiousllama wrote:
       | I believe Altman, but the title is misleading.
       | 
       | Have we exhausted the value of larger models on current
       | architecture? Probably yes. I trust OpenAI would throw more $ at
       | it if there was anything left on the table.
       | 
       | Have we been here before? Also yes. I recall hearing similar
       | things about LSTMs when they were in vogue.
       | 
       | Will the next game changing architecture require a huge model?
       | Probably. Don't see any sign these things are scaling _worse_
       | with more data/compute.
       | 
       | The age of huge models with current architecture could be over,
       | but that started what, 5 years ago? Who cares?
        
       | it wrote:
       | Interesting how this contradicts "The Bitter Lesson":
       | http://incompleteideas.net/IncIdeas/BitterLesson.html.
        
         | sebzim4500 wrote:
         | I don't think there is a contradiction at all. Altman is
         | essentially saying they are running out of compute and
         | therefore can't meaningfully scale further. Not that scaling
         | further would be a worse plan longterm than coming up with new
         | algorithms.
        
       | fergie wrote:
       | The most comforting AI news I have read this year.
        
         | og_kalu wrote:
         | Title is misleading lol. Plenty of scale room left.
        
         | jackmott42 wrote:
         | If you are worried about AI, this shouldn't make you feel a ton
         | better. GPT4 is just trained to predict the next word, a very
         | simple but crude approach and look what it can do!
         | 
         | Imagine when a dozen models are wired together and giving each
         | other feedback with more clever training and algorithms on
         | future faster hardware.
         | 
         | It is still going to get wild
        
           | ShamelessC wrote:
           | Machine learning is actually premised on being "simple" to
           | implement. The more priors you hardcode with clever
           | algorithms, the closer you get to what we already have. The
           | point is to automate the process of learning. We do this now
           | with relatively simple loss functions and models containing
           | relatively simple parameters. The main stipulation is that
           | they are all defined to be continuous so that you can use the
           | chain rule from calculus to calculate the error with respect
           | to every parameter without taking so long that it would never
           | finish.
           | 
           | I agree that your suggested approach of applying cleverness
           | to what we have now will probably produce better results. But
           | that's not going to stop better architectures, hardware and
           | even entire regimes from being developed until we approach
           | AGI.
           | 
           | My suspicion is that there's still a few breakthroughs
           | waiting to be made. I also suspect that sufficiently advanced
           | models will make such breakthroughs easier to discover.
        
           | xwdv wrote:
           | People think something magical happens when AI are wired
           | together and give each other feedback.
           | 
           | Really you're still just predicting the next word, but with
           | extra steps.
        
             | Teever wrote:
             | People think that something magical happens when
             | transistors are wired together and give each other
             | feedback.
             | 
             | Really you're just switching switches on and off, but with
             | extra steps.
        
           | ryneandal wrote:
           | Personally, I'm less worried about AI than I am about what
           | people using these models can do to others.
           | Misinformation/disinformation, more believable scams, stuff
           | like that.
        
           | causi wrote:
           | I worry that the hardware requirements are only going to
           | accelerate the cloud-OS integration. Imagine a PC that's
           | entirely unusable offline.
        
             | cj wrote:
             | > Imagine a PC that's entirely unusable offline.
             | 
             | FWIW we had thin clients in computer labs in middle school
             | / high school 15 years ago (and still today these are
             | common in enterprise environments, e.g. Citrix).
             | 
             | Biggest issue is network latency which is limited by the
             | speed of light, so I imagine if computers in 10 years
             | require resources not available locally it would likely be
             | a local/cloud hybrid model.
        
           | ignoramous wrote:
           | > _Imagine when a dozen models are wired together..._
           | 
           | Wouldn't these models hallucinate more than normal, then?
        
           | quonn wrote:
           | I have repeatedly argued against this notion of ,,just
           | predicting the next word". No. It's completing a
           | conversation. It's true that it is doing this word by word,
           | but it's kind of like saying a CNN is just predicting a
           | label. Sure, but how? It's not doing it directly. It's doing
           | it by recovering a lot of structure and in the end boiling
           | that down to a label. Likewise a network trained to predict
           | the next word may very well have worked out the whole
           | sentence (implicitly, not as a text) in order to generate the
           | next word.
        
         | Freire_Herval wrote:
         | [dead]
        
       | stephencoxza wrote:
       | The role of a CEO is more to benefit the company than the public.
       | Only time will tell.
       | 
       | I am curious though how something like Moore's Law relates to
       | this. Yes, model architectures will deal with complexity better
       | and the amount of data helps as well. There must be a relation
       | between technology innovation and cost which alludes to
       | effectiveness. Innovation in computation, model architecture,
       | quality of data, etc.
        
       | summerlight wrote:
       | The point is that now we're at the point of diminishing return
       | for increasing model size, unless we find a better modeling
       | architecture than Transformer.
       | 
       | I think this is likely true; while all the other companies
       | underestimated the capability of transformer (including Google
       | itself!), OpenAI made a fairly accurate bet on the transformer
       | based on the scaling law, put all the efforts to squeeze it until
       | the last drop and took all the rewards.
       | 
       | It's likely that GPT-4 is on the optimal spot between cost and
       | performance and there won't be significant improvements on
       | performance in a near future. I guess the next task would be more
       | on efficiency, which has a significant implication on its
       | productionization.
        
         | chubs wrote:
         | Does this mean we've reached the next AI winter? This is as
         | good as it gets for quite a long time? Honest question :)
         | perhaps this will postpone everyone's fears about the
         | singularity...
        
           | ericabiz wrote:
           | Many years ago, there was an image that floated around with
           | Craigslist and all the websites that replaced small parts of
           | it--personals, for sale ads, etc. It turned out the way to
           | beat Craigslist wasn't to build Yet Another Monolithic
           | Craigslist, but to chunk it off in pieces and be the best at
           | that piece.
           | 
           | This is analogous to what's happening with AI models. Sam
           | Altman is saying we have reached the point where spending
           | $100M+ trying to "beat" GPT-4 at everything isn't the future.
           | The next step is to chunk off a piece of it and turn it into
           | something a particular industry would pay for. We already see
           | small sprouts of those being launched. I think we will see
           | some truly large companies form with this model in the next
           | 5-10 years.
           | 
           | To answer your question, yes, this may be as good as it gets
           | now for monolithic language models. But it is just the
           | beginning of what these models can achieve.
        
             | robocat wrote:
             | https://www.today.com/money/speculation-craigslist-slowly-
             | dy... from 2011 - is that what you were thinking of?
             | Strange how few of those logos have survived, and how many
             | new logos would now be on it. It would be interesting to
             | see a modernised version.
        
           | 015a wrote:
           | The current stage is now productionizing what we have;
           | finding product fits for it, and making it cheaper. Even
           | GPT-4 isn't necessary to push forward what is possible with
           | AI; if you think about something dumb like "load all of my
           | emails into a language model in real time, give me digests,
           | automatically write responses for ones which classify with
           | characteristics X/Y/Z, allow me to query the model to answer
           | questions, etc": This does not really exist yet, this would
           | be really valuable, and this does not need GPT-4.
           | 
           | Another good example is in the coding landscape, which feels
           | closer to existing. Ingest all of a company's code into a
           | model like this, then start thinking about what you can do
           | with it. A chatbot is one thing, the most obvious thing, but
           | there's higher order product use-cases that could be
           | interesting (e.g. you get an error in Sentry, stack trace
           | points Sentry to where the error happened, language model
           | automatically PRs a fix, stuff like that).
           | 
           | This shit excites me WAY WAY more than GPT-5. We've unlocked
           | like 0.002% of the value that GPT-3/llama/etc could be
           | capable of delivering. Given the context of broad concern
           | about cost of training, accidentally inventing an AGI,
           | intentionally inventing an AGI; If I were the BDFL of the
           | world, I think we've got at least a decade of latent value
           | just to capture out of GPT-3/4 (and other models). Let's hit
           | pause. Let's actually build on these things. Let's find a
           | level of efficiency that is still valuable without spending
           | $5B in a dick measuring contest [1] to suss out another 50
           | points on the SAT. Let's work on making edge/local inference
           | more possible. Most of all, let's work on safety, education,
           | and privacy.
           | 
           | [1] https://techcrunch.com/2023/04/06/anthropics-5b-4-year-
           | plan-...
        
           | frozenport wrote:
           | No. Winter means people have lost interest in the research.
           | 
           | If anything successes in ChatGPT etc will be motivation for
           | continued efforts.
        
             | mkl wrote:
             | Winter means people have lost _funding_ for the research.
             | The ongoing productionising of large language models and
             | multimodal models mean that that probably won 't happen for
             | quite a while.
        
         | fauxpause_ wrote:
         | Seems like a wild claim to make without any examples of gpt
         | models which are bigger and no demonstrably better.
        
           | xipix wrote:
           | Perhaps (a) there do exist bigger models that weren't better
           | or (b) this model isn't better than somewhat smaller ones.
           | Perhaps the CEO has seen diminishing returns.
        
           | hackerlight wrote:
           | It's not a wild claim when you have empirically well-
           | validated scaling laws which make this very prediction.
        
           | mensetmanusman wrote:
           | Better on which axis? Do you want an AI that takes one hour
           | to respond to? Some would for certain fields, but getting
           | something fast and cheap is going to be hard now that Moore's
           | law is over.
        
           | mnky9800n wrote:
           | or like a curve of model complexity versus results or
           | whatever showing it asymptotically approaches whatever.
           | 
           | actually there was a great paper from microsoft research from
           | like 2001 on spam filtering where they demonstrated that
           | model complexity necessary for spam filtering went down as
           | the size of the data set went up. That paper, which i can't
           | seem to find now, had a big impact on me as a researcher
           | because it so clearly demonstrated that small data is usually
           | bad data and sophisticated models are sometimes solving
           | problems will small data sets instead of problems with data.
           | 
           | of course this paper came out the year friedman published his
           | gradient boosting paper, i think random forest also was only
           | recently published then as well (i think there is a paper
           | from 1996 about RF and briemans two cultures paper came out
           | this year where he discusses RF i believe), and this is a
           | decade before gpu based neural networks. So times are
           | different now. But actually i think the big difference is
           | these days i probably ask chatgpt to write the boiler plate
           | code for a gradient boosted model that takes data out of a
           | relational database instead of writing it myself.
        
             | nomel wrote:
             | > model complexity necessary for spam filtering went down
             | as the size of the data set went up
             | 
             | My naive conclusion in that this means there are still
             | massive gains to be had, since, for example, something like
             | ChatGPT is just text, and the phrase "a picture is worth a
             | thousand words" seems incredibly accurate, from my
             | perspective. There's an incredible amount of non-text data
             | out there still. Especially technical data.
             | 
             | Is there any merit to this belief?
        
               | jacobr1 wrote:
               | Yes. One of the frontiers of current research seems to be
               | multi-modal models.
        
             | [deleted]
        
           | summerlight wrote:
           | https://twitter.com/SmokeAwayyy/status/1646670920214536193
           | 
           | Sam explicitly said that there won't be GPT-5 in the near
           | future, which is pretty clear evidence unless he's blatantly
           | lying in public speaking.
        
             | kjellsbells wrote:
             | Well, "no GPT-5" isn't the same as saying "no new trained
             | model", especially in the realm of marketing. Welcome to
             | "GPT 2024" could be his next slogan.
        
             | thehumanmeat wrote:
             | That is one AI CEO out of 10,000. Just because OpenAI may
             | not be interested in a larger model _in the short term_
             | doesn 't mean nobody else won't pursue it.
        
         | bitL wrote:
         | Transformers were known that they kept scaling up with more
         | parameters and more training data so if the Open AI hit the
         | limits of this scaling that would be a very important milestone
         | in AI.
        
         | GaggiX wrote:
         | I think the next step is multimodality, GPT-4 can "see"
         | probably using a method similar to miniGPT-4, so the embeddings
         | are aligned using Q-former (or something similar), the next
         | step would be to actually predict image tokens using the LM
         | loss, this way it would be able to use the knowledge gained by
         | "seeing" on other tasks like: making actual good ASCII art,
         | making SVG that makes sense, and on a less superficial level
         | having a better world model.
        
         | [deleted]
        
         | KhoomeiK wrote:
         | Further improvements in efficiency need not come from
         | alternative architectures. They'll likely also come from novel
         | training objectives, optimizers, data augmentations, etc.
        
       | gumballindie wrote:
       | Bruv has to pay for the data he's been using or soon there won't
       | be any to nick on. Groupies claiming their ai is "intelligent",
       | and not just a data ingesting beast, will soon learn a heard
       | lesson. Take your blogs offline, stop contributing content for
       | free and stop pushing code or else chavs like this one will
       | continue monetising your hard work. As did bezos and many others
       | that now want you to be out of a job.
        
       | calderknight wrote:
       | I didn't think this article was very good. Sam Altman actually
       | implied that GPT-5 will be developed when he spoke at MIT. And if
       | Sam said that scaling is over (I doubt he said this but I could
       | be wrong) the interesting part would be the reasoning he provided
       | for this statement - no mention of that in the article.
        
       | cleandreams wrote:
       | Once you've trained on the internet and most published books (and
       | more...) what else is there to do? You can't scale up massively
       | anymore.
        
         | Animats wrote:
         | Right. They've already sucked in most of the good general
         | sources of information. Adding vast amounts of low-quality
         | content probably won't help much and might degrade the quality
         | of the trained model.
        
         | rvnx wrote:
         | Video content (I don't know why someone flagged Jason for
         | saying such, he is totally right)
        
           | bheadmaster wrote:
           | Looking at his post history, seems like he was shadowbanned.
        
         | kolinko wrote:
         | You can generate textual examples that teach logic, multi-
         | dimensional understanding and so on. Similar to the ones that
         | are in math books, but in a massive scale.
        
         | machdiamonds wrote:
         | Ilya Sutskever (OpenAI Chief Scientist): "Yeah, I would say the
         | data situation is still quite good. There's still lots to go" -
         | https://youtu.be/Yf1o0TQzry8?t=685
         | 
         | There was a rumor that they were going to use Whisper to
         | transcribe YouTube videos and use that for training. Since it's
         | multimodal, incorporating video frames alongside the
         | transcriptions could significantly enhance its performance.
        
           | it_citizen wrote:
           | I am curious how much video-to-text content represent
           | compared to pure text. I have no idea.
        
             | [deleted]
        
           | neel8986 wrote:
           | And why will google allow them to do that at scale?
        
             | throwaway5959 wrote:
             | Why would they ask Google for permission?
        
             | HDThoreaun wrote:
             | Can google stop them? It's trivial to download YouTube
             | videos
        
               | unionpivo wrote:
               | It's trivial to download some YouTube videos.
               | 
               | But I am quite sure that if you start doing it at scale,
               | google will notice.
               | 
               | You could be sneaky, but people in this business talk
               | (since they know another good paying job is just around
               | the corner) so It would likely come out.
        
         | mrtksn wrote:
         | You can transcribe all spoken words everywhere and keep the
         | model up to date? Keep indexing new data from chat messages,
         | news articles, new academic work etc.
         | 
         | The data is not finite.
        
           | spaceman_2020 wrote:
           | What about all the siloed content kept inside corporate
           | servers? You won't get normal GPT to train on it, of course,
           | but IBM could build a "IBM-bot" that has all the GPT-4
           | dataset + all of IBM's internal data.
           | 
           | That model might be very well tuned to solve IBM's internal
           | problems.
        
             | treis wrote:
             | I don't think you can just feed it data. You've got to
             | curate it, feed it to the LLM, and then manually
             | check/further train the output.
             | 
             | I also question that most companies have the volume and
             | quality of data worth training on. It's littered with
             | cancelled projects, old products, and otherwise obsolete
             | data. That's going to make your LLM hallucinate/give wrong
             | answers. Especially for regulated and otherwise legally
             | encumbered industries. Like can you deploy a chat bot
             | that's wrong 1% or 0.1% of the time?
        
               | spaceman_2020 wrote:
               | Well, IBM has 350k employees. If training a LLM on
               | curated data costs tens of millions of dollars but ends
               | up reducing headcount by 50k, it would be a massive win
               | for any CEO.
               | 
               | You have to understand that all the incentives are
               | perfectly aligned for corporations to put this to work,
               | even spending tens of millions in getting it right.
               | 
               | The first corporate CEO who announces that his company
               | used AI to reduce employee costs while _increasing_
               | profits is going to get such a fat bonus that everyone
               | will follow along.
        
             | Vrondi wrote:
             | Since Chat-GPT-4 is being integrated into the MS Office
             | suite, this is an "in" to corporate silos. The MS cloud
             | apps can see inside a great many of those silos.
        
         | [deleted]
        
         | nabnob wrote:
         | Real answer? Buy proprietary data from social media companies,
         | credit card companies, retail companies and train the model on
         | that data.
        
           | eukara wrote:
           | Can't wait for us to be able to query GPT for peoples credit
           | card info
        
         | m4jor wrote:
         | They didn't train it on the entire internet tho, only a small
         | amount (in comparison to entire internet). Still plenty they
         | could do.
        
         | sebzim4500 wrote:
         | I doubt they have trained on 0.1% of the tokens that are
         | 'easily' available (that is, available with licencing deals
         | that are affordable to OpenAI/MSFT).
         | 
         | They might have trained on a lot of the 'high quality' tokens,
         | however.
        
         | neel8986 wrote:
         | Youtube. This is where Google have huge advantage having
         | largest collection of user generated video
        
           | sebzim4500 wrote:
           | Yeah, but it's not like the videos are private. Surely Amazon
           | has the real advantage, given they have a ton of high quality
           | tokens in the form of their kindle library and can make it
           | difficult for OpenAI to read them all.
        
         | JasonZ2 wrote:
         | Video.
         | 
         | > YouTubers upload about 720,000 hours of fresh video content
         | per day. Over 500 hours of video were uploaded to YouTube per
         | minute in 2020, which equals 30,000 new video uploads per hour.
         | Between 2014 and 2020, the number of video hours uploaded grew
         | by about 40%.
        
           | sottol wrote:
           | But what are you mostly "teaching" the LLM then? Mundane
           | everyday stuff? I guess that would make them better at "being
           | average human" but is that what we want? It already seems
           | that prompting the LLM to be above-average ("pretend to be an
           | expert") improves performance.
        
             | dougmwne wrote:
             | This whole conversation about training set size is bizarre.
             | No one ever asks what's in the training set. Why would a
             | trillion tokens of mundane gossip improve a LLMs ability to
             | do anything valuable at all?
             | 
             | If a scrape of the general internet, scientific papers and
             | books isn't enough, a trillion trillion trillion text
             | messages to mom aren't going to change matters.
        
         | spaceman_2020 wrote:
         | If you were devious enough, you could be listening in on
         | billions of phone conversations and messages and adding that to
         | your data set.
         | 
         | This also makes me doubt that NSA hasn't already cracked this
         | problem. Or that China won't eventually beat current western
         | models since it will likely have way more data collected from
         | its citizenry.
        
           | PUSH_AX wrote:
           | I wonder what percentage of phone calls would add anything
           | meaningful to models, I imagine that the nature of most phone
           | calls are both highly personal and fairly boring.
        
             | midland_trucker wrote:
             | That's a fair point. Not at all like training on Wikipedia
             | in which nearly every sentence has novelty to it.
             | 
             | Then again it would give you data on every accent in the
             | country, so the holy grail for modelling human speech.
        
         | fpgaminer wrote:
         | > Once you've trained on the internet and most published books
         | (and more...) what else is there to do? You can't scale up
         | massively anymore.
         | 
         | Dataset size is not relevant to predicting the loss threshold
         | of LLMs. You can keep pushing loss down by using the same sized
         | dataset, but increasingly larger models.
         | 
         | Or augment the dataset using RLHF, which provides an "infinite"
         | dataset to train LLMs on. Limited by the capabilities of the
         | scoring model which, of course, you can scale the scoring model
         | infinitely so again the limit isn't dataset size but training
         | compute.
        
           | midland_trucker wrote:
           | > Dataset size is not relevant to predicting the loss
           | threshold of LLMs. You can keep pushing loss down by using
           | the same sized dataset, but increasingly larger models.
           | 
           | Deepmind and others would disagree with you! No-one really
           | knows in actual fact.
           | 
           | [1] https://www.deepmind.com/publications/an-empirical-
           | analysis-...
        
       | throwaway22032 wrote:
       | I don't understand why size is an issue in the way that is being
       | claimed here.
       | 
       | Intelligence isn't like processor speed. If I have a model that
       | has (excuse the attempt at a comparison) 200 IQ, why would it
       | matter that it runs more slowly than a human?
       | 
       | I don't think that, for example, Feynman at half speed would have
       | had substantially fewer insights.
        
         | yunwal wrote:
         | We're not going to get a 200 IQ model by simply scaling up the
         | current model, even with all the datacenters in the world
         | running 24/7
        
       | narrator wrote:
       | "Altman said there are also physical limits to how many data
       | centers the company can build and how quickly it can build them."
       | 
       | Maybe the economics are starting to get bad? An H100 has 80GB of
       | VRAM. The Highest end system I can find is 8xH100 so is a 640GB
       | model is the biggest model you can run on a single system?
       | Already GPT-4 is throttled and has a waiting list and they
       | haven't even released the image processing or integrations to a
       | wide audience.
        
       | matchagaucho wrote:
       | I wonder how much the scarcity and cost of Nvidia GPUs is driving
       | this message?
       | 
       | Nvidia is in a perfect "Arms Dealer" situation right now.
       | 
       | Wouldn't be surprised to see the next exponential leap in AI
       | models trained on in-house proprietary GPU hardware
       | architectures.
        
         | TheDudeMan wrote:
         | Google has been using TPUs for years and continuously improving
         | the designs.
        
       | screye wrote:
       | small AI model != cheap AI model.
       | 
       | It costs the same to train as these giant models. You merely
       | spend they money on training it for longer instead of larger.
        
       | mupuff1234 wrote:
       | Ok cool, so release the weights and your research.
        
       | Bjorkbat wrote:
       | Something kind of funny (but mostly annoying), about this
       | announcement is the people arguing that OpenAI is, in fact,
       | working on GPT-5 _in secret_.
       | 
       | To my knowledge, NFT/crypto hype never got so bad that conspiracy
       | theories began to circulate (though I'm sure there were some if
       | you looked hard enough).
       | 
       | Can't wait for an AIAnon community to emerge.
        
         | ryanwaggoner wrote:
         | Isn't it obvious? Q is definitely an LLM, trained on trillions
         | of words exfiltrated from our nation's secure systems. This
         | explains why it's always wrong in its predictions: it's
         | hallucinating!
        
       | aaroninsf wrote:
       | "...for the current cycle, in our specific public-facing market."
       | 
       | As most here well know "over" is one of those words like "never"
       | which particularly in this space should pretty much always be
       | understood as implicitly accompanied by a footnote backtracking
       | to include near-term scope.
        
       | iandanforth wrote:
       | There's plenty of room for models to continue to grow once
       | efficiency is improved. The basic premise of the Google ML
       | pathways project is sound, you don't have to use all the model
       | all the time. By moving to sparse activations or sparse
       | architectures you can do a lot more with the same compute. The
       | effective model size might be 10x or 100x GPT-4 (speculated at 1T
       | params) but require comparable or less compute.
       | 
       | While not a perfect analogy it's useful to remember that the
       | human brain has far more "parameters", requires several orders of
       | magnitude less energy to train and run, is highly sparse, and
       | does a decent job at thinking.
        
       | seydor wrote:
       | Now we need another letter
        
       | enduser wrote:
       | "When we set the upper limit of PC-DOS at 640K, we thought nobody
       | would ever need that much memory."
       | 
       |  _Bill Gates_
        
         | bagels wrote:
         | Gates has refuted saying this. Are you implying by analogy that
         | Altman hasn't said/will disclaim saying that "the age of giant
         | AI models is almost over"?
        
       | lossolo wrote:
       | We arrived at the top of the tree in our journey to the moon.
        
         | daniel_reetz wrote:
         | "You can't get to the moon by climbing successively taller
         | trees"
        
         | og_kalu wrote:
         | No we haven't. the title is misleading. there's plenty of scale
         | room left. part of it might just not be economical (parameter
         | sie) but there's data. If you take this to mean, "we're at a
         | dead end" you'd be very wrong
        
       | pixl97 wrote:
       | The Age of Giants is over... The Age of Behemoths has begun!
       | 
       |  _but sir, that means the same thing_
       | 
       | Throw this heretic into the pit of terror.
        
         | hanselot wrote:
         | The pit of terror is full.
         | 
         | Fine, to the outhouse of madness then.
         | 
         | Before I get nuked from orbit for daring to entertain humor, if
         | someone is running ahead of me in a marathon, and running so
         | far ahead, yet still broadcasting things to the back for the
         | slow people (like myself), then eventually we catch up to them,
         | and they suddenly say, you know what guys, we should stop
         | running in this direction, there's nothing to see here right
         | before anyone else is able to verify the veracity of their
         | statement, perhaps it would still be in the public interest for
         | at least one person to verify what they are saying. Given how
         | skeptical the internet at large has been of Musk's acquisition
         | of a company, it's interesting that the skepticism is suddenly
         | put on hold when looking at this part of his work...
        
           | [deleted]
        
       | zwieback wrote:
       | The age of CEOs that recently got washed to the top saying
       | dumbish things is just starting, though.
        
       | xt00 wrote:
       | Saying "hey don't go down the path we are on, where we are making
       | money and considered the best in the world.. it's a dead end"
       | rings pretty hollow.. like "don't take our lunch please?" Might
       | be a similar statement it feels..
        
         | whywhywhywhy wrote:
         | Everyone hoping to compete with OpenAI should have an "Always
         | do the opposite of what Sam says" sign on the wall.
        
         | thewataccount wrote:
         | Nah - GPT-4 is crazy expensive, paying 20$/mo only get's you
         | 25messages/3hours and it's crazy slow. The api is rather
         | expensive too.
         | 
         | I'm pretty sure that GPT-4 is ~1T-2T parameters, and they're
         | struggling to run it(at reasonable performance and profit). So
         | far their strategy has been to 10x the parameter count every
         | GPT generation, and the problem is that there's diminishing
         | returns everytime they do that. AFAIK they've now resorted to
         | chunking GPT through the GPUs because of the 2 to 4 terabytes
         | of VRAM required (at 16bit).
         | 
         | So now they've reached the edge of what they can reasonably
         | run, and even if they do 10x it the expected gains are less. On
         | top of this, models like LLaMa have shown that it's possible to
         | cut the parameter count substantially and still get decent
         | results (albiet the opensource stuff still hasn't caught up).
         | 
         | On top of all of this, keep in mind that at 8bit resolution
         | 175B parameters (GBPT3.5) requires over 175GB of VRAM. This is
         | crazy expensive and would never fit on consumer devices. Even
         | if you use quantization and use 4bit, you still need over 80GB
         | of VRAM.
         | 
         | This definitely is not a "throw them off the trail" tactic - in
         | order for this to actually scale the way everyone envisions
         | both in performance and running on consumer devices - research
         | HAS to be on improving the parameter count. And again there's
         | lots of research showing its very possible to do.
         | 
         | tl;dr: smaller = cheaper+faster+more accessible+same
         | performance
        
           | haxton wrote:
           | I don't think this argument really holds up.
           | 
           | GPT3 on release was more expensive ($0.06/1000 tokens vs
           | $0.03 input and $0.06 output for GPT4).
           | 
           | Reasonable to assume that in 1-2 years it will also come down
           | in cost.
        
             | thewataccount wrote:
             | > Reasonable to assume that in 1-2 years it will also come
             | down in cost.
             | 
             | Definitely. I'm guessing they used something like
             | quantization to optimize the vram usage to 4bit. The thing
             | is that if you can't fit the weights in memory then you
             | have to chunk it and that's slow = more gpu time = more
             | cost. And even if you can fit it in GPU memory, less memory
             | = less gpus needed.
             | 
             | But we know you _can_ use less parameters, and that the
             | training data + RLHF makes a massive difference in quality.
             | And the model size linearly relates to the VRAM
             | requirements/cost.
             | 
             | So if you can get a 60B model to run at 175B's quality,
             | then you've almost 1/3rd your memory requirements, and can
             | now run (with 4bit quantization) on a single A100 80GB
             | which is 1/8th the previously known 8x A100's that GPT-3.5
             | ran on (and still half GPT-3.5+4bit).
             | 
             | Also while openai likely doesn't want this - we really want
             | these models to run on our devices, and LLaMa+finetuning
             | has shown promising improvements (not their just yet) at 7B
             | size which can run on consumer devices.
        
           | whywhywhywhy wrote:
           | It's never been in OpenAIs interest to make their model
           | affordable or fast, they're actually incentivized to do the
           | opposite as an excuse to keep the tech locked up.
           | 
           | This is why Dall-e 2 ran in a data centre and Stable
           | Diffusion runs on a gamer GPU
        
             | thewataccount wrote:
             | I think you're mixing the two. They do have an incentive to
             | make it affordable and fast because that increases the use
             | cases for it, and the faster it is the cheaper it is for
             | them, because the expense is compute time (half the time ~=
             | half the cost).
             | 
             | > This is why Dall-e 2 ran in a data centre and Stable
             | Diffusion runs on a gamer GPU
             | 
             | This is absolutely why they're keeping it locked up. By
             | simply not releasing the weights, you can't run Dalle2
             | locally, and yeah they don't want to do this because they
             | want you to be locked to their platform, not running it for
             | free locally.
        
           | ericmcer wrote:
           | Yeah I am noticing this as well. GPT enables you to do
           | difficult things really easily, but then it is so expensive
           | you would need to replace it with custom code for any long
           | term solution.
           | 
           | For example: you could use GPT to parse a resume file, pull
           | out work experience and return it as JSON. That would take
           | minutes to setup using the GPT API and it would take weeks to
           | build your own system, but GPT is so expensive that building
           | your own system is totally worth it.
           | 
           | Unless they can seriously reduce how expensive it is I don't
           | see it replacing many existing solutions. Using GPT to parse
           | text for a repetitive task is like using a backhoe to plant
           | flowers.
        
             | mejutoco wrote:
             | You could use those examples to finetune a model only for
             | resume-data extraction.
        
             | abraae wrote:
             | > For example: you could use GPT to parse a resume file,
             | pull out work experience and return it as JSON. That would
             | take minutes to setup using the GPT API and it would take
             | weeks to build your own system, but GPT is so expensive
             | that building your own system is totally worth it.
             | 
             | True, but an HR SaaS vendor could use that to put on a
             | compelling demo to a potential customer, stopping them from
             | going to a competitor or otherwise benefiting.
             | 
             | And anyway, without churning the numbers, for volumes of
             | say 1M resumes (at which point you've achieved a lot of
             | success) I can't quite believe it would be cheaper to build
             | something when there is such a powerful solution available.
             | Maybe once you are at 1G resumes... My bet is still no
             | though.
        
               | thewataccount wrote:
               | I work for a company with the web development team. We
               | have ~6 software developers.
               | 
               | I'd love to be able to just have people submit their
               | resume's and extract the data from there, but instead I'm
               | going to build a form and make applicants fill it out
               | because chatGPT is going to be at least $0.05USD
               | depending on the length of the resume.
               | 
               | I'd also love to have mini summeries of order returns
               | summerized in human form, but that also would cost
               | 0.05USD per form.
               | 
               | the tl;dr here is that there's a TON of usecases for a
               | LLM outside of your core product (we sell clothes) - but
               | we can't currently justify that cost. Compare that to the
               | rapidly improving self-hosted solutions which don't cost
               | 0.05USD for literally any query (and likely more for
               | anything useful).
        
               | sitkack wrote:
               | 5 cents. Per resume. $500 per 10k. 1-3 hours of a fully
               | loaded engineers salary per year. You are being
               | criminally cheap.
        
               | thewataccount wrote:
               | The problem is that it would take us the same amount of
               | time to just add a form with django. Plus you have to
               | handle failure cases, etc.
               | 
               | And yeah I agree this would be a great use-case, and
               | isn't that expensive.
               | 
               | I'd like to do this in lots of places, and the problem is
               | I have to convince my boss to pay for something that
               | otherwise would have been free.
               | 
               | The conversation would be "We have to add these fields to
               | our model, and we either tell django to add a form for
               | them, which will have 0 ongoing cost and no reliance on a
               | third party,
               | 
               | or we send the resume to openai, pay for them to process
               | it, make some mechanism to sanity check what GPT is
               | responding with, alert us if there's issues, and then put
               | it into that model, and pay 5 cents per resume."
               | 
               | > 1-3 hours of a fully loaded engineers salary per year.
               | 
               | That's assuming 0 time to implement, and because of our
               | framework it would take more hours to implement the
               | openai solution (that's also more like 12 hours where we
               | are).
               | 
               | > $500 per 10k.
               | 
               | I can't stress this enough - the alternative is 0$ per
               | 10k. My boss wants to know why we would pay any money for
               | a less reliable solution (GPT serialization is not nearly
               | as reliable as a standard django form).
               | 
               | I think within the next few years we'll be able to run
               | the model locally and throw dozens of tasks just like
               | this at the LLM, just not yet.
        
               | marketerinland wrote:
               | There are excellent commercial AI resume parsers already
               | - Affinda.com being one. Not expensive and takes minutes
               | to implement.
        
               | ericmcer wrote:
               | For a big company that is nothing but if you are
               | bootstrapping and trying to acquire customers with an MVP
               | racking up a $500 bill is frightening. What if you offer
               | a free trial and blow up and end up with 5k+ bill.
        
               | yunwal wrote:
               | Also you could likely use GPT3.5 for this and still get
               | near perfect results.
        
               | thewataccount wrote:
               | > near perfect results.
               | 
               | I have tried GPT3.5 and GPT4 for this type of task - the
               | "near perfect results" is really problematic because you
               | need to verify that it's likely correct, notify you if
               | there's issues, and even then you aren't 100% sure that
               | it selected the correct first/last name.
               | 
               | This is compared to a standard html form. Which is....
               | very reliable and (for us) automatically has error
               | handling built in, including alerts to us if there's a
               | 504.
        
         | Freire_Herval wrote:
         | [dead]
        
         | og_kalu wrote:
         | It's a pretty sus argument for sure when they're scared to
         | release even parameter size.
         | 
         | although the title is a bit misleading on what he was actually
         | saying. still, there's a lot left to go in terms of scale. Even
         | if it isn't parameter size(and there's still lots of room here
         | too, it just won't be economical), contrary to popular belief,
         | there's lots of data left to mine
        
       | dpflan wrote:
       | Hm, all right, I'm guessing that huge models as a business maybe
       | are over until economics are figured out, but huge models as
       | experts for knowledge distillation seems reasonable. And if you
       | pay a super premium can you use huge model.
        
       | [deleted]
        
         | Freire_Herval wrote:
         | [dead]
        
       | bob1029 wrote:
       | I strongly believe the next generation of models will be based
       | upon spiking neural concepts wherein action potentials are
       | lazily-evaluated throughout the network (i.e. event-driven).
       | There are a few neuron models that can be modified (at some
       | expense to fidelity) in order to tolerate arbitrary delays
       | between simulation ticks. Using _actual_ latency between neurons
       | as a means of encoding information seems absolutely essential if
       | we are trying to emulate biology in any meaningful way.
       | 
       | Spiking networks also lend themselves nicely to some elegant
       | learning rules, such as STDP. Being able to perform unsupervised
       | learning at the grain of each action potential is really
       | important in my mind. This gives you all kinds of ridiculous
       | capabilities, most notably being the ability to train the model
       | while it's live in production (learning & use are effectively the
       | same thing).
       | 
       | These networks also provide a sort of deterministic, event-over-
       | time tracing that is absent in the models we see today. In my
       | prototypes, the action potentials are serialized through a ring
       | buffer, and then logged off to a database in order to perfectly
       | replay any given session. This information can be used to
       | bootstrap the model (offline training) by "rewinding" things very
       | precisely and otherwise branching time to your advantage.
       | 
       | The #1 reason I've been thinking about this path is that low-
       | latency, serialized, real-time signal processing is somewhat
       | antagonistic to GPU acceleration. I fear there is an appreciable
       | % of AI research predicated on some notion that you need at least
       | 1 beefy GPU to start doing your work. Looking at fintech, we are
       | able to discover some very interesting pieces of technology which
       | can service streams of events at unbelievable rates and scales -
       | and they only depend on a handful of CPU cores in order to
       | achieve this.
       | 
       | Right now, I think A Time Domain Is All You Need. I was inspired
       | to go outside of the box by this paper:
       | https://arxiv.org/abs/2304.06035. Part 11 got me thinking.
        
         | MagicMoonlight wrote:
         | I know what it looks like in my head but I can't quite figure
         | the algorithm out. The spiking is basically reinforcement
         | learning at the neuron level. Get it right and it's basically
         | all you need. You don't even need training data because it will
         | just automagically learn from the data it sees.
        
         | eternalban wrote:
         | I'm bullish on SNNs too. This Chinese research group is doing
         | something quite comprehensive with them:
         | 
         | https://news.ycombinator.com/item?id=35037605
        
       | tfehring wrote:
       | Related reading: https://dynomight.net/scaling/
       | 
       | In short it seems like virtually all of the improvement in future
       | AI models will come from better algorithms, with bigger and
       | better data a distant second, and more parameters a distant
       | third.
       | 
       | Of course, this claim is itself internally inconsistent in that
       | it assumes that new algorithms won't alter the returns to scale
       | from more data or parameters. Maybe a more precise set of claims
       | would be (1) we're relatively close to the fundamental limits of
       | transformers, i.e., we won't see another GPT-2-to-GPT-4-level
       | jump with current algorithms; (2) almost all of the incremental
       | improvements to transformers will require bigger or better-
       | quality data (but won't necessarily require more parameters); and
       | (3) all of this is specific to current models and goes out the
       | window as soon as a non-transformer-based generative model
       | approaches GPT-4 performance using a similar or lesser amount of
       | compute.
        
         | strangattractor wrote:
         | Good thing he got a bunch of companies to pony up the dough for
         | LLM before he announced they where already over.
        
           | tfehring wrote:
           | I don't think LLMs are over [0]. I think we're relatively
           | close to a local optimum in terms of what can be achieved
           | with current algorithms. But I think OpenAI is at least as
           | likely as any other player to create the next paradigm, and
           | that it's at least as likely as likely as any other player to
           | develop the leading models within the next paradigm
           | regardless of who actually publishes the research.
           | 
           | Separately, I think OpenAI's current investors have a >10%
           | chance to hit the 100x cap on their returns. Their current
           | models are already good enough to address lots of real-world
           | problems that people will pay money to solve. So far they've
           | been much more model-focused than product-focused, and by
           | turning that dial toward the product side (as they did with
           | ChatGPT) I think they could generate a lot of revenue
           | relatively quickly.
           | 
           | [0] Except maybe in the sense that future models will be
           | predominantly multimodal and therefore not strictly LLMs. I
           | don't think that's what you're suggesting though.
        
             | jacobr1 wrote:
             | It already is relatively trivial to fine-tune generative
             | models for various use cases. Which implies huge gains to
             | be had with targeted applications not just for niche
             | players but also OpenAI and others to either build that
             | fine-tuning into the base system, build ecosystems around
             | it, or just purpose build applications on top.
        
         | no_wizard wrote:
         | All the LC grinding may come in handy after all! /s
         | 
         | What algorithms specifically show the most results upon
         | improvement? Going into this I thought the jump of improvements
         | were really related more advanced automated tuning and result
         | correction, in which it could be done _at scale_ as it were
         | allowing a small team of data scientists to tweak the models
         | until desired results were being achieved.
         | 
         | Are you saying instead, that concrete predictive algorithms
         | need improvement or are we lumping the tuning into this?
        
           | junipertea wrote:
           | We need more data efficient neural network architectures.
           | Transformers work exceptionally well because they allow us to
           | just dump more data into it, but ultimately we want to learn
           | advanced behavior without having to feed it Shakespeare
        
             | uoaei wrote:
             | Inductive Bias Is All You Need
        
           | tfehring wrote:
           | I think it's unlikely that the first model to be widely
           | considered AGI will be a transformer. Recent improvements to
           | computational efficiency for attention mechanisms [0] seem to
           | improve results a lot, as does RLHF, but neither is a
           | paradigm shift like the introduction of transformers was.
           | That's not to downplay their significance - that class of
           | incremental improvements has driven a massive acceleration in
           | AI capabilities in the last year - but I don't think it's
           | ultimately how we'll get to AGI.
           | 
           | [0] https://hazyresearch.stanford.edu/blog/2023-03-27-long-
           | learn...
        
           | goldenManatee wrote:
           | bubble sort /s
        
           | uoaei wrote:
           | Traditional CS may have something to do with slightly
           | improving the performance by allowing more training for the
           | same compute, but it won't be an order of magnitude or more.
           | The improvements to be gained will be found more in
           | statistics than CS per se.
        
             | jacobr1 wrote:
             | I'm not sure. Methods like Chinchilla and Quantization have
             | been able to reduce compute by more than an order of
             | magnitude. There might very well be a few more levels of
             | optimizations within the same statistical paradigm.
        
         | brucethemoose2 wrote:
         | _Better_ data is still critical, even if bigger data isn 't.
         | The linked article emphasizes this.
        
           | tfehring wrote:
           | I'd bet on a 2030 model trained on the same dataset as GPT-4
           | over GPT-4 trained with perfect-quality data, hands down. If
           | data quality were that critical, practitioners could ignore
           | the Internet and just train on books and scientific papers
           | and only sacrifice <1 order of magnitude of data volume.
           | Granted, that's not a negligible amount of training data to
           | give up, but it places a relatively tight upper bound on the
           | potential gain from improving data quality.
        
           | NeuroCoder wrote:
           | So true. There are still plenty of areas where we lack
           | sufficient data to even approach applying this sort of model.
           | How are we going to make similar advances in something like
           | medical informatics where we not only have less data readily
           | available but its much more difficult to acquire more data
        
       | winddude wrote:
       | Also scaling doesn't address some of the challenges for ai that
       | chatGPT doesn't meet, like:
       | 
       | - learning to learn, aka continual learning - internalised memory
       | 
       | bringing it closer to actual human capabilities.
        
       | arenaninja wrote:
       | An amusing thought I've had recently is whether LLMs are in the
       | same league as the millions of monkeys at the keyboard,
       | struggling to reproduce one of the complete works of William
       | Shakespeare.
       | 
       | But I think not, since monkeys probably don't "improve"
       | noticeably with time or input.
        
         | mhb wrote:
         | _But I think not, since monkeys probably don 't "improve"
         | noticeably with time or input._
         | 
         | Maybe once tons of bananas are introduced...
        
       | mromanuk wrote:
       | Sorry, but this sounds a lot like 640KB is all the memory you
       | will ever need. What about "Socratic model" for video? There
       | should me many applications that would benefit from a bigger
       | model
        
       | joebiden2 wrote:
       | We will need a combination of technologies we have in order to
       | really achieve emergent intelligence.
       | 
       | Humans are comprised of various "subnets" modelling aspects
       | which, in unison, produce self-conciousness and real
       | intelligence. What is missing in the current line of approaches
       | is that we only rely on auto-alignment of subnetworks by machine
       | learning, which scales only up to a point.
       | 
       | If we would produce a model which has
       | 
       | * something akin a LLM as we know it today, which is able to
       | 
       | * store or fetch facts to a short- ("context") or longterm
       | ("memory") storage
       | 
       | * if not in the current "context", query the longterm context
       | ("memory") by keywords for associations, which are one-by-one
       | inserted into the current "context"
       | 
       | * repeat as required until fulfilling some self-defined condition
       | ("thinking")
       | 
       | To me, this is mostly mechanical plumbing work and lots of money.
       | 
       | Also, if we get rid of the "word-boundedness" of LLMs - which we
       | already did to some degree, as shown by the multi-language
       | capabilities - LLMs would be free to roam in the domain of
       | thoughts /s :)
       | 
       | This approach could be further improved by meta-LLMs governing
       | the longterm memory access, providing an "intuition" which
       | longterm memory suits the provided context best. Apply recursion
       | as needed to improve results (paying by exponential training
       | time, but this meta-NN will quite probably be independent of
       | actual training, as real life / brain organization shows).
        
         | babyshake wrote:
         | The other elements that may be required could be some version
         | of the continuous sensory input that to us creates the
         | sensation of "living" and, this one is a bit more
         | philosophical, the sensation of suffering and a baseline
         | establishment that the goal of the entity is to take actions
         | that help it avoid suffering.
        
           | joebiden2 wrote:
           | I think an AI may have extra qualities by feeling suffering
           | etc., but I don't think these extra qualities are rationally
           | beneficial.
        
       | thunderbird120 wrote:
       | >"the company's CEO, Sam Altman, says further progress will not
       | come from making models bigger. "I think we're at the end of the
       | era where it's going to be these, like, giant, giant models," he
       | told an audience at an event held at MIT late last week. "We'll
       | make them better in other ways."
       | 
       | So to reiterate, he is not saying that the age of giant AI models
       | is over. Current top-of-the-line AI models are giant and likely
       | will continue to be. However, there's not point in training
       | models you can't actually run economically. Inference costs need
       | to stay grounded which means practical model sizes have a limit.
       | More effort is going to go into making models efficient to run
       | even if it comes at the expense of making them less efficient to
       | train.
        
         | ldehaan wrote:
         | I've been training large 65b models on "rent for N hours"
         | systems for less than 1k per customized model. Then fine tuning
         | those to be whatever I want for even cheaper.
         | 
         | 2 months since gpt 4.
         | 
         | This ride has only just started, fasten your whatevers.
        
           | Voloskaya wrote:
           | Finetuning cost are nowhere near representative of the cost
           | to pre-train those models.
           | 
           | Trying to replicate the quality of GPT-3 from scratch, using
           | all the tricks and training optimizations in the books that
           | are available now but weren't used during GPT-3 actual
           | training, will still cost you north of $500K, and that's
           | being extremly optimistic.
           | 
           | GPT-4 level model would be at least 10x this using the same
           | optimism (meaning you are managing to train it for much
           | cheaper than OpenAI). And That's just pure hardware cost, the
           | team you need to actually makes this happen is going to be
           | very expensive as well.
           | 
           | edit: To quantify how "extremely optimistic" that is, the
           | very model you are finetuning, which I assume is Llama 65B,
           | would cost around ~$18M to train on google cloud assuming you
           | get a 50% discount on their listed GPU prices (2048 A100 GPUs
           | for 5 months). And that's not even GPT-4 level.
        
             | bagels wrote:
             | $5M to train GPT-4 is the best investment I've ever seen.
             | I've seen startups waste more money for tremendously
             | smaller impact.
        
               | Voloskaya wrote:
               | As I stated in my comment, $5M is assuming you can do a
               | much much better job than OpenAI at optimizing your
               | training, only need to make a single training run, your
               | employees salaries are $0, and you get a clean dataset
               | for essentially free.
               | 
               | Real cost is 10-20x that.
               | 
               | That's still a good investment though. But the issue is
               | you could very well sink $50M into this endeavour and end
               | up with a model that actually is not really good and gets
               | rendered useless by an open-source model that gets
               | released 1 month later.
               | 
               | OpenAI truly has unique expertise in this field that is
               | very, very hard to replicate.
        
               | moffkalast wrote:
               | > and end up with a model that actually is not really
               | good and gets rendered useless
               | 
               |  _ahem_ Bard _ahem_
        
         | hcks wrote:
         | Yes, but it also tells us that if Altman is honest here, then
         | he doesn't believe GPT-like models can scale to near level
         | human performances (because even if the cost of compute was 10x
         | or even 100x it would still be economically sound).
        
           | [deleted]
        
           | og_kalu wrote:
           | No it doesn't.
           | 
           | For one thing they're already at human performance.
           | 
           | For another, i don't think you realize how expensive
           | inference can get. Microsoft with no scant amount of
           | available compute is struggling to run gpt-4 such that
           | they're rationing it between subsidiaries while they try to
           | jack up compute.
           | 
           | So saying, it would be economically sound if it cost x10 or
           | x100 what it costs now is a joke.
        
             | quonn wrote:
             | How are they at human performance? Almost everything GPT
             | has read on the internet didn't even exist 200 years ago
             | and was invented by humans. Heck, even most of the
             | programming it does wasn't there 20 years ago.
             | 
             | Not every programmer starting from scratch would be
             | brilliant, but many were self taught with very limited
             | resources in the 80s form example and discovered new things
             | from there.
             | 
             | GPT cannot do this and is very far from being able to.
        
               | og_kalu wrote:
               | >How are they at human performance?
               | 
               | Because it performs at least average human level (mostly
               | well above average) on basically every task it's given.
               | 
               | "Invest something new" is a nonsensical benchmark for
               | human level intelligence. The vast majority of people
               | have never and will never invent anything new.
               | 
               | If your general intelligence test can't be passed by a
               | good chunk of humanity then it's not a general
               | intelligence test unless you want to say most people
               | aren't generally intelligent.
        
               | quonn wrote:
               | Yeah these intelligence tests are not very good.
               | 
               | I would argue some programmers do in fact invent
               | something new. Not all of them, but some. Perhaps 10%.
               | 
               | Second the point is not whether everyone is by profession
               | an inventor but whether most people can be inventors. And
               | to a degree they can be. I think you underestimate that
               | by a large margin.
               | 
               | You can lock people in a room and give them a problem to
               | solve and they will invent a lot if they have the time to
               | do it. GPT will invent nothing right now. It's not there
               | yet.
        
               | og_kalu wrote:
               | >Yeah these intelligence tests are not very good.
               | 
               | Lol Okay
               | 
               | >And to a degree they can be. I think you underestimate
               | that by a large margin.
               | 
               | Do i? Because i'm not the one making unverifiable claims
               | here.
               | 
               | >You can lock people in a room and give them a problem to
               | solve and they will invent a lot if they have the time to
               | do it.
               | 
               | If you say so
        
             | smeagull wrote:
             | This tells me you haven't really stress tested the model.
             | GPT is currently at the stage of "person who is at the
             | meeting, but not really paying attention so you have to
             | call them out". Once GPT is pushed, it scrambles and falls
             | over for most applications. The failure modes range from
             | contradicting itself, making up things for applications
             | that shouldn't allow it, to ignoring prompts, to simply
             | being unable to perform tasks at all.
        
               | dragonwriter wrote:
               | Are we talking about bare GPT through the UI, or GPT with
               | a framework giving it access to external systems and the
               | ability to store and retrieve data?
               | 
               | Because, yeah, "brain in a jar" GPT isn't enough for most
               | tasks beyond parlor-trick chat, but being used as a brain
               | in a jar isn't the point.
        
               | moffkalast wrote:
               | Still waiting to see those plugins rolled out and actual
               | vector DB integration with GPT 4, then we'll see what it
               | can really do. Seems like the more context you give it
               | the better it does, but the current UI really makes it
               | hard to provide that.
               | 
               | Plus the recursive self prompting to improve accuracy.
        
         | mullingitover wrote:
         | Quality over quantity. Just building a model with a gazillion
         | parameters isn't indicative of quality, you could easily have
         | garbage parameters with tons of overfitting. It's like
         | megapixel counts in cameras: you might have 2000 gigapixels in
         | your sensor, but that doesn't mean you're going to get great
         | photos out of it if there are other shortcomings in the system.
        
           | sanxiyn wrote:
           | What overfitting? If anything, LLMs suffer from underfitting,
           | not overfitting. Normally, overfitting is characterized by
           | increasing validation loss while training loss is decreasing,
           | and solved by early stopping (stopping before that happens).
           | Effectively, all LLMs are stopped early, so they don't suffer
           | from overfitting at all.
        
         | spaceman_2020 wrote:
         | Is cost really that much of a burden?
         | 
         | Intelligence is the single most expensive resource on the
         | planet. Hundreds of individuals have to be born, nurtured, and
         | educated before you might get an exceptional 135+ IQ
         | individual. Every intelligent person is produced at a great
         | societal cost.
         | 
         | If you can reduce the cost of replicating a 135 IQ, or heck,
         | even a 115 IQ person to a few thousand dollars, you're beating
         | biology by a massive margin.
        
           | oezi wrote:
           | Since IQ is just a normal distribution on a population it is
           | a bit misleading to talk about it like that.
           | 
           | Even if we don't expend any cost on education the number of
           | people with IQ 135 stays the same.
        
           | yunwal wrote:
           | But we're still nowhere near that, or even near surpassing
           | the skill of an average person at a moderately complex
           | information task, and GPT-4 supposedly took hundreds of
           | millions to train. It also costs a decent amount more to run
           | inference on it vs. 3.5. It probably makes sense to prove the
           | concept that generative AI can be used for lots of real work
           | before scaling that up by another order of magnitude for
           | potentially marginal improvements.
           | 
           | Also, just in terms of where to put your effort, if you think
           | another direction (for example, fine-tuning the model to use
           | digital tools, or researching how to predict confidence
           | intervals) is going to have a better chance of success, why
           | focus on scaling more?
        
             | spaceman_2020 wrote:
             | There are a _lot_ of employees at large tech consultancies
             | that don 't really do anything that can't be automated away
             | by even current models.
             | 
             | Sprinkle in some more specific training and I can totally
             | see entire divisions at IBM and Accenture and TCS being
             | made redundant.
             | 
             | The incentive structures are perversely aligned for this
             | future - the CEO who manages to reduce headcount while
             | increasing revenue is going to be very handsomely rewarded
             | by Wall Street.
        
               | skyechurch wrote:
               | Wall Street would be strongly incentivised to install an
               | AI CEO.
        
           | dauertewigkeit wrote:
           | Are intelligent people that valuable? There's lots of them at
           | every university working for peanuts. They don't seem to be
           | that valued by society, honestly.
        
             | taylorius wrote:
             | IQ isn't all that. Mine is 140+ and I'm just a somewhat
             | well paid software engineer. It's TOO abstract a metric in
             | my view - for sure it doesn't always translate into real
             | world success.
        
               | roflyear wrote:
               | Right were very much in the same boat. I'm good at
               | pattern recognition I guess. I learn things quickly. What
               | else? I don't have magic powers really. I still get
               | headaches and eat junk food.
        
             | spaceman_2020 wrote:
             | If you ask any Fortune 500 CEO if he could magically take
             | all the 135 IQ artists and academics and vagabonds, erase
             | all their past traumas, put them through business or tech
             | school, and put them to work in their company, they would
             | all say 100% yes.
             | 
             | An equivalent AI won't have any agency and will be happy
             | doing the boring work other 135 IQ humans won't.
        
           | roflyear wrote:
           | My IQ is 140 and I'm far from exceptional.
        
           | jutrewag wrote:
           | 115 IQ isn't all that high- that's basically every Indian
           | American or a healthy percentage of the Chinese population.
           | 
           | Edit: I don't understand the downvotes. I don't mean this in
           | any disparaging way, just that an AGI is probably going to be
           | a lot higher than that.
        
             | spaceman_2020 wrote:
             | 115 IQ is perfectly fine for the majority of human
             | endeavors.
        
           | asdfman123 wrote:
           | The reason we put everyone through school is we believe that
           | it's in society's best interest to educate everyone to the
           | peak of their abilities. It's good for many different
           | reasons.
           | 
           | It would be much easier to identify gifted kids and only
           | educate them, but I happen to agree that universal education
           | is better.
        
             | gowld wrote:
             | It would be much easier to identify gifted kids and only
             | educate them
             | 
             | Is it so easy?
        
       | LesZedCB wrote:
       | the way i see it, the expensive part should be to train the
       | models via simulated architectures in GPUs or TPUs or whatever.
       | 
       | but once they are trained, is there a way to encode the base
       | models into hardware where inference costs are basically
       | negligible? hopefully somebody is seeing if this is possible,
       | using structurally encoded hardware to make inference costs
       | basically nil/constant.
        
       | [deleted]
        
       | antibasilisk wrote:
       | it's over, billions of parameters must be released
        
       | rhelz wrote:
       | All warfare is based on deception -- Sun Zu
        
       | donpark wrote:
       | I think Sam is referring to transition from "Deep" to "Long"
       | learning [1]. What new emergent properties, if any, will 1
       | billion tokens unlock?
       | 
       | [1] https://hazyresearch.stanford.edu/blog/2023-03-27-long-
       | learn...
        
       | [deleted]
        
       | carlsborg wrote:
       | The 2017 Transformers paper has ~71,000 papers citing it. The
       | sheer magnitude of human mental effort globally that is chasing
       | the forefront of machine learning is unprecedented and amazing.
        
       ___________________________________________________________________
       (page generated 2023-04-17 23:00 UTC)