[HN Gopher] GPT-3 is no longer the only game in town
___________________________________________________________________
GPT-3 is no longer the only game in town
Author : sebg
Score : 217 points
Date : 2021-11-07 14:53 UTC (8 hours ago)
(HTM) web link (lastweekin.ai)
(TXT) w3m dump (lastweekin.ai)
| supperburg wrote:
| I have supported my beliefs on this topic in these threads to the
| point of exhausting myself. The tools that we use to find these
| agents are the underpinning of AGI, it's coming way faster than
| even most people here appreciate, this development is
| intrinsically against the interest of human beings. Please stop
| and think, please.
| Simon321 wrote:
| I argue it's very much in the interest of human beings. It has
| been since we first picked up a rock and used it has a hammer.
| It's the ultimate tool and has the potential to bring unseen
| prosperity.
| supperburg wrote:
| It won't. You're wrong.this is the perfect illustration. You
| think a rock is good therefore AI is good. You're just
| unbelievably wrong.
| [deleted]
| eunos wrote:
| Number of parameters aside, I am really surprised that we havent
| yet reached hundreds of TB of training data. Especially Chinese
| model only used less than 10 TB of data.
| visarga wrote:
| The GPT-3 family is still to expensive to use, too big to fit in
| memory on a single machine. Prices need to come down before large
| scale adoption or someone needs to invent the chip to hold it
| (cheaply).
|
| The most exciting part about it is showing us there is a path
| forward by scaling and prompting, but you can still do much
| better with a smaller model and a bit of training data (which can
| come from the expensive GPT-3 as well).
|
| What I expected from the next generation: multi-modality, larger
| context, using retrieval to augment input data with fresh
| information, tuned to solve thousands of tasks with supervised
| data so it can generalize on new tasks better, and some efficient
| way to keep it up to date and fine-tune it. On the data part -
| more data, more languages - a lot of work.
| worik wrote:
| The underlying methods seem impractical. GPT-n are an existence
| proof - it is possible to make parrot like software that
| generates realistic text. But using these methods it is not
| practical.
|
| Maybe that is a good thing, maybe a bad thing, but unless there
| is a breakthrough in methods this is a dead end. Impressive
| though.
| WithinReason wrote:
| https://copilot.github.com/
| mrbukkake wrote:
| Can anyone tell me what the value of GPT-3 actually is other than
| generating meaningless prose? What would a business use it for
| phone8675309 wrote:
| It's good for the university-industrial-business complex -
| people writing papers about a model they can't even run
| themselves. It practically prints money in journal articles,
| travel per diem, and conference honorariam, not even counting
| the per-API call rates.
| DeathArrow wrote:
| >What would a business use it for
|
| If you think about business uses you can actually get advices
| from Jerome Powell, simulated by GPT-3.
|
| If someone use GPT-3 to simulate Warren Buffet, he can extract
| even more value.
|
| https://www.institutionalinvestor.com/article/b1tktmhcfdyqsk...
| mrbukkake wrote:
| I can't tell whether or not this article is parody... is this
| a new kind of turing test
| vitus wrote:
| Somehow I don't think that this is quite how Jerome Powell
| would respond in an interview:
|
| > Interviewer: How do you think a global pandemic would
| impact the financial system?
|
| > Mr. Powell: A pandemic will have impact on the economy. It
| will have an impact on the financial system.
|
| > Interviewer: What would the impact be?
|
| > Mr. Powell: If a major pandemic occurred, the economy would
| be affected, and the financial system would be affected.
|
| Yes, GPT-3 can provide a convincing chatbot, but it shouldn't
| be confused for domain expertise.
| robbedpeter wrote:
| This is a bad example. Here's an output from gpt-j-6b by
| EleutherAI: Prompt( Jerome Powell was interviewed about the
| impact of a pandemic on the economy.
|
| > > Interviewer: What would the impact be? > Mr. Powell: If
| a major pandemic occurred)
|
| Output( it could be very disruptive. > > Interviewer: How
| disruptive? > Mr. Powell: Well, it could be disruptive in a
| number of ways. > > Interviewer: How so? > Mr. Powell:
| Well, one of the first things that would be disrupted would
| be the supply chain. )
|
| Using prompts well makes a huge difference.
|
| If you parse the generated output, classify it, then
| develop a decision tree that uses further prompts to refine
| the response, you can get more sophisticated, valuable
| responses.
|
| The output in the parent is comparable to an off-the-cuff
| interview response. If you emulate a deeper thought
| process, you can get more meaningful output, and if you use
| the right prompts, you can access the semantic networks in
| the model related to your domain of interest.
| notahacker wrote:
| I think the "bad example" is actually the good one,
| because it's a reminder that actually you're not getting
| business advice from someone with Warren Buffet or Jerome
| Powell's understanding of the economy, you're getting
| text generated by analysing patterns in other not-
| necessarily-applicable text. If you start forcing it in
| very specific directions you start getting text that
| summarises the commentary in the corpus, but most of that
| commentary doesn't come from Warren Buffet or Jerome
| Powell and isn't applicable to the future you're asking
| it about...
| sva_ wrote:
| _> Interviewer: Are you in favor of a carbon tax?
|
| > Mr. Powell: I don't want to get into the details of taxes.
|
| > Interviewer: Are you in favor of a cap and trade system?
|
| > Mr. Powell: I don't want to get into the details of a cap
| and trade system.
|
| > Interviewer: How do you think a global pandemic would
| impact the financial system?
|
| > Mr. Powell: A pandemic will have impact on the economy. It
| will have an impact on the financial system.
|
| > Interviewer: What would the impact be?
|
| > Mr. Powell: If a major pandemic occurred, the economy would
| be affected, and the financial system would be affected._
|
| Maybe I'm a bit harsh on GPT-3, but I'm not nearly as
| fascinated by this kind of output as the author.
| teaearlgraycold wrote:
| It does pretty well at transforming text into a person's
| style of talking. So you could have it re-write any
| sentence to sound like a Trump tweet.
| renewiltord wrote:
| I, too, that sounded like Eliza. Anyway, it looks like
| that's a small excerpt from the conversation.
| benatkin wrote:
| It looks like the dialogue is only on the human end. The
| chatbot is treating each question as the first. I think
| it sounds a lot like Biden. I prefer that to Trump, but
| don't like either sort of conversation!
| CamperBob2 wrote:
| GPT-3 and similar ML/AI projects may have many interesting and
| valuable commercial applications, not all of which are readily
| apparent at this stage of the game. For instance, it could be
| used to insert advertisements for herbal Viagra at
| https://www.geth3r3a1N0W.com into otherwise-apropros comments
| on message boards, preferably near the end once it's too late
| to stop reading.
|
| Life online is about to become very annoying.
| hubraumhugo wrote:
| At https://reviewr.ai we're using GPT-3 to summarize product
| reviews into simple bullet-point lists. Here's an example with
| backpack reviews: https://baqpa.com
| staticautomatic wrote:
| Did you test it against extractive summarizers?
| hubraumhugo wrote:
| We experimented with BERT summarization, but the results
| weren't too good. Do you have any resources or experiences
| in this area?
| moffkalast wrote:
| That sounds like BERT alright.
| cma wrote:
| How do you avoid libel?
| kingcharles wrote:
| Are you confusing libel with something else? Can you
| extrapolate what you mean here? Are you saying that they
| will be liable for libel (!) if they publish a negative
| summary of a product?
| cma wrote:
| If they mischaracterize a positive review into a negative
| summary based on factual mistakes they know the system
| makes at a high rate, I would think they would be liable
| for libel right?
| teaearlgraycold wrote:
| I work for a company that re-sells GPT-3 to small business
| owners. We help them generate product descriptions in bulk,
| Google ads, Facebook ads, Instagram captions, etc.
| crubier wrote:
| Have you heard of GitHub copilot ? It's based on GPT3 and I can
| tell you one thing: it does not generate meaningless prose (90%
| of the time)
| inglor wrote:
| This - it is tremendously valuable to me and I use it all the
| time at work.
| skybrian wrote:
| What do you use it for?
| inglor wrote:
| Coding, I actually had to forbid it today in a course I
| teach because it solves all the exercises :) (given unit
| tests with titles students needed to fill those tests in)
| singlow wrote:
| Isn't that just because others have stored solutions to
| these problems in GitHub?
| iamcurious wrote:
| That is my question too. Is it a fancier autocomplete? Or
| does it reason about code?
| PeterisP wrote:
| In some sense you could think of as a fancy autocomplete
| which uses not only code but also comments as input,
| looks up previous solutions for the same problem but
| (mostly) appropriately replaces the variable names to
| those that you are using.
| robbedpeter wrote:
| It reasons over the semantic network between tokens, in a
| feedforward inference pass over the 2k(ish) words or
| tokens of the prompt. Sometimes that reasoning is
| superficial and amounts to probabilistic linear
| relationships, but it can go deeply abstract depending on
| training material, runtime/inference parameters, and
| context of the prompt.
| inglor wrote:
| Probably, also I'm sure 99%+ of the code I author isn't
| groundbreaking and someone did it before.
| worldsayshi wrote:
| But what about the potential for intellectual property
| problems?
| trothamel wrote:
| That's beside the point, which is that the output copilot
| produces is useful.
| worldsayshi wrote:
| I don't see how that's besides the point. How can it be
| that useful if the output is a such legal mystery?
|
| I'd love to use it but not when there's such a risk of
| compromising the code base.
| amelius wrote:
| How many % of the time does it produce code that compiles?
| crubier wrote:
| In my experience 95% of the time. And 80% of the time it
| output codes which is better than I would have done myself
| in a first approach (thinks of corner cases, adds
| meaningful comments etc.). It's impressive.
| bidirectional wrote:
| From my anecdotal experience, the vast majority of the time
| (90+%).
| amelius wrote:
| Interesting. Is there any constraint built into the model
| that makes this possible? E.g. grammar, or semantics of
| the language? Or is it all based on deep learning only?
| crubier wrote:
| Deep learning only I believe. But real good one
| emteycz wrote:
| The overwhelming majority. Whatever used to take me an hour
| or two is now a 10-minute task.
| ilteris wrote:
| I am so confused. Is there a tutorial explaining how you
| are using in the IDE whatever it is. I use vscode curious
| if it can be applied. Thanks
| crubier wrote:
| It works very well with VSCode. It has an integration. It
| shows differently than normal autocomplete, it shows just
| like gmail autocomplete (grayed out text sugggestion, and
| press tab to actually autocomplete). Sometimes the
| suggestion is just a couple tokens long, sometimes it's
| an entire page of correct code.
|
| Nice trick: write a comment describing quickly what your
| code will do ("// order an item on click") and enjoy the
| complete suggested implementation !
|
| Other nice trick: write the code yourself, and then just
| before your code, start a comment saying "// this code"
| and let copilot finishe the sentence with a judgement
| about your code like "// this code does not work in case
| x is negative". Pretty fun !
| icelancer wrote:
| Interesting second use case; I use comments like this
| already as typical practice and I agree Copilot fills in
| the gaps quite well - never thought to do it in
| reverse... will give that a shot today.
| emteycz wrote:
| I also like to do synthesis from example code (@example
| doccomment) and synthesis from tests.
| icelancer wrote:
| I was exceptionally skeptical about it, but it's been very
| useful for me and I'm only using it for minor tasks, like
| automatically writing loops to pull out data from arrays,
| merge them, sort information, make cURL calls and process
| data, etc.
|
| Simply leading the horse to water is enough in something
| like PHP:
|
| // instantiate cURL event from API URL, POST vars to it
| using key as variable name, store output in JSON array and
| pretty print to screen
|
| Usually results in code that is 95-100% of the way done.
| nradov wrote:
| The fact that GPT3 works at all for coding indicates that our
| programming languages are too low level and force a lot of
| redundancy (low entropy). From a programmer productivity
| optimization perspective it should be impossible to reliably
| predict the next statement. Of course there might be trade
| offs. Some of that redundancy might be helping maintenance
| programmers to understand the code.
| hans1729 wrote:
| >From a programmer productivity optimization perspective it
| should be impossible to reliably predict the next statement
|
| Why? 99.9% of programming being done is composition of
| trivial logical propositions, in some semantic context. The
| things we implement are trivial, unless you're thinking
| about symbolic proofs etc
| tshaddox wrote:
| I think that's precisely the problem the parent commenter
| is describing.
| alephaleph wrote:
| That would only follow if we were trying to optimize code
| for brevity, and I have no clue why that would be your top
| priority.
| mpoteat wrote:
| I have indeed seen codebases where it seems like the
| programmer was being charged per source code byte. Full
| of single letter methods and such - it takes a large
| confusion of ideas to motivate such a philosophy.
| nradov wrote:
| Not at all. Brevity (or verbosity) is largely orthogonal
| to level of entropy or redundancy. In principle it ought
| to be possible to code at a higher level of abstraction
| while still using understandable names and control flow
| constructs.
| mpoteat wrote:
| Indeed, in the limit of maximal abstraction, i.e. semantic
| compression, code becomes unreadable by humans in practice.
| We can see this in code golf competitions.
| dharmaturtle wrote:
| Let me rephrase:
|
| > The fact that GPT3 works at all for English indicates
| that English is too low level and forces a lot of
| redundancy (low entropy).
|
| I don't think the goal is to compress information/language
| and maximize "surprise".
| Traster wrote:
| I think this is kind of true, but also kind of not true.
| Programming, like all writing, is the physical
| manifestation of a much more complex mental process. By the
| time that I _know_ what I want to write, the hardwork is
| done. In that way, you can think of co-pilot as a way of
| increasing the WPM of an average coder. But the WPM isn 't
| the bit that matters. In fact almost the onlt thing that
| matters are hte bits you won't predict.
| pharmakom wrote:
| Code is easier to write than read and maintain, so how useful
| is something that generates pages of 90% correct code?
| ALittleLight wrote:
| It's not useful if you use it to auto complete pages of
| code. It is useful to see it propose lines, read, and
| accept its proposals. Sometimes it just saves you a second
| of typing. Sometimes it makes a suggestion that causes you
| to update what you wanted to do. Sometimes it proposes
| useless stuff. On the whole, I really like it and think
| it's a boon to productivity.
| ghoomketu wrote:
| Yes I'm used it now but first time it started doing its
| thing, I wanted to stop and clap for how jaw dropping and
| amazing this technology is.
|
| I was a Jetbrains fan but this thing takes productivity to a
| whole new level. I really don't think I can go back to my
| normal programming without it anymore.
| kuschku wrote:
| Luckily, there's a jetbrains addon for it.
| inglor wrote:
| Someone at work showed me copilot works on WebStorm today
| (I also use VSCode).
| supperburg wrote:
| That's like if an alien took Mozart as a specimen and then
| disregarded the human race because this human, while making
| interesting sounds, does nothing of value. You have to look at
| the bigger picture.
| lysecret wrote:
| Hey for a long time i was also very sceptical. However i can
| refer you to this paper to a really cool applciaiton.
| https://www.youtube.com/watch?v=kP-dXK9JEhY. They baseically
| use clever GPT-3 prompting to create a dataset, you then train
| another model on. Besides, you can prompt these models to get
| (depending on the usecase) really good few shot performance.
| And finally, github copilot is another pretty neat application.
| micro_cam wrote:
| Actually using this class (larger transformer based language
| models) of models to generate text is to me the least
| interesting use case.
|
| They can also all be adapted and fine tuned for other tasks in
| content classification, search, discovery etc. Think facnial
| recognition for topics. Want to mine a whole social network for
| anywhere people are talking about _______ even indirectly with
| very low false negative rate? You want to fine tune a
| transformer model.
|
| Bert tends to get used for this more because it is freely
| available, established and not too expensive to fine tune but i
| suspect this is what microsoft licensing gpt-3 is all about.
| warning26 wrote:
| GPT-3 is fairly effective at summarization, so that's one
| potential business use case:
|
| https://sdtimes.com/monitor/using-gpt-3-for-root-cause-incid...
| Tijdreiziger wrote:
| https://replika.ai/
| amelius wrote:
| I hope that one day it will allow me to write down my thoughts
| in bullet-list form, and it will then produce beautiful prose
| from it.
|
| Of course this will be another blow for journalists, who rely
| on this skill for their income.
| DeathArrow wrote:
| I played with GPT-3 giving it long news stories. It actually
| replied with more meaningful titles than the journalists
| themselves used.
| rm_-rf_slash wrote:
| Perhaps GPT-3 was optimizing to deliver information while
| news sites these days optimize titles to get clicks.
| ailef wrote:
| You can prompt GPT-3 in ways that make it perform various tasks
| such as text classification, information extraction, etc...
| Basically you can force that "meaningless prose" into answers
| to your questions.
|
| You can use this instead of having to train a custom model for
| every specific task.
| DeathArrow wrote:
| Chat bots are an usage. I think you might use it for customer
| support.
|
| One example of GPT-3 powered chat bot:
| https://www.quickchat.ai/emerson
| jszymborski wrote:
| While the generation is fun and even suitable for some use
| cases, I'm particularly interested in its ability to take _in_
| language and use it for downstream tasks.
|
| A good example is DALL-E[0]. Now, what's interesting to me is
| the emerging idea of "prompt engineering" where once you spend
| long enough with a model, you're able to ask it for some pretty
| specific results.
|
| This gives us a foothold in creating interfaces whereby you can
| query things using natural language. It's not going to replace
| things like SQL tomorrow (or maybe ever?) but it certainly is
| promising.
|
| [0] https://openai.com/blog/dall-e/
| 13415 wrote:
| Automatic generation of positive fake customer reviews on
| Amazon, landing pages about topics that redirect to attack and
| ad sites, fake "journalism" with auto-generated articles mixed
| with genuine press releases and viral marketing content,
| generating fake user profiles and automated karma farming on
| social media sites, etc. etc.
| phone8675309 wrote:
| > fake "journalism" with auto-generated articles mixed with
| genuine press releases and viral marketing content
|
| How would you tell the difference from the real thing these
| days?
| DeathArrow wrote:
| The state of the journalism is so poor, I'd rather take some
| AI generated articles instead.
| akelly wrote:
| https://copy.ai/
| teaearlgraycold wrote:
| Hey, I work there! To be honest it's still very much a
| prototype. We have big plans for the next few months.
| mark_l_watson wrote:
| You can try it yourself - apply for a free API license from
| OpenAI. If you like to use Common Lisp or Clojure then I have
| examples in two of my books (you can download for free by
| setting the price to zero): https://leanpub.com/u/markwatson
| pyb wrote:
| I know of some credible developers who were struggling to get
| access, so YMMV
| mark_l_watson wrote:
| It took me over a month, so put in your request. Worth the
| effort!
| [deleted]
| moffkalast wrote:
| I put in a request months ago, I think they're not approving
| people anymore.
| mark_l_watson wrote:
| You might try signing up directly for a paid non/free
| account, if that is possible to do. I was using a free
| account, then switched to paying them. Individual API calls
| are very inexpensive.
| warning26 wrote:
| Neat to see more models getting closer, thought it appears only
| one so far has exceeded GPT-3's 175B parameters.
|
| That said, what I'm really curious is how those other models
| stack up against GPT-3 in terms of performance -- does anyone
| know of any comparisons?
| sillysaurusx wrote:
| I'm surprised that no one has answered for three hours!
|
| The answer is at https://github.com/kingoflolz/mesh-
| transformer-jax
|
| It has detailed comparisons and a full breakdown of the
| performance, courtesy of Eleuther.
| 6gvONxR4sf7o wrote:
| I was so frustrated when that was first announced because it
| didn't include those metrics, and everyone ate it up like the
| models were equivalent.
| 6gvONxR4sf7o wrote:
| Whenever I've seen language modeling metrics, GPT-3's largest
| model has been at the top. If you see a writeup that doesn't
| include accuracy-type metrics, you're reading a sales pitch,
| not an honest comparison.
| machiaweliczny wrote:
| There's Wu Dao 2.0 and Google has 2 models with 1T+ params.
| atty wrote:
| For clarity, i believe these are all mixture of expert
| models, where each input only sparsely activates some subset
| subset of the full model. This is why they were able to make
| such a big jump over the "dense" GPT3. Not really an apples-
| to-apples comparison.
| pyb wrote:
| +1, does the new generation match or exceed GPT-3 in terms of
| relevance ? Is there a way for a non-AI-researcher to
| understand how the benchmarks measure this ? Bigger does not
| mean better.
| GhettoComputers wrote:
| >However, the ability of people to build upon GPT-3 was hampered
| by one major factor: it was not publicly released. Instead,
| OpenAI opted to commercialize it and only provide access to it
| via a paid API. This made sense given OpenAI's for profit nature,
| but went against the common practice of AI researchers releasing
| AI models for others to build upon. So, since last year multiple
| organizations have worked towards creating their own version of
| GPT-3, and as I'll go over in this article at this point roughly
| half a dozen such gigantic GPT-3 esque models have been
| developed.
|
| Seems like aside from Eleuther.ai you can't use the models freely
| either, correct me if I'm wrong.
| andreyk wrote:
| I believe you are correct, at least for GPT-3 scaled things.
| Hopefully that'll change with time, though.
| rg111 wrote:
| The future is not as dark as it seems because of the rat race of
| megacorps.
|
| You can use reduced versions of language models with extremely
| good results.
|
| I was involved in training the first-ever GPT2 for Bengali
| language, but with 117 million parameters.
|
| It took a month's effort (training + writing code + setup) and
| about $6k in TPU cost, but Google Cloud covered it.
|
| Anyway, it is surprisingly good. We fine-tuned the model for
| several downstream tasks and we were shocked when we saw the
| quality of generated text.
|
| I fine-tuned this model to write Bengali poems with a dataset of
| just about 2k poems and ran the training for 20 minutes in GPU
| instance of Colab Pro.
|
| I was really blown away by the quality.
|
| The main training was done in JAX, and it is much faster and
| seamless than PyTorch XLA, much _better_ than TensorFlow in every
| way.
|
| So, my point is, although everyone is talking hundreds of
| billions of parameters and millions in training cost, you can
| still derive practical value from language models, and that too,
| at a low cost.
| amelius wrote:
| > The future is not as dark as it seems because of the rat race
| of megacorps.
|
| Just wait until NVidia comes with a "Neural AppStore" and
| corresponding restrictions. Then wait until the other GPU
| manufacturers follow suit.
| rg111 wrote:
| Much of the work done is fully open source and are liberally
| licensed.
|
| DeepMind and OpenAI have a bad rep in this regard.
|
| But a lot is available for free (as in beer _and_ speech).
|
| And most of the research papers are released in arXiv. It's
| very refreshing.
|
| The bottleneck is not the knowledge or code, but the compute.
| People are fighting this in innovative ways.
|
| I have been an inactive part of Neuropark that first demoed
| collaborative training. A bunch of folks (some of them close
| to laypeople) ran their free Colab instances and trained a
| huge model. You can even utilize a swarm of GT1030s or
| something like that.
|
| Also, if you have shown signs of success, you are very likely
| to have people willing to sponsor your compute needs, case in
| point- Eluether AI.
|
| The situation is far from ideal with this megacorps rat race
| [0], and NLP research being more and more inaccessible, but
| it is not completely dark.
|
| [0]: I, along with many respected figures tend to think that
| this scaling up stuff approach is not even _useful_. We can
| write good prose with GPT-3 nowadays, that are, for all
| intents and purposes, indistinguishable from text written by
| humans. But we are far, far away from true _understanding_.
| These models don 't really _understand_ anything and are not
| even "AI", so to speak.
|
| The Transformer architecture, the backbone of all these
| approaches- is too brute-force-y for my taste to be
| considered something that can mimic or, further- _be_
| intelligent.
| cmrajan wrote:
| Good to know. We're trying to attempt something similar[1] but
| for Tamil. I'm also surprised how well the OSS language model &
| library AI4Bharat [2] performs for NLP tasks against SoTA
| systems. Is there a way to contact you? [1]
| https://vpt.ai/posts/about-us/ [2]
| https://ai4bharat.org/projects/
| rg111 wrote:
| Among a master's degree, a consultancy gig, personal research
| and study, and finding unis abroad- I am living a hectic
| life.
|
| I don't see how I can be of help.
|
| But I can talk. Leave me something through which I can reach
| you. And I will reach you within a week.
| xyproto wrote:
| I think companies should be banned from having "Open" in their
| names.
| evergrande wrote:
| OpenAI takes the Orwellian cake.
| Tenoke wrote:
| I hear a lot of low effort takes about OpenAI but how exactly
| is providing your service via a paid API the "Orwellian
| cake"? Is this really the most (or even at all) Orwellian
| practice for you?
| leoxv wrote:
| https://en.wikipedia.org/wiki/Doublespeak
| TulliusCicero wrote:
| I think it's more the contrast where they claim, via their
| name, to be open, but actually aren't.
|
| If their name was ProfitableAI, there'd probably be fewer
| complaints.
| c7DJTLrn wrote:
| "Open"AI but you can only use it how we want you to and no, you
| can't run it yourself.
| moffkalast wrote:
| The only thing open in OpenAI is your wallet.
| moffkalast wrote:
| > most recently NVIDIA and Microsoft teamed up to create the 530
| billion parameter model Megatron-Turing NLG
|
| Get it, cause it's a generative transformer? Hah
| DeathArrow wrote:
| People were blaming cryptocurrencies miners for the prices of
| GPUs, when in fact it was the AI researchers who bought all the
| GPUs. :D
|
| I wonder what if somebody designs an electronic currency rewarded
| as payment for general GPU computations instead of just computing
| hashes? You pay some $, to train your model and the miner gets
| some coins.
|
| Every one is happy, electricity is not wasted and the GPUs gets
| used for a reasonable purpose.
| nathias wrote:
| Yes, this is an old idea (which I really like) but it hasn't
| really taken off yet. GridCoin was one example, where you
| solved BOINC problems or RLC that's for more general
| computation.
| rewq4321 wrote:
| The problem is that, currently, large ML models need to be
| trained on clusters of tightly-connected GPUs/accelerators.
| So it's kinda useless having a bunch of GPUs spread all over
| the world with huge latency and low bandwidth between them.
| That may change though - there are people working on it:
| https://github.com/learning-at-home/hivemind
| Kiro wrote:
| It hasn't taken off because it doesn't work. PoW only works
| for things that are hard to calculate but easy to verify. Any
| meaningful result is equally hard to verify.
| snovv_crash wrote:
| It's easy to verify ML training - inference on a test set
| has lower error than it did before.
|
| Training NN ML is much slower than inference (1000x at
| least) because it has to calculate all of the gradients.
| petters wrote:
| > Any meaningful result is equally hard to verify.
|
| This is very much not true. A central class in complexity
| is NP whose problems are hard to answer but easy to verify
| if the answer is yes.
|
| E.g. is there a path visiting all nodes in this graph of
| length less than 243000? Hard to answer but easy to check
| any proposed answer.
| PeterisP wrote:
| The current way of training is efficient when compute is
| located in a single place and is colocated with large
| quantities of training data. Distributing small parts of
| computation to remote computers is theoretically possible (and
| an active direction of research) but currently not preferable
| nor widely used; you really need very high bandwidth between
| all the nodes to constantly synchronize the hundreds-of-
| gigabytes sized weights they're iterating on and the resulting
| gradients.
| bckr wrote:
| This may not be true in the future. There is some work being
| done on distributed neural net training. I can't recall the
| name of the technique at the moment, but a paper came out in
| the last year showing results comparable with backprop that
| only required local communication of information (whatever
| this technique's alternative to gradients is).
| evergrande wrote:
| First the electricity morality police came for crypto and I
| said nothing.
|
| Then they came for AI, video games, HN...
| Nextgrid wrote:
| My understanding is that proof-of-work is intentionally
| wasteful; the objective is to make 51% attacks (where a single
| entity controls at least 51% of the global hashrate) infeasible
| by attaching a cost to the mining process.
|
| Making the mining process produce useful output that can be
| resold nullifies the purpose as it means an attacker can now
| mine "for free" as a byproduct of doing general-purpose
| computations (as opposed to tying up dedicated hardware),
| lowering the barrier for a 51% attack dramatically.
| magikabula wrote:
| If everyone offer GPUs, is the same game. If I will buy more
| GPU I will get more money, so the average payment for a person
| with a single or a small bunch of GPU will be low.
|
| And second, the principles of electronic currency are different
| from gold/money. That's why crypto uses GPU ;)
| qwertox wrote:
| If I were to run GPT-3 on my 70000 browser bookmarks, what kind
| of insights could I get from that?
|
| Only by analyzing the page title (from the bookmark, not by re-
| fetching the url) and eventually also the domain name.
| supermatt wrote:
| GPT-3 is a text generator, so i doubt you would get anything of
| use. You cant even supply such a large input to GPT-3.
| teaearlgraycold wrote:
| GPT-3 is also a classifier and data-extractor.
|
| You could give it a couple dozen bookmarks with example
| classifications and then feed it a new bookmark and ask GPT-3
| what category the page belongs in. Repeat for the entire data
| set.
|
| For data extraction you could ask questions about the titles.
| Maybe have it list all machine-learning model names that
| appear in the set of bookmark titles.
| keithalewis wrote:
| print gpt.submit_request("Give me insights")
|
| >>> You are spending way to much time browsing.
| air7 wrote:
| So is there any one of them that I could play around with?
| sillysaurusx wrote:
| https://6b.eleuther.ai
| lazylion2 wrote:
| AI21 labs 178B parameter model
|
| https://studio.ai21.com/
| ComodoHacker wrote:
| Are we heading to the (distant) future where to make progress in
| any field you _have_ to spend big $$$ to train a model?
| jowday wrote:
| That's not even distant - most of the self-supervised vision
| and language models at the bleeding edge of the field require
| huge compute budgets to train.
| iamcurious wrote:
| We are already there. Machine learning is the flavor of A.I.
| that keeps business barriers of entry high. If we had invested
| in symbolic A.I., things would be different. A similar thing
| happens with programming language flavors. PHP lowers barriers
| of entry so it is discredited by the incumbents.
| j45 wrote:
| Your point about incumbents not wanting it to be easier to
| create beginners in a language or technology is very
| understated.
|
| Excluding participation in having the time and resources
| available to overcome the initial inertia required to become
| productive is a form of opportunity and earning segregation.
|
| Despite having a background in your tech, there is little
| more if satisfying than people experiencing putting tech to
| work for them, rather than the other way around or being
| dependent on others.
| [deleted]
| lostdog wrote:
| The difference between ML and symbolic AI is that ML works
| and symbolic AI doesn't. At my job, dropping the
| computational load of our ML models is heavily invested in,
| and every success is celebrated. Everybody wants it to be
| easier and cheaper to train high quality models, but some
| things are still intrinsically hard.
| CodeGlitch wrote:
| > The difference between ML and symbolic AI is that ML
| works and symbolic AI doesn't.
|
| IBM managed to beat Garry Kasperov using symbolic AI did
| they not? So in what way does it not work?
| lostdog wrote:
| > IBM managed to beat Garry Kasperov using symbolic AI
| did they not? So in what way does it not work?
|
| Ok, I should be clearer. ML approaches are way way better
| than symbolic approaches. Given almost any problem, it is
| much much easier to make an ML approach work than any
| symbolic approach.
|
| Yes, chess was first solved symbolically, but it's since
| been solved by ML better and more easily, to the point
| that stockfish now incorporates neural nets [1]. ML has
| also given extremely high levels of performance on Go,
| Starcraft, DoTA, and on protein folding, image
| recognition, text processing, speech recognition, and
| pretty much everything else.
|
| I would challenge you to name any (non-simple) problem
| where traditional AI methods are still state of the art.
|
| [1] https://stockfishchess.org/blog/2020/stockfish-12/
| goodside wrote:
| "I would challenge you to name any (non-simple) problem
| where traditional AI methods are still state of the art."
|
| Lossless file compression. As far as I know none of the
| algorithms in widespread use are neural-based, despite
| the fact that compression is clearly a rich statistical
| modeling problem, at least on par with GPT-3-style
| language understanding in difficulty. There are published
| attempts to solve the problem with neural networks, but
| they simply don't work well enough to date. Modern
| solutions also still use old-fashioned AI ingredients
| like compiled dictionaries of common natural-language
| words -- any other domain where nat-lang dictionaries are
| useful has been conquered by neural solutions, e.g.
| spelling and grammar checkers.
| _game_of_life wrote:
| I'm far from an expert in this subject but doesn't this
| ranking of large text compression algorithms with NNCP
| coming first suggest that neural-nets are pretty great at
| compression?
|
| http://mattmahoney.net/dc/text.html
|
| https://bellard.org/nncp/
|
| I don't see examples of high performing symbolic AI based
| compression algorithms anywhere, but again I am very
| ignorant, do you have examples?
| CodeGlitch wrote:
| Thanks for clearing that up, I do agree that ML-based AI
| has surpassed symbolic approaches in every field.
| adgjlsfhk1 wrote:
| they didn't. that was just alpha beta search with some
| custom hardware to speed it up. also at this point, both
| of the strongest chess ai (stockfish and lc0) are using
| neutral networks and are roughly 1000 elo above where
| deep blue was (and most of that is from software, not
| hardware)
| shmageggy wrote:
| > _just alpha beta search_
|
| I will cling to these goal posts every time. Search was
| and still is AI, unless you think Russell and Norvig
| should have named the field's foundational textbook
| something other than "Artificial Intelligence: A Modern
| Approach"
| PeterisP wrote:
| 1. There's a world of problems (such as "perception-
| related" e.g. vision and NLP) which we tried to solve for
| decades with symbolic AI and got worse results than what
| nowadays first-year students can do as a homework with
| ML;
|
| 2. For your example of chess, for some time now ML
| engines are pretty much untouchable by engines based on
| pre-ML methods.
| CodeGlitch wrote:
| Yes I agree with all your points - I was however
| responding to the point being made that symbolic AI
| "wasn't useful"...which in the past it was. Perhaps in
| the future some new method or breakthrough will mean it
| becomes useful once again?
| panabee wrote:
| this is a great point.
|
| much like deep learning was invented decades ago but
| didn't become feasible until technology caught up, could
| the same be true for symbolic AI?
|
| i.e., is the ceiling for symbolic AI technical and
| transient or fundamental and permanent?
| PeterisP wrote:
| My feeling is that even in our own thinking symbols are
| used mostly to communicate our (inherently non-symbolic)
| thoughts to others or record them; i.e. they are a
| solution to a bandwidth-limited transfer of information
| while the actual thinking process happens with concepts
| that have more similarity to collections of vague
| parameters and associations which can be compressed to
| symbols only imperfectly with losses.
|
| From that perspective, I don't see how symbolic AI would
| be competitive but there would be a role for symbolic AI
| in designing systems that can be comprehensible for
| humans, but perhaps just as a distillation/compression
| output from a non-symbolic system. I.e. have a strong
| "black box" ML system that learns to solve a task, and
| then have it construct a symbolic system that solves that
| task worse, but in an explainable way.
| iamcurious wrote:
| >The difference between ML and symbolic AI is that ML works
| and symbolic AI doesn't.
|
| There was a point when it was the other way around, this is
| not static but the result of resources being poured. The
| data heavy, computational heavy, black box style of ML
| gives power to large business over small business. So it's
| seen as a safer bet than symbolic A.I. This in turn makes
| it work better, which makes it an even safer bet. Notice
| that startups dream of being big business so they still
| pick ML.
|
| Also notice that in some domains ML is still behind
| symbolic A.I., for instance a lot of robotics and
| autonomous vehicles.
| DonHopkins wrote:
| PHP wasn't discredited by the incumbents. It was discredited
| by its creator.
|
| "I'm not a real programmer. I throw together things until it
| works then I move on. The real programmers will say Yeah it
| works but you're leaking memory everywhere. Perhaps we should
| fix that. I'll just restart Apache every 10 requests."
| -Rasmus Lerdorf
|
| "I was really, really bad at writing parsers. I still am
| really bad at writing parsers." -Rasmus Lerdorf
|
| "We have things like protected properties. We have abstract
| methods. We have all this stuff that your computer science
| teacher told you you should be using. I don't care about this
| crap at all." -Rasmus Lerdorf
| iamcurious wrote:
| To most programmers that doesn't discredit PHP at all. He
| cares about a working product, much like 90% of
| programmers, who don't have the privilige to worry about
| theory. They just need an ecommerce, or blog or whatever,
| running asap. To use a pg's analogy, they are there to
| paint not to worry about painting chemistry.
|
| The incumbents do discredit PHP though. For instance,
| facebook was built on PHP, and still runs on it. They used
| the language of personal home pages to give every person on
| the planet a personal home page. Nevertheless, once they
| suceeded they forked PHP with a new name and isolated devs
| culturally.
| YetAnotherNick wrote:
| Training model is getting cheaper. GPT-3 is one of the very few
| countable examples where it is so expensive. In the end it all
| depends on the size of data you have that you could scale up
| the model without overfitting. And the internet text data is
| one of the only data size is this big.
| minimaxir wrote:
| Fortunately, costs for training superlarge models are coming
| down rapidly thanks to TPUs (which was the approach used to
| train GPT-J 6B) and DeepSpeed improvements.
| Nextgrid wrote:
| Are there any TPUs that can be purchased off-the-shelf and
| then owned, like you can do with a CPU or GPU? Or are you
| just limited to paying rent to cloud providers and ultimately
| being at their mercy when it comes to pricing, ToS, etc?
| 6gvONxR4sf7o wrote:
| No, but you probably aren't going to buy an A100 either, so
| it's a moot point.
___________________________________________________________________
(page generated 2021-11-07 23:00 UTC)