[HN Gopher] Copy is all you need
___________________________________________________________________
Copy is all you need
Author : mottiden
Score : 127 points
Date : 2023-07-17 13:51 UTC (9 hours ago)
(HTM) web link (arxiv.org)
(TXT) w3m dump (arxiv.org)
| woeirua wrote:
| The big advantage here would be the ability to attribute entire
| blocks of text back to a specific source and cross domains just
| by building a database of embeddings. The downside is that these
| networks are probably not as creative as they're limited to only
| data that's available. It might work best to use something like
| this as an expert system for a GPT like agent to refer to when
| needed.
| soliton4 wrote:
| this made me think of a fun activity. ask chatgpt to come up with
| a new word and then google that word. sometimes the word exists
| in the context of a scify show or a plant. sometimes gpt just
| added a "se" or "us" to existing words. sometimes it changes a Z
| to a C but it never actually came up with a new word
| jsight wrote:
| I asked it this:
|
| "Set your model temperature as high as possible an generate a
| completely new and random word"
|
| It acted acted like it understood and generated the word
| Blazivox. I don't see it on Google at least.
| fsmv wrote:
| It cannot change the temperature intrinsically. Only OpenAI
| controls that in their API.
| vanjajaja1 wrote:
| but it does know the concept, so it can simulate it
| jojobaskins wrote:
| BlazeVox is a publishing company. I guess its still one
| character away but close enough that it could have just
| randomly swapped out the character.
| MAXPOOL wrote:
| What about LLM reasoning ability?
|
| Faith and Fate: Limits of Transformers on Compositionality
| https://arxiv.org/abs/2305.18654
|
| Transformers solve compositional reasoning tasks by reducing
| multi-step compositional reasoning into linearized subgraph
| matching without problem-solving skills. They can solve problems
| when they have reasoning graphs in the memory.
| BSEdlMMldESB wrote:
| I think this boils down to the capacity to match together
| parenthesis in a logical-syntax way
|
| however, the "parenthesis" can be any symbol. even grammatical
| clauses are one sort of "parenthesis" in the way I'm thinking
| about them
| abc_lisper wrote:
| Funny, I write Clojure for my day job and fun, so I have
| tried to use ChatGPT to generate code. If anything, it sucks
| at paren matching. It reminded me of stable diffusion's "six
| finger problem".
| BSEdlMMldESB wrote:
| as I said, it's not exactly "parenthesis" with the
| strictness that real programing needs.
|
| in fact, my whole idea has got me on a deep dive into the
| nature of the decimal point (up to which extent is the
| decimal point representation of numbers and instance of a
| "fixed point"? I don't know! I cannot understand a fix
| point just yet; and for me to say I get decimal notation
| actually means I understand something about p-adic
| representation; which I'm still working on figuring out)
|
| I thought these models got more 'logical' after training
| with computer code
| esjeon wrote:
| LLMs do logic by mimicking logical structures on the text level
| (and that's why they often need be ordered to do step-by-step
| for correct answers), so this one may also have the same
| ability as long as memories are properly utilized.
| rapatel0 wrote:
| Surprised no one has mentioned the obvious issue: plagiarism
|
| (Not sure if the authors have indicated any method for
| attribution of the original data)
| opnac wrote:
| I wish we could stop with the "X is all you need" papers! The
| first one was unintuitive and so are the rest.
| hardware2win wrote:
| "is all you need" is considered harmful just like "considered
| harmful" is considered harmful by HNers?
| MR4D wrote:
| Given that redundancy is considered harmful, you probably
| want to create a ConsideredHarmful class and then a
| ConsideredHarmfulFactory to make for a more enterprise-ready
| structure.
|
| :)
| hardware2win wrote:
| Thats so 2015, we need to move it to ConsideredHarmful
| Microservice
| furyofantares wrote:
| I agree. X is all you need considered harmful
| butterisgood wrote:
| X is all you need considered harmful is all you need!
| mottiden wrote:
| I agree. The paper is really interesting, but the title not so
| much :)
| jillesvangurp wrote:
| A bit click baity at least. And without opening it you have
| no chance to understand what this is about. I know HN has a
| policy against editorializing but in this case, a brief
| summary would have been helpful.
| mottiden wrote:
| The paper introduces a new method for text generation,
| named Copy-Over-Generate (COG), which differs from
| traditional approaches that generate words from a fixed
| vocabulary. Instead, COG progressively copies phrases from
| a massive text collection, aiming to generate coherent text
| continuations through multiple rounds of phrase retrieval.
|
| COG stands on the line of retrieval-augmented text
| generation research but takes a radical step forward.
| Unlike previous work that combines retrieval and
| generation, in COG, retrieval is generation.
|
| COG shares some ideas with previous work such as replacing
| the fixed vocabulary with a nonparametric phrase table.
|
| The paper presents experimental results showing the
| advantages of COG over strong baselines in three
| experimental settings: standard language modeling (using
| the WikiText-103 dataset), domain adaptation (using the
| Law-MT dataset), and an enlarged phrase index (using the
| En-Wiki dataset).
|
| Despite the promising results, the authors acknowledge that
| there are some flaws in the COG method. For example, COG
| may copy a phrase that is incoherent with the previously
| copied phrase, or it may only copy a part of a complete
| phrase, leading to inaccurate generation results.
| CamperBob2 wrote:
| Before long, if these NN refinements continue at their
| current pace, it's going to become impossible to tell
| synthetic HN posts from organic ones. Going to get weird.
| flangola7 wrote:
| This is already the case with LLaMA.
| lolinder wrote:
| I agree that the copycats are wearing thin, but the original
| paper's title seems fine to me. It's an accurate description of
| the breakthrough they made. The first few sentences of the
| _Attention is All You Need_ abstract explain it pretty well:
|
| > The dominant sequence transduction models are based on
| complex recurrent or convolutional neural networks that include
| an encoder and a decoder. The best performing models also
| connect the encoder and decoder through an attention mechanism.
| We propose a new simple network architecture, the Transformer,
| based solely on attention mechanisms, dispensing with
| recurrence and convolutions entirely.
|
| https://proceedings.neurips.cc/paper/2017/file/3f5ee243547de...
| macleginn wrote:
| The first sentence seems to imply that the main thing they
| want to do away with is the encoder-decoder arch, which they
| actually kept. (But BERT and GPT later did manage to simplify
| it.)
| jsight wrote:
| It took a lot of my attention to even begin to understand
| that, but in the end, I agree with you.
| KnobbleMcKnees wrote:
| But I'm so close to finishing "goto is all you need"!
| nonameiguess wrote:
| Didn't somebody prove the mov instruction is Turing-complete
| all by itself? I believe some code obfuscators actually take
| advantage of this.
| lotsofspots wrote:
| Beat me to it, I was going to go for "X Is All You Need
| Papers Considered Harmful"
| ot wrote:
| At least in this case it's self-referential (not sure if
| intended)
| rubslopes wrote:
| There's this guy in Twitter fighting against titles like that:
| https://twitter.com/SayWhatYouFound
| asalahli wrote:
| Huh, did twitter stop letting unauthenticated users read
| tweets? I can't get past login form.
| lolinder wrote:
| You're one of today's lucky 10k:
| https://news.ycombinator.com/item?id=36540957
| VHRanger wrote:
| This resonates with the current AI skeptic view that language
| models are a supercharged search engine on the pile of text
| they're trained on.
|
| Also the fact that evaluating language models is difficult, and
| we tend to end up with models that game the evaluation
| benchmarks.
| wongarsu wrote:
| Good information retrieval is a problem we are trying to solve
| for thousands of years, so even if that's all LLMs are doing
| then that's still a great achievement.
|
| Of course a more explicit approach like this paper is a really
| good step in that direction by making it easier to trace
| information provenance. It might still be nontrivial to answer
| why the model selected this specific piece of information, and
| why it was composed in this specific way, but it seems trivial
| to say where the model got the information from. Which is
| really all we demand from humans too.
| croes wrote:
| Was the training data quality checked? If so then LLMs are
| search engines for catalogs like Yahoo once was and not a
| good search engine for SEO optimized click farms.
|
| Google search once was great too but then ads and SEO killed
| it.
| bob1029 wrote:
| > Good information retrieval is a problem we are trying to
| solve for thousands of years
|
| Quantum computers would have something to say about this,
| assuming they ever materialize.
| msoad wrote:
| Obvious immediate question is, is it as creative? There are a lot
| creativity left behind when you increase the token size (let's be
| real, it's just that). As an example creating a new word like
| "dickstracted"[1] would not ever happen in this model
|
| [1] https://www.urbandictionary.com/define.php?term=Dickstracted
| 3cats-in-a-coat wrote:
| Why wouldn't it. It suggest it copies text spans, it doesn't
| say how big.
| naillo wrote:
| Seems like a common pattern. State of the art models being well
| replaced by a information retrieval layer (top 10 results) fed
| into a much lighter model that does something with that plus the
| original input. Cool result!
| twic wrote:
| This is definitely my bet on where things are going. And not
| just this particular example - i believe we will identify many
| recurring submodules and patterns in neural networks that can
| be extracted into conventional code, leaving a lightweight
| neural glue layer orchestrating them. This should be more
| efficient, faster to train, more interpretable, and more
| reliable, so better for users. But less mysterious, so worse
| for VCs.
| redox99 wrote:
| I don't know. ChatGPT and Bing both dramatically deteriorate if
| you allow them to search the web.
|
| And systems that allow you to "talk" to a PDF via top results
| of vector search being added to the prompt are also pretty
| underwhelming.
| falcor84 wrote:
| Yeah, that actually sounds amazing to me. If we could limit the
| LLM to somehow only act as a "reasoning" rather than a
| "knowledge" layer, such that all the non-trivial domain
| knowledge has to come from the information retrieval layer, in
| a fully referenced way, that could potentially "solve" the
| hallucination problem, no?
|
| Even more than that, I wonder if we could then apply something
| like this to power some sort of "fact provenance" for the web
| as a whole, e.g. by populating Wikidata with referenced facts
| (preferably with extensive human QA).
| esjeon wrote:
| Yeah, and, on top of that, I think this can lead to smaller
| (and snappier) agent models, because we no longer have to
| encode every single piece of information into models. As we
| carve out more and more parameters and input data, AI
| development will get more accessible, and we'll get more
| novel applications. (I'm certainly dreaming here.)
| spacemanspiff01 wrote:
| [dead]
| awestroke wrote:
| First, hate the title
|
| Second, this approach seems equivalent to using larger tokens,
| which means the problems with using tokens instead of letters are
| just exacerbated
| xg15 wrote:
| I think the "... is all you need" title here is particularly
| misleading as the paper does in fact use a BERT model for
| generating the vectors.
|
| So if the implication was that no language model was needed at
| all and you can just do nearest neighbour on string similarity
| and patch results together, that implication was clearly wrong.
|
| I think what the paper does show though is that there are methods
| that can make language models topic-specific without fine-tuning
| and that yield competitive results even with older models.
| moffkalast wrote:
| Next thing you'll say the Beatles are being misleading with
| 'All you need is love' because people also need food and
| shelter.
| xg15 wrote:
| Eh, the "attention is all you need" paper was kinda arguing
| that. And this paper doesn't.
| Zacharias030 wrote:
| mmd! <3
| usgroup wrote:
| Yeah I thought the same -- it struck me at first blush as if it
| was some kind of super simple architecture that didn't use
| transformers, and then in the diagram i saw they used BERT to
| produce the embeddings!
| amluto wrote:
| This is interesting coming on the heels of the gzip-based
| inference paper. gzip is based on LZ77, and the LZ family of
| compressors generate and store (and cleverly encode) instructions
| to copy blocks of text they have seen before to their output.
| collinc777 wrote:
| Slight tangent:
|
| I once worked with a programmer who, the vast majority of time,
| would only input text into a text editor via copy and paste.
|
| Think anti-vim. His fingers were locked on mouse and crtl+c/v. It
| was incredible to watch and his programming speed was very
| impressive.
| michaelcampbell wrote:
| I (used to) work with a colleague who was just the opposite;
| she did (and still does) ONLY do copy/paste with the mouse. It
| is excruciating to watch when pairing or on a video meeting.
|
| I get people have different workflows, but not taking advantage
| of even the minimalist functionality of ones tools I think I
| will never understand.
| willsmith72 wrote:
| Please tell me more. Where was he copying from? What about
| formatting and refactors? Was his quality as impressive as his
| speed?
| jaredsohn wrote:
| programmers he subcontracted to.
|
| Also explains why he was so fast
| tylercrompton wrote:
| Stack Overflow, surely
| collinc777 wrote:
| Generally the repository he was working in, but really it was
| any application that he had open on his machine. He would
| remember where words, or portion of words that he needed
| were, go to them, and copy and paste what he needed.
|
| Just in case you're thinking this: He was not copying large
| portions of code from stack overflow or anything like that.
| He was line by line writing code, a few copy and pastes at a
| time. Often he times would copy and paste single characters
| to maintain his flow.
| postalrat wrote:
| Start with a file like: abcde...ABCD..1234.{}()*.... and go
| from there.
| ourmandave wrote:
| Voice dictation, Shirley.
| esafak wrote:
| Don't call me Shirley!
| high_priest wrote:
| OpenAI, surely.
| klysm wrote:
| Over the years I've come to realize the copy paste has probably
| been a net negative for me and I almost never do it anymore. If
| you are doing the copy-paste, then change a couple names to
| match a different pattern thing by hand, the subtle errors you
| can get by making a mistake can take forever to catch. In code
| review it always looks plausible unless the reviewer is _very_
| careful. Furthermore, it means you are duplicating code - which
| is sometimes totally fine - but forcing yourself to not copy-
| paste makes you consider what the abstraction would be and if
| it would be worth it not.
|
| In the case where you are copy-pasting out of code you don't
| really understand. Retyping it gives you time to understand and
| maybe catch existing bugs in the code you are copying.
| rubslopes wrote:
| A tangent of a tangent:
|
| My way of doing imperative coding for data science with Python
| is to write a price of code in Sublime Text, copy and paste to
| iTerm, run, and get back to the editor. But of course I mapped
| shift+Enter to do all of that for me. I much prefer this
| setting to Jupyter Notebooks.
| webnrrd2k wrote:
| I program with almost all of my script-ey languages, like
| python, in a similar way - I'm on linux, and I edit
| everything in Sublime, save it to a file and then run it in a
| separate terminal. It the command gets complex I'll create a
| little bash script file, make it execytable, and run that.
|
| I just alt-tab from editer to terminal, check output, etc,
| and back. That way I have a bunch of unix text-processing
| tools (grep, sed, etc...) always available. I'm too reliant
| on print debuging things as I go along, but it's a deeply
| ingrained habit.
| bombela wrote:
| The tool "entr" (and similar like "cargo watch") are so
| useful there. Save the file and it runs the command. For
| extra credit I mapped shift-enter to save the current file
| in my editor.
| klyrs wrote:
| Many years ago when I used windows, I had a virus once that
| killed my keyboard. It was pretty fun to work around that... I
| ended up using the symbol browser utility to copy individual
| letters and then right click to paste them places.
| 3cats-in-a-coat wrote:
| Wait, I need you to please elaborate on this. Where was he
| sourcing all code he pasted? Did he have a "snippet file" like
| a painter with a palette or something?
| collinc777 wrote:
| Typically in the code repository with other files that shared
| a similar pattern context that he was working with.
|
| Sometimes he would need a portion of a word and he would
| remember that it was in an email he had open, and he would
| alt+tab and grab the portion of the word from the email, then
| alt+tab back to the editor and paste the word portion in.
|
| He would go to extreme lengths to not have to move his hands
| to the home row on the keyboard.
| bombela wrote:
| I am a bit dyslexic and even though I am an avid vim user,
| I do often rely on copy paste to avoid silly typos that
| cost so much time for me to fix. Because I cannot even see
| the typo, even in the compiler output!
| [deleted]
| thanatropism wrote:
| > COG
|
| https://wiki.opencog.org/w/The_Open_Cognition_Project
| xianshou wrote:
| Behold, the _true_ stochastic parrot.
| twic wrote:
| Auto-dadaism: https://en.wikipedia.org/wiki/Cut-up_technique
| js8 wrote:
| I remember that around 2004, before convnets became popular,
| there was a paper on image texture style transfer using
| approximate nearest neighbors based on some neighborhood of each
| point. This technique seems similar but for text.
| Animats wrote:
| This approach can probably handle most of the queries search
| engines and Siri-type chatbots handle. The big GPT-type engines
| can be reserved for the hard problems. Something along those
| lines is needed to keep the cost of search down. There's an
| estimate that using a large language model for search is 10x more
| expensive than existing search engines. Yet few queries really
| need that much heavy machinery.
| Der_Einzige wrote:
| This has deep connections with my attempt to implement an
| effective queryable word-level grammatically correct extractive
| text summarizer (AKA: The way most people actually summarize
| documents) - https://github.com/Hellisotherpeople/CX_DB8
|
| I will try to implement this with the necessary changes to
| actually make this work properly, where instead of generating a
| new answer, it simply highlights the most likely text spans.
___________________________________________________________________
(page generated 2023-07-17 23:02 UTC)