[HN Gopher] Copy is all you need
       ___________________________________________________________________
        
       Copy is all you need
        
       Author : mottiden
       Score  : 127 points
       Date   : 2023-07-17 13:51 UTC (9 hours ago)
        
 (HTM) web link (arxiv.org)
 (TXT) w3m dump (arxiv.org)
        
       | woeirua wrote:
       | The big advantage here would be the ability to attribute entire
       | blocks of text back to a specific source and cross domains just
       | by building a database of embeddings. The downside is that these
       | networks are probably not as creative as they're limited to only
       | data that's available. It might work best to use something like
       | this as an expert system for a GPT like agent to refer to when
       | needed.
        
       | soliton4 wrote:
       | this made me think of a fun activity. ask chatgpt to come up with
       | a new word and then google that word. sometimes the word exists
       | in the context of a scify show or a plant. sometimes gpt just
       | added a "se" or "us" to existing words. sometimes it changes a Z
       | to a C but it never actually came up with a new word
        
         | jsight wrote:
         | I asked it this:
         | 
         | "Set your model temperature as high as possible an generate a
         | completely new and random word"
         | 
         | It acted acted like it understood and generated the word
         | Blazivox. I don't see it on Google at least.
        
           | fsmv wrote:
           | It cannot change the temperature intrinsically. Only OpenAI
           | controls that in their API.
        
             | vanjajaja1 wrote:
             | but it does know the concept, so it can simulate it
        
           | jojobaskins wrote:
           | BlazeVox is a publishing company. I guess its still one
           | character away but close enough that it could have just
           | randomly swapped out the character.
        
       | MAXPOOL wrote:
       | What about LLM reasoning ability?
       | 
       | Faith and Fate: Limits of Transformers on Compositionality
       | https://arxiv.org/abs/2305.18654
       | 
       | Transformers solve compositional reasoning tasks by reducing
       | multi-step compositional reasoning into linearized subgraph
       | matching without problem-solving skills. They can solve problems
       | when they have reasoning graphs in the memory.
        
         | BSEdlMMldESB wrote:
         | I think this boils down to the capacity to match together
         | parenthesis in a logical-syntax way
         | 
         | however, the "parenthesis" can be any symbol. even grammatical
         | clauses are one sort of "parenthesis" in the way I'm thinking
         | about them
        
           | abc_lisper wrote:
           | Funny, I write Clojure for my day job and fun, so I have
           | tried to use ChatGPT to generate code. If anything, it sucks
           | at paren matching. It reminded me of stable diffusion's "six
           | finger problem".
        
             | BSEdlMMldESB wrote:
             | as I said, it's not exactly "parenthesis" with the
             | strictness that real programing needs.
             | 
             | in fact, my whole idea has got me on a deep dive into the
             | nature of the decimal point (up to which extent is the
             | decimal point representation of numbers and instance of a
             | "fixed point"? I don't know! I cannot understand a fix
             | point just yet; and for me to say I get decimal notation
             | actually means I understand something about p-adic
             | representation; which I'm still working on figuring out)
             | 
             | I thought these models got more 'logical' after training
             | with computer code
        
         | esjeon wrote:
         | LLMs do logic by mimicking logical structures on the text level
         | (and that's why they often need be ordered to do step-by-step
         | for correct answers), so this one may also have the same
         | ability as long as memories are properly utilized.
        
       | rapatel0 wrote:
       | Surprised no one has mentioned the obvious issue: plagiarism
       | 
       | (Not sure if the authors have indicated any method for
       | attribution of the original data)
        
       | opnac wrote:
       | I wish we could stop with the "X is all you need" papers! The
       | first one was unintuitive and so are the rest.
        
         | hardware2win wrote:
         | "is all you need" is considered harmful just like "considered
         | harmful" is considered harmful by HNers?
        
           | MR4D wrote:
           | Given that redundancy is considered harmful, you probably
           | want to create a ConsideredHarmful class and then a
           | ConsideredHarmfulFactory to make for a more enterprise-ready
           | structure.
           | 
           | :)
        
             | hardware2win wrote:
             | Thats so 2015, we need to move it to ConsideredHarmful
             | Microservice
        
         | furyofantares wrote:
         | I agree. X is all you need considered harmful
        
           | butterisgood wrote:
           | X is all you need considered harmful is all you need!
        
         | mottiden wrote:
         | I agree. The paper is really interesting, but the title not so
         | much :)
        
           | jillesvangurp wrote:
           | A bit click baity at least. And without opening it you have
           | no chance to understand what this is about. I know HN has a
           | policy against editorializing but in this case, a brief
           | summary would have been helpful.
        
             | mottiden wrote:
             | The paper introduces a new method for text generation,
             | named Copy-Over-Generate (COG), which differs from
             | traditional approaches that generate words from a fixed
             | vocabulary. Instead, COG progressively copies phrases from
             | a massive text collection, aiming to generate coherent text
             | continuations through multiple rounds of phrase retrieval.
             | 
             | COG stands on the line of retrieval-augmented text
             | generation research but takes a radical step forward.
             | Unlike previous work that combines retrieval and
             | generation, in COG, retrieval is generation.
             | 
             | COG shares some ideas with previous work such as replacing
             | the fixed vocabulary with a nonparametric phrase table.
             | 
             | The paper presents experimental results showing the
             | advantages of COG over strong baselines in three
             | experimental settings: standard language modeling (using
             | the WikiText-103 dataset), domain adaptation (using the
             | Law-MT dataset), and an enlarged phrase index (using the
             | En-Wiki dataset).
             | 
             | Despite the promising results, the authors acknowledge that
             | there are some flaws in the COG method. For example, COG
             | may copy a phrase that is incoherent with the previously
             | copied phrase, or it may only copy a part of a complete
             | phrase, leading to inaccurate generation results.
        
               | CamperBob2 wrote:
               | Before long, if these NN refinements continue at their
               | current pace, it's going to become impossible to tell
               | synthetic HN posts from organic ones. Going to get weird.
        
               | flangola7 wrote:
               | This is already the case with LLaMA.
        
         | lolinder wrote:
         | I agree that the copycats are wearing thin, but the original
         | paper's title seems fine to me. It's an accurate description of
         | the breakthrough they made. The first few sentences of the
         | _Attention is All You Need_ abstract explain it pretty well:
         | 
         | > The dominant sequence transduction models are based on
         | complex recurrent or convolutional neural networks that include
         | an encoder and a decoder. The best performing models also
         | connect the encoder and decoder through an attention mechanism.
         | We propose a new simple network architecture, the Transformer,
         | based solely on attention mechanisms, dispensing with
         | recurrence and convolutions entirely.
         | 
         | https://proceedings.neurips.cc/paper/2017/file/3f5ee243547de...
        
           | macleginn wrote:
           | The first sentence seems to imply that the main thing they
           | want to do away with is the encoder-decoder arch, which they
           | actually kept. (But BERT and GPT later did manage to simplify
           | it.)
        
           | jsight wrote:
           | It took a lot of my attention to even begin to understand
           | that, but in the end, I agree with you.
        
         | KnobbleMcKnees wrote:
         | But I'm so close to finishing "goto is all you need"!
        
           | nonameiguess wrote:
           | Didn't somebody prove the mov instruction is Turing-complete
           | all by itself? I believe some code obfuscators actually take
           | advantage of this.
        
           | lotsofspots wrote:
           | Beat me to it, I was going to go for "X Is All You Need
           | Papers Considered Harmful"
        
         | ot wrote:
         | At least in this case it's self-referential (not sure if
         | intended)
        
         | rubslopes wrote:
         | There's this guy in Twitter fighting against titles like that:
         | https://twitter.com/SayWhatYouFound
        
           | asalahli wrote:
           | Huh, did twitter stop letting unauthenticated users read
           | tweets? I can't get past login form.
        
             | lolinder wrote:
             | You're one of today's lucky 10k:
             | https://news.ycombinator.com/item?id=36540957
        
       | VHRanger wrote:
       | This resonates with the current AI skeptic view that language
       | models are a supercharged search engine on the pile of text
       | they're trained on.
       | 
       | Also the fact that evaluating language models is difficult, and
       | we tend to end up with models that game the evaluation
       | benchmarks.
        
         | wongarsu wrote:
         | Good information retrieval is a problem we are trying to solve
         | for thousands of years, so even if that's all LLMs are doing
         | then that's still a great achievement.
         | 
         | Of course a more explicit approach like this paper is a really
         | good step in that direction by making it easier to trace
         | information provenance. It might still be nontrivial to answer
         | why the model selected this specific piece of information, and
         | why it was composed in this specific way, but it seems trivial
         | to say where the model got the information from. Which is
         | really all we demand from humans too.
        
           | croes wrote:
           | Was the training data quality checked? If so then LLMs are
           | search engines for catalogs like Yahoo once was and not a
           | good search engine for SEO optimized click farms.
           | 
           | Google search once was great too but then ads and SEO killed
           | it.
        
           | bob1029 wrote:
           | > Good information retrieval is a problem we are trying to
           | solve for thousands of years
           | 
           | Quantum computers would have something to say about this,
           | assuming they ever materialize.
        
       | msoad wrote:
       | Obvious immediate question is, is it as creative? There are a lot
       | creativity left behind when you increase the token size (let's be
       | real, it's just that). As an example creating a new word like
       | "dickstracted"[1] would not ever happen in this model
       | 
       | [1] https://www.urbandictionary.com/define.php?term=Dickstracted
        
         | 3cats-in-a-coat wrote:
         | Why wouldn't it. It suggest it copies text spans, it doesn't
         | say how big.
        
       | naillo wrote:
       | Seems like a common pattern. State of the art models being well
       | replaced by a information retrieval layer (top 10 results) fed
       | into a much lighter model that does something with that plus the
       | original input. Cool result!
        
         | twic wrote:
         | This is definitely my bet on where things are going. And not
         | just this particular example - i believe we will identify many
         | recurring submodules and patterns in neural networks that can
         | be extracted into conventional code, leaving a lightweight
         | neural glue layer orchestrating them. This should be more
         | efficient, faster to train, more interpretable, and more
         | reliable, so better for users. But less mysterious, so worse
         | for VCs.
        
         | redox99 wrote:
         | I don't know. ChatGPT and Bing both dramatically deteriorate if
         | you allow them to search the web.
         | 
         | And systems that allow you to "talk" to a PDF via top results
         | of vector search being added to the prompt are also pretty
         | underwhelming.
        
         | falcor84 wrote:
         | Yeah, that actually sounds amazing to me. If we could limit the
         | LLM to somehow only act as a "reasoning" rather than a
         | "knowledge" layer, such that all the non-trivial domain
         | knowledge has to come from the information retrieval layer, in
         | a fully referenced way, that could potentially "solve" the
         | hallucination problem, no?
         | 
         | Even more than that, I wonder if we could then apply something
         | like this to power some sort of "fact provenance" for the web
         | as a whole, e.g. by populating Wikidata with referenced facts
         | (preferably with extensive human QA).
        
           | esjeon wrote:
           | Yeah, and, on top of that, I think this can lead to smaller
           | (and snappier) agent models, because we no longer have to
           | encode every single piece of information into models. As we
           | carve out more and more parameters and input data, AI
           | development will get more accessible, and we'll get more
           | novel applications. (I'm certainly dreaming here.)
        
           | spacemanspiff01 wrote:
           | [dead]
        
       | awestroke wrote:
       | First, hate the title
       | 
       | Second, this approach seems equivalent to using larger tokens,
       | which means the problems with using tokens instead of letters are
       | just exacerbated
        
       | xg15 wrote:
       | I think the "... is all you need" title here is particularly
       | misleading as the paper does in fact use a BERT model for
       | generating the vectors.
       | 
       | So if the implication was that no language model was needed at
       | all and you can just do nearest neighbour on string similarity
       | and patch results together, that implication was clearly wrong.
       | 
       | I think what the paper does show though is that there are methods
       | that can make language models topic-specific without fine-tuning
       | and that yield competitive results even with older models.
        
         | moffkalast wrote:
         | Next thing you'll say the Beatles are being misleading with
         | 'All you need is love' because people also need food and
         | shelter.
        
           | xg15 wrote:
           | Eh, the "attention is all you need" paper was kinda arguing
           | that. And this paper doesn't.
        
           | Zacharias030 wrote:
           | mmd! <3
        
         | usgroup wrote:
         | Yeah I thought the same -- it struck me at first blush as if it
         | was some kind of super simple architecture that didn't use
         | transformers, and then in the diagram i saw they used BERT to
         | produce the embeddings!
        
       | amluto wrote:
       | This is interesting coming on the heels of the gzip-based
       | inference paper. gzip is based on LZ77, and the LZ family of
       | compressors generate and store (and cleverly encode) instructions
       | to copy blocks of text they have seen before to their output.
        
       | collinc777 wrote:
       | Slight tangent:
       | 
       | I once worked with a programmer who, the vast majority of time,
       | would only input text into a text editor via copy and paste.
       | 
       | Think anti-vim. His fingers were locked on mouse and crtl+c/v. It
       | was incredible to watch and his programming speed was very
       | impressive.
        
         | michaelcampbell wrote:
         | I (used to) work with a colleague who was just the opposite;
         | she did (and still does) ONLY do copy/paste with the mouse. It
         | is excruciating to watch when pairing or on a video meeting.
         | 
         | I get people have different workflows, but not taking advantage
         | of even the minimalist functionality of ones tools I think I
         | will never understand.
        
         | willsmith72 wrote:
         | Please tell me more. Where was he copying from? What about
         | formatting and refactors? Was his quality as impressive as his
         | speed?
        
           | jaredsohn wrote:
           | programmers he subcontracted to.
           | 
           | Also explains why he was so fast
        
           | tylercrompton wrote:
           | Stack Overflow, surely
        
           | collinc777 wrote:
           | Generally the repository he was working in, but really it was
           | any application that he had open on his machine. He would
           | remember where words, or portion of words that he needed
           | were, go to them, and copy and paste what he needed.
           | 
           | Just in case you're thinking this: He was not copying large
           | portions of code from stack overflow or anything like that.
           | He was line by line writing code, a few copy and pastes at a
           | time. Often he times would copy and paste single characters
           | to maintain his flow.
        
           | postalrat wrote:
           | Start with a file like: abcde...ABCD..1234.{}()*.... and go
           | from there.
        
           | ourmandave wrote:
           | Voice dictation, Shirley.
        
             | esafak wrote:
             | Don't call me Shirley!
        
           | high_priest wrote:
           | OpenAI, surely.
        
         | klysm wrote:
         | Over the years I've come to realize the copy paste has probably
         | been a net negative for me and I almost never do it anymore. If
         | you are doing the copy-paste, then change a couple names to
         | match a different pattern thing by hand, the subtle errors you
         | can get by making a mistake can take forever to catch. In code
         | review it always looks plausible unless the reviewer is _very_
         | careful. Furthermore, it means you are duplicating code - which
         | is sometimes totally fine - but forcing yourself to not copy-
         | paste makes you consider what the abstraction would be and if
         | it would be worth it not.
         | 
         | In the case where you are copy-pasting out of code you don't
         | really understand. Retyping it gives you time to understand and
         | maybe catch existing bugs in the code you are copying.
        
         | rubslopes wrote:
         | A tangent of a tangent:
         | 
         | My way of doing imperative coding for data science with Python
         | is to write a price of code in Sublime Text, copy and paste to
         | iTerm, run, and get back to the editor. But of course I mapped
         | shift+Enter to do all of that for me. I much prefer this
         | setting to Jupyter Notebooks.
        
           | webnrrd2k wrote:
           | I program with almost all of my script-ey languages, like
           | python, in a similar way - I'm on linux, and I edit
           | everything in Sublime, save it to a file and then run it in a
           | separate terminal. It the command gets complex I'll create a
           | little bash script file, make it execytable, and run that.
           | 
           | I just alt-tab from editer to terminal, check output, etc,
           | and back. That way I have a bunch of unix text-processing
           | tools (grep, sed, etc...) always available. I'm too reliant
           | on print debuging things as I go along, but it's a deeply
           | ingrained habit.
        
             | bombela wrote:
             | The tool "entr" (and similar like "cargo watch") are so
             | useful there. Save the file and it runs the command. For
             | extra credit I mapped shift-enter to save the current file
             | in my editor.
        
         | klyrs wrote:
         | Many years ago when I used windows, I had a virus once that
         | killed my keyboard. It was pretty fun to work around that... I
         | ended up using the symbol browser utility to copy individual
         | letters and then right click to paste them places.
        
         | 3cats-in-a-coat wrote:
         | Wait, I need you to please elaborate on this. Where was he
         | sourcing all code he pasted? Did he have a "snippet file" like
         | a painter with a palette or something?
        
           | collinc777 wrote:
           | Typically in the code repository with other files that shared
           | a similar pattern context that he was working with.
           | 
           | Sometimes he would need a portion of a word and he would
           | remember that it was in an email he had open, and he would
           | alt+tab and grab the portion of the word from the email, then
           | alt+tab back to the editor and paste the word portion in.
           | 
           | He would go to extreme lengths to not have to move his hands
           | to the home row on the keyboard.
        
             | bombela wrote:
             | I am a bit dyslexic and even though I am an avid vim user,
             | I do often rely on copy paste to avoid silly typos that
             | cost so much time for me to fix. Because I cannot even see
             | the typo, even in the compiler output!
        
         | [deleted]
        
       | thanatropism wrote:
       | > COG
       | 
       | https://wiki.opencog.org/w/The_Open_Cognition_Project
        
       | xianshou wrote:
       | Behold, the _true_ stochastic parrot.
        
         | twic wrote:
         | Auto-dadaism: https://en.wikipedia.org/wiki/Cut-up_technique
        
       | js8 wrote:
       | I remember that around 2004, before convnets became popular,
       | there was a paper on image texture style transfer using
       | approximate nearest neighbors based on some neighborhood of each
       | point. This technique seems similar but for text.
        
       | Animats wrote:
       | This approach can probably handle most of the queries search
       | engines and Siri-type chatbots handle. The big GPT-type engines
       | can be reserved for the hard problems. Something along those
       | lines is needed to keep the cost of search down. There's an
       | estimate that using a large language model for search is 10x more
       | expensive than existing search engines. Yet few queries really
       | need that much heavy machinery.
        
       | Der_Einzige wrote:
       | This has deep connections with my attempt to implement an
       | effective queryable word-level grammatically correct extractive
       | text summarizer (AKA: The way most people actually summarize
       | documents) - https://github.com/Hellisotherpeople/CX_DB8
       | 
       | I will try to implement this with the necessary changes to
       | actually make this work properly, where instead of generating a
       | new answer, it simply highlights the most likely text spans.
        
       ___________________________________________________________________
       (page generated 2023-07-17 23:02 UTC)