Copy is all you need

xg15 · on July 17, 2023

I think the "... is all you need" title here is particularly misleading as the paper does in fact use a BERT model for generating the vectors.

So if the implication was that no language model was needed at all and you can just do nearest neighbour on string similarity and patch results together, that implication was clearly wrong.

I think what the paper does show though is that there are methods that can make language models topic-specific without fine-tuning and that yield competitive results even with older models.

usgroup · on July 17, 2023

Yeah I thought the same -- it struck me at first blush as if it was some kind of super simple architecture that didn't use transformers, and then in the diagram i saw they used BERT to produce the embeddings!

moffkalast · on July 17, 2023

Next thing you'll say the Beatles are being misleading with 'All you need is love' because people also need food and shelter.

xg15 · on July 17, 2023

Eh, the "attention is all you need" paper was kinda arguing that. And this paper doesn't.

Zacharias030 · on July 17, 2023

mmd! <3

VHRanger · on July 17, 2023

This resonates with the current AI skeptic view that language models are a supercharged search engine on the pile of text they're trained on.

Also the fact that evaluating language models is difficult, and we tend to end up with models that game the evaluation benchmarks.

wongarsu · on July 17, 2023

Good information retrieval is a problem we are trying to solve for thousands of years, so even if that's all LLMs are doing then that's still a great achievement.

Of course a more explicit approach like this paper is a really good step in that direction by making it easier to trace information provenance. It might still be nontrivial to answer why the model selected this specific piece of information, and why it was composed in this specific way, but it seems trivial to say where the model got the information from. Which is really all we demand from humans too.

croes · on July 17, 2023

Was the training data quality checked? If so then LLMs are search engines for catalogs like Yahoo once was and not a good search engine for SEO optimized click farms.

Google search once was great too but then ads and SEO killed it.

actionfromafar · on July 18, 2023

Eh, the free to use commercial LLMs will surely be spiced with commercials eventually.

bob1029 · on July 17, 2023

> Good information retrieval is a problem we are trying to solve for thousands of years

Quantum computers would have something to say about this, assuming they ever materialize.

MAXPOOL · on July 17, 2023

What about LLM reasoning ability?

Faith and Fate: Limits of Transformers on Compositionality https://arxiv.org/abs/2305.18654

Transformers solve compositional reasoning tasks by reducing multi-step compositional reasoning into linearized subgraph matching without problem-solving skills. They can solve problems when they have reasoning graphs in the memory.

esjeon · on July 17, 2023

LLMs do logic by mimicking logical structures on the text level (and that's why they often need be ordered to do step-by-step for correct answers), so this one may also have the same ability as long as memories are properly utilized.

BSEdlMMldESB · on July 17, 2023

I think this boils down to the capacity to match together parenthesis in a logical-syntax way

however, the "parenthesis" can be any symbol. even grammatical clauses are one sort of "parenthesis" in the way I'm thinking about them

abc_lisper · on July 17, 2023

Funny, I write Clojure for my day job and fun, so I have tried to use ChatGPT to generate code. If anything, it sucks at paren matching. It reminded me of stable diffusion's "six finger problem".

BSEdlMMldESB · on July 17, 2023

as I said, it's not exactly "parenthesis" with the strictness that real programing needs.

in fact, my whole idea has got me on a deep dive into the nature of the decimal point (up to which extent is the decimal point representation of numbers and instance of a "fixed point"? I don't know! I cannot understand a fix point just yet; and for me to say I get decimal notation actually means I understand something about p-adic representation; which I'm still working on figuring out)

I thought these models got more 'logical' after training with computer code

naillo · on July 17, 2023

Seems like a common pattern. State of the art models being well replaced by a information retrieval layer (top 10 results) fed into a much lighter model that does something with that plus the original input. Cool result!

twic · on July 17, 2023

This is definitely my bet on where things are going. And not just this particular example - i believe we will identify many recurring submodules and patterns in neural networks that can be extracted into conventional code, leaving a lightweight neural glue layer orchestrating them. This should be more efficient, faster to train, more interpretable, and more reliable, so better for users. But less mysterious, so worse for VCs.

falcor84 · on July 17, 2023

Yeah, that actually sounds amazing to me. If we could limit the LLM to somehow only act as a "reasoning" rather than a "knowledge" layer, such that all the non-trivial domain knowledge has to come from the information retrieval layer, in a fully referenced way, that could potentially "solve" the hallucination problem, no?

Even more than that, I wonder if we could then apply something like this to power some sort of "fact provenance" for the web as a whole, e.g. by populating Wikidata with referenced facts (preferably with extensive human QA).

esjeon · on July 17, 2023

Yeah, and, on top of that, I think this can lead to smaller (and snappier) agent models, because we no longer have to encode every single piece of information into models. As we carve out more and more parameters and input data, AI development will get more accessible, and we'll get more novel applications. (I'm certainly dreaming here.)

redox99 · on July 17, 2023

I don't know. ChatGPT and Bing both dramatically deteriorate if you allow them to search the web.

And systems that allow you to "talk" to a PDF via top results of vector search being added to the prompt are also pretty underwhelming.

Animats · on July 17, 2023

This approach can probably handle most of the queries search engines and Siri-type chatbots handle. The big GPT-type engines can be reserved for the hard problems. Something along those lines is needed to keep the cost of search down. There's an estimate that using a large language model for search is 10x more expensive than existing search engines. Yet few queries really need that much heavy machinery.

woeirua · on July 17, 2023

The big advantage here would be the ability to attribute entire blocks of text back to a specific source and cross domains just by building a database of embeddings. The downside is that these networks are probably not as creative as they're limited to only data that's available. It might work best to use something like this as an expert system for a GPT like agent to refer to when needed.

msoad · on July 17, 2023

Obvious immediate question is, is it as creative? There are a lot creativity left behind when you increase the token size (let's be real, it's just that). As an example creating a new word like "dickstracted"[1] would not ever happen in this model

[1] https://www.urbandictionary.com/define.php?term=Dickstracted

3cats-in-a-coat · on July 17, 2023

Why wouldn't it. It suggest it copies text spans, it doesn't say how big.

collinc777 · on July 17, 2023

Slight tangent:

I once worked with a programmer who, the vast majority of time, would only input text into a text editor via copy and paste.

Think anti-vim. His fingers were locked on mouse and crtl+c/v. It was incredible to watch and his programming speed was very impressive.

klysm · on July 17, 2023

Over the years I've come to realize the copy paste has probably been a net negative for me and I almost never do it anymore. If you are doing the copy-paste, then change a couple names to match a different pattern thing by hand, the subtle errors you can get by making a mistake can take forever to catch. In code review it always looks plausible unless the reviewer is _very_ careful. Furthermore, it means you are duplicating code - which is sometimes totally fine - but forcing yourself to not copy-paste makes you consider what the abstraction would be and if it would be worth it not.

In the case where you are copy-pasting out of code you don't really understand. Retyping it gives you time to understand and maybe catch existing bugs in the code you are copying.

willsmith72 · on July 17, 2023

Please tell me more. Where was he copying from? What about formatting and refactors? Was his quality as impressive as his speed?

collinc777 · on July 17, 2023

Generally the repository he was working in, but really it was any application that he had open on his machine. He would remember where words, or portion of words that he needed were, go to them, and copy and paste what he needed.

Just in case you're thinking this: He was not copying large portions of code from stack overflow or anything like that. He was line by line writing code, a few copy and pastes at a time. Often he times would copy and paste single characters to maintain his flow.

OJFord · on July 28, 2023

Just code or would he avoid any typing this way?

naveen99 · on July 18, 2023

You win Dailywtf

willsmith72 · on July 18, 2023

This is too good to be true

postalrat · on July 17, 2023

Start with a file like: abcde...ABCD..1234.{}()*.... and go from there.

ourmandave · on July 17, 2023

Voice dictation, Shirley.

esafak · on July 17, 2023

Don't call me Shirley!

tylercrompton · on July 17, 2023

Stack Overflow, surely

jaredsohn · on July 17, 2023

programmers he subcontracted to.

Also explains why he was so fast

high_priest · on July 17, 2023

OpenAI, surely.

3cats-in-a-coat · on July 17, 2023

Wait, I need you to please elaborate on this. Where was he sourcing all code he pasted? Did he have a "snippet file" like a painter with a palette or something?

collinc777 · on July 17, 2023

Typically in the code repository with other files that shared a similar pattern context that he was working with.

Sometimes he would need a portion of a word and he would remember that it was in an email he had open, and he would alt+tab and grab the portion of the word from the email, then alt+tab back to the editor and paste the word portion in.

He would go to extreme lengths to not have to move his hands to the home row on the keyboard.

bombela · on July 17, 2023

I am a bit dyslexic and even though I am an avid vim user, I do often rely on copy paste to avoid silly typos that cost so much time for me to fix. Because I cannot even see the typo, even in the compiler output!

michaelcampbell · on July 17, 2023

I (used to) work with a colleague who was just the opposite; she did (and still does) ONLY do copy/paste with the mouse. It is excruciating to watch when pairing or on a video meeting.

I get people have different workflows, but not taking advantage of even the minimalist functionality of ones tools I think I will never understand.

klyrs · on July 17, 2023

Many years ago when I used windows, I had a virus once that killed my keyboard. It was pretty fun to work around that... I ended up using the symbol browser utility to copy individual letters and then right click to paste them places.

rubslopes · on July 17, 2023

A tangent of a tangent:

My way of doing imperative coding for data science with Python is to write a price of code in Sublime Text, copy and paste to iTerm, run, and get back to the editor. But of course I mapped shift+Enter to do all of that for me. I much prefer this setting to Jupyter Notebooks.

webnrrd2k · on July 17, 2023

I program with almost all of my script-ey languages, like python, in a similar way - I'm on linux, and I edit everything in Sublime, save it to a file and then run it in a separate terminal. It the command gets complex I'll create a little bash script file, make it execytable, and run that.

I just alt-tab from editer to terminal, check output, etc, and back. That way I have a bunch of unix text-processing tools (grep, sed, etc...) always available. I'm too reliant on print debuging things as I go along, but it's a deeply ingrained habit.

bombela · on July 17, 2023

The tool "entr" (and similar like "cargo watch") are so useful there. Save the file and it runs the command. For extra credit I mapped shift-enter to save the current file in my editor.

tipsytoad · on July 18, 2023

Vscode has this built in: https://code.visualstudio.com/docs/python/jupyter-support-py

Der_Einzige · on July 17, 2023

This has deep connections with my attempt to implement an effective queryable word-level grammatically correct extractive text summarizer (AKA: The way most people actually summarize documents) - https://github.com/Hellisotherpeople/CX_DB8

I will try to implement this with the necessary changes to actually make this work properly, where instead of generating a new answer, it simply highlights the most likely text spans.

rapatel0 · on July 17, 2023

Surprised no one has mentioned the obvious issue: plagiarism

(Not sure if the authors have indicated any method for attribution of the original data)

soliton4 · on July 17, 2023

this made me think of a fun activity. ask chatgpt to come up with a new word and then google that word. sometimes the word exists in the context of a scify show or a plant. sometimes gpt just added a "se" or "us" to existing words. sometimes it changes a Z to a C but it never actually came up with a new word

jsight · on July 17, 2023

I asked it this:

"Set your model temperature as high as possible an generate a completely new and random word"

It acted acted like it understood and generated the word Blazivox. I don't see it on Google at least.

jojobaskins · on July 17, 2023

BlazeVox is a publishing company. I guess its still one character away but close enough that it could have just randomly swapped out the character.

fsmv · on July 17, 2023

It cannot change the temperature intrinsically. Only OpenAI controls that in their API.

vanjajaja1 · on July 17, 2023

but it does know the concept, so it can simulate it

xianshou · on July 17, 2023

Behold, the true stochastic parrot.

twic · on July 17, 2023

Auto-dadaism: https://en.wikipedia.org/wiki/Cut-up_technique

amluto · on July 17, 2023

This is interesting coming on the heels of the gzip-based inference paper. gzip is based on LZ77, and the LZ family of compressors generate and store (and cleverly encode) instructions to copy blocks of text they have seen before to their output.

js8 · on July 17, 2023

I remember that around 2004, before convnets became popular, there was a paper on image texture style transfer using approximate nearest neighbors based on some neighborhood of each point. This technique seems similar but for text.

kastnerkyle · on July 18, 2023

Maybe 'Image Quilting for Texture Synthesis and Transfer', Efros and Freeman [0]?

There's some neural / patch blends from 2016 that I always thought were interesting (CNN-MRF) [1], and I think there's a renaissance in those approaches recently (combined with other generators / prompts etc.). You can also argue ViT is "patch based" in a major sense... I am still a big believer in patch + combinations + warping (non-parameteric synthesis) generally, some cool older work from Apple on that in speech land [2].

I go as far as arguing BPE / wordpiece / sentencepiece / tokenizers in general are key for modern approaches (as were word vocab selections in the earlier days of NMT), because they find 'good enough' patches (tokens) for a higher level model to stitch together while still having some creativity / generalization available... but we focus on the model details rather than the importance of the tokenizer (and tokenizer distribution) in publication many times.

[0] http://people.eecs.berkeley.edu/~efros/research/quilting.htm...

[1] https://github.com/chuanli11/CNNMRF

[2] https://machinelearning.apple.com/research/siri-voices

thanatropism · on July 17, 2023

> COG

https://wiki.opencog.org/w/The_Open_Cognition_Project

sfmike · on July 18, 2023

Thought this was about how you just need good copywriting skills

opnac · on July 17, 2023

I wish we could stop with the “X is all you need” papers! The first one was unintuitive and so are the rest.

furyofantares · on July 17, 2023

I agree. X is all you need considered harmful

butterisgood · on July 17, 2023

X is all you need considered harmful is all you need!

lolinder · on July 17, 2023

I agree that the copycats are wearing thin, but the original paper's title seems fine to me. It's an accurate description of the breakthrough they made. The first few sentences of the Attention is All You Need abstract explain it pretty well:

> The dominant sequence transduction models are based on complex recurrent or convolutional neural networks that include an encoder and a decoder. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely.

https://proceedings.neurips.cc/paper/2017/file/3f5ee243547de...

macleginn · on July 17, 2023

The first sentence seems to imply that the main thing they want to do away with is the encoder–decoder arch, which they actually kept. (But BERT and GPT later did manage to simplify it.)

jsight · on July 17, 2023

It took a lot of my attention to even begin to understand that, but in the end, I agree with you.

hardware2win · on July 17, 2023

"is all you need" is considered harmful just like "considered harmful" is considered harmful by HNers?

MR4D · on July 17, 2023

Given that redundancy is considered harmful, you probably want to create a ConsideredHarmful class and then a ConsideredHarmfulFactory to make for a more enterprise-ready structure.

:)

hardware2win · on July 17, 2023

Thats so 2015, we need to move it to ConsideredHarmful Microservice

KnobbleMcKnees · on July 17, 2023

But I'm so close to finishing "goto is all you need"!

lotsofspots · on July 17, 2023

Beat me to it, I was going to go for "X Is All You Need Papers Considered Harmful"

anticensor · on July 20, 2023

And the dual of it, "X is harmful papers are all you need".

nonameiguess · on July 17, 2023

Didn't somebody prove the mov instruction is Turing-complete all by itself? I believe some code obfuscators actually take advantage of this.

ot · on July 17, 2023

At least in this case it's self-referential (not sure if intended)

rubslopes · on July 17, 2023

There's this guy in Twitter fighting against titles like that: https://twitter.com/SayWhatYouFound

asalahli · on July 17, 2023

Huh, did twitter stop letting unauthenticated users read tweets? I can't get past login form.

lolinder · on July 17, 2023

You're one of today's lucky 10k: https://news.ycombinator.com/item?id=36540957

mottiden · on July 17, 2023

I agree. The paper is really interesting, but the title not so much :)

jillesvangurp · on July 17, 2023

A bit click baity at least. And without opening it you have no chance to understand what this is about. I know HN has a policy against editorializing but in this case, a brief summary would have been helpful.

mottiden · on July 17, 2023

The paper introduces a new method for text generation, named Copy-Over-Generate (COG), which differs from traditional approaches that generate words from a fixed vocabulary. Instead, COG progressively copies phrases from a massive text collection, aiming to generate coherent text continuations through multiple rounds of phrase retrieval.

COG stands on the line of retrieval-augmented text generation research but takes a radical step forward. Unlike previous work that combines retrieval and generation, in COG, retrieval is generation.

COG shares some ideas with previous work such as replacing the fixed vocabulary with a nonparametric phrase table.

The paper presents experimental results showing the advantages of COG over strong baselines in three experimental settings: standard language modeling (using the WikiText-103 dataset), domain adaptation (using the Law-MT dataset), and an enlarged phrase index (using the En-Wiki dataset).

Despite the promising results, the authors acknowledge that there are some flaws in the COG method. For example, COG may copy a phrase that is incoherent with the previously copied phrase, or it may only copy a part of a complete phrase, leading to inaccurate generation results.

CamperBob2 · on July 17, 2023

Before long, if these NN refinements continue at their current pace, it's going to become impossible to tell synthetic HN posts from organic ones. Going to get weird.

flangola7 · on July 17, 2023

This is already the case with LLaMA.

awestroke · on July 17, 2023

First, hate the title

Second, this approach seems equivalent to using larger tokens, which means the problems with using tokens instead of letters are just exacerbated