Q* Hypothesis: Enhancing Reasoning, Rewards, and Synthetic Data

romesc · on Nov 24, 2023

Sure A* is awesome, but taking the "star" and immediately attributing it to A* is probably a bridge too far.

Q* or any X* for that matter is extremely common for referring to the optimal function under certain assumptions. (usually cost / reward structure).

tunesmith · on Nov 24, 2023

Yeah I just saw the video from that researcher (later an OpenAI researcher?) that talked about it back in 2016... not that I understood much, but it definitely seemed that Q* was a generalization of the Q algorithm described on the previous slide. The optimum something across all somethings.

resource0x · on Nov 24, 2023

LeCun: Please ignore the deluge of complete nonsense about Q*. https://twitter.com/ylecun/status/1728126868342145481

zaptrem · on Nov 25, 2023

As someone with a borderline acceptable understanding of RL this is the most accurate take so far.

maaaaattttt · on Nov 24, 2023

If you have the possibility I would be quite interested in a link to the video or alternatively the name of the researcher you mention.

andrew3726 · on Nov 24, 2023

It's Noam Brown, he worked at Meta AI before on Cicero and No-hands Poker before that.

Zolde · on Nov 24, 2023

It will be nice to see the breakthroughs resulting from what people _believed_ Q* to have been.

bschne · on Nov 24, 2023

I love this take. Reminds me of how the Mechanical Turk apparently indirectly inspired someone to build a weaving machine b/c "how hard could it be if machines can play chess" -- https://x.com/gordonbrander/status/1385245747071787008?s=20

jiggawatts · on Nov 25, 2023

This is one of my favourite "errors" of human thinking: mistaking something false for reality, and then making it real based on the confidence gained from that initial mistake.

An example I heard was that one of the programmers working on the original Unreal engine saw a demo of John Carmack's constructive solid geometry (CSG) editor. He incorrectly surmised that this was a real-time editor, so he hurriedly made one for the Unreal game engine to "keep up" with Quake. In reality, the Quake editor wasn't nearly as responsive as he assumed, and in fact he had to significantly advance the state of the art to "keep up"!

vsnf · on Nov 25, 2023

I recall that this is the story behind overlapping windows on the old Macintosh computers, although a concrete source seems difficult to find[0]

[0] https://news.ycombinator.com/item?id=2998463

erikaww · on Nov 24, 2023

certainly more things to throw at the wall! Excited to see the "accidental" progress

spicyusername · on Nov 24, 2023

I have trouble believing this isn't just a sneaky marketing campaign.

dmix · on Nov 24, 2023

Nothing OpenAI has released product-wise (ChatGPT, Dall-E) has required 'marketing'. The value speaks for itself. People raving about it on twitter, telling their friends/coworkers, and journos documenting their explorations is more than enough.

If this was an extremely competitive market that'd be more plausible. But they enjoy some pretty serious dominance and are struggling to handle the growth they already have with GPT.

If Q* is real, you likely wouldn't need to hype up something that has the potential to solve math / logic problems without having seen the problem/solution before hand. Something that novel would be hugely valuable and generate demand naturally.

lawlessone · on Nov 24, 2023

>The value speaks for itself.

What is that though? I've seen a lot of tools created for it. Custom AI Characters. Things that let you have an LLM read a DB etc. But I haven't much in regards to customer facing things.

janalsncm · on Nov 24, 2023

> But I haven't much in regards to customer facing things.

How about ChatGPT? It’s a game changer. It has allowed me to learn Rust extremely quickly since I can just ask it direct questions about my code. And I don’t worry about hallucinations since the compiler is always there to “fact check”.

I’m pretty bearish on OpenAI wrappers. Low effort, zero moat. But that’s largely irrelevant to the value of OpenAI products themselves.

riku_iki · on Nov 24, 2023

> It has allowed me to learn Rust extremely quickly since I can just ask it direct questions about my code.

I was asking today about polars, and it hallucinated all answers..

janalsncm · on Nov 25, 2023

Sure, you have to take all the answers with a grain of salt. But code is one area I can verify myself, so I’m less worried.

x86x87 · on Nov 25, 2023

So, it's really easy to learn but you already have to know what you're allegedly learning to be able to call bs on the parts that are fake?

How is this helpful?

dogprez · on Nov 28, 2023

That level of skepticism is the reality for any query to a human or a search engine. For example, maybe that first search result is correct, but for the wrong version of what you are looking for. Maybe your colleague didn't understand what you really wanted.

Where it is helpful is that it's way more convenient than asking a human, doing a web search, or looking it up in a book. Even if a followup question is required. In this particular case there is a lot of transferability between knowing programming languages, too, so one can filter most implausible answers even if they don't know the programming language they are asking about.

therein · on Nov 24, 2023

As they unify the "web browsing" capabilities with the LLM using its own internal knowledge, it gets interesting because how can the LLM even decide if its own knowledge is accurate or superior or not?

riku_iki · on Nov 24, 2023

so, they now adding few more steps into the chain, each of each has its own quality issues, and we yet to see what will be final quality.

dharmab · on Nov 24, 2023

It's pretty good for customer support agent tools. Feed the LLM your company's knowledgebase and give it the context of the support chat/email/call transcript, and it suggests solutions to the agent.

dist-epoch · on Nov 24, 2023

> Satya: Microsoft has over a million paying Github Copilot users

https://www.zdnet.com/article/microsoft-has-over-a-million-p...

ghostzilla · on Nov 24, 2023

> People raving about it on twitter

For the most part usages of GenAI have been sharing output on social media. It is mind-blowingly fascinating, but the utility of it is far far behind.

djvdq · on Nov 24, 2023

Of course they are doing PR stunts to kepp media talking about them.

Remember Altman saying that they shouldn't release GPT-2 because of it being too dangerous? It's the same thing with this Q* thing.

FeepingCreature · on Nov 24, 2023

Because it could be used to generate spam, yes, and he was right about that.

And to set a precedent that models should be released cautiously, and he was right about that too, and it is to our detriment that we don't take that more seriously.

mdekkers · on Nov 25, 2023

> it is to our detriment that we don't take that more seriously.

Why?

FeepingCreature · on Nov 25, 2023

Because when a network turns out to be dangerous in some unexpected way, you cannot exactly unrelease it.

mdekkers · on Nov 26, 2023

Dangerous, how? All this vague handwavey fear mongering doesn’t really do it for me. Specifics are more my thing.

FeepingCreature · on Nov 26, 2023

What makes you positively confident that a network cannot be dangerous?

mdekkers · on Nov 26, 2023

I’m not making any claims about the danger or lack of danger. Anyway, in the absence of specifics, this is a boring conversation.

FeepingCreature · on Nov 26, 2023

Great, so given a chance of danger, not releasing a network keeps your options open. You can make an API available, and if there turns out to be a problem you can close it again, or ban a specific user, or implement hotfixes. None of those can be done with a publically released network.

edit: You know what, let's take a concrete issue that could happen today. You've made a generative image network. Five weeks after releasing it on Huggingface, you discover to your chagrin that the dataset that you used to train it contains an astonishing amount of child pornography, something like 1%. Your spot checks didn't find this because it's all in a subfolder that you forgot to check. Who knew it wasn't a good idea to download datasets from 4chan? As a result, this network is now extremely good at generating images of children in sexual situations, and because of mode collapse, it's creating fake images of real children, something which all but the most libertarian consider morally abhorrent. At any rate, you consider this morally abhorrent, and you'd love to work with the police to prevent any further misuse. Unfortunately, your network has been downloaded at least ten thousand times and it has already been fine-tuned to be even better at child porn by the nice folks at <insert dubious discord here>. Now you have an appointment with a senator in three days, and you have to explain to her why you thought it was a good idea to publish this network for open download, even though you could have made way more money by keeping it closed. Good luck?

Now of course you can argue that in this case all the material was already out there. But that doesn't change the fact that you were the one who did the training run, and released the network, and you're the reason why perceptual hashes now won't find collisions on the generated pictures anymore. If there was a limited amount of generated images in circulation, you could just take the API down, apologize profusely, donate 10k to RAINN or whatever and restart your project under a new name. But as it is, that option is no longer available. The point is, we don't know what a network is doing, and so we don't know what it's going to do in the wild. We cannot prove the absence of capability, so we should hedge our bets.

dmix · on Nov 24, 2023

Helen Toner board member accused Sam/OpenAI for releasing GPT too early, there were people who wanted to keep it locked away for those concerns, which largely haven’t come true (a lot of people don’t understand how spam detection works and overrate the impact of deepfakes).

Company’s have competing interests and personalities. That’s normal. But there is no indication that GPT was held back for marketing.

YetAnotherNick · on Nov 24, 2023

I have trouble believing the who ousting of Sam Altman was planned for this. But yeah someone might be smart enough to feed wrong info to the press after the whole saga was over.

bhhaskin · on Nov 24, 2023

I agree. Only thing that matters is results.

kelseyfrog · on Nov 24, 2023

A* is a red-herring based on availability bias.

Q* is already a thing and it's the Bellman equation describing the optimal action-value function.

bertil · on Nov 24, 2023

Are you saying that the Bellman equations already use the notation Q*, or are you saying that those equations (I’m not as familiar as I should be, sorry) are the obvious connection between the incoherent ramblings from Reuters?

Because having similar acronyms or notations used for multiple contexts that end up collapsing with cross-pollination of ideas is far too frequent these days. I once made a dictionary of terms used in A/B testing / Feature Flags / DevOps / Statistics / Econometrics, and most keywords had multiple, incompatible acceptions depending on the exact context, all somewhat relevant to A/B testing. Every reader came out of it so defeated, like language itself was broken…

kelseyfrog · on Nov 24, 2023

I'm saying that everyone already uses that notation including OpenAI[1].

1. https://spinningup.openai.com/en/latest/algorithms/ddpg.html

bertil · on Nov 25, 2023

Thank you so much! This is a fantastic resource that I somehow missed.

tnecniv · on Nov 24, 2023

Q* is an incredibly common notation for the above version of the Bellman equation. I think it’s stupid to call an algorithm Q* for the same reason it is to read too much into this: it’s an incredibly nondescript name.

ElectricalUnion · on Nov 24, 2023

Can you link this dictionary here or is it proprietary?

bertil · on Nov 25, 2023

It was proprietary when I made it, but no one seemed to care–there wasn't anything that wasn't painfully public in there. I'll probably edit and include it in a class I'm planning about teaching people about A/B testing.

jbrisson · on Nov 24, 2023

Imho, in order to reach AGI you have to get out of the LLM space. It has to be something else. Something close to biological plausability.

bob1029 · on Nov 24, 2023

I think big parts of the answer include time domain, multi-agent and iterative concepts.

Language is about communication of information between parties. One instance of an LLM doing one-shot inference is not leveraging much of this. Only first-order semantics can really be explored. There is a limit to what can be communicated in a context of any size if you only get one shot at it. Change over time is a critical part of our reality.

Imagine if your agent could determine that it has been thinking about something for too long and adapt strategy automatically. Increase to higher param model, adapt the context, etc.

Perhaps we aren't seeking total AGI/ASI either (aka inventing new physics). From a business standpoint, it seems like we mostly have what we need now. The next ~3 months are going to be a hurricane in our shop.

valine · on Nov 24, 2023

Covering an airplane in feathers isn't going to make it fly faster. Biological plausibility is a red haring imho.

foooorsyth · on Nov 24, 2023

The training space is more important. I don’t think a general intelligence will spawn from text corpuses. A person only able to consume text to learn would be considered severely disabled.

A significant part of intelligence comes from existence in meatspace and the ability to manipulate and observe that meatspace. A two year old learns much faster with much less data than any LLM.

valine · on Nov 24, 2023

We already have multimodal models that take both images and text as input. The bulk of the training for these models was in text, not images. This shouldn’t be surprising. Text is a great way of abstractly and efficiently representing reality. Of course those patterns are useful for making sense of other modalities.

Beyond modeling the world, text is also a great way to model human thought and reason. People like to explain their thought process in writing. LLMs already pick up on and mimic chain of thought well.

Contained within large datasets is crystallized thought, and efficient descriptions of reality that have proven useful for processing modalities beyond text. To me that seems like a great foundation for AGI.

lossolo · on Nov 25, 2023

> To me that seems like a great foundation for AGI.

It's only one part, predicting text is relatively straightforward because it doesn't require predicting complex sequences like 'a S23mz s.zawsds'. Based on statistical analysis, there is a limited number of word combinations that humans use. With hundreds of billions of parameters, significant compression is possible. Mathematics is different as it requires actual reasoning, an area where LLMs often struggle significantly because they lack the capability for genuine reasoning.

foooorsyth · on Nov 25, 2023

Text and 2D images are a tiny subset of physical reality as perceived by an able-bodied human. Even our best approximation (3D VR headset with Spatial Audio) is a poor representation. We don’t even bother to simulate touch, temperature, equilibrio-sense, etc. And the more detailed you get, the less data you have.

These senses can be described via text, but I’m highly skeptical that the learning outcomes will be the same.

valine · on Nov 25, 2023

>> Text and 2D images are a tiny subset of physical reality as perceived by an able-bodied human. Even our best approximation is a poor representation.

This is wrong. There’s nothing magical about human perception. You see the world because a 2D image is projected onto your retina.

GPT-4 was trained on text and generalized the ability to output 2D images. There’s absolutely nothing to suggest text can’t generalize further to new modalities. GPT4 is forced to serialize images as SVGs to output them (a crazy emergent ability btw), but that demonstrates an inherent spatial reasoning capability baked into the model.

GPT4V was created with a transfer learning step where image embeddings are passed as input in place of text. That’s further evidence of models ability to generalize to new modalities.

Everything you need to do multimodal input and output is already trained in, GPT-4V I’m sure is just the start.

foooorsyth · on Nov 25, 2023

>GPT-4 was trained on text

And it shows. It has a poor grasp of reality. It does a poor job with complex tasks. It cannot be trusted with specialized tasks typically done by expert humans. It is certainly an amazing technical achievement that does a decent job with simple tasks requiring cursory knowledge, but that’s all it is at this time.

>There’s absolutely nothing to suggest text can’t generalized further to new modalities

Inversion of burden of proof.

valine · on Nov 25, 2023

>> Inversion of burden of proof

Nope. OpenAI has already demonstrated the ability to generalize GPT4 to a new modality. Your claim that text models can only generalize to images and not other modalities is utterly unconvincing. Explain to me why vision is so much different than say audio?

>> And it shows. It has a poor grasp of reality. It does a poor job with complex tasks.

GPT4 is a proof of concept more than anything. I’m excited to see how much reliability improves over time. It’s grasp of reality isn’t prefect, but at least it understands how burden of proof works.

foooorsyth · on Nov 25, 2023

>GPT4 is a proof of concept more than anything

Hilarious walk-back. “Text can generalize anything” —-> “It’s just a demo, bro” in the same post.

Lmao

valine · on Nov 25, 2023

I walked back nothing. OpenAI was surprised by the mass adoption of ChatGPT, they saw it as an early technical preview.

I don’t understand why some people have a such hard time envisioning the potential of new technologies without a polished end product in their hands. Imagine if AI researchers had the same attitude.

Technology can be both real and unpolished at the same time. Those two things are not contradictory.

rational_indian · on Nov 25, 2023

A two year old learns faster because it has inherited training data from its ancestors in the form of evolutionary memory. Think of it as a BIOS for human beings. The LLM takes longer to learn because we are building this BIOS for it. Remember it took billions of years for the human BIOS to be developed.

orbital-decay · on Nov 24, 2023

Definitions, again. OpenAI defines AGI as highly autonomous agents that can replace humans in most of the economically important jobs. Those don't need to look or function like humans.

hackinthebochs · on Nov 24, 2023

LLMs as we currently understand them won't reach AGI. But AGI will very likely have an LLM as a component. What is language but a way to represent arbitrary structure? Of course that's relevant to AGI.

janalsncm · on Nov 24, 2023

Is it possible they were referring to this research they published in May?

https://openai.com/research/improving-mathematical-reasoning...

urbandw311er · on Nov 24, 2023

See also https://news.ycombinator.com/item?id=38407741

adamnemecek · on Nov 24, 2023

Both RL and A* are both approaches to dynamic programming, this would not be surprising.

fizx · on Nov 24, 2023

The most likely hypothesis I've seen for Q*:

https://twitter.com/alexgraveley/status/1727777592088867059

ben_w · on Nov 24, 2023

I definitely need to blog more. A* search with a neural network as the heuristic function seemed like a good idea to investigate… a month or two ago, and I never got around to it.

haltist · on Nov 24, 2023

I have an idea for a great AI project and it's about finding the first logical inconsistency in an argument about a formal system like an LLM. I think if OpenAI can deliver that then I will believe they have achieved AGI.

I am a techno-optimist and I believe this is possible and all I need is a lot of money. I think $80B would be more than sufficient. I will be awaiting a reply from other techno-optimists like Marc Andreesen and those who are techno-optimist adjacent like millionaires and billionaires that read HN comments.