This path feels correct to me. It feel like what we do as humans and seems like ...

pixl97 · on May 26, 2023

I believe you are correct here, yet at the same time I think we're about 2 orders of magnitude off on the amount of compute power needed to do it effectively.

I think the first order of mag will be in tree of thought processing. The amount of additional queries we need to run to get this to work is at least 10x, but I don't believe 100x.

I think the second order of mag will be multimodal inference so the models can ground themselves in 'reality' data. Saying, "the brick layed on the ground and did not move" and "the brick floated away" are only deciable based on the truthfulness of all the other text corpus it's looked at. At least to me it gets even more interesting when you tie it into environmental data that is more likely to be factual, such as massive amounts of video.

sdwr · on May 26, 2023

Yeah looks very promising. Naively, it multiplies computation time by a factor of 20x though? If they are taking 5x samples per step, and multiple steps per problem.

https://imgur.com/a/VbpQZRm

As this gets explored further, I believe we will start finding out why human minds are constructed the way they are, from the practical/necessity direction. Seems like the next step is farming out subtasks to smaller models, and adding an orthogonal dimension of emotionality to help keep track of state.

GreedClarifies · on May 26, 2023

I’m sympathetic to the idea of new types of specialized models to assist in this effort. We’re using our one hammer for all problems.

In particular, it jumps out that a “ranking model” (different, I think from current ranking models) to judge which paths to take and which nodes to trim would make some level of sense.

joshka · on May 26, 2023

Not sure if it's relevant, but the OpenAI APIs generally support taking multiple responses in a single API call. I'm unsure what the generalized effect on processing time of that is however. From what I've read, this is sub-linear, so could reasonably be more effective than 20x, and I'd bet there are probably speedups to be had on the model side of this that make the extra time cost negligible.

gglon · on May 26, 2023

Andrej Karpathy would agree - recent talk: https://youtu.be/bZQun8Y4L2A

GreedClarifies · on May 26, 2023

Uhm wow. I was just talking about my feelings on the topic. I'm guessing he has way more data (and knowledge).

Better lucky than good!

(also, man he's awesome. How does he have such a strong grasp on all of the topics in the field?)