Hacker Newsnew | past | comments | ask | show | jobs | submit | geoffhill's commentslogin

One of my agents is kinda like this too. The only operation is SPARQL query, and the only accessible state is the graph database.

Since most of the ontologies I'm using are public, I just have to namedrop them in prompt; no schemas and little structure introspection needed. At worst, it can just walk and dump triples to figure out structure; it's all RDF triples and URIs.

One nice property: using structured outputs, you can constrain outputs of certain queries to only generate valid RDF to avoid syntax errors. Probably can do similar stuff with GraphQL.


Both Clang and GCC have musttail attributes than can force tail calls at specific return statements in C/C++.


Idk, `o3-mini-high` was able to pop this Prolog code out in about 20 seconds:

  solve(WaterDrinker, ZebraOwner) :-
      % H01: Five houses with positions 1..5.
      Houses = [ house(1, _, norwegian, _, _, _),  % H10: Norwegian lives in the first house.
                 house(2, blue, _, _, _, _),       % H15: Since the Norwegian lives next to the blue house,
                 house(3, _, _, milk, _, _),        %       and house1 is Norwegian, house2 must be blue.
                 house(4, _, _, _, _, _),
                 house(5, _, _, _, _, _) ],
  
      % H02: The Englishman lives in the red house.
      member(house(_, red, englishman, _, _, _), Houses),
      % H03: The Spaniard owns the dog.
      member(house(_, _, spaniard, _, dog, _), Houses),
      % H04: Coffee is drunk in the green house.
      member(house(_, green, _, coffee, _, _), Houses),
      % H05: The Ukrainian drinks tea.
      member(house(_, _, ukrainian, tea, _, _), Houses),
      % H06: The green house is immediately to the right of the ivory house.
      right_of(house(_, green, _, _, _, _), house(_, ivory, _, _, _, _), Houses),
      % H07: The Old Gold smoker owns snails.
      member(house(_, _, _, _, snails, old_gold), Houses),
      % H08: Kools are smoked in the yellow house.
      member(house(_, yellow, _, _, _, kools), Houses),
      % H11: The man who smokes Chesterfields lives in the house next to the man with the fox.
      next_to(house(_, _, _, _, _, chesterfields), house(_, _, _, _, fox, _), Houses),
      % H12: Kools are smoked in a house next to the house where the horse is kept.
      next_to(house(_, _, _, _, horse, _), house(_, _, _, _, _, kools), Houses),
      % H13: The Lucky Strike smoker drinks orange juice.
      member(house(_, _, _, orange_juice, _, lucky_strike), Houses),
      % H14: The Japanese smokes Parliaments.
      member(house(_, _, japanese, _, _, parliaments), Houses),
      % (H09 is built in: Milk is drunk in the middle house, i.e. house3.)
      
      % Finally, find out:
      % Q1: Who drinks water?
      member(house(_, _, WaterDrinker, water, _, _), Houses),
      % Q2: Who owns the zebra?
      member(house(_, _, ZebraOwner, _, zebra, _), Houses).
  
  right_of(Right, Left, Houses) :-
      nextto(Left, Right, Houses).
  
  next_to(X, Y, Houses) :-
      nextto(X, Y, Houses);
      nextto(Y, X, Houses).
Seems ok to me.

   ?- solve(WaterDrinker, ZebraOwner).
   WaterDrinker = norwegian,
   ZebraOwner = japanese .


That's because it uses a long CoT. The actual paper [1] [2] talks about the limitations of decoder-only transformers predicting the reply directly, although it also establishes the benefits of CoT for composition.

This is all known for a long time and makes intuitive sense - you can't squeeze more computation from it than it can provide. The authors just formally proved it (which is no small deal). And Quanta is being dramatic with conclusions and headlines, as always.

[1] https://arxiv.org/abs/2412.02975

[2] https://news.ycombinator.com/item?id=42889786


LLMs using CoT are also decoder-only, it's not a paradigm shift like people want to claim now to don't say they were wrong: it's still next token prediction, that is forced to explore more possibilities in the space it contains. And with R1-Zero we also know that LLMs can train themselves to do so.


That’s a different paper than the one this article describes. The article describes this paper: https://arxiv.org/abs/2305.18654


The article describes both papers.


A paper that came out 15 months ago?


Yes! That one's linked in paragraph three.


gpt-4o, asked to produce swi-prolog code, gets the same result using a very similar code. gpt4-turbo can do it with slightly less nice code. gpt-3.5-turbo struggled to get the syntax correct but I think with some better prompting could manage it.

COT is defiantly optional. Although I am sure all LLM have seen this problem explained and solved in training data.


This doesn't include Encoder-Decoder Transformer Fusion for machine translation, or Encoder-Only like text classification, named entity recognition or BERT.


Also, notice that the original study is from 2023.


The LLM doesn't understand it's doing this, though. It pattern matched against your "steering" in a way that generalized. And it didn't hallucinate in this particular case. That's still cherry picking, and you wouldn't trust this to turn a $500k screw.

I feel like we're at 2004 Darpa Grand Challenge level, but we're nowhere near solving all of the issues required to run this on public streets. It's impressive, but leaves an enormous amount to be desired.

I think we'll get there, but I don't think it'll be in just a few short years. The companies hyping that this accelerated timeline is just around the corner are doing so out of existential need to keep the funding flowing.


Solving it with Prolog is neat, and a very realistic way of how LLMs with tools should be expected to handle this kind of thing.


I would've been very surprised if Prolog to solve this wasn't something that the model had already ingested.

Early AI hype cycles, after all, is where Prolog, like Lisp, shone.



I'm certain models like o3-mini are capable of writing Prolog of this quality for puzzles they haven't seen before - it feels like a very straight-forward conversion operation for them.


My comment got eaten by HN, but I think LLMs should be used as the glue between logic systems like prolog, with inductive, deductive and abductive reasoning being handed off to a tool. LLMs are great at pattern matching, but forcing them to reason seems like an out of envelope use.

Prolog would be how I would solve puzzles like that as well. It is like calling someone weak for using a spreadsheet or a calculator.

Abductive Commonsense Reasoning Exploiting Mutually Exclusive Explanations https://arxiv.org/abs/2305.14618


I actually coincidentally tried this yesterday on variants of the "surgeon can't operate on boy" puzzle. It didn't help, LLMs still can't reliably solve it.

(All current commercial LLMs are badly overfit on this puzzle, so if you try changing parts of it they'll get stuck and try to give the original answer in ways that don't make sense.)


What do you mean by you tried it?


Generated some Prolog programs and looked at them and they were wrong.

Specifically, it usually decides it knows what the answer is (and gets it wrong), then optimizes out the part of the program that does anything.


I've been saying this ever since GPT 3 came out and I started toying with it.

It's unfortunate that for all the people who work in AI most of them barely even know what Prolog is.


It seems quite logical to me as well. An LLM is not a logical computing system but it has the knowledge on how to do a multiplication


I’ve used DeepSeek for verifying a couple gnarly boolean conditions in home assistant with z3 and it did a good job, though it didn’t one shot it.


I used a Knights and Knaves puzzle generator last month to test 4o / Claude 3.5 and all failed on novel puzzles


Hey, I'm interested in the details of this. How many persons in the puzzle? Did it include nested statements, conditionals and such?

If the puzzle generator is hosted anywhere, I'd love to have a look at it.


If the LLM’s user indicates that the input can and should be translated as a logic problem, and then the user runs that definition in an external Prolog solver, what’s the LLM really doing here? Probabilistically mapping a logic problem to Prolog? That’s not quite the LLM solving the problem.


Do you feel differently if it runs the prolog in a tool call?


Not the user you’re replying to, but I would feel differently if the LLM responded with “This is a problem I can’t reliably solve by myself, but there’s a logic programming system called Prolog for which I could write a suitable program that would. Do you have access to a Prolog interpreter, or could you give me access to one? I could also just output the Prolog program if you like.”

Furthermore, the LLM does know how Prolog’s unification algorithm works (in the sense that it can provide an explanation of how Prolog and the algorithm works), yet it isn’t able to follow that algorithm by itself like a human could (with pen and paper), even for simple Prolog programs whose execution would fit into the resource constraints.

This is part of the gap that I see to true human-level intelligence.


But the problem is solved. Depends what you care about.


Psst, don't tell my clients that it's not actually me but the languages syntax i use, that's solving their problem.


So you asked an LLM to translate. It excells in translation. But ask it to solve and it will, inevitably, fail. But that's also expected.

The interesting question is: Given a C compiler and the problem, could an LLM come up with something like Prolog on its own?


I think it could even solve, these kinds of riddles are heavily trained


Then what about new, unseen riddles that don't have a similar pattern to existing ones? That's the question people are asking.


If an LLM can solve a riddle of arbitrary complexity that is not similar to an already-solved riddle, have the LLM solve the riddle "how can this trained machine-learning model be adjusted to improve its riddle-solving abilities without regressing in any other meaningful capability".

It's apparent that particular riddle is not presently solved successfully by LLMs, as if it were solved, humans would be having LLMs improve themselves in the wild.

So, constructively, there exists at least one riddle that doesn't have a pattern similar to existing ones, where that riddle is unsolvable by any existing LLM.

If you present a SINGLE riddle an LLM can solve, people will reply that particular riddle isn't good enough. In order to succeed they need to solve all the riddles, including the one I presented above.


Unfortunately, that's a "could an omipotent god create a boulder so heavy he can't move it" level of "logic puzzle" and does your argument no favors.


It's quite the opposite. Converting to words like yours, the argument is "could a powerful but not omnipotent god make themself more powerful", and the answer is "probably".

If the god cannot grant themself powers they're not very powerful at all, are they?


Science is not in the proving of it.

It’s in the disproving of it, and in the finding of the terms that help others understand the limits.

I dont know why it took me so long to come to that sentence. Yes, everyone can trot out their core examples that reinforce the point.

The research is motivated by these examples in the first place.


Good point. LLMs can be treated as "theories" and then they definitely meet falsifiability [1] allowing researchers finding "black swans" for years to come. Theories in this case can be different. But if the theory is of logical or symbolic solver then Wolfram's Mathematica may be struggle with understanding the human language as an input, but when evaluating the results, well, I think Stephen (Wolfram) can sleep soundly, at least for now

[1] https://en.wikipedia.org/wiki/Falsifiability


I'd say not only LLM stuggle with these kind of problems, 99% of humans do.


    solve (make me a sandwich)
Moravec's Paradox is still a thing.


Can it port sed to java? I just tried to do that in chatgippity and it failed


I found myself empathizing more with MrBeast and his kid audience than the grump who wrote this article.


Care to elaborate? It’s pretty reasonable to be emotional if you feel like your loved ones are being manipulated, especially if they’re children. The article isn’t even that grumpy that it doesn’t deserve reasonable considerations from readers.


> When we use the $10B satellite that took 25 years to build, we must be secretive about why we point it at a certain part of the sky.

> Then we must gatekeep all data for up to a year, in order to better take personal credit for the findings.

> This will help accelerate scientific progress in the long run.

Statements dreamed up by the utterly deranged.


You're gonna want that TrueCoat though...


> not copying and pasting code

https://news.ycombinator.com/item?id=27710287


For cheap esp32-based smartwatches and wristbands, I've had good success with LILYGO and the esp-idf toolchain.

$18 T-Wristband: https://www.aliexpress.com/item/4000527495064.html

$26 T-Watch-2020: https://www.aliexpress.com/item/4000971508364.html

esp-idf: https://github.com/espressif/esp-idf


I use NewsBlur in Firefox and via the Android app, and I'm very happy with it.


+1 for Newsblur, also open source! https://github.com/samuelclay/NewsBlur

I dearly miss Google Reader but that's the best replacement (and even better now than Reader was)


I came to this post through newsblur, which does a decent job on sites that doesn't give a full feed.

Though I would want to try out a local reader that works offline, just because that would be nice.


I like popcount for converting a 2^N-bit uniformly-distributed random number into a N-bit binomially-distributed one. Each bit of the input simulates a random coin flip.


You are a wasting a lot of random bits this way, don't you?


Not if you already have 2^n bits at hand. In fact, if you have 2^n bits of entropy, popcount is probably more efficient than generating n more bits randomly.


Sure, but generating random bits is fast with e.g. AES-NI, RdRand or a software implementation of e.g. ChaCha.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: