> Having LLMs book flights by interacting with the DOM is sort of like having th...

jonplackett · 2025-08-26T21:42:46 1756244566

It also surely leaves more room for prompt injection that the user can’t see

mikepurvis · 2025-08-26T23:58:22 1756252702

I had the same thought that really an LLM should interact with a browser viewport and just leverage normal accessibility features like tabbing between form fields and links, etc.

Basically the LLM sees the viewport as a thumbnail image and goes “That looks like the central text, read that” and then some underlying skill implementation selects and returns the textual context from the viewport.