I had a brainwave recently. I was tired and looking at two XML documents which looked identical to me and I thought hey, let's see what ChatGPT thinks. So I asked it to describe to me what the differences were between the two documents and it immediately just started talking about elements that it had completely made up. Every time I asked it why it was doing that it apologised but then doubled-down on making even more stuff up. Eventually I asked it to show me what its understanding was of the two documents I was asking it to compare and it showed me two completely unrelated XML documents
I do not understand why people expect chatgpt to reason when all it is is a fancy probabilistic language model....
I guess humans have high bias towards trusting confident-sounding language despite what reason would tell us to do. That's how politicians and advertisment work anyway....
I just thought it would be interesting, given that it has an understanding of XML to see if it could do a simple diff, "by eye" if you will. Obviously I wasn't intending to trust its output. We of course have long standing trusted tools for diffing files.
Right, but consider its 'evaluation' during training: During training it is constantly seeing stuff where the context is out of the window and the correct completion confidently answers, so the model is trained to do the same.
I think this is very tricky to solve conceptually (since the human authors don't have the same input event horizon problem), but it could be (and has been) papered over by making the context bigger.
Same here, and to be honest I'm still not sure what the clean solution would be for this level. With that particular html structure and class/id usage there maybe isn't any…
Draw.io does this. When you export a diagram as a PNG. There is an option to embed the source file in the png. If you subsequently open one of those PNGs in Draw.io you can carry on editing it. I find it really handy.
This is my exact takeaway. I can't decide whether this article and many of the commenters are deliberately missing this point or whether it's actually not understood.