Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Wasting time and having to be constantly vigilant is exhausting and a slippery slope that makes it easier to fall for deceptive content and settling for "I don't know, it's probably close enough" instead of insisting on precision and accuracy.

Humans take a lot of shortcuts (such as believing more easily the same facts presented with a confident tone) and the "firehose of bs" exploits it: this was already the case before generative AI, but AI amplifies the industrial-scale imbalance between the time needed to generate partially incorrect data and the amount of time/energy required to validate.



Agreed that it is a slippery slope. Programming is understanding - like writing or teaching is understanding. To really understand something, we must construct it ourselves. We will be inclined to skip this step. This comment sums it up well:

> Salgat 8 days ago

> The problem with ML is that it's pattern recognition, it's an approximation. Code is absolute, it's logic that is interpreted very literally and very exactly. This is what makes it so dangerous for coding; it creates code that's convincing to humans but with deviations that allow for all sorts of bugs. And the worst part is, since you didn't write the code, you may not have the skills (or time) to figure out if those bugs exist

https://news.ycombinator.com/item?id=34140585


> To really understand something, we must construct it ourselves.

I think the real power of these bots will be to lead us down this path, as opposed to it doing everything for us. We can ask it to justify and explain its solution and it will do its best. If we're judicious with this we can use it to build our own understanding and just trash the AI's output.


How is that worse than having to look at every online post's date to estimate whether the solution is out of date? Or two StackOverflow results where one is incorrectly marked as duplicate and in the other the person posting the answer is convinced that the question is wrong.

ChatGPT can completely cut out the online search and give an answer directly about things like compiler errors, and elaborate further on any detail in the answer. I think that 2-3 further GPT generations down the line it will be worth the time for some applications.

The problem I see is less the overall quality of responses but people overestimating on where it can be used productively. But that will always be a problem with new tech, see Tesla drivers who regularly take a nap in the car because it didn't crash yet.


Unless the responses in those old online forums where intentionally malicious, they might be reasonably helpful even if not 100%.

While ChatGPT spews out complete nonsense most of the time. And the dangerous part is that that nonsense looks very reasonable. It gets very frustrating after some time, because at first you are always happy that it gave you a nice solution, but then it's not usable at all.


I'm a glass-half-empty sort of person: in my experience, even perfectly good answers for a different version can be problematic, and sometimes harmful.


Unless the training of ChatGPT has a mechanism to excise the influence of now out-of-date training input, it will become increasingly more likely to give an outdated response as time goes by. Does its training have this capability?


Yes.

The trick is to use it as an LLM and not a procedural, transactional data set.

For instance, “how do I create a new thread in Python”. Then ask “how do I create a new thread in Python 3.8”. The answers will (probably) be different.

Any interface to chatgpt or similar can help users craft good prompts this way. It just takes thinking about the problem a little differently.

One wildly inefficient but illustrative approach is to use chatgpt itself to optimize the queries. For the Python threading example, I just asked it “ A user is asking a search engine ‘how do I create threads in Python’. What additional information will help ensure the results are most useful to the user?”.

The results:

> The user's current level of programming experience and knowledge of Python

> The specific version of Python being used

> The desired use case for the threads (e.g. parallel processing, concurrent execution)

> Any specific libraries or modules the user wants to use for thread creation

> The operating system the user is running on (as this may affect the availability of certain threading options)

So if you imagine something like Google autocomplete, but running this kind of optimization advice while the user builds their query, the AI can help guide the user to being specific enough to get the most relevant results.


I understand this works well in many practical cases, but it seems to depend on a useful fraction of the training material making the version distinction explicit, which is particularly likely with Python questions since the advent of Python 3.

One concern I have goes like this: I seriously doubt that current LLMs are capable of anything that could really be called an understanding of the significance of the version number[1], but I would guess that it characterizes the various Python-with-versions strings it has seen as being close[2] so I can imagine it synthesizing an answer that is mostly built from facts about Python2.7. With a simple search engine, you can go directly to checking the source of the reply, and dig deeper from there if necessary, but with an LLM, that link is missing.

[1] The fact that it listed the version as being a factor in reply to your prompt does not establish that it does, as that can be explained simply by the frequency with which it has encountered sentences stating its importance.

[2] If only on account of the frequency with which they appear in similar sentences (though the whole issue might be complicated by how terms like 'Python3.8' are tokenized in the LLM's training input.)


It's all imperfect, for sure. For for instance see this old SO question [1], which does not specify python version. I pasted the text of the question and top answer into GPT-3 and prefaced it with the query "The following is programming advice. What is the langauge and version it is targeted at, and why?"

GPT-3's response:

> The language and version targeted here is Python 3, as indicated by the use of ThreadPoolExecutor from the concurrent.futures module. This is a module added in Python 3 and can be installed on earlier versions of Python via the backport in PyPi. The advice is tailored to Python 3 due to the use of this module.

That's imperfect, but I'm not trying to solve for Python specifically... just saying that the LLM itself holds the data a query engine needs to schematize a query correctly. We don't ChatGPT to understand the significance of version numbers in some kind of sentient way, we just need it to surface that "for a question like X, here is the additional information you should specify to get a good answer". And THAT, I am pretty sure, it can do. No understanding required.

1. https://stackoverflow.com/questions/30812747/python-threadin...


I don't think the issue is whether current LLMs have sufficient data, but whether they will be able to use it sufficiently well to make an improvement.

The question you posed GPT-3 here is a rather leading one, unlikely to be asked except by an entity knowing that the version makes a significant difference in this context, and I am wondering how you envisage this being integrated into Bing.

One way I can imagine is that if the user's query specified a python version, a response like that given by GPT-3 in this case might be used in ranking the candidate replies for relevance: reject it if the user asked about python 2, promote it if python 3 was asked for.

Another way I can imagine for Bing integration is that perhaps the LLM can be prompted with something like "what are the relevant issues in answering <this question> accurately?" in order to interact with the user to strengthen the query.

In either case, Bing's response to the user's query would be a link to some 3rd-party work rather than an answer created by the LLM, so that would answer my biggest concern over being able to check its veracity, though its usefulness would depend on the quality of the LLM's reply to its prompts.

On the other hand, the article says "Microsoft is betting that the more conversational and contextual replies to users’ queries will win over search users by supplying better-quality answers beyond links", apparently saying that they envision giving the user a response created by the LLM, which brings the question of verifiability back to center stage. Did you have some other form of Bing-LLM interaction in mind?


The problem I have with ChatGPT is that it doesn't give me any context to its answer or provide actual resources. Cite your darn sources already.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: