ChatGPT fabricates lots of stuff, it's deceptive for common queries, but for programming related output, it's easily verifiable and delivers as an extremely valuable search tool. I can easly ask ChatGPT to explain stuff e.g. eBPF details without wasting time looking up the manuals. I hope Bing dominates Google and stackoverflow in this.
It's easily verifiable, but it may still waste time. I've had many cases where ChatGPT makes up functions that do exactly what I need, but then I find out these functions don't actually exist. This may not happen very often for super popular languages like Python or Javascript where training data is huge, but it happens all the time for the long-tail of languages. In those cases, it would've been faster for me to do a regular search.
I do agree with the overall point though. If you understand when to use it and when it's more likely to give you nonsensical answers, it can save a huge amount of time. But when I ask it about a topic that I don't know enough about to immediately verify the answer myself I'm forced to double check the answers for validity, which kind of defeats the purpose.
The best queries to ChatGPT are cases where I know what the answer should look like, I just forgot the syntax or some details. Bash scripts or Kubernetes manifests are examples here, I know them, I just keep forgetting the keywords because I only touch them every few weeks.
And don't get me started about asking ChatGPT about more general topics in e.g. economics or finance. What you get is a well-written summary of popular news and reddit opinions, which is dangerous if it's presented as "the truth" - The big mistake here is that the training procedure assumes that the amount of data correlates with correctness, which isn't true for many topics that involve politics or similar kinds of incentives where people and news spread what conveniently benefits them and gets clicks.
Wasting time and having to be constantly vigilant is exhausting and a slippery slope that makes it easier to fall for deceptive content and settling for "I don't know, it's probably close enough" instead of insisting on precision and accuracy.
Humans take a lot of shortcuts (such as believing more easily the same facts presented with a confident tone) and the "firehose of bs" exploits it: this was already the case before generative AI, but AI amplifies the industrial-scale imbalance between the time needed to generate partially incorrect data and the amount of time/energy required to validate.
Agreed that it is a slippery slope. Programming is understanding - like writing or teaching is understanding. To really understand something, we must construct it ourselves. We will be inclined to skip this step. This comment sums it up well:
> Salgat 8 days ago
> The problem with ML is that it's pattern recognition, it's an approximation. Code is absolute, it's logic that is interpreted very literally and very exactly. This is what makes it so dangerous for coding; it creates code that's convincing to humans but with deviations that allow for all sorts of bugs. And the worst part is, since you didn't write the code, you may not have the skills (or time) to figure out if those bugs exist
> To really understand something, we must construct it ourselves.
I think the real power of these bots will be to lead us down this path, as opposed to it doing everything for us. We can ask it to justify and explain its solution and it will do its best. If we're judicious with this we can use it to build our own understanding and just trash the AI's output.
How is that worse than having to look at every online post's date to estimate whether the solution is out of date? Or two StackOverflow results where one is incorrectly marked as duplicate and in the other the person posting the answer is convinced that the question is wrong.
ChatGPT can completely cut out the online search and give an answer directly about things like compiler errors, and elaborate further on any detail in the answer. I think that 2-3 further GPT generations down the line it will be worth the time for some applications.
The problem I see is less the overall quality of responses but people overestimating on where it can be used productively. But that will always be a problem with new tech, see Tesla drivers who regularly take a nap in the car because it didn't crash yet.
Unless the responses in those old online forums where intentionally malicious, they might be reasonably helpful even if not 100%.
While ChatGPT spews out complete nonsense most of the time. And the dangerous part is that that nonsense looks very reasonable. It gets very frustrating after some time, because at first you are always happy that it gave you a nice solution, but then it's not usable at all.
I'm a glass-half-empty sort of person: in my experience, even perfectly good answers for a different version can be problematic, and sometimes harmful.
Unless the training of ChatGPT has a mechanism to excise the influence of now out-of-date training input, it will become increasingly more likely to give an outdated response as time goes by. Does its training have this capability?
The trick is to use it as an LLM and not a procedural, transactional data set.
For instance, “how do I create a new thread in Python”. Then ask “how do I create a new thread in Python 3.8”. The answers will (probably) be different.
Any interface to chatgpt or similar can help users craft good prompts this way. It just takes thinking about the problem a little differently.
One wildly inefficient but illustrative approach is to use chatgpt itself to optimize the queries. For the Python threading example, I just asked it “ A user is asking a search engine ‘how do I create threads in Python’. What additional information will help ensure the results are most useful to the user?”.
The results:
> The user's current level of programming experience and knowledge of Python
> The specific version of Python being used
> The desired use case for the threads (e.g. parallel processing, concurrent execution)
> Any specific libraries or modules the user wants to use for thread creation
> The operating system the user is running on (as this may affect the availability of certain threading options)
So if you imagine something like Google autocomplete, but running this kind of optimization advice while the user builds their query, the AI can help guide the user to being specific enough to get the most relevant results.
I understand this works well in many practical cases, but it seems to depend on a useful fraction of the training material making the version distinction explicit, which is particularly likely with Python questions since the advent of Python 3.
One concern I have goes like this: I seriously doubt that current LLMs are capable of anything that could really be called an understanding of the significance of the version number[1], but I would guess that it characterizes the various Python-with-versions strings it has seen as being close[2] so I can imagine it synthesizing an answer that is mostly built from facts about Python2.7. With a simple search engine, you can go directly to checking the source of the reply, and dig deeper from there if necessary, but with an LLM, that link is missing.
[1] The fact that it listed the version as being a factor in reply to your prompt does not establish that it does, as that can be explained simply by the frequency with which it has encountered sentences stating its importance.
[2] If only on account of the frequency with which they appear in similar sentences (though the whole issue might be complicated by how terms like 'Python3.8' are tokenized in the LLM's training input.)
It's all imperfect, for sure. For for instance see this old SO question [1], which does not specify python version. I pasted the text of the question and top answer into GPT-3 and prefaced it with the query "The following is programming advice. What is the langauge and version it is targeted at, and why?"
GPT-3's response:
> The language and version targeted here is Python 3, as indicated by the use of ThreadPoolExecutor from the concurrent.futures module. This is a module added in Python 3 and can be installed on earlier versions of Python via the backport in PyPi. The advice is tailored to Python 3 due to the use of this module.
That's imperfect, but I'm not trying to solve for Python specifically... just saying that the LLM itself holds the data a query engine needs to schematize a query correctly. We don't ChatGPT to understand the significance of version numbers in some kind of sentient way, we just need it to surface that "for a question like X, here is the additional information you should specify to get a good answer". And THAT, I am pretty sure, it can do. No understanding required.
I don't think the issue is whether current LLMs have sufficient data, but whether they will be able to use it sufficiently well to make an improvement.
The question you posed GPT-3 here is a rather leading one, unlikely to be asked except by an entity knowing that the version makes a significant difference in this context, and I am wondering how you envisage this being integrated into Bing.
One way I can imagine is that if the user's query specified a python version, a response like that given by GPT-3 in this case might be used in ranking the candidate replies for relevance: reject it if the user asked about python 2, promote it if python 3 was asked for.
Another way I can imagine for Bing integration is that perhaps the LLM can be prompted with something like "what are the relevant issues in answering <this question> accurately?" in order to interact with the user to strengthen the query.
In either case, Bing's response to the user's query would be a link to some 3rd-party work rather than an answer created by the LLM, so that would answer my biggest concern over being able to check its veracity, though its usefulness would depend on the quality of the LLM's reply to its prompts.
On the other hand, the article says "Microsoft is betting that the more conversational and contextual replies to users’ queries will win over search users by supplying better-quality answers beyond links", apparently saying that they envision giving the user a response created by the LLM, which brings the question of verifiability back to center stage. Did you have some other form of Bing-LLM interaction in mind?
I am foreseeing a future in which programming language designers match the most sought after functions in google/bing/chatgpt and then implement those that do not yet exist because apparently there is a real need for those.
Yes, I had the same thought. LLM’s might be instrumental in new language design. If it can understand the most common structures being used, it makes sense to build libraries, macros, or language features.
I agree. ChatGPT is really really bad. It just makes up stuff and wraps its fabrications in an air of authority.
A "bullshit sandwich" if you will.
When one tells people this we get the reply "but so do random blogs! or reddit comments!". Well yes, but they're just random blogs and reddit comments, often peppered with syntactic and spelling mistakes, non sequiturs, and other absurdities. Nobody would take them seriously.
ChatGPT is very different. It doesn't say "this random redditor says this, and this other random redditor says the exact opposite, so IDK, I'm just a machine, please make up your mind".
What it says is "this is the absolute truth that I, a 'large language model', have been able to extract from the vast amount of information I have been trained on. You can rely on it with confidence."
I'm sorry to sound hyperbolic but this cannot end well.
I like bouncing my code problems off ChatGPT, it can give me an answer and I don't feel bad if I forgot something simple. The issue is I've had it give me completely wrong code only for it to be like "I'm sorry" and provide a second incorrect response.
ChatGPT doesn't say anything of the sort. In fact, it will vehemently insist that what it says is not necessarily true or accurate if you challenge it.
I'm sorry but this is demonstrably false. I have posted examples of this on HN before. Yes, if you tell ChatGPT that it's wrong, in some cases it says "I'm sorry" and tries again (and produces some other random guess). But if you ask it "are you sure?" it invariably affirms that yes, it's sure and it's in the right.
Hm, you're right. I'm pretty sure that it wasn't so gung-ho when I played with it earlier, but now even very explicit instructions along the lines of "you should only answer "yes" if it is absolutely certain that this is the correct answer" still give this response. Ditto for prompts like "is it possible that your answer was incorrect?"
Using a purpose built (or trained I guess) model for code generation would likely have better results. GitHub copilot is useful for this reason. I find ChatGPT for code is mainly useful if you want to instruct it in natural language to make subsequent changes to the output.
If you ask, there's a good chance ChatGPT can create that function for you. Just tell it: "That function `xyz()` doesn't exist in the library, can you write it for me?"
I had a lot of fun with ChatGPT’s wholly fabricated but entirely legitimate-sounding descriptions of different Emacs packages (and their quite detailed elisp configuration options) for integrated cloud storage, none of which exist.
I’m not sure that fabricated nonsense would actually make Bing’s results any worse than they are today.
“It’s okay I don’t mind verifying all these answers myself” is an odd sort of sentiment, and also inevitably going to prove untrue in one sense or another.
If it generated the code, I would have to audit that code for correctness/safety/etc.
Or, more likely, I would just lazily assume everything is fine and use it anyway, until one day the unexamined flaws destroyed something costly in a manner difficult to diagnose because I didn't bother to actually understand what it was doing.
There really should be more horror at the imminent brief and temporary stint of humans as editors, code reviewers, whatever, over generative AI mechanisms (temporary because that will be either automated or rendered moot next). I'm unaware of any functional human societies that have actually reached the "no one actually has to work unless they want to do so, because technology" state, so this is an interesting transition, for sure.
> Or, more likely, I would just lazily assume everything is fine and use it anyway, until one day the unexamined flaws destroyed something costly in a manner difficult to diagnose because I didn't bother to actually understand what it was doing.
Well yeah, I'm right there with you. But that feels a lot like any software, open or closed source. Human programmers on average are better than AI programming today, but human programmers aren't improving as fast as AI is. Ten years from now, AI code will be able to destroy your data in far more unpredictable and baroque ways than some recent CS grad.
> I'm unaware of any functional human societies that have actually reached the "no one actually has to work unless they want to do so, because technology" state, so this is an interesting transition, for sure.
This is a really interesting thought. Are we seeing work evaporate, or just move up the stack? Is it still work if everyone is just issuing natural language instructions to AI? I think so, assuming you need the AI's output in order to get a paycheck which you need to live.
Then again, as a very long time product manager, I'm relatively unfazed by the current state of AI. The hundreds of requirements docs I've written over decades of work were all just prompt engineering for human developers. The exact mechanism for converting requirements to product is an implementation detail ;)
It does such a good job at giving answers that sound right, and are almost correct.
I could imagine losing many hours from a ChatGPT answer. And if you have to go through the trouble to verify everything it says to make sure it's not just making crap up, then imo it loses much value as a tool.
It shows how form matters more than substance. Say real information in some poor structure and people will think you're wrong
Say incorrect stuff authoritatively and people will think you're right.
It happens to me all the time. I can't structure accurate information in a better way then some bullshit artist can spit off what they imagine to be real so everyone walks away believing in their haughty nonsense.
ChatGPT exploits that phenomena which is why it sounds like some overly confident oblivious dumb dumb all the time. That's the training set.
Almost once a week I'll go through a reddit thread and find someone deep in the negatives who has clearly done their homework and is extraordinarily more informed than anyone else but the problem is everyone else commenting is probably either drunk or a teenager or both so it doesn't matter.
Stuff is hard and people are mostly wrong. That's why PhDs take years and bars for important things are set so high
But so do people: I spent an hour yesterday trying regexps that multiple people on Stackoverflow confirmed would definitely do what I needed, and guess what? They did not do what I needed.
Same with copilot. Sometimes it's ludicrously wrong in ways that sound good. I still have to do my job and make sure they are right. But it's right or right enough to save me significant effort at least 75% of the time. Right enough to at least point me in the right direction or inspire me at least 90% of the time.
Self Reply: I just now thought to use Copilot to get my regex and wow! I described it in a comment and it printed me one that was only two characters off, and now I have what I needed yesterday. I'd since solved the problem without a regex.
It's not perfect, but sometimes its amazing. In your case, not only did it provide the right solution, but it was about as fast as theoretically possible. About as fast as if you already knew the answer.
I had a similar experience with a shell command. Searched google, looked at a few posts, wasnt exactly what I needed but close. Modified it a few times and got it working. Went to save the command in a markdown file and when I explained what the command did, copilot made a suggestion for it. It was correct and also much simpler.
It went from taking 5-10 minutes to stumble through something just so I could do the thing I really wanted to do, to finding the answer instantly all from within the IDE. Can keep you in flow.
They released a zero day for a security hole in the human brain. That's what ChatGPT is. The security hole is well known and described perhaps the most understandable format is the book Thinking Fast And Slow which describes, ah, if I try to explain I will surely botch but perhaps put it this way: how things that appear more credible will be deemed credible because of the "fast" processes in our brains.
In this particular case, ChatGPT will write something nonsensical which people will accept more easily because of the way it is written. This is inevitable and extremely dangerous.
> Humans are still a lot better at writing something nonsensical that people will accept easily because of the way it's written.
Some are but not many. And then there's the amount. That's the crux of the matter. Have you seen that Aza Raskin interview where he posited one could ask the AI to write a thousand papers citing previous research against vaccines and then another thousand pro-vaccines? No human can do that.
> People are just as good at making up convincing sounding nonsense.
Perhaps as you just did, as I can find no one actually "injecting themselves with bleach."
The overall point stands: the difference between reading something dumb and doing that dumb thing is what it means to have agency. I personally don't think we should optimize the world 100% to prevent people who read something stupid from doing that stupid thing.
Or, if that's the path we're going to take, maybe we should first target things like the show Ridiculousness before we start talking about AI. After all, someone might do something dumb they see on TV!
People have absolutely injected themselves with what's known as "Miracle Mineral Solution", which is essentially bleach. It's more frequently drunk, of course.
I dunno, verifying and adjusting an otherwise complete answer is a lot more rote than originating what that answer would be, and I think that has value.
>It does such a good job at giving answers that sound right, and are almost correct.
For sure. But you have to compare against alternatives.
What would that be? Posting to stack overflow and maybe getting a helpful reply within 48 hours.
> I could imagine losing many hours from a ChatGPT answer.
Dont trust it. Verify it.
We expect to ask a question and get a good answer. In reality we should leverage how cheap the answers are.
Knowing little about how ChatGPT actually works, is there perhaps a variable that could be exposed, something that would represent the model's confidence in the solution provided?
I'd say you can't do that, because ChatGPT has no internal model for how the things it is explaining work; so there can't be any measure of closeness to the topic described, as would be the case for classification AIs.
ChatGPT models are language models; they represent closeness between text utterances. It works by looking for the chains of words most similar or usually connected to those indicated in the prompt, with no understanding of what those words mean.
As a metaphor, think of an intern who every morning is asked to buy all the newspapers in paper form, cut out the news sentence by sentence, and put all the pieces of paper in piles grouped according to the words they contain.
Then, the director requests to write a news item on the increase in interest rates. The intern goes to the pile where all the snippets about interest rates are placed, will randomly get a bunch of them, and write a piece by linking the fragments together.
The intern has a PhD in English, so it is easy for them to adjust the wording to ensure consistency; and the topics more talked about will appear more often in the snippets, so the ones chosen are more likely to deal with popular issues. Yet the ideas expressed are a collection of concepts that might have made sense in their original context, but have been decontextualized and put together pell-mell, so there's no guarantee that they're saying anything useful.
> ChatGPT models are language models; they represent closeness between text utterances. It works by looking for the chains of words most similar or usually connected to those indicated in the prompt, with no understanding of what those words mean.
No, it does not work that way. That’s how base GPT3 works. ChatGPT works via RLHF and so we don’t “know” how it decides to answer queries. That’s kind of the problem.
I don't think so. It doesn't understand what it says, it basically does interpolation between text it copy-pastes in a very impressive manner. Still it does not "understand" anything, so it cannot have any kind of confidence.
Take Stable Diffusion for instance: it can interpolate a painting from that huge dataset it has, and sometimes output a decent result that may look like what a good artist would do. But it doesn't have any kind of "creative process". If it tells you "I chose this theme because it reflects this deep societal problem", it will just be pretending.
It may not matter if all you want is a nice drawing, but when it's about, say, engineering, that's quite different.
It's not available for ChatGPT but the other GPT models can expose the probability for each generated token, which can serve as a proxy for confidence.
Tuning the temperature and topP parameters you can also make the model avoid low probability completions (useful for less creative use cases where you need exact answers).
> It's not available for ChatGPT but the other GPT models can expose the probability for each generated token, which can serve as a proxy for confidence.
A proxy for confidence in what exactly?
Language models represent closeness of words, so a high probability would only express that those words are put together frequently in the corpus of text; not that their meanings are at all relevant to the problem at hand. Am I wrong?
In cases where you ask GPT-3 questions that have a clear correct answer, I think you can use the probability to judge how correct the answer is. For example, when asking "How tall is Mount Everest?" I would want the completion "Mount Everest is ____ meters above sea level." to have a very high probability for the ____ tokens.
This is because I'm operating under the assumption that sequences of words that appear often in the training set are more likely to represent something correct (otherwise you might as well train on random words). This only holds if the training set is big enough that you can estimate correctly (e.g. if the training set is small a very rare/wrong phrase may appear very often).
Maybe confidence was the wrong word, but for this kind of questions I would trust a high-probability answer way more than a low one. For questions belonging to very specific subjects, where training material is scarce, the model might have very skewed probabilities so they become less useful.
> In cases where you ask GPT-3 questions that have a clear correct answer, I think you can use the probability to judge how correct the answer is. For example, when asking "How tall is Mount Everest?" I would want the completion "Mount Everest is ____ meters above sea level." to have a very high probability for the ____ tokens.
Maybe, as long as you're aware that this is the same kind of correctness that you get from looking at Google's first search results (the old kind of organic pages, not the "knowledge graph", which uses an different process - precisely to avoid being spammed by SEO) i.e. "correctness by popularity".
This means that the content that is more replicated will be considered more true by the system, regardless of its connection to reality or its coherence with the rest of the knowledge in the system. And you know what they say about big enough lies that you keep repeating millions of times.
I agree, and furthermore, a search engine is constrained to pick its responses from what's already out there.
This line of thought is a distraction, anyway. The likelehood that GPT-3 will do as well as a search engine on topics where there is an unambiguous and well-known answer does little to address the more general concern.
> This means that the content that is more replicated will be considered more true by the system, regardless of its connection to reality or its coherence with the rest of the knowledge in the system.
I understand the problem, but what better way do we currently have to measure its connection to reality? At least from a practical point of view it seems that LLMs have achieved way better performance than other methods in this regard, so repeatedness doesn't look like that bad a metric. Or rather, it's the best I think we currently have.
> I understand the problem, but what better way do we currently have to measure its connection to reality?
We can consider its responses to a broader range of questions than those having an unambiguous and well-known answer. Its propensity for making up 'facts', and for fabricating 'explanations' that are incoherent or even self-contradictory shows that any apparent understanding of the world being represented in the text is illusory.
This resonates with me. We have all worked with someone who is a superb bullshitter, 100% confident in their responses, yet they are completely wrong. Only now, we have codified that person into chatGPT.
I doubt it. Even if it was trained with 100% accurate information chatGPT would still prefer an incorrect decisive answer to admitting it doesn't know.
SEO optimized sites can also be identified and avoided. There's various indicators of the quality of a site, to the point where I'm positive most people on HN can know to stay away or bail from one of those sites without even being consciouly aware of what gave them that sense of SEO.
General Purpose Bullshitting Technology. I've always found LLMs most useful as assistants when working on things I'm already familiar with, or as don't-trust-always-verify high temperature creatives. I think that attempts to sanitize their outputs to be super safe and "reliable sources" will trend public models towards blandness.
Add documentation to this method :
[paste a method in any language]
For me the results have been impressive. It’s even more impressive if you are not English speaking because it explains what the code does but also translates your domain terms in your own language.
More than code generation I see a really concrete application in having autogenerated and up to date documentation of public methods. It could be generated directly in your code or only by your IDE to help you in absence of human written documentation.
Other interesting things it can does is basic code review by proposing a « better » code and explaining what and why it changed something.
It can also try to rewrite a given code in another language. I haven’t tried a lot of things due to the limitations in response size but for what I tested, it looks like it is able to convert the bulk of the work.
While I’m not really convinced by code generation itself (a la copilot) I truly think that GPT can be a really powerful tool for IDE editors if used cleverly, especially to add meaning to unclear, decade old codebases from which original contributors are long gone.
And knowing that what is hard is not writing but reading code, I see GPT to be a lot more useful here than helping writing 10 lines in a keystroke.
> A common advice for documentation is "why not how", I'm not sure you can do "why" by looking at the "how".
You are right. It’s the rule when you write the doc.
But when you are let alone in an unknown codebase, having your IDE summarize the "what" in the auto completion popup could be really useful. Especially in codebases with wrong naming conventions.
> A common advice for documentation is "why not how", I'm not sure you can do "why" by looking at the "how".
The "why" is important for inline comments, but for function and method comments I think the biggest is neither "why" nor "how", but "what". As in, "what does this method do?" especially with regards to edge cases.
I tried a few methods just now; it gives okay-ish docs. Lots of people don't write great comments in the first place, so it's about on-par. Sometimes it got some of those edge cases wrong though; e.g. a "listFiles()" which filters our directories and links isn't documented as such, but then again, many people wouldn't document it properly either.
Maybe its better in some programming languages, but my experience with verilog/systemVerilog output is that it generates a design with flaws almost every time (but very confidently). If you try to correct it with prompting it comes up with reasonable sounding responses about what its fixing then just creates more wild examples.
One pretty consistent way to see this is to ask for various very simple designs like a n-bit adder, it will almost always do something logically incorrect or syntactically incorrect with the carry in or carry out
ChatGPT has acted as an advanced rubber duck for me. It outputs a lot of bullshit but so often it gives me the prompt or way of thinking needed to move on.
And it’s so much faster than posting on stack overflow or some irc. It doesn’t abuse you for asking dumb questions either.
When it works it is great. I've been using it instead of Google a lot too, but when it makes mistakes it requires someone familiar with a subject to detect it. I'm not sure if it is ready to be used as as a search engine by everyone.
For example recently I asked it for the best way to search in an mbox file on arch Linux. It proceeded to recommend a number of tools including mboxgrep. When I asked how to install it on arch it gave me a standard response using the package manager, but mboxgrep is not an arch package. It isn't even an aur package. It requires fetching the source and building it by yourself(if I remember correctly one has to use an older version of gcc too). None if it was mentioned by chatgpt.
This is not the first time BTW, there was another software it recommended that Debian doesn't know about, when I asked it another time.
The key is that it is way faster and has a broader set of knowledge than a human. Being an editor is often easier and more productive than being both a single generator and editor
ChatGPT can play an interesting role by separating duties in a process of productivity. ChatGPT can generate tons of true/false suggestions very fast and understandable by humans. Sometimes this helps a lot.
The downside is the risk of atrophying one's own mental ability to generate such suggestions if excessively relied upon. Given my druthers, I'd prefer to be a generator of text ChatGPT would want to absorb than to be a consumer of the mystery meat it is regurgitating.
I tested chatgpt with some domain specific stuff and found it so wrong on the fundamentals that I immediately lost trust in any of its output for learning. I would not trust it to explain anything eBPF related reliably. You are more likely to get something that is extremely wrong or, worse, subtly wrong.
I found ChatGPTs answers relatively accurate for explaining programming related queries, feeding it documentation and asking questions related to that, etc. But I've also tried to use it for travel and health related queries. For travel queries, it confidently tells me the wrong information, "Do most restaurants in Chiang Mai accept credit cards?" got "Yes, most restaurants in Chiang Mai accept credit cards!", which is completely false. Also got wildly inaccurate information about the quality of drinking water. And for health related queries, it tells me the same weasel-worded BS that I get on health spam blogs. I tried to dig out more information regarding sources of both travel and health related information, but ChatCPT simply said it doesn't know the details of the sources of information.
I think a new implementation of ChatGPT is worth exploring though, one that cites sources and gives links to further information, and also one that has the ability to somehow validate it's responses for accuracy.
It doesn’t consistently know words have individual letters since it’s trained using byte pair encodings. This is one reason earlier versions of it couldn’t generate rhymes.
When ChatGPT serve me broken code, I would paste the errors back in and ChatGPT would try to make corrections. I don't see why ChatGPT couldn't do that itself with the right compiler, saving me from being a copy and paste clerk.
I think we should let this C era meme die, the manuals are often terrible. I'm currently working with the AWS SDK Python documentation and it's a hot pile of garbage from all points of view (UX, info architecture, technical detail, etc.).
Python lang docs are "kind-of-OK" but when someone raves about them I'm left scratching my head. Information is not always well-organized, examples are hit-and-miss, parameter and return types not always clear, etc.
Referencing docs as a programmer is generally a nightmare and a time sink, and it's the one use case where ChatGPT is slowly becoming indispensable crutch for me. I can ask for very specific examples that are not included in the docs, or that cannot be included in the docs, for example combinatorial in nature: "how can I mock this AWS SDK library by patching it with a context manager"? Occasionally it will hallucinate, but even if it gets it 8/10 times right - and it's higher than that in practice - it will prove revolutionary at least for this use case.
> I'm currently working with the AWS SDK Python documentation and it's a hot pile of garbage from all points of view (UX, info architecture, technical detail, etc.).
I agree that pretty much all AWS documentation is woeful, and it's a travesty that the service is so expensive yet its documentation is so poor. I would gladly dump AWS and never use it again, as I hate paying top-dollar to decipher the AWS doc team's mistakes (not to mention that they are unresponsive to bug reports and feedback).
My point was made more in jest, and supposed to point out the irony of the communities' changing expectations of what documentation should be like. I predict that in a few years we'll be circling back to prioritizing writing software documentation well. (Kind of like how everybody was hating on XML for the past 20 years and it's now having a renaissance because it actually does what it's supposed to well very well.)
I'm amazed by how divisive it is. I've also been using it to significantly increase my productivity, be that documenting things or having it mutate code via natural language or various other tasks. I feel that if you keep in mind that hallucination is something that can happen, then you can somewhat mitigate that by prompting it in certain ways. E.g. asking for unit tests to verify generated functions, among other things.
I find this tool so useful, that I scratch my head when I read about how dismissive some people are of it.
I think one of the reasons why Python got such a reputation for good docs is because its primary competitors back in the day were Perl and Ruby. Ruby has horrible documentation to this day, and Perl has extensive docs that are difficult to navigate; in comparison with either, Python was definitely superior.
I believe the exact opposite. If one could prove that text has not been generated by an AI, that would have immense value. StackOverflow has a built-in validation process ("mark as the solution"), which says that some human found that it solved the problem. Doesn't mean it's correct, but still, that's something.
I really wonder what impact ChatGPT will have on search engines. I could imagine that the first 4 pages of Google/Bing results end up being autogenerated stuff, and it will just make it harder to find trustworthy information.
For now, but perhaps we are at a level where enough knowledge is there that future solutions can be inferred from the past ones and documentation/code of libraries available on the internet.
where are they going to get a steady fresh firehose of data comparable to stackoverflows? who are the magical entities that will be feeding them all these inputs for bing to claim all the fame?