While somewhat off-topic, I had an interesting experience highlighting the utili...

WanderPanda · on Oct 16, 2023

Copilot at present capabilities is already so valuable that not having it in some environment gives me the „disabledness feeling“ that I otherwise only get when vim bindings are not enabled. Absolute miracle technology! I‘m sure in the not too distant future we‘ll have have privacy preserving, open source versions that are good enough to not shovel everything over to openai

jgalt212 · on Oct 17, 2023

> shovel everything over to openai

Seriously, if you're a niche market with specific know-how, the easiest way to broadly propagate this know-now is to use copilot.

sangnoir · on Oct 16, 2023

That sounds like very basic code review - which I guess is useful in instances where one can't get a review from a human. If it has a low enough false-positive rate, it could be great as a background CI/CD bot that can chime in the PR/changeset comments to say "You may have a bug here"

TheBlight · on Oct 16, 2023

One nice thing about a machine code reviewing is no tedious passive-aggressive interactions or subjective style feedback you feel compelled to take etc.

crazysim · on Oct 17, 2023

This isn't always the case!

There was a code review Q/A model posted on /r/locallamas which was very amusingly StackOverflow sometimes.

ceedan · on Oct 16, 2023

Discovering a bug, and reproducing via unit tests is very different than "a very basic code review"

sangnoir · on Oct 17, 2023

Identifying potential bugs within a unit is only a part of a good code review; good code reviews also identify potential issues with broader system goals, readability, and idiomaticness, elegance and "taste" (e.g. pythonicity in Python) which require larger contexts than LLMs can currently muster.

So yes, the ability to identify a bug and providing a unit test to reproduce it is rather basic[1], compared to what a good code review can be.

1. An org I worked for had one such question for entry-level SWEs interviews in 3-parts: What's wrong with this code? Design test cases for it. Write the correct version (and check if the tests pass)

hluska · on Oct 17, 2023

That is nothing like a ‘very basic code review.’ The LLM discovered a bug and reproduced it via a test.

sangnoir · on Oct 17, 2023

What is the purpose of Code reviews, if not to identify potential issues?

Cthulhu_ · on Oct 17, 2023

Sharing knowledge, improving code quality, readability and comprehensability, reviewing test efficacy and coverage, validating business requirements and functionality, highlight missing features or edge cases, etc. AI can fulfill this role, but it does so in addition to other automated tools like linters and whatnot; it isn't as of yet a replacement for a human, only an addition.

The better your code is before submitting it for review, the smoother it'll go though. So if it's safe and allowed, by all means, have copilot have a look at your code first. But don't trust that it catches everything.

hluska · on Oct 17, 2023

What was the purpose of ‘very basic’? The semantic value of that diminishes the concept of a code review. Why?

sangnoir · on Oct 17, 2023

Calling it 'very basic' actually exalts the concept of code reviews, because the ideal code review is more than just identifying bugs in the code under review.

If I were to call the Mercedes A-Class a 'very basic Mercedes', it implies my belief in the existence of superior versions of the make.

dividedbyzero · on Oct 17, 2023

Power play for devs who don't get to do that very much otherwise. Only 80% /s.

chrisco255 · on Oct 16, 2023

Try it on a million line code base where it's not so cut and dry to even determine if the code is running correctly or what correctly means when it changes day to day.

Closi · on Oct 16, 2023

"A tool is only useful if I can use it in every situation".

LLM's don't need to find every bug in your code - even if they found an additional 10% of genuine bugs compared to existing tools, it's still a pretty big improvement to code analysis.

In reality, I suspect the scope is much higher than 10%.

chrisco255 · on Oct 17, 2023

If it takes you longer to vet hallucinations than to just test your code better, is it an improvement? If you accept a bug fix for a hallucination that you got too lazy to check because you grew dependent on AI to do the analysis for you, and the bug "fix" itself causes other unforeseen issues or fails to recognize why an exception in this case might be worth preserving, is it really an improvement?

Closi · on Oct 17, 2023

What if it takes you longer to vet false positives from a static analysis tool rather than just testing your code better?

Cthulhu_ · on Oct 17, 2023

What if indeed. Most static analyses tools (disclaimer: anecdotal) have very little false positives these days. This may be much worse in C/C++ land though, I don't know.

yjftsjthsd-h · on Oct 16, 2023

Is it better or worse than a human, though?

inopinatus · on Oct 16, 2023

It’s slightly worse than a junior developer, and just as confidently incorrect, but much faster to iterate.

Either is better than no assistant at all. With circumstantial caveats.

OJFord · on Oct 16, 2023

Sounds like it will go far!

SirMaster · on Oct 16, 2023

I would imagine worse, because a human has a much, much, much larger context size.

jameshart · on Oct 16, 2023

But also a much much shorter attention span and tolerance for BS.

If you ask the LLM to analyze those 1000000 lines 1000 at a time, 1000 times, it’ll do it, with the same diligence and attention to detail across all 1000 pages.

Ask a human to do it and their patience will be tested. Their focus will waver, they’ll grow used to patterns and miss anomalies, and they’ll probably skip chunks that look fine at first glance.

Sure the LLM won’t find big picture issues at that scale. But it’ll find plenty of code smells and minor logic errors that deserve a second look.

chrisco255 · on Oct 17, 2023

Ok, why don't you run this experiment on a large public open source code base? We should be drowning in valuable bug reports right now but all I hear is hype.

Cthulhu_ · on Oct 17, 2023

While true, on the other hand an AI is a tool, and can have a much larger context size, and it can apply all of that at once. It also isn't limited by availability or time constraints, i.e. if you have only one developer that can do a review, and the tooling or AI can catch 90% of what that developer would catch.

dzhiurgis · on Oct 16, 2023

I've separated 5000 line class into smaller domains yesterday. It didn't provide end solution, it wasn't perfect, but gave me a good plan where to place what.

Once it is capable to process larger context windows it will become impossible to ignore.

ushakov · on Oct 16, 2023

You can’t, it has a context size window of 8192 tokens. That’s like 1000 lines depending on programming language

ushakov · on Oct 16, 2023

That’s rather an exception in my experience. For unit-tests it starts hallucinating hard once you have functions imported from other files. This is probably the reason most unit tests in their marketing materials are things like fibonacci…

rripken · on Oct 16, 2023

How did you prompt Copilot to identify issues? In my experience the best I can do is to put in code comments of what what I want a snippet to do and copilot tries to write it. I haven't had good luck asking copilot to rewrite existing code. Nearest I've gotten is: // method2 is identical to method1 except it fixes the bugs public void method2(){

crazysim · on Oct 17, 2023

Might be using the Copilot chat feature.

ChatGTP · on Oct 17, 2023

These things are amazing when you first experience but I think in most cases the user fails to realise how common their particular bug is. But then you also need to realise there maybe bugs in what has been suggested. We all know there are issue with stack overflow responses too.

Probably 85% of codebase are just rehashes of the same stuff. Co-pilot has seen it all I guess.

pylua · on Oct 17, 2023

This is a great use of ai. In all seriousness I can’t wait for the day it gets added to spring as a plug-in.