Hacker Newsnew | past | comments | ask | show | jobs | submit | aster0id's commentslogin

How many false positives did the AI throw up?

Even if it does have false positives, I expect it would make a nicer starting point for finding and verifying bugs/vulnerabilities, compared to wading through the entire codebase until you find something. Even if it is a false positive, it would probably be due to sketchy looking code (hopefully, unless it hallucinated completely new code) that you can take a look at, and maybe spot something else that the AI didn't catch.

Besides the HN submission, XBOW and Hacktron AI has found plenty of vulnerabilities in code.


Does it matter? They found 12 vulnerabilities. Clearly there was enough signal:noise that they could uncover these as real.

It doesn't look like they had 1 AI run for 20 minutes and then 30 humans sift through for weeks.


Does it matter?

Yes, we have been on the receiving end of AI generated bug reports and in the vast majority of cases they are really bad. But you still need humans to sift through them. And when you ask the submitter questions, it’s often clear that they just give the questions to an LLM again to answer.

It costs a huge amount of human manpower, so if the company who made this had an AI based solution with a far lower false-positive rate, that would be great.


> It doesn't look like they had 1 AI run for 20 minutes and then 30 humans sift through for weeks.

It does, though, look like they were running their AI over the codebase for an extended period of time (not per run, but multiple runs over the period of a year)

> Does it matter?

Hell yes, false reports are the bane of the bug bounty industry.


They don't appear to go into detail about anything except how great it is that they found the bugs, what those bugs were, and how rare it is for other people to find bugs.

I think that it would be helpful from a research point of view to know what sort of noise their AI tool is generating, but, because they appear to be trying to sell the service, they don't want you to know how many dev months you will lose chasing issues that amount to nothing.


I wonder too. Did it take many human hours to verify everything?

This is incredible research. So much harm can be prevented if this makes it into law. I hope it does. Kudos to the anthropic team for making this public.

I had a similar problem this year after having moved to a new country, working a remote job and separated from my partner. Having had a terrible social life since I was a kid, I knew it in my bones that I'd have to find myself new friends or else. So I did - I renewed my relationship with old friends, joined a book club (was a big reader as a kid), and my dog helped me make friends at the dog park.

I find it interesting that I've thought about the exact social mechanics of making friends before as well - low stakes in person common context where you meet on a regular basis is key.


A demo video would've been nice


Anything would have been nice, this looks like a landing page with zero substance


I agree with the premise but take issue with the measure for "success": do you feel excited to get up and work on Monday?

We're humans and no matter what you're pursuing, you'll hit a point where your brain will adjust to the new reality and things will start feeling mundane. This is called the hedonic treadmill.

To me, what has helped is developing hobbies and relationships outside of work. We're social animals and need connection with others to feel fulfilled. Personally, my own life feels way more fulfilled right now than when I was just working on interesting projects at work or on my startup (that went nowhere).


I was hooked by the first few paragraphs but the immediate switch to focus on work was disappointing.

The happiest people I know treat work like the necessary evil to be endured to fulfill all other facets of life.


Or you totally love doing what you do at work and, after spending a week at the beach, you can’t wait to go back because you’re so close to solving that interesting problem you’ve been working on for more than a month.


There is danger to that as well. Work can be an addiction. It is often solitary and removes you from focus on your actual self, friends, family, or community, in favor of "the work."


I'm in exactly this place. Looking for help (books) to get out. Care to reend anything?


Ah, to have any real amount of time to work on something. Sounds surreal.


It’s great!


I bet


What were you looking to read about in that spot?

Work shouldn't be treated as a "necessary evil".

Reconciling the work vs. meaning split is hugely important.

Even if it means making less money short term, aligning work and purpose through work like politics and writing can make us way happier long-term.


Yeah but work isn't all there is to life, at least for me. There are way more fulfilling things. If you like your work more than anything else in life, good for you. Different strokes for different folks.


I don't think that's possible for the vast majority of working people.


The happiest people I know don’t work or love their work. I can’t think of any that fit your description.


> Code is so cheap it’s practically free. Code that works continues to carry a cost, but that cost has plummeted now that coding agents can check their work as they go.

I personally think that even before LLMs, the cost of code wasn't necessarily the cost of typing out the characters in the right order, but having a human actually understand it to the extent that changes can be made. This continues to be true for the most part. You can vibe code your way into a lot of working code, but you'll inevitably hit a hairy bug or a real world context dependency that the LLM just cannot solve, and that is when you need a human to actually understand everything inside out and step in to fix the problem.


I wonder if we will trend towards a world where maintainability is just a waste of time and money, when you can just knock together a new flimsy thing quicker and cheaper than maintaining one thing over multiple iterations.


I don't think most business processes can afford to have that many issues with their code. Customers and contracts will be lost. Reputations will be lost


Without maintainability, adding a new type of input or feature will break existing features.

Doesn’t matter how quick it is to write from scratch, if you want varying inputs handled by the same piece of code, you need maintainability.

In a way, software development is all about adding new constraints to a system and making sure the old constraints are still satisfied.


Don't adequate tests make this much less of an issue?


I don’t think that will ever be true. Let’s take a shell session as an example of ad-hoc code: People are still writing programs and scripts. Stuff doesn’t really change that often to warrant starting from scratch. Easier to add a new format to a music player than writing a new player from scratch.


Be careful with sharing this URL in your Instagram DMs. I built something similar a few weeks ago for watching Instagram reels, shared a link to it with someone on Instagram, and my account instantly got suspended because this type of stuff violates Instagram's TOS and they crack down really hard on it.


How can that be, aren't Instagram Dms end-to-end encrypted, curious on how they find these kind of messages


They make the app that stores the plaintext at each end... they don't need to decrypt your messages across the wire to read them.


Ah I understand now, thank you. So they can actually read it just not in transit but from the app, isn't that a security violation?


All e2e encrypted apps can do this. It's the price you pay for a completely closed ecosystem that coddles you at every turn because you're too much of a little bean to know what real security is.

Edit: this isn't a dig at you, it's a dig at how google and apple treat you


I didn't know that they are able to do this, now that I know it just makes perfect sense.


A what violation?

I kid. But only sorta.


They advertise e2e as something that is to secure your messages and no one can read it including them, just got surprised when I realized they can view all the messages before transit, I never even thought that that they can do that before encryption


They control the software doing the encryption & decryption - they could ship your private keys to themselves if they truly wanted to be evil.

Who would know?


Dog enrichment calendar - I have a lot of different types of treats, toys and activities that I'd like to do with my dog but I fell into routines and just gave him two or three toys and treats on repeat. So I'm building an app where I'd be able to configure an inventory of all the treats and toys I own and the app would remind me to use a new toy or treat every day, to minimize repetition. You'll also be reminded ahead of time for toys and treats that require preparation


I relate to this. I started building side projects last year, and being used to all the bells and whistles of CI/CD, serverless/containers and amazing monitoring and dashboarding tooling, I defaulted to those patterns even for my tiny projects. To make matters worse, I tried building everything on top of free tiers of various services, which made configuration and setup even harder as I was trying to glue things together in non standard ways just to make free stuff look like the stuff I have at my job.

I quickly learned that I needed none of that crap. Now I usually just have one dev environment (my local machine) and one prod, usually a free cloudflare worker. DB is almost always a free tier postgres instance. Testing and prod deployment happens on git precommit and postcommit hooks instead of inside a CI pipeline. No docker is usually necessary as I just build typescript services which have native support on most platforms. DB migrations are run directly from my local machine when I need them to run, instead of having specialized config in a CI pipeline.


I doubt that the product folks over at Google overseeing an experimental project like this have such outsized influence over something core like the ads engine


I'm feeling deeply cynical here. I wonder if the people at Google overseeing this experiment are from or also oversee the ads engine team?


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: