On an unrelated note, this malware team has assembled a great dataset for training AIs on identifying security problems. Every commit has some security problem, and the open source community will be going through and identifying them. (Thanks, maintainers, for the cleanup work; definitely not fun!)
Currently it doesn't work, but yeah, it'll be really cool when we have tech like that! (It'll still only be able to detect known vulnerabilities, but we don't often invent new ones.)
Like in this article, we have a patch that introduces a security flaw (disabling landlock). We later have a patch that fixes it, specifically. The job of the LLM is to reproduce the fixing patch given the problem patch. Or at the very least, explain that this patch results in landlock always being disabled. To be clear, this problem is much, much harder than the problems LLMs are solving now, requiring knowledge of autotools behavior that isn’t included in the context (identifying that a failed build disables the feature, and that this build always fails).
There was another example where this team submitted a patch that swapped safe_fprintf for fprintf while adding some additional behavior. It was later pointed out that this allows printing invisible characters to the stream, which allows hiding some of the files that are placed when decompressing.