This is hard for lots of companies. Some ignore the problem entirely until there's a fire drill (which can be a huge risk if you end up on an old major version that won't get patched). Some keep everything up to date, and then taking a new security patch is trivial. It's always risk/reward tradeoff between the risk of breaking production with an upgrade and the value an org sees from staying up to date. We work on this problem at Infield (https://www.infield.ai/post/introducing-infield) where we tackle both sides of the project management: "Which dependencies should I prioritize upgrading" and "How difficult and likely to break production is this upgrade".
To your specific points
> 1. How do you decide what's actually urgent? CVSS? EPSS? Manual assessment?
The risk factors we track are open CVEs, abandonment (is this package supported by the maintainer?), and staleness (how deep in the hole am I?).
We also look at the libyear metric as an overall indication of dependency health.
> 2. Do you treat "outdated but not vulnerable" dependencies differently from "has CVEs"?
We group upgrades into three general swimlanes:
- "trivial" upgrades (minor/patch versions of packages that respect semantic versioning, dev/test only packages). We batch these together for our customers regardless of priority.
- "could break". These deserve standalone PRs and an engineer triaging when these become worth tackling, if ever.
- "major frameworks". Think something like Rails. These are critical to keep on supported versions of because the rest of the ecosystem moves with them, and vulnerabilities in them tend to have a large blast radius. Upgrading these can be hard. You'll definitely need to upgrade someday to stay supported, and getting there has follow-on benefits on all your other dependencies, so these are high priority.
> 3. For those using Dependabot/Renovate/Snyk - what's your workflow? Do you review every alert or have you found a good filtering system?
We offer a Github app that integrates with alerts from Dependabot. While security teams are happy with just a scanner, the engineering teams that actually do this upgrade work need to mash that up with all the other data we're talking about here.
Thanks! We support Python, JS, and Ruby right now (started with dynamic languages).
I'm not sure what you mean by prioritization on the issues, but generally we are trying to help you figure out what to upgrade next, and to actually do it too.
PHP would definitely be in scope, either that or Java are likely to be next for us. If you are familiar with PHPs ecosystem I'd be interested in your take on what's most important / problematic there.
This is cool, it looks to me like you're integrating static analysis on the user's codebase and the underlying dependency. Very curious to see where it goes.
We've found dependency upgrades to be deceptively complex to evaluate safety for. Often you need context that's difficult or impossible to determine statically in a dynamically typed language. An example I use for Ruby is the kwarg migration from ruby 2.7->3 (https://www.ruby-lang.org/en/news/2019/12/12/separation-of-p...). It's trivial to profile for impacted sites at runtime but basically impossible to do it statically without adopting something like sorbet. Do you have any benchmarks on how reliable your evaluations are on plain JS vs. typescript codebases?
We ended up embracing runtime profiling for deprecation warnings / breaking changes as part of upgrading dependencies for our customers and have found that context to unlock more reliable code transformations. But you're stuck building an SDK for every language you want to support, and it's more friction than installing a github app.
This is very true. It can be a real fire drill if it turns out you need to go up a major version in some other dependency in order to get a security fix. It can get even worse in JS if you're on some abandoned package that's pinned to an old version of some transient dependency which turns out to be vulnerable. Then you're scrambling to migrate to some alternate package with no clear upgrade path.
On the flipside sometimes you get lucky and being on an old version of a package means you don't have the vulnerability in the first place.
libyear is a helpful metric for tracking how much of this debt you might have.
I have been in the position of having a mix of having to contend with very old (4+ year) transient dependencies brought in by contemporary dependencies where npm and node versions complain about deprecations and associated security issues. I get into icky package.json `overrides` to force these transient dependencies to upgrade.
This seems to show the power of the reasoning models over interacting with a prompted chat-tuned LLM directly. If I navigate backwards on your link Sonnet 4 gets it right.
I've used a similar prompt - "How can you make 1000 with exactly nine 8s using only addition?"
I think the kind of application here matters a lot, specifically whether you're trying to make a change to a web app or if you're hacking on library code.
In ruby, for example, I can pretty trivially clone any open source gem and run the specs in < 5 minutes. Patching something and opening a PR in under an hour is definitely standard.
On the other hand, getting a development environment running for a company's proprietary web app is often a hassle. Mostly though this isn't because of the language or dependencies, it's because of:
- Getting all the dependent services up and running (postgres version X, redis Y, whatever else) with appropriate seed data.
- Getting access to development secrets
My company (infield.ai) upgrades legacy apps, so we deal with setting up a lot of these. We run them in individual siloed remote developer environments using devcontainers. It works OK once we've configured the service containers.
It also deleted all my side-loaded books. That was the last straw for me. I only buy DRM-free media from now on and only use respectful hardware. I use my Remarkable 2 primarily for e-reading now, though I concede fully it's not the best user experience for reading. But I don't have to worry that it will delete my books! I can also now "write in the margins" which I've found to be a powerful way to take notes. I can't bring myself to write on physical books, but with Remarkable you can have a copy that is stock and a copy with your notes on it. Best of both worlds!
They're tracking different related things. I run a startup in this space and we track: aggregate libyear of your direct dependencies; total # of direct dependencies with libyear > 2; # of direct dependencies at least one major version behind; dependencies that have been abandoned by the maintainer.
I think the top-line aggregate libyear number is helpful to monitor over time to get a general sense of the slope of your technical debt. If the number is trending upwards then your situation is getting worse and you're increasing the chance you find yourself in an emergency (i.e., a CVE comes out and you're on an unsupported release line and need to go up major versions to take the patch).
Tracking total # of major versions behind gets at the same thing but it's less informative. If you're on v1 of some package that has a v2 but is actively releasing patches for the v1 line that should be a lower priority upgrade than some other package where your v1 line hasn't had a release in 5 years.
It feels like it just has so many weird edge cases. A stable 2.3 branch that hasn't changed while the 1.2 branch has major security issues punishes you for not using the 1.x version.
A regularly updated 1.x branch for docs/security looks like you're doing fine even though the project is on 3.x and deprecating soon.
Perhaps as a vague guide to point to potential issues, sure.
Oh man, this brings me back! Almost 10 years ago I was working on a rails app trying to detect the file type of uploaded spreadsheets (xlsx files were being detected as application/zip, which is technically true but useless).
I found "magic" that could detect these and submitted a patch at https://bugs.freedesktop.org/show_bug.cgi?id=78797. My patch got rejected for needing to look at the first 3KB bytes of the file to figure out the type. They had a hard limit that they wouldn't see past the first 256 bytes. Now in 2024 we're doing this with deep learning! It'd be cool if google released some speed performance benchmarks here against the old-fashioned implementations. Obviously it'd be slower, but is it 1000x or 10^6x?
Co-author of Magika here (Elie) so we didn't include the measurements in the blog post to avoid making it too long but we did those measurements.
Overall file takes about 6ms (single file) 2.26ms per files when scanning multiples. Magika is at 65ms single file and 5.3ms when scanning multiples.
So Magika is for the worst case scenario about 10x slower due to the time it takes to load the model and 2x slower on repeated detection. This is why we said it is not that much slower.
We will have more performance measurements in the upcoming research paper. Hope that answer the question
Is that single-threaded libmagic vs Magika using every core on the system? What are the numbers like if you run multiple libmagic instances in parallel for multiple files, or limit both libmagic and magika to a single core?
Testing it on my own system, magika seems to use a lot more CPU-time:
file /usr/lib/* 0,34s user 0,54s system 43% cpu 2,010 total
./file-parallel.sh 0,85s user 1,91s system 580% cpu 0,477 total
bin/magika /usr/lib/* 92,73s user 1,11s system 393% cpu 23,869 total
Looks about 50x slower to me. There's 5k files in my lib folder. It's definitely still impressively fast given how the identification is done, but the difference is far from negligible.
Electricity is cheap. If this is sufficiently or actually important for your org, you should measure it yourself. There are too many variables and factors subject to your org’s hardware.
Totally disagree. Most end users are on laptops and mobile devices these days, not desktop towers. Thus power efficiency is important for battery life. Performance per watt would be an interesting comparison.
You might be surprised. Rename your Photo.JPG as Photo.PNG and you'll still get a perfectly fine thumbnail. The extension is a hint, but it isn't definitive, especially when you start downloading from the web.
Of course, it's arguably unlikely a virus scanner would opt for an ML-based approach, as they specifically need to be robust against adversarial inputs.
I mean if you care about that you shouldn't be running anything that isn't highly optimized. Don't open webpages that might be CPU or GPU intensive. Don't run Electron apps, or really anything that isn't built in a compiled language.
Certainly you should do an audit of all the Android and iOS apps as well, to make sure they've been made in a efficient manner.
Block ads as well, they waste power.
This file identification is SUCH a small aspect of everything that is burning power in your laptop or phone as to be laughable.
Whilst energy usage is indeed a small aspect this early on when using bespoke models, we do have to consider that this is a model for simply identifying a file type.
What happens when we introduce more bespoke models for manipulating the data in that file?
This feels like it could slowly boil to the point of programs using magnitudes higher power, at which point it'll be hard to claw it back.
That's a slippery slope argument, which is a common logical fallacy[0]. This model being inefficient compared to the best possible implementation does not mean that future additions will also be inefficient.
It's the equivalent to saying many people programming in Ruby is causing all future programs to be less efficient. Which is not true. In fact, many people programming in Ruby has caused Ruby to become more efficient because it gets optimised as it gets used more (or Python for that matter).
It's not as energy efficient as C, but it hasn't caused it to get worse and worse, and spiral out of control.
Likewise smart contracts are incredibly inefficient mechanisms of computation. The result is mostly that people don't use them for any meaningful amounts of computation, that all gets done "Off Chain".
Generative AI is definitely less efficient, but it's likely to improve over time, and indeed things like quantization has allowed models that would normally to require much more substantial hardware resources (and therefore, more energy intensive) to be run on smaller systems.
The slippery slope fallacy is: "this is a slope. you will slip down it." and is always fallacious. Always. The valid form of such an argument is: "this is a slope, and it is a slippery one, therefore, you will slip down it."
In general you're right, but I can't think of a single local use for identifying file types by a human on a laptop - at least, one with scale where this matters. It's all going to be SaaS services where people upload stuff.
We are building a data analysis tool with great UX, where users select data files, which are then parsed and uploaded to S3 directly, on their client machines. The server only takes over after this step.
Since the data files can be large, this approach bypasses having to trnasfer the file twice, first to the server, and then to S3 after parsing.
I've ended up implementing a layer on top of "magic" which, if magic detects application/zip, reads the zip file manifest and checks for telltale file names to reliably detect Office files.
The "magic" library does not seem to be equipped with the capabilities needed to be robust against the zip manifest being ordered in a different way than expected.
But this deep learning approach... I don't know. It might be hard to shoehorn in to many applications where the traditional methods have negligible memory and compute costs and the accuracy is basically 100% for cases that matter (detecting particular file types of interest). But when looking at a large random collection of unknown blobs, yeah, I can see how this could be great.
If you're curious, here's how I solved it for ruby back in the day. Still used magic bytes, but added an overlay on top of the freedesktop.org DB: https://github.com/mimemagicrb/mimemagic/pull/20
magic is the core detection logic of file that was extracted out to be available as a library. So these days file is just a higher level wrapper around magic
> At inference time Magika uses Onnx as an inference engine to ensure files are identified in a matter of milliseconds, almost as fast as a non-AI tool even on CPU.
> They had a hard limit that they wouldn't see past the first 256 bytes.
Then they could never detect zip files with certainty, given that to do that you need to read up to 65KB (+ 22) at the END of the file. The reason is that the zip archive format allows "gargabe" bytes both in the beginning of the file and in between local file headers.... and it's actually not uncommon to prepend a program that self-extracts the archive, for example. The only way to know if a file is a valid zip archive is to look for the End of Central Directory Entry, which is always at the end of the file AND allows for a comment of unknown length at the end (and as the comment length field takes 2 bytes, the comment can be up to 65K long).
That's why the whole question is ill formed. A file does not have exactly one type. It may be a valid input in various contexts. A zip archive may also very well be something else.
How does this work on the backend? Does it only trace method calls when an exception is thrown, or does it profile the call stack of every request?
Something I've been interested in is the performance impact of using https://docs.ruby-lang.org/en/3.2/Coverage.html to find unused code by profiling production. Particularly using that to figure out any gems that are never called in production. Seems like it could be made fast.
(Good project management is the key to this work. You need to spend the time up front (scripts can help here, but the tooling isn't awesome) to figure out everything that's going to block your language upgrade. For all those blockers you want to work as hard as you can to find backwards-compatible fixes (like upgrading a blocking dependency to a version that's dual-compatible with your current and next language version). Then you work through small incremental PRs so your eventual large upgrade is easier. This also works for code changes around breaking changes - often you can make these changes ahead of time, standalone. I've done individual rails minor version upgrades that took 100 PRs of pre-work.
In my experience often developers try to take a shortcut where they try to bang out the large upgrade in one giant PR. Sometimes this works but it's very risky - more often I see very long running branches that get abandoned. It's much better to make guaranteed incremental progress, plus your PRs are much easier for your team to review.
My startup Infield solves this problem for python / ruby / JS backends through software for the project planning (we run a solver over your dependency graph and read every changelog to identify breaking changes) and people for the breaking changes (our team will open PRs where we fix breaking changes so your code is compatible). We're starting to think about moving into the statically typed backend world with java / .net / maybe scala. I'd love to hear more about exactly what you've run into and whether my experience here matches with these ecosystems.
To your specific points
> 1. How do you decide what's actually urgent? CVSS? EPSS? Manual assessment?
The risk factors we track are open CVEs, abandonment (is this package supported by the maintainer?), and staleness (how deep in the hole am I?).
We also look at the libyear metric as an overall indication of dependency health.
> 2. Do you treat "outdated but not vulnerable" dependencies differently from "has CVEs"?
We group upgrades into three general swimlanes:
> 3. For those using Dependabot/Renovate/Snyk - what's your workflow? Do you review every alert or have you found a good filtering system?We offer a Github app that integrates with alerts from Dependabot. While security teams are happy with just a scanner, the engineering teams that actually do this upgrade work need to mash that up with all the other data we're talking about here.