More

stevepike · 2025-11-20T21:59:16 1763675956

This is hard for lots of companies. Some ignore the problem entirely until there's a fire drill (which can be a huge risk if you end up on an old major version that won't get patched). Some keep everything up to date, and then taking a new security patch is trivial. It's always risk/reward tradeoff between the risk of breaking production with an upgrade and the value an org sees from staying up to date. We work on this problem at Infield (https://www.infield.ai/post/introducing-infield) where we tackle both sides of the project management: "Which dependencies should I prioritize upgrading" and "How difficult and likely to break production is this upgrade".

To your specific points

> 1. How do you decide what's actually urgent? CVSS? EPSS? Manual assessment?

The risk factors we track are open CVEs, abandonment (is this package supported by the maintainer?), and staleness (how deep in the hole am I?).

We also look at the libyear metric as an overall indication of dependency health.

> 2. Do you treat "outdated but not vulnerable" dependencies differently from "has CVEs"?

We group upgrades into three general swimlanes:

  - "trivial" upgrades (minor/patch versions of packages that respect semantic versioning, dev/test only packages). We batch these together for our customers regardless of priority.

  - "could break". These deserve standalone PRs and an engineer triaging when these become worth tackling, if ever.

  - "major frameworks". Think something like Rails. These are critical to keep on supported versions of because the rest of the ecosystem moves with them, and vulnerabilities in them tend to have a large blast radius. Upgrading these can be hard. You'll definitely need to upgrade someday to stay supported, and getting there has follow-on benefits on all your other dependencies, so these are high priority.

> 3. For those using Dependabot/Renovate/Snyk - what's your workflow? Do you review every alert or have you found a good filtering system?

We offer a Github app that integrates with alerts from Dependabot. While security teams are happy with just a scanner, the engineering teams that actually do this upgrade work need to mash that up with all the other data we're talking about here.

greekcoder · 2025-11-20T22:07:57 1763676477

Sounds very interesting solution! Do you support all the famous programming languages? Do you also offer prioritasion on the "issues"?

stevepike · 2025-11-20T22:10:40 1763676640

Thanks! We support Python, JS, and Ruby right now (started with dynamic languages).

I'm not sure what you mean by prioritization on the issues, but generally we are trying to help you figure out what to upgrade next, and to actually do it too.

greekcoder · 2025-11-20T22:12:58 1763676778

Yeah that's exactly what I meant by issues prioritasion, thanks! Do you plan to support PHP or it's totally out of scope?

stevepike · 2025-11-21T00:17:51 1763684271

PHP would definitely be in scope, either that or Java are likely to be next for us. If you are familiar with PHPs ecosystem I'd be interested in your take on what's most important / problematic there.

stevepike · 2025-10-01T18:59:15 1759345155

This is cool, it looks to me like you're integrating static analysis on the user's codebase and the underlying dependency. Very curious to see where it goes.

We've found dependency upgrades to be deceptively complex to evaluate safety for. Often you need context that's difficult or impossible to determine statically in a dynamically typed language. An example I use for Ruby is the kwarg migration from ruby 2.7->3 (https://www.ruby-lang.org/en/news/2019/12/12/separation-of-p...). It's trivial to profile for impacted sites at runtime but basically impossible to do it statically without adopting something like sorbet. Do you have any benchmarks on how reliable your evaluations are on plain JS vs. typescript codebases?

We ended up embracing runtime profiling for deprecation warnings / breaking changes as part of upgrading dependencies for our customers and have found that context to unlock more reliable code transformations. But you're stuck building an SDK for every language you want to support, and it's more friction than installing a github app.

stevepike · 2025-09-18T20:29:43 1758227383

This is very true. It can be a real fire drill if it turns out you need to go up a major version in some other dependency in order to get a security fix. It can get even worse in JS if you're on some abandoned package that's pinned to an old version of some transient dependency which turns out to be vulnerable. Then you're scrambling to migrate to some alternate package with no clear upgrade path.

On the flipside sometimes you get lucky and being on an old version of a package means you don't have the vulnerability in the first place.

libyear is a helpful metric for tracking how much of this debt you might have.

dwoldrich · 2025-09-18T22:15:53 1758233753

I have been in the position of having a mix of having to contend with very old (4+ year) transient dependencies brought in by contemporary dependencies where npm and node versions complain about deprecations and associated security issues. I get into icky package.json `overrides` to force these transient dependencies to upgrade.

stevepike · 2025-06-03T18:16:18 1748974578

This seems to show the power of the reasoning models over interacting with a prompted chat-tuned LLM directly. If I navigate backwards on your link Sonnet 4 gets it right.

I've used a similar prompt - "How can you make 1000 with exactly nine 8s using only addition?"

Here's GPT 4.5 getting it wrong: https://chatgpt.com/share/683f3aca-8fbc-8000-91e4-717f5d81bc...

It tricks it because it's a slight variation of an existing puzzle (making 1000 with 8 8s and addition only).

The reasoning models seem to reliably figure it out, though. Some of them even come up with a proof of why it's impossible to do with 9 8s. Here's o4 getting it right: https://chatgpt.com/share/683f3bc2-70b8-8000-9675-4d96e72b58...

stevepike · on Nov 8, 2024

I think the kind of application here matters a lot, specifically whether you're trying to make a change to a web app or if you're hacking on library code.

In ruby, for example, I can pretty trivially clone any open source gem and run the specs in < 5 minutes. Patching something and opening a PR in under an hour is definitely standard.

On the other hand, getting a development environment running for a company's proprietary web app is often a hassle. Mostly though this isn't because of the language or dependencies, it's because of:

  - Getting all the dependent services up and running (postgres version X, redis Y, whatever else) with appropriate seed data. 
  - Getting access to development secrets

My company (infield.ai) upgrades legacy apps, so we deal with setting up a lot of these. We run them in individual siloed remote developer environments using devcontainers. It works OK once we've configured the service containers.

stevepike · on Oct 17, 2024

It doesn't do this for me. I've got side-loaded and Amazon store books on the same device, no problem.

freedomben · on Oct 17, 2024

It also deleted all my side-loaded books. That was the last straw for me. I only buy DRM-free media from now on and only use respectful hardware. I use my Remarkable 2 primarily for e-reading now, though I concede fully it's not the best user experience for reading. But I don't have to worry that it will delete my books! I can also now "write in the margins" which I've found to be a powerful way to take notes. I can't bring myself to write on physical books, but with Remarkable you can have a copy that is stock and a copy with your notes on it. Best of both worlds!

stevepike · on May 9, 2024

They're tracking different related things. I run a startup in this space and we track: aggregate libyear of your direct dependencies; total # of direct dependencies with libyear > 2; # of direct dependencies at least one major version behind; dependencies that have been abandoned by the maintainer.

I think the top-line aggregate libyear number is helpful to monitor over time to get a general sense of the slope of your technical debt. If the number is trending upwards then your situation is getting worse and you're increasing the chance you find yourself in an emergency (i.e., a CVE comes out and you're on an unsupported release line and need to go up major versions to take the patch).

Tracking total # of major versions behind gets at the same thing but it's less informative. If you're on v1 of some package that has a v2 but is actively releasing patches for the v1 line that should be a lower priority upgrade than some other package where your v1 line hasn't had a release in 5 years.

IanCal · on May 9, 2024

It feels like it just has so many weird edge cases. A stable 2.3 branch that hasn't changed while the 1.2 branch has major security issues punishes you for not using the 1.x version.

A regularly updated 1.x branch for docs/security looks like you're doing fine even though the project is on 3.x and deprecating soon.

Perhaps as a vague guide to point to potential issues, sure.

stevepike · on Feb 16, 2024

Oh man, this brings me back! Almost 10 years ago I was working on a rails app trying to detect the file type of uploaded spreadsheets (xlsx files were being detected as application/zip, which is technically true but useless).

I found "magic" that could detect these and submitted a patch at https://bugs.freedesktop.org/show_bug.cgi?id=78797. My patch got rejected for needing to look at the first 3KB bytes of the file to figure out the type. They had a hard limit that they wouldn't see past the first 256 bytes. Now in 2024 we're doing this with deep learning! It'd be cool if google released some speed performance benchmarks here against the old-fashioned implementations. Obviously it'd be slower, but is it 1000x or 10^6x?

ebursztein · on Feb 16, 2024

Co-author of Magika here (Elie) so we didn't include the measurements in the blog post to avoid making it too long but we did those measurements.

Overall file takes about 6ms (single file) 2.26ms per files when scanning multiples. Magika is at 65ms single file and 5.3ms when scanning multiples.

So Magika is for the worst case scenario about 10x slower due to the time it takes to load the model and 2x slower on repeated detection. This is why we said it is not that much slower.

We will have more performance measurements in the upcoming research paper. Hope that answer the question

chmod775 · on Feb 16, 2024

Is that single-threaded libmagic vs Magika using every core on the system? What are the numbers like if you run multiple libmagic instances in parallel for multiple files, or limit both libmagic and magika to a single core?

Testing it on my own system, magika seems to use a lot more CPU-time:

    file /usr/lib/*  0,34s user 0,54s system 43% cpu 2,010 total
    ./file-parallel.sh  0,85s user 1,91s system 580% cpu 0,477 total
    bin/magika /usr/lib/*  92,73s user 1,11s system 393% cpu 23,869 total

Looks about 50x slower to me. There's 5k files in my lib folder. It's definitely still impressively fast given how the identification is done, but the difference is far from negligible.

jpk · on Feb 16, 2024

Do you have a sense of performance in terms of energy use? 2x slower is fine, but is that at the same wattage, or more?

alephnan · on Feb 16, 2024

That sounds like a nit / premature optimization.

Electricity is cheap. If this is sufficiently or actually important for your org, you should measure it yourself. There are too many variables and factors subject to your org’s hardware.

cornholio · on Feb 16, 2024

The hardware requirements of a massively parallel algorithm can't possibly be "a nit" in any universe inhabited by rational beings.

djxfade · on Feb 16, 2024

Totally disagree. Most end users are on laptops and mobile devices these days, not desktop towers. Thus power efficiency is important for battery life. Performance per watt would be an interesting comparison.

true_religion · on Feb 16, 2024

What end users are working with arbitrary files that they don’t know the identification of?

This entire use case seems to be one suited for servers handling user media.

wongarsu · on Feb 16, 2024

File managers that render preview images. Even detecting which software to open the file with when you click it.

Of course on Windows the convention is to use the file extension, but on other platforms the convention is to look at the file contents

michaelmior · on Feb 16, 2024

> on other platforms the convention is to look at the file contents

MacOS (that is, Finder) also looks at the extension. That has also been the case with any file manager I've used on Linux distros that I can recall.

jdiff · on Feb 16, 2024

You might be surprised. Rename your Photo.JPG as Photo.PNG and you'll still get a perfectly fine thumbnail. The extension is a hint, but it isn't definitive, especially when you start downloading from the web.

r0ze-at-hn · on Feb 16, 2024

Browsers often need to guess a file type

michaelt · on Feb 16, 2024

Theoretically? Anyone running a virus scanner.

Of course, it's arguably unlikely a virus scanner would opt for an ML-based approach, as they specifically need to be robust against adversarial inputs.

michaelmior · on Feb 16, 2024

> it's arguably unlikely a virus scanner would opt for an ML-based approach

Several major players such as Norton, McAfee, and Symantec all at least claim to use AI/ML in their antivirus products.

scq · on Feb 16, 2024

You'd be surprised what an AV scanner would do.

https://twitter.com/taviso/status/732365178872856577

vertis · on Feb 16, 2024

I mean if you care about that you shouldn't be running anything that isn't highly optimized. Don't open webpages that might be CPU or GPU intensive. Don't run Electron apps, or really anything that isn't built in a compiled language.

Certainly you should do an audit of all the Android and iOS apps as well, to make sure they've been made in a efficient manner.

Block ads as well, they waste power.

This file identification is SUCH a small aspect of everything that is burning power in your laptop or phone as to be laughable.

_puk · on Feb 16, 2024

Whilst energy usage is indeed a small aspect this early on when using bespoke models, we do have to consider that this is a model for simply identifying a file type.

What happens when we introduce more bespoke models for manipulating the data in that file?

This feels like it could slowly boil to the point of programs using magnitudes higher power, at which point it'll be hard to claw it back.

vertis · on Feb 16, 2024

That's a slippery slope argument, which is a common logical fallacy[0]. This model being inefficient compared to the best possible implementation does not mean that future additions will also be inefficient.

It's the equivalent to saying many people programming in Ruby is causing all future programs to be less efficient. Which is not true. In fact, many people programming in Ruby has caused Ruby to become more efficient because it gets optimised as it gets used more (or Python for that matter).

It's not as energy efficient as C, but it hasn't caused it to get worse and worse, and spiral out of control.

Likewise smart contracts are incredibly inefficient mechanisms of computation. The result is mostly that people don't use them for any meaningful amounts of computation, that all gets done "Off Chain".

Generative AI is definitely less efficient, but it's likely to improve over time, and indeed things like quantization has allowed models that would normally to require much more substantial hardware resources (and therefore, more energy intensive) to be run on smaller systems.

[0]: https://en.wikipedia.org/wiki/Slippery_slope

diffeomorphism · on Feb 16, 2024

That is a fallacy fallacy. Just because some slopes are not slippery that does not mean none of them are.

samatman · on Feb 17, 2024

The slippery slope fallacy is: "this is a slope. you will slip down it." and is always fallacious. Always. The valid form of such an argument is: "this is a slope, and it is a slippery one, therefore, you will slip down it."

diffeomorphism · on Feb 17, 2024

No, it isn't.

samatman · on Feb 17, 2024

Yeah. Yeah, it is.

thfuran · on Feb 16, 2024

>This feels like it could slowly boil to the point of programs using magnitudes higher power, at which point it'll be hard to claw it back.

We're already there. Modern software is, by and large, profoundly inefficient.

underdeserver · on Feb 16, 2024

In general you're right, but I can't think of a single local use for identifying file types by a human on a laptop - at least, one with scale where this matters. It's all going to be SaaS services where people upload stuff.

prmph · on Feb 16, 2024

We are building a data analysis tool with great UX, where users select data files, which are then parsed and uploaded to S3 directly, on their client machines. The server only takes over after this step.

Since the data files can be large, this approach bypasses having to trnasfer the file twice, first to the server, and then to S3 after parsing.

DontSignAnytng · on Feb 16, 2024

This dont sound like very common scenario.

metafunctor · on Feb 16, 2024

I've ended up implementing a layer on top of "magic" which, if magic detects application/zip, reads the zip file manifest and checks for telltale file names to reliably detect Office files.

The "magic" library does not seem to be equipped with the capabilities needed to be robust against the zip manifest being ordered in a different way than expected.

But this deep learning approach... I don't know. It might be hard to shoehorn in to many applications where the traditional methods have negligible memory and compute costs and the accuracy is basically 100% for cases that matter (detecting particular file types of interest). But when looking at a large random collection of unknown blobs, yeah, I can see how this could be great.

stevepike · on Feb 16, 2024

If you're curious, here's how I solved it for ruby back in the day. Still used magic bytes, but added an overlay on top of the freedesktop.org DB: https://github.com/mimemagicrb/mimemagic/pull/20

comboy · on Feb 16, 2024

Many commenters seem to be using magic instead of file, any reasons?

e1g · on Feb 16, 2024

magic is the core detection logic of file that was extracted out to be available as a library. So these days file is just a higher level wrapper around magic

comboy · on Feb 16, 2024

thanks

renonce · on Feb 16, 2024

From the first paragraph:

> enabling precise file identification within milliseconds, even when running on a CPU.

Maybe your old-fashioned implementations were detecting in microseconds?

stevepike · on Feb 16, 2024

Yeah I saw that, but that could cover a pretty wide range and it's not clear to me whether that relies on preloading a model.

ryanjshaw · on Feb 16, 2024

> At inference time Magika uses Onnx as an inference engine to ensure files are identified in a matter of milliseconds, almost as fast as a non-AI tool even on CPU.

brabel · on Feb 16, 2024

> They had a hard limit that they wouldn't see past the first 256 bytes.

Then they could never detect zip files with certainty, given that to do that you need to read up to 65KB (+ 22) at the END of the file. The reason is that the zip archive format allows "gargabe" bytes both in the beginning of the file and in between local file headers.... and it's actually not uncommon to prepend a program that self-extracts the archive, for example. The only way to know if a file is a valid zip archive is to look for the End of Central Directory Entry, which is always at the end of the file AND allows for a comment of unknown length at the end (and as the comment length field takes 2 bytes, the comment can be up to 65K long).

jeffbee · on Feb 16, 2024

That's why the whole question is ill formed. A file does not have exactly one type. It may be a valid input in various contexts. A zip archive may also very well be something else.

aidenn0 · on Feb 16, 2024

FWIW, file can now distinguish many types of zip containers, including Oxml files.

stevepike · on Nov 22, 2023

How does this work on the backend? Does it only trace method calls when an exception is thrown, or does it profile the call stack of every request?

Something I've been interested in is the performance impact of using https://docs.ruby-lang.org/en/3.2/Coverage.html to find unused code by profiling production. Particularly using that to figure out any gems that are never called in production. Seems like it could be made fast.

aantix · on Nov 22, 2023

The idea is to turn it on for a given request when needed - via a parameter, feature flag, etc.

prepend_around_action :callstacking_setup, if: -> { params[:debug] == '1' }

Once the request completes, the instrumented methods are removed to remove the performance overhead.

stevepike · on Nov 3, 2023

(Good project management is the key to this work. You need to spend the time up front (scripts can help here, but the tooling isn't awesome) to figure out everything that's going to block your language upgrade. For all those blockers you want to work as hard as you can to find backwards-compatible fixes (like upgrading a blocking dependency to a version that's dual-compatible with your current and next language version). Then you work through small incremental PRs so your eventual large upgrade is easier. This also works for code changes around breaking changes - often you can make these changes ahead of time, standalone. I've done individual rails minor version upgrades that took 100 PRs of pre-work.

In my experience often developers try to take a shortcut where they try to bang out the large upgrade in one giant PR. Sometimes this works but it's very risky - more often I see very long running branches that get abandoned. It's much better to make guaranteed incremental progress, plus your PRs are much easier for your team to review.

My startup Infield solves this problem for python / ruby / JS backends through software for the project planning (we run a solver over your dependency graph and read every changelog to identify breaking changes) and people for the breaking changes (our team will open PRs where we fix breaking changes so your code is compatible). We're starting to think about moving into the statically typed backend world with java / .net / maybe scala. I'd love to hear more about exactly what you've run into and whether my experience here matches with these ecosystems.