Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

My headcanon is that Docker exists because Python packaging and dependency management was so bad that dotCloud had no choice but to invent some porcelain on top of Linux containers, just to provide a pleasant experience for deploying Python apps.


That's basically correct. But the more general problem is that engineers simply lost the ability to succinctly package applications and their dependencies into simple to distribute and run packages. Somehow around the same time Java made .jar files mainstream (just zip all the crap with a manifest), the rest of the world completely forgot how to do the equivalent of statically linking in libraries and that we're all running highly scheduled multithreaded operating systems now.

The "solution" for a long time was to spin up single application Virtual Machines, which was a heavy way to solve it and reduced the overall system resources available to the application making them stupidly inefficient solutions. The modern cloud was invented during this phase, which is why one of the base primitives of all current cloud systems is the VM.

Containers both "solved" the dependency distribution problem as well as the resource allocation problem sort of at once.


> engineers simply lost the ability to succinctly package applications and their dependencies into simple to distribute and run packages.

but this is what docker is

If anything, java kinda showed it doesn't have to suck, but as not all things are java, you need something more general


With the difference that with docker you are shipping the runtime to your source code as well.


which is great when you realize that not all software is updated at the same time.

how managing multiple java runtime versions is supposed to work is still beyond me... it's a different tool at every company, and the instructions never seem to work


It's less complicated than you might think. A Java Development Kit (JDK) is a filesystem directory, and includes everything necessary to run a Java program. Most of the mysterious installers and version managers are managing a collection of these JDK directories in some fixed location on disk. You can download a JDK directory (tarball), and use the `java` binary within it directly.

There is also a convention of using the `JAVA_HOME` environment variable to allow tools to locate the correct JDK directory. For example, in a unix shell, add `$JAVA_HOME/bin` to your `PATH`.


Java runtimes is just: export JAVA_HOME=/path; ./app.sh


But the java runtime needs to be at /path then, and it needs to stay there as long as ./app.sh needs it. And when app2.sh needs a different version you need that to be at /path2


You need the runtime though


And even a java program may need a system wide install of ffmpeg or opencv or libgtk or VC runtime 2019 but not 2025 or some other dependency.

And sometimes you want to ship multiple services together.

In any case 'docker run x' is easier and seemingly less error prone than a single sudo apt get install


I would argue that the traditional way to install applications (particularly servers) on UNIX wasn’t very compatible with the needs that arose in the 2000s.

The traditional way tends to assume that there will be only one version of something installed on a system. It also assumes that when installing a package you distribute binaries, config files, data files, libraries and whatnot across lots and lots of system directories. I grew up on traditional UNIX. I’ve spent 35+ years using perhaps 15-20 different flavors of UNIX, including some really, really obscure variants. For what I did up until around 2000, this was good enough. I liked learning about new variants. And more importantly: it was familiar to me.

It was around that time I started writing software for huge collections of servers sitting in data centers on a different continent. Out of necessity I had to make my software more robust and easier to manage. It had to coexist with lots of other stuff I had no control over.

It would have to be statically linked, everything I needed had to be in one place so you could easily install and uninstall. (Eventually in all-in-one JAR files when I started writing software in Java). And I couldn’t make too many assumptions about the environment my software was running in.

UNIX could have done with a re-thinking of how you deal with software, but that never happened. I think an important reason for this is that when you ask people to re-imagine something, it becomes more complex. We just can’t help ourselves.

Look at how we reimagined managing services with systemd. Yes, now that it has matured a bit and people are getting used to it, it isn’t terrible. But it also isn’t good. No part of it is simple. No part of it is elegant. Even the command line tools are awkward. Even the naming of the command line tools fail the most basic litmus test (long prefixes that require too many keystrokes to tab-complete says a lot about how people think about usability - or don’t).

Again, systemd isn’t bad. But it certainly isn’t great.

As for blaming Python, well, blame the people who write software for _distribution_ in Python. Python isn’t a language that lends itself to writing software for distribution and the Python community isn’t the kind of community that will fix it.

Point out that it is problematic and you will be pointed to whatever mitigation that is popular at the time (to quote Queen “I've fallen in love for the first time. And this time I know it's for real”), and people will get upset with you, downvote you and call you names.

I’m too old to spend time on this so for me it is much easier to just ban Python from my projects. I’ve tried many times, I’ve been patient, and it always ends up biting me in the ass. Something more substantial has to happen before I’ll waste another minute on it.


> UNIX could have done with a re-thinking of how you deal with software, but that never happened.

I think it did, but the Unix world has an inherent bad case of "not invented here" syndrome, and a deep cultural reluctance to admit that other systems (OSes, languages, and more) do some things better.

NeXTstep fixed a big swath of issues (in the mid-to-late 1980s). It threw out X and replaced it with Display Postscript. It threw out some of the traditional filesystem layout and replaced it with `.app` bundles: every app in its own directory hierarchy, along with all its dependencies. Isolation and dependency packaging in one.

(NeXT realised this is important but it has to be readable and user-friendly. It replaces the traditional filesystem with something more readable. 15Y later, Nix realised the same lesson, but forgot the 2nd, so it throws out the traditional FHS and replaces it with something less readable, which needs software to manage it. The NeXT way means you can install an app with a single `cp` command or one drag-and-drop operation.)

Some of this filtered back upstream to Ritchie, Thompson and Pike, resulting in Plan 9: bin X, replace it with something simpler and filesystem-based. Virtualise the filesystem, so everything is in a container with a virtual filesystem.

But it wasn't Unixy enough so you couldn't move existing code to it. And it wasn't FOSS, and arrived at the same time as a just-barely-good-enough FOSS Unix for COTS hardware was coming: Linux on x86.

(The BSDs treated x86 as a 2nd class citizen, with grudging limited support and the traditional infighting.)


I can’t remember NeXTStep all that well anymore, but the way applications are handled in Darwin is a partial departure from the traditional unix way. Partial, because although you can mostly make applications live in their own directory, you still have shared, global directory structures where app developers can inflict chaos. Sometimes necessitating third party solutions for cleaning up after applications.

But people don’t use Darwin for servers to any significant degree. I should have been a bit more specific and narrowed it down to Linux and possibly some BSDs that are used for servers today.

I see the role of Docker as mostly a way to contain the “splatter” style of installing applications. Isolating the mess that is my application from the mess that is the system so I can both fire it up and then dispose of it again cleanly and without damaging my system. (As for isolation in the sense of “security”, not so much)


> a way to contain the “splatter” style of installing applications

Darwin is one way of looking at it, true. I just referred to the first publicly released version. NeXTstep became Mac OS X Server became OS X became macOS, iOS, iPadOS, watchOS, tvOS, etc. Same code, many generations later.

So, yes, you're right, little presence on servers, but still, the problems aren't limited to servers.

On DOS, classic MacOS, on RISC OS, on DR GEM, on AmigaOS, on OS/2, and later on, on 16-bit Windows, the way that you install an app is that you make a directory, put the app and its dependencies in it, and maybe amend the system path to include that directory.

All single-user OSes, of course, so do what you want with %PATH% or its equivalent.

Unix was a multi-user OS for minicomputers, so the assumption is that the app will be shared. So, break it up into bits, and store those component files into the OS's existing filesystem hierarchy (FSH). Binaries in `/bin`, libraries in `/lib`, config in `/etc`, logs and state in `/var`, and so on -- and you can leave $PATH alone.

Make sense in 1970. By 1980 it was on big shared departmental computers. Still made sense. By 1990 it was on single-user workstations, but they cost as much as minicomputers, so why change?

The thing is, the industry evolved underneath. Unix ended up running on a hundred million times more single-user machines (and VMs and containers) than multiuser shared hosts.

The assumptions of the machine being shared turned out to be wrong. That's the exception, not the rule.

NeXT's insight was to only keep the essential bits of the shared FSH layout, and to embed all the dependencies in a folder tree for each app -- and then to provide OS mechanisms to recognise and manipulate those directory trees as individual entities. That was the key insight.

Plan 9 virtualised the whole FSH. Clever but hard to wrap one's head around. It's all containers all the way down. No "real" FSH.

Docker virtualises it using containers. Also clever but in a cunning-engineer's-hacky-kludge kind of way, IMHO.

I think GoboLinux maybe made the smartest call. Do the NeXT thing, junk the existing hierarchy -- but make a new more-readable one, with the filesystem as the isolation mechanism, and apply it to the OS and its components as well. Then you have much less need for containers.


I agree with tou that the issue is packaging. And to have developers trying to package software is the issue IMO. They will come up with the most complicated build system to handle all scenarios, and the end result will be brittle and unwieldy.

There’s also the overly restrictive dependency list, because each deps in turn is happy to break its api every 6 months.


Exactly this, but not just Python. The traditional way most Linux apps work is that they are splayed over your filesystem with hard coded references to absolute paths and they expect you to provide all of their dependencies for them.

Basically the Linux world was actively designed to apps difficult to distribute.


It wasn't about making apps difficult to distribute at all, that's a later side effect. Originally distros were built around making a coherent unified system of package management that made it easier to manage a system due to everything being built on the same base. Back then Linux users were sysadmins and/or C programmers managing (very few) code dependencies via tarballs. With some CPAN around too.

For a sysadmin, distros like Debian were an innovative godsend for installing and patching stuff. Especially compared to the hell that was Windows server sysadmin back in the 90s.

The developer oriented language ecosystem dependency explosion was a more recent thing. When the core distros started, apps were distributed as tarballs of source code. The distros were the next step in distribution - hence the name.


Right but those things are not unrelated. Back in the day if you suggested to the average FOSS developer that maybe it should just be possible to download a zip of binaries, unzip it anywhere and run it with no extra effort (like on Windows), they would say that that is actively bad.

You should be installing it from a distro package!!

What about security updates of dependencies??

And so on. Docker basically overrules these impractical ideas.


It’s still actively bad. And security updates for dependencies is easy to do when the dependencies developer is not bundling those with feature changes and actively breaking the API.


I would say those are good point, not impractical ideas.

You make software harder to distribute (so inconvenient for developers and distributors) but gain better security updates and lower resource usage.


The success of Docker shows that this is a minority view.


I was replying to a comment comparing the distribution of self-contained binaries to Linux package management. This is a much more straightforward question

Containers are a related (as the GP comment says) thing, but offer a different and varied set of tradeoffs.

Those tradeoffs also depend on what you are using containers for. Scaling by deploying large numbers of containers on a cloud providers? Applications with bundled dependencies on the same physical server? As a way of providing a uniform development environment?


> Those tradeoffs also depend on what you are using containers for. Scaling by deploying large numbers of containers on a cloud providers? Applications with bundled dependencies on the same physical server? As a way of providing a uniform development environment?

Those are all pretty much the same thing. I want to distribute programs and have them work reliably. Think about how they would work if Linux apps were portable as standard:

> Scaling by deploying large numbers of containers on a cloud providers?

You would just rsync your deployment and run it.

> Applications with bundled dependencies on the same physical server?

Just unzip each app in its own folder.

> As a way of providing a uniform development environment?

Just provide a zip with all the required development tools.


> Those are all pretty much the same thing. I want to distribute programs and have them work reliably.

Yes, they are very similar in someways, but the tradeoffs (compared to using containers) would be very different.

> You would just rsync your deployment and run it.

If you are scaling horizontally and not using containers you are already probably automating provisioning and maintenance of VMs, so you can just use the same tools to automate deployment. You would also be running one application per VM so you do not need to worry about portability.

> Just unzip each app in its own folder.

What is stopping people from doing this? You can use an existing system like Appimage, or write a windows like installer (Komodo used to have one). The main barrier as far as I can see is that users do not like it.

> Just provide a zip with all the required development tools.

vs a container you still have to configure it and isolation can be nice to have in a development environment.

vs installing what you need with a package manager, it would be less hassle in some cases but this is a problem that is largely solved by things like language package managers.


> What is stopping people from doing this?

Most Linux apps do not bundle their dependencies, don't provide binary downloads, and aren't portable (they use absolute paths). Some dependencies are especially awkward like glibc and Python.

It is improving with programs written in Rust and Go which tend to a) be statically linked, and b) are more modern so they are less likely to make the mistake of using absolute paths.

Incidentally this is also the reason Nix has to install everything globally in a single root-owned directory.

> The main barrier as far as I can see is that users do not like it.

I don't think so. They've never been given the option.


> Most Linux apps do not bundle their dependencies, don't provide binary downloads, and aren't portable (they use absolute paths).

That is because the developers choose not to, and no one else chooses to do it for them. On the other hand lots of people package applications (and libraries) for all the Linux distros out there.

> I don't think so. They've never been given the option.

The options exist. AppImage does exactly what you want. Snap and Flatpak are cross distro, have lots of apps, and are preinstalled by many major distros.


Docker was the tool for those who couldn't create a deb or rpm package.


You mean a deb and an rpm right? And multiple versions of each.


The same as we have docker images with different OS flavours now: "-debian", "-alpine", "-slim", etc.


deb or rpm packages are a tool for people who did not care about reproducible code and who always lived "at the edge".


So Debian, Ubuntu LTS, RHEL, distributions with multi-year release cycles are living at the edge? Ok.


Sure, it is not as edgy Arch or something, but unless you have your own mirror, your stuff can be broken at any time.

To be fair, they are _usually_ pretty good about that, the last big breakage I've seen was that git "security" fix which basically broke git commands as root. There is also some problems with Ubuntu LTS kernel upgrades, but docker won't save you here, you need to use something like AMI images.


The irony is that the majority of docker images are built from the same packages and break all the same. But in your eyes, `apt install package` is bad but `RUN apt install package` inside a `Dockerfile` somehow makes it reproducible. I suspect you are confusing "having an artifact" with "reproducible builds" [1]. Having a docker image as an artifact is the same as having tar/zip with your application and its dependencies or having a filesystem snapshot or having VM image like AMI/OVM/VMDK. You can even have a deb file with all your dependencies vendored in.

[1]: https://en.wikipedia.org/wiki/Reproducible_builds


If only handling Dockerfiles were as easy.


"Dockerfile is simple", they promised. Now look at the CNCF landscape.


I stopped listening to cloud related podcasts, because it started to feel like it was just PR for whatever product the guest came up with.


why would you do this?

If you are considering bare-metal servers with deb files, you compare them to bare-metal servers with docker containers. And in the latter case, you immediately get all the compatibility, reproducibility, ease of deployment, ease of testing, etc... and there is no need for a single YAML file.


If you need a reliable deployment without catching 500 errors from Docker Hub, then you need a local registry. If you need a secure system without accumulating tons of CVEs in your base images, then you need to rebuild your images regularly, so you need a build pipeline. To reliably automate image updates, you need an orchestrator or switch to podman with `podman auto-update` because Docker can't replace a container with a new image in place. To keep your service running, you again need an orchestrator because Docker somehow occasionally fails to start containers even with --restart=always. If you need dependencies between services, you need at least Docker Compose and YAML or a full orchestrator, or wrap each service in a systemd unit and switch all restart policies to systemd. And you need a log collection service because the default Docker driver sucks and blocks on log writes or drops messages otherwise. This is just the minimum for production use.


Yes, running server farms in production is complex, and docker won't magically solve _every one_ of your problems. But it's not like using deb files will solve them either - you need most of the same components either way.

> If you need a reliable deployment without catching 500 errors from Docker Hub, then you need a local registry.

Yes, and with debs you need local apt repository

> If you need a secure system without accumulating tons of CVEs in your base images, then you need to rebuild your images regularly, so you need a build pipeline.

presumably you were building your deb with build pipeline as well.. so the only real change is that pipeline now has to has timer as well, not just "on demand"

> To reliably automate image updates, you need an orchestrator or switch to podman with `podman auto-update` because Docker can't replace a container with a new image in place.

With debs you only have automatic-updates, which is not sufficient for deployments. So either way, you need _some_ system to deploy the images and monitor the servers.

> To keep your service running, you again need an orchestrator because Docker somehow occasionally fails to start containers even with --restart=always. If you need dependencies between services, you need at least Docker Compose and YAML or a full orchestrator, or wrap each service in a systemd unit and switch all restart policies to systemd.

deb files have the same problems, but here dockerfiles have an actual advantage: if you run supervisor _inside_ docker, then you can actually debug this locally on your machine!

No more "we use fancy systemd / ansible setups for prod, but on dev machines here are some junky shell scripts" - you can poke the things locally.

> And you need a log collection service because the default Docker driver sucks and blocks on log writes or drops messages otherwise. This is just the minimum for production use.

What about deb files? I remember bad old pre-systemd days where each app had to do its own logs, as well as handle rotations - or log directly to third-party collection server. If that's your cup of tea, you can totally do this in docker world as well, no changes for you here!

With systemd's arrival, the logs actually got much better, so it's feasible to use systemd's logs. But here is a great news! docker has "journald" driver, so it can send its logs to systemd as well... So there is feature parity there as well.

The key point is there are all sorts of so-called "best practices" and new microservice-y way of doing things, but they are all optional. If you don't like them, you are totally free to use traditional methods with Docker! You still get to keep your automation, but you no longer have to worry about your entire infra breaking, with no easy revert button, because your upstream released broken package.


You switched from

> ease of deployment

to

> running server farms in production is complex

You just confirmed my initial point:

> "Dockerfile is simple", they promised. Now look at the CNCF landscape.

> with debs you need local apt repository

No, you don't need an apt repository. To install a deb file, you need to scp/curl the file and run `dpkg`.

>presumably you were building your deb with build pipeline as well

You don't need to rebuild the app package every time there is a new CVE in a dependency. Security updates for dependencies are applied automatically without any pipeline, you just enable `unattended-updates`, which is present out of the box.

> With debs you only have automatic-updates, which is not sufficient for deployments.

Again, you only need to run `dpkg` to update your app. preinst, postinst scripts and systemd unit configuration included in a deb package should handle everything.

> deb files have the same problems

No, they don't. deb files intended to run as a service have systemd configuration included and every major system now runs systemd.

> but here dockerfiles have an actual advantage: if you run supervisor _inside_ docker, then you can actually debug this locally on your machine!

Running a supervisor inside a container is an anti-pattern. It just masks errors from the orchestrator or external supervisor. Also usually messes with logs.

> No more "we use fancy systemd / ansible setups for prod, but on dev machines here are some junky shell scripts" - you can poke the things locally.

systemd/ansible are not fancy but basic beginner-level tools to manage small-scale infrastructure. That tendency to avoid appropriate but unfamiliar tools and retreat into more comfortable spaces reminds me of the old joke about a drunk guy searching for keys under a lamp post.

> What about deb files? I remember bad old pre-systemd days where each app had to do its own logs, as well as handle rotations - or log directly to third-party collection server.

Everything was out of the box - syslog daemon, syslog function in libc, preconfigured logrotate and logrotate configs included in packages.

There are special people who write their own logs bypassing syslog and they are still with us and they still write logs into files inside containers.

There are already enough rants about journald, so I'll skip that.

> but you no longer have to worry about your entire infra breaking, with no easy revert button, because your upstream released broken package.

Normally, updates are applied in staging/canary environments and tested. If upstream breaks a package - you pin the package to a working version, report the bug to the upstream or fix it locally and live happily ever after.


> Basically the Linux world was actively designed to apps difficult to distribute.

It has "too many experts", meaning that everyone has too much decision making power to force their own tiny variations into existing tools. So you end up needing 5+ different Python versions spread all over the file system just to run basic programs.


It was more like, library writers forgot how to provide stable APIs for their software, and applications decided they just wanted to bundle all the dependencies they needed together and damn the consequences on the rest of the system. Hence we got static linked binaries and then containers.


even if you have a stable interface... the user might not want to install it and then forget to remove it down the line


Pretty much this; systems with coherent isolated dependency management, like Java, never required OS-level container solutions.

They did have what you could call userspace container management via application servers, though.


NodeJS, Ruby, etc also have this problem, as does Go with CGO. So the problem is the binary dependencies with C/C++ code and make, configure, autotools, etc... The whole C/C++ compilation story is such a mess that almost 5 decades ago inventing containers was pretty much the only sane way of tackling it.

Java at least uses binary dependencies very rarely, and they usually have the decency of bundling the compiled dependencies... But it seems Java and Go just saw the writing on the wall and mostly just reimplement everything. I did have problems with the Snappy compression in the Kafka libraries, though, for instance .


The issue is with cross platform package management without proper hooks for the platform themselves. That may be ok if the library is pure, but as soon as you have bindings to another ecosystem (C/C++ in most cases), then it should be user/configurable instead of the provider doing the configuration with post installs scripts and other hacky stuff.

If you look at most projects in the C world, they only provide the list of dependencies and some build config Makefile/Meson/Cmake/... But the latter is more of a sample and if your platform is not common or differs from the developer, you have the option to modify it (which is what most distros and port systems do).

But good luck doing that with the sprawling tree of modern packages managers. Where there's multiple copies of the same libraries inside the same project just because.


Funnily enough, one of the first containers I did on my current job was to package a legacy Java app.

It was pretty old, and required a very specific version of java, not available on modern systems. Plus some config files in global locations.

Packaging it in the docker container made it so much easier to use.


That makes sense, but still - there is nothing OS-specific here, like system lib or even a database, it's just the JRE version that you needed to package into the container, or am I missing something?


I don't agree with this. Java systems were one of the earliest beneficiaries of container-based systems, which essentially obsoleted those ridiculously over-complicated, and language-specific, application servers that you mentioned.


Java users largely didn't bother with containers IME, largely for the same reasons that most Java users didn't bother with application servers. Those who did want that functionality already had it available, making the move from an existing Java application server to Docker-style containers a minor upgrade at best.


This is just a testament to how widely Java is used, and in how many different ways. Sounds like you're more focused on "Core Java" applications, or something like that. Every company I've been with since the late 90s was using application servers of some kind until Docker came along. And all of the more recent ones switched to containers and ditched the application servers.

The switch was often much more than a minor upgrade, because it often made splitting up monoliths possible in ways that the Java ecosystem itself didn't have good support for.


Tomcat and Jetty are application servers which are in almost every Spring application. There are such application servers which you mentioned, like Wildfly, but they are not obsolete as a whole.


Tomcat and Jetty are not application servers according to the Jakarta EE definition. They're just servlet containers.

The reason Spring includes those libraries is partly historical - Spring is old, and dates from the applications server days. Newer frameworks like Micronaut and Quarkus use more focused and performant libraries like Netty, Vert.x, and Undertow instead.


Not really, and apparently there is enough value to now having startups replicating application servers by running WebAssembly docker containers as Kubernetes pods.


What are you thinking of specifically? Because using WASM doesn't sound like "replicating application servers", but rather like an attempt to address things like the startup speed of typical large Java apps.

Unless you just mean that using Kubernetes at all is replicating application servers, which was my point. Kubernetes makes language-specific application servers like Wildfly/JBoss or Websphere obsolete, and is much more powerful, generic, and an improvement in just about every respect.


I rather deal with Websphere 5 than Kubernetes, that version number is on purpose, anyone that was there will get it.

As for the question I mean the startups trying to sell the idea to use WebAssembly based pods as the next big idea.


Pyinstaller predates Docker. It's not about any individual language not being able to do packaging, it's about having a uniform interface for running applications in any language/architecture. That's why platforms like K8s don't have to know a thing about Python or anything else and they automatically support any future languages too.


Sure they definitely were using Docker for their own applications, but also dotCloud was itself a PaaS, so they were trying to compete with Heroku and similar offerings, which had buildpacks.

The problem is/was that buildpacks aren't as flexible and only work if the buildpack exists for your language/runtime/stack.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: