Hacker Newsnew | past | comments | ask | show | jobs | submit | ananthakumaran's commentslogin

We have a similar use case. All Elixir code base, but need to use Python for ML libraries. We decided to use IPC. Elixir will spawn a process and communicate over stdio. https://github.com/akash-akya/ex_cmd makes it a breeze to stream stdin and stdout. This also has the added benefit of keeping the Python side completely stateless and keeping all the domain logic on the Elixir side. Spawning a process might be slower compared to enqueuing a job, but in our case the job usually takes long enough to make it irrelevant.

This might be of interest to others: Last night I stumbled across Hornbeam, a library in a similar vein from the author of Gunicorn that handles WSGI / ASGI apps as well as a specific wrapper for ML inference

https://erlangforums.com/t/hornbeam-wsgi-asgi-server-for-run... https://github.com/benoitc/hornbeam


We also had a similar use case, so I built Snex[0] - specifically for Elixir-Python interop. Elixir-side spawns interpreters with Ports managed by GenServers, Python-side has a thin asyncio runtime to run arbitrary user code. Declarative environments (uv), optimized serde with language-specific objects (like `%MapSet{}` <-> `set`), etc. Interpreters are meant to be long lived, so you pay for initialization once.

It's a very different approach than ex_cmd, as it's not really focused on the "streaming data" use case. Mine is a very command/reply oriented approach, though the commands can flow both ways (calling BEAM modules from Python). The assumption is that big data is passed around out of band; I may have to revisit that.

[0]: https://github.com/kzemek/snex


I have one vibecoded ml pipeline now and I'm strongly considering just clauding it into Nx so I can ditch the python

I did exactly this in early 2025 with a small keyword tagging pipeline.

You may run into some issues with Docker and native deps once you get to production. Don’t forget to cache the bumblebee files.


No problem. It's an SLM, I have a dedicated on-prem GPU server that I deploy behind tailscale for inference. For training, I reach out to lambdalabs and just get a beefy GPU for a few hours for the cost of a Starbucks coffee.

Similar use case as well. I use erl ports to spawn a python process as well. Error handling is a mess, but using python as a short scripting language and elixir for all the database/application/architecture has been very ideal

Honestly you saved yourself major possible headaches down the line with this approach.

At my work we run a fairly large webshop and have a ridiculous number of jobs running at all times. At this point most are running in Sidekiq, but a sizeable portion remain in Resque simply because it does just that, start a process.

Resque workers start by creating a fork, and that becomes the actual worker.

So when you allocate half your available RAM for the job, its all discarded and returned to the OS, which is FANTASTIC.

Sidekiq, and most job queues uses threads which is great, but all RAM allocated to the process stays allocated, and generally unused. Especially if you're using malloc it's especially bad. We used jemalloc for a while which helped since it allocates memory better for multithreaded applications, but easiest is to just create a process.

I don't know how memory intensive ML is, what generally screwed us over was image processing (ImageMagick and its many memory leaks) and... large CSV files. Yeah come to think of it, you made an excellent architectural choice.


Is this part of a web server or some other system where you could end up spawning N python processes instead of 1 at a time?

I use a similar strategy for python calls from elixir. This is in a web server, usually they're part of a process pool. So we start up N workers and they hang out and answer requests when needed. I just have an rpc abstraction that handles all the fiddly bits. The two sides pass erlang terms back and forth. Pretty simple.

No, it's a background job. We can easily control the Python process count by controlling the job queue concurrency on the Elixir side.

ES should be thought of as a json key value store and search engine. The json key value store is fully consistent and supports read after write semantics, refresh is needed for search api. In some cases it does make sense to treat it as a database provided the key value store semantics is enough.

I used it about 7 years ago. Text search was not that heavily used, but we utilized the keyword filter heavily. It's like having a database where you can throw any query at it and it would return a response in reasonable time, because you are just creating an index on all fields.



Exactly, I was reading the blog and wondering the whole time how it's better than --update-refs, which I have been using a lot recently.


This is one of the things Elixir got right on day 1, all the libraries in the ecosystem use the standard library and it makes dealing with logs so much easier compared to other languages.


Have you checked the js grammar? It has a regex grammar, though not sure if that's what you are looking for



> For example, you don’t need sidekiq or Redis as Elixir/Erlang is a concurrent, distributed platform. This simplifies operations and is cheaper to run.

This is simply wrong. You need a Sidekiq alternative (There is Exq which is protocol compatible with Sidekiq, others like Oban are available as well) because Erlang processes are not durable, nor do you get retry, concurrency control, etc. Everyone loves to claim Erlang is distributed and you can connect them so Redis is useless? but this is rarely utilized in the context of the web server. I have been using Elixir for more than 5+ years now, and I have never seen a valid use case for connecting two nodes.

In reality, the tech stack of the Elixir web app is mostly similar, but it makes life a lot easier if you ever have to deal with any kind of concurrency. Making concurrent requests is as simple as doing a map over a list and they just work, unlike other languages where you have to double-guess whether the library you use is thread-safe.


> have never seen a valid use case for connecting two nodes

Have you never worked with websockets, LiveView, caching, OTP, etc? Those require distributed Elixir.

And okay, need durability? Oban - which runs without Redis and kicks the crap out of Sidekiq.


It depends on the layer, some of the layers might be able to take advantage of how the data is persisted. For example, if you use avro/protobuf, the decoder will handle it for you. If that's not the case, you would have to implement the migration by yourself. There is a paper[1] on this subject called "Online, asynchronous schema change in F1", which explains how to implement it.

1: https://dl.acm.org/doi/abs/10.14778/2536222.2536230


thanks, i'm really enjoying that paper


Salesforce has this model of development and it's horrible. They won't let you run the code locally since the environment is proprietary. Even to run the tests, the code has to be uploaded and run on their servers.


Same in the semiconductor sector. You run everything remotely off a centralized on-prem mainframe and your laptop is just a thin client to a X11 session on the server. At least that's how it was 8 or so years ago. Maybe now they started to move to the cloud.


But that's understandable. You need powerful machines to run simulations and stuff. And we could collaborate with colleagues over phone, with a shared VNC session, etc.


Ah yes, the joys of developing in Salesforce "Apex" At a previous company, we had some crazy "build scripts" that would sync a local filesystem into SF, but I recall it being pretty fragile. Also, there were annoying name spacing issues...


This was a long time ago, but I had a similar experience with Blackbaud (A Salesforce competitor). The suite of APIs required was only installed on a server so you could not run the code locally. You could either perform code edits locally and copy the files before building, or simply do all development while remoted into the dev server.


With all the time/effort Salesforce has put in to SFDX and the VSCode plugins releasing an APEX compiler must just be completely off the table. I do a lot of Salesforce work and introducing new developers to the SF way of doing things results in a lot of "wut." facial expressions.


In a previous company, we'd have people making direct edits in Salesforce. We'd then export the edits with the Salesforce CLI "force" utility, bring it into a local git repo, run some rather buggy scripts that would change the object namespaces/prefixes to a generic prefix. Then you could finally do a diff and a PR in a normal way.

We then had other scripts that would take the "generic" code and change the namespaces/prefixes for upload into QA or prod environments. It was quite painful.


It does have PDF import support, I am using pdf.js to extract data and it works decently from what I have tested.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: