I was hoping to migrate to Fly.io and during my testing I found that simple depl...

asaddhamani · on Feb 14, 2024

Yeah I had a similar experience where I got builds frozen for a couple days, such that I was not able to release any updates. When I emailed their support, I got an auto-response asking me to post in the forum. Pretty much all hosts are expected to offer a ticket system even for their unmanaged services if its a problem on their side. I just moved over all my stuff to Render.com, it's more expensive, but its been reliable so far.

loloquwowndueo · on Feb 14, 2024

The first (pinned) post in the fly.io forum explains it:

https://community.fly.io/t/fly-io-support-community-vs-email...

malfist · on Feb 14, 2024

That forum post just says what OP said, that they will ignore all tickets from unnmanaged customers. Which is a pretty shitty thing to do to your customers.

yencabulator · on Feb 16, 2024

The cheapest plan that gets email support is nothing more than a commitment to spend a minimum of $29/mo on their services. That is, if you spend >=$29/mo, it costs nothing extra. Not what I'd call "managed".

sofixa · on Feb 14, 2024

> I've written pretty large deployment systems for kubernetes. This isn't it. Theres a real space for heroku-like deploys done properly and no one is really doing it well (or at least without ridiculously thin or expensive compute resources)

Have you tried Google Cloud Run(based on KNative) I've never used it in production, but on paper seems to fit the bill.

parhamn · on Feb 14, 2024

Yeah we're mostly hosted there now. The cpu/virtualization feels slow but I haven't had time to confirm (we had to offload super small ffmepg operations).

It's in a weird place between heroku and lambda. If your container has a bad startup time like one of our python services, autoscaling can't be used as latency becomes a pain. Its also common deploy services on there that need things like health checks (unlike functions which you assume are alive), this assumes at least 1 instance of sustained use as well, assuming you do minute health checks. Their domain mapping service is also really really bad and can take hours to issue a cert for a domain so you have to be very careful about putting a lb in front of it for hostname migrations.

I don't care right now but the fact that we're paying 5x in compute is starting to bother me a bit. A 8core 16gb 'node' is ~$500/month ($100 on DO) assuming you don't scale to zero (which you probably wont). Plus I'm pretty sure the 8 cores reported isn't a meaty 8 cores.

But its been pretty stable and nice to use otherwise!

jetbalsa · on Feb 14, 2024

A 6c / 12t Dedicated Server with 32GB of ram is 65$ a month with OVH

I do get that it is a bare server, but if you deploy even just bare containers to it, you would be saving a good bit of money and get better performance from it.

doctorpangloss · on Feb 14, 2024

Another interpretation is the so-called dedicated servers are too good to be true.

jrockway · on Feb 14, 2024

It depends on what the 6 cores are. Like I have a 8C/8T dedicated server sitting in my closet that costs $65 per the number of times you buy it. (Usually once.) The cores are not as fast as the highest-end Epyc cores, however ;)

ac29 · on Feb 14, 2024

At the $65/month level for an OVH dedicated server, you get a 6-core CPU from 2018 and a 500Mbps public network limit. Doesnt even seem like that good a deal.

There is also a $63/month option that is significantly worse.

seabrookmx · on Feb 15, 2024

We also run some small ffmpeg workloads and experimented with Cloud Run consuming Pub/sub via EventArc triggers. Since Cloud Run's opaque scaling is tied to http requests, EventArc uses a push subscription. In pub/sub these don't give you any knobs to turn regarding rate limiting/back pressure, so it basically tries to DoS your service and then backs off. This setup was basically impossible to tune or monitor properly.

Our solution was to migrate the service to Kubernetes using an HPA scaling on the number of un-acked messages in the subscription, and then use a pull subscription to ensure reliable delivery (if the service is down they just sit in the queue rather retrying indefinitely).

I'm convinced Cloud Run/Functions are only useful for trivial HTTP workloads at this point and I rarely consider them.

parhamn · on Feb 15, 2024

Thats very interesting. Thanks for sharing.

But sweet sweet github triggered deploys. Have you found an easy solution to this?

seabrookmx · on Feb 17, 2024

> easy solution to this

Triggered deploys to Kubernetes you mean? There's a million ways to solve this problem for better or worse. We use Gitlab CI so we invoke helm in our pipelines (I'm sure there's a way to do this with github actions), but there's also flux cd, argo, etc. etc.

We use Kubernetes (GKE) elsewhere so we already had this machinery in place luckily. I can see the appeal of CloudRun/Functions as a way to avoid taking that plunge

dig1 · on Feb 14, 2024

I have yet to gain positive experience with Cloud Run. I have one project with it, and Cloud Run is very unpredictable with autoscaling. Sometimes, it can start spinning up/down containers without any apparent reason, and after hunting Google support for months, they said it is an "expected behavior". Good luck trying to debug this independently because you don't have access to knative logs.

Starting containers on Cloud Run is weirdly slow, and oh boy, how expensive that thing is. I'm getting the impression that pure VMs + Nomad would be a way better option.

sofixa · on Feb 14, 2024

> I'm getting the impression that pure VMs + Nomad would be a way better option

As a long time Nomad fan (disclaimer: now I work at HashiCorp), I would certainly agree. You lose some on the maintenance side because there's stuff for you to deal with that Google could abstract for you, but the added flexibility is probably worth it.

parhamn · on Feb 14, 2024

> Starting containers on Cloud Run is weirdly slow

What is this about? I assumed a highly throttled cpu or terrible disk performance. A python process that would start in 4 seconds locally could easily take 30 seconds there.

JoshTriplett · on Feb 14, 2024

Last I checked, Cloud Run isn't actually running real Linux, it's emulating Linux syscalls.

seabrookmx · on Feb 15, 2024

Cloud Run "gen2" runs a microvm (ala lambda) rather than gvisor, so it depends on your settings.

JoshTriplett · on Feb 15, 2024

Ah, good to know, thank you! I hadn't seen the announcement of the second generation environment.

jonatron · on Feb 14, 2024

I just use AWS EC2, load balancer, auto scaling groups. The user_data pulls and runs a docker image. To deploy I do an instance refresh which has no downtime. Obvious downside is more configuration than more managed services.

giovannibonetti · on Feb 14, 2024

I have been using Google Cloud Run in production for a few years and have had a very good experience. It has the fastest auto scaler I have ever seen, except only for FaaS, which are not a good option for client-facing web services.

davidspiess · on Feb 14, 2024

Same experience here, using it for years in production for our critical api services without issues.

seabrookmx · on Feb 15, 2024

Cloud Run is compatible with KNative YAML but actually runs on Borg under the hood, not Kubernetes. At least when not using the "Cloud Run on GKE" option via Anthos.

rollcat · on Feb 14, 2024

> Try a `watch -n 2 curl <serviceipv4>` during a deploy

You need blackbox HTTP monitoring right now, don't ever wait for your customer to tell you that your service is down.

I use Prometheus (&Grafana), but you can also get a hosted service like Pingdom or whatever.