I was hoping to migrate to Fly.io and during my testing I found that simple deploys would drop connections for a few seconds during a deploy switch over. Try a `watch -n 2 curl <serviceipv4>` during a deploy to see for yourself (try any one of the the strategies documented including blue-green). I wonder how many people know this?
When I tested it I was hoping for at worst early termination of old connections with no dropped new connections and at best I expected them to gracefully wait for old connections to finish. But nope, just a full downtime switch over every time. But then when you think about the network topology described in their blog posts, you realize theres no way it could've been done correctly to begin with.
It's very rare for me to comment negatively on a service but that fact that this was the case paired with the way support acted like we were crazy when we sent video evidence of it definitely irked me for infrastructure company standards. Wouldn't recommend it outside of toy applications now.
> It feels like it's compelling to those scared of or ignorant of Kubernetes
I've written pretty large deployment systems for kubernetes. This isn't it. Theres a real space for heroku-like deploys done properly and no one is really doing it well (or at least without ridiculously thin or expensive compute resources)
Yeah I had a similar experience where I got builds frozen for a couple days, such that I was not able to release any updates. When I emailed their support, I got an auto-response asking me to post in the forum. Pretty much all hosts are expected to offer a ticket system even for their unmanaged services if its a problem on their side. I just moved over all my stuff to Render.com, it's more expensive, but its been reliable so far.
That forum post just says what OP said, that they will ignore all tickets from unnmanaged customers. Which is a pretty shitty thing to do to your customers.
The cheapest plan that gets email support is nothing more than a commitment to spend a minimum of $29/mo on their services. That is, if you spend >=$29/mo, it costs nothing extra. Not what I'd call "managed".
> I've written pretty large deployment systems for kubernetes. This isn't it. Theres a real space for heroku-like deploys done properly and no one is really doing it well (or at least without ridiculously thin or expensive compute resources)
Have you tried Google Cloud Run(based on KNative) I've never used it in production, but on paper seems to fit the bill.
Yeah we're mostly hosted there now. The cpu/virtualization feels slow but I haven't had time to confirm (we had to offload super small ffmepg operations).
It's in a weird place between heroku and lambda. If your container has a bad startup time like one of our python services, autoscaling can't be used as latency becomes a pain. Its also common deploy services on there that need things like health checks (unlike functions which you assume are alive), this assumes at least 1 instance of sustained use as well, assuming you do minute health checks. Their domain mapping service is also really really bad and can take hours to issue a cert for a domain so you have to be very careful about putting a lb in front of it for hostname migrations.
I don't care right now but the fact that we're paying 5x in compute is starting to bother me a bit. A 8core 16gb 'node' is ~$500/month ($100 on DO) assuming you don't scale to zero (which you probably wont). Plus I'm pretty sure the 8 cores reported isn't a meaty 8 cores.
But its been pretty stable and nice to use otherwise!
A 6c / 12t Dedicated Server with 32GB of ram is 65$ a month with OVH
I do get that it is a bare server, but if you deploy even just bare containers to it, you would be saving a good bit of money and get better performance from it.
It depends on what the 6 cores are. Like I have a 8C/8T dedicated server sitting in my closet that costs $65 per the number of times you buy it. (Usually once.) The cores are not as fast as the highest-end Epyc cores, however ;)
At the $65/month level for an OVH dedicated server, you get a 6-core CPU from 2018 and a 500Mbps public network limit. Doesnt even seem like that good a deal.
There is also a $63/month option that is significantly worse.
We also run some small ffmpeg workloads and experimented with Cloud Run consuming Pub/sub via EventArc triggers. Since Cloud Run's opaque scaling is tied to http requests, EventArc uses a push subscription. In pub/sub these don't give you any knobs to turn regarding rate limiting/back pressure, so it basically tries to DoS your service and then backs off. This setup was basically impossible to tune or monitor properly.
Our solution was to migrate the service to Kubernetes using an HPA scaling on the number of un-acked messages in the subscription, and then use a pull subscription to ensure reliable delivery (if the service is down they just sit in the queue rather retrying indefinitely).
I'm convinced Cloud Run/Functions are only useful for trivial HTTP workloads at this point and I rarely consider them.
Triggered deploys to Kubernetes you mean? There's a million ways to solve this problem for better or worse. We use Gitlab CI so we invoke helm in our pipelines (I'm sure there's a way to do this with github actions), but there's also flux cd, argo, etc. etc.
We use Kubernetes (GKE) elsewhere so we already had this machinery in place luckily. I can see the appeal of CloudRun/Functions as a way to avoid taking that plunge
I have yet to gain positive experience with Cloud Run. I have one project with it, and Cloud Run is very unpredictable with autoscaling. Sometimes, it can start spinning up/down containers without any apparent reason, and after hunting Google support for months, they said it is an "expected behavior". Good luck trying to debug this independently because you don't have access to knative logs.
Starting containers on Cloud Run is weirdly slow, and oh boy, how expensive that thing is. I'm getting the impression that pure VMs + Nomad would be a way better option.
> I'm getting the impression that pure VMs + Nomad would be a way better option
As a long time Nomad fan (disclaimer: now I work at HashiCorp), I would certainly agree. You lose some on the maintenance side because there's stuff for you to deal with that Google could abstract for you, but the added flexibility is probably worth it.
> Starting containers on Cloud Run is weirdly slow
What is this about? I assumed a highly throttled cpu or terrible disk performance. A python process that would start in 4 seconds locally could easily take 30 seconds there.
I just use AWS EC2, load balancer, auto scaling groups. The user_data pulls and runs a docker image. To deploy I do an instance refresh which has no downtime. Obvious downside is more configuration than more managed services.
I have been using Google Cloud Run in production for a few years and have had a very good experience. It has the fastest auto scaler I have ever seen, except only for FaaS, which are not a good option for client-facing web services.
Cloud Run is compatible with KNative YAML but actually runs on Borg under the hood, not Kubernetes. At least when not using the "Cloud Run on GKE" option via Anthos.
When I tested it I was hoping for at worst early termination of old connections with no dropped new connections and at best I expected them to gracefully wait for old connections to finish. But nope, just a full downtime switch over every time. But then when you think about the network topology described in their blog posts, you realize theres no way it could've been done correctly to begin with.
It's very rare for me to comment negatively on a service but that fact that this was the case paired with the way support acted like we were crazy when we sent video evidence of it definitely irked me for infrastructure company standards. Wouldn't recommend it outside of toy applications now.
> It feels like it's compelling to those scared of or ignorant of Kubernetes
I've written pretty large deployment systems for kubernetes. This isn't it. Theres a real space for heroku-like deploys done properly and no one is really doing it well (or at least without ridiculously thin or expensive compute resources)