The cost of egress traffic is a very good reason for many organizations to not fully migrate to a cloud provider anytime soon. And since, unlike with storage costs, there doesn't seem to be an actual reason (other than: it makes migrating to competitors cost-prohibitive in a subset of cases), that seems kind of... weird?
Small example: an actual company I do some work for is in the business of delivering creative assets to distributors. This results in an egress of around 180TB per month, which is, on average just, around 500Mb/s.
So, this company currently operates 2 racks in commercial data centers, linked via 10Gb/s Ethernet-over-DWDM, with 2x512Mb/s and 1x1Gb/s Internet uplinks per DC. Each rack has 2 generic-OEM servers with ~64 AMD Zen cores, 1/2TB RAM, ~8TB NVMe and ~100TB SAS RAID6 storage per node.
Just the cost-savings over egress on AWS is enough to justify that setup, including the cost of an engineer to keep it all up and running (even though the effort required for that turns out to be minimal).
So, are cloud providers ignoring a significant market here, or is the markup on their current customers lucrative enough?
> other than: it makes migrating to competitors cost-prohibitive in a subset of cases
My theory: it forces third party services into the same cloud.
Suppose you use AWS and you want to pay a third party SaaS provider for some service involving moderate-to-large amounts of data. Here’s one of many examples:
And look at this remarkable choice: you get to pick AWS, Azure, or GCP! Snowflake is paying a lot of money to host on those clouds, and they’re passing those costs on to customers.
Snowflake is big. They have lots of engineers. They are obviously cloud-agnostic: they already support three clouds. It would surely be much cheaper to operate a physical facility, and they could plausibly offer better performance (because NVMe is amazing), and they could split the cost savings with customers. But they don’t, and my theory is that egress from customers to Snowflake would negate any cost savings, and the variable nature of the costs would scare away customers.
So my theory is that the ways that customers avoid egress fees makes the major clouds a lot of money. IMO regulators should take a very careful look at this, but it’s an excellent business decision on the parts of the clouds.
Its this plus locking customers to one cloud, because egress kills intercloud syncing on any moderately large data set. Any smart customer would have a duplicated setup across clouds if egress cost what it actually cost instead of 100x plus what it actually costs
There have been other instances in which exit fees, which is what this amounts to, were considered anticompetitive, e.g. [1] (although this was settled so there is no ruling).
Google itself has started to waive egress costs for GCP customers leaving the platform last month, which, according to some sources, is simply a direct consequence of new EU legislation (Data Act) [2], but according to others is in anticipation of wider-reaching EU antitrust investigations [3].
I'm a little ignorant of the upcoming regs, but are they aiming to basically say "if a customer wants to leave a service, then the provider must provide them with all their data for free"?
I've not thought about the unintended consequences of this, but it feels like a reasonable regulation to have.
Ingress and egress should be treated the same. Without any anti-competitive reasoning in mind a provider can’t claim that egress in particular should be more expensive than ingress. Even more so when ingress is many times completely free.
The asymmetry is obviously meant to trap customers, which is anti-competitive.
Wholesale connectivity is usually priced per unit for the 95th percentile sample, in the dominant direction. For most cloud services, outbound is the dominant direction by far. That's why ingress is free in almost every hosting environment.
Additionally, settlement free peering is usually based on having a roughly balanced traffic ratio (something like 2:1 or 3:1 counts as roughly balanced), attracting more ingress traffic by making it free vs having paid egress helps the provider balance their ratios so they may help their case for settlement free peering.
I'm pretty sure ingress started off costing less due to the nature of usage. If you were to see a graph of the usage you'd see ingress is used less. One reason is an http request takes less bytes than the response to the request. So the disparity in cost can't simple be attributed to a desire to trap customers
I kind of doubt it's easier. If you use your own servers you need some stuff to manage them. If you use cloud, you need some cloud engineers to manage your cloud infrastructure. On top of that, if your developers use cloud APIs and frameworks, they have to learn that.
As someone working at a dev shop/full-service-agency mostly using cloud(Azure) and having also managed server on my own in the past it's an easy breakdown.
1: We have a dozen or so regular clients and being in a relatively high salary cost country, while our cloud bill isn't insignificant it's still far less than one full time employee, even less so than having 24/7 on-call handled (we have large enough 24/7 critical clients).
2: The cloud provides broken down billing, it's easy to pass it to the customer and just point to it, having our own employee we would need to structure up the billing process somehow. Our customers are outside of the IT field and doesn't care as long as it works and doesn't cost too much (it's a mostly fixed expenditure and easily separable from our development hours).
3: We have one client that does on-prem (partly out of habit) that's actually looking at parts to the cloud, partly because they've expanded internationally but I wouldn't be surprised if it's a bit of testing the waters since they have to spend a fair bit of their time on security and stability (security doesn't go away entirely of course but I'd be more sure of my backup processes on a large cloud compared to just locally).
4: I'm a tad on the fence of cloud-api's:
On the positive side: No need to implement a bunch of systems(badly?) and no need to build/install monitoring tools for those (and the extra administration burden that comes with updating those tools for X number of clients).
On the negative: There's a tad of churn on some cloud-api's, luckily Azure is mostly stable with plenty of migration time until you come around to a customer again (reading about Goog's churn feels like horror stories), and yes there is a bit of vendor lock-in if you go all-out (luckily we've not seen any reasons to change over).
If you don't still have that on call ops engineer, you're tricking yourself into thinking you have cover you simply don't. The low level hardware and networking issues that a cloud provider would abstract from you in a well set up system are a rounding error of the set of ops issues.
I used to provide services like that, on call, and the number of ops related interventions for my cloud customers were consistently higher than for those on colo setups.
It's easier but only at the beginning, when the needs are simple and you dream of unpredictable QPS spikes that cloud will magically handle for you, summed up pretty well in https://world.hey.com/dhh/we-have-left-the-cloud-251760fb
I think the QPS dream is something that was teased at to show feasibility (and scaling up services on the cloud IS simple when needed).
I think the bread and butter for the big clouds however are customers like the shop I'm at (see sibling comment), we can do development work and put it on our clients (mostly non-IT that has no interesting in hiring their own department) and if we for some reason need to part ways it's easy to "hand-over-the-keys" since subscriptions can be handed over between organizations.
But getting customers back is usually equally easy since there was never any fuzz when leaving so when they've become dissatisfied with the "cheap" options they don't remember leaving being a problem, regardless the winner is the cloud provider (but the costs are still small enough for each client that nobody is unhappy).
Yes, AWS-style cloud is just software-defined infrastructure.
It's like virtual lego - the bricks fit the same way as physical lego but the difference is you can rent as many virtual bricks as you like, almost instantly with little up-front cost.
You still need to know how to build the lego pirate ship either way.
I disagree. The company that put their infrastructure on AWS may well have made an informed choice. But the third party service hosted in AWS doesn’t have a choice, because AWS would punish their mutual customer with egress fees if the service moved out of AWS.
But it was heavily marketed that way so at the very least you have to acknowledge that there was a fair bit of false advertisement...
I don't think making an informed decision has anything to do with competitiveness in any case. Example I might make an informed decision to publish my app on the app store, but this doesn't mean its practices are not anti-competition.
Cloud wasn't predatory in this... the price has always been there. If you lease a car and they mandate you get your oil changed with them for a price, and the price sucks, that's YOUR fault.
No one got locked into the cloud and THEN the egress prices went up... everyone went in knowing this.
The industry needs to man up and own shitty decisions rather than double down on them forever.
I'd argue it's still quite predatory. Basically, they reel in users with other services/products and then charge extortionate prices for egress specifically with massive margins just because they can.
In certain use cases it becomes all or nothing, either you host everything on the cloud or nothing at all which is certainly abusive and highly anti-competitive.
Sign up now for free (and we will auto charge you next month)...
The nice razor handle is cheap, but the blades are expensive.
No one hid the prices from any one... it was all very up front and out in the open. No more data centers no more systems admins, no more capacity planing just scale on a dime... Here is the price chart....
And everyone didn't want to do that HARD work any more. They could just grow and grow and grow and it would save them money.
You know what happens when your not gonna run out of storage or bandwidth or servers... Everything gets bloated.
SO sure, we can say abusive... Amazon abused all the poor stupid vp's who took the lazy way out and let their systems get so fat and bloated on button mashing "l" shaped engineers. Crying about the lock in, about the pricing after you signed up for the service is your own fault. Take the hit and move on!
It's pretty simple. Excessive egress costs = vendor lock in, and yes, forcing third party services into the same cloud (the walled garden), and limiting customer choice.
Just another reason so many orgs are getting heartburn from going too deep too fast into the cloud.
Cloudflare -- Free for most services
OVH Cloud -- Free and unlimited
Scaleway -- Free for most services
Great:
Hetzner 20-60 TB / mo per instance $1.08
Not bad:
Linode 1-20 TB / mo per instance $5.00
Oracle Cloud 10 TB / mo $8.50
A bit much:
Backblaze 3x the amount of data stored $10.00
Bunny CDN -- $10.00
DigitalOcean 100 GB - 10 TB / mo per instance $10.00
UpCloud 500 GB - 24 TB / mo per instance $10.77
Vultr 2 TB / mo for most services $10.00
Uh...
Fly.io 100 GB / mo $20.00
Are you actually serious?
Microsoft Azure 100 GB / mo $78.30
Amazon Web Services 100 GB / mo $92.16
Railway -- $100.00
Zeabur 10-100 GB, depends on plan $100.00
Google Cloud Depends on service $111.60
Screw you guys:
Render 100 GB - 1 TB, depends on plan $300.00
Vercel 100 GB - 1 TB, depends on plan $400.00
Netlify 100 GB - 1 TB, depends on plan $550.00
(We use Netlify and have well over 1TB of monthly traffic. They're insanely expensive for what they are. As soon as we have roadmap time to revisit it, we'll move away.)
I'm starting to think of cloud as less of an asset and as more of a liability. We can leverage them for temporary scale, but in no way will we tie ourselves to a particular vendor.
One thing we should all have a subtle understanding of: if you are a company selling infrastructure tooling as a service (Vercel, for example). You should probably try to avoid hyperscalers.
The adage is that you should outsource the things that are not your core competence. However if your product is infrastructure for developers then outsourcing that is going to be painful.
You are forced to pass down costs from your upstream supplier, and headroom is necessary (somewhat) because your provider can dictate terms essentially on a whim.
I feel like this is such an obvious statement, but we do seem to be in the gold-rush of avoiding responsibility so I'm not certain.
Practically, this means that if you are an infrastructure tolling company - then you must reinvent all pieces of the cloud supply chain before building your product.
That’s a very expensive proposition. While the paper cost of a few racks in a colo is low - the people time to get where you want to be is high. If you mess up in this evolution - there is a risk that others outcompete you on something that isn’t your core product.
The people time of setup in a colo or managed provider is trivial if you hire someone who actually know what they're doing, and usually lower than a cloud setup. It's stupidly easy to mess up a cloud setup in ways that are hard with a rack.
I find this is mostly an issue with a lot of shops simply having nobody with actual ops experience.
1) It's not like cloud databases are problem free. You can have very non-trivial issues, and they especially have nonstandard (and very nontrivial) transaction semantics. I'm not saying this is necessarily a problem, but unless you're on top of it. You need someone with real database administration experience.
You especially CANNOT "START TRANSACTION; SELECT * FROM products; SELECT * FROM five_other_tables; UPDATE sales ...; COMMIT TRANSACTION" (so all product prices and ... are included in the transaction "and nothing goes wrong"). It does not scale, it does not scale on postgres, it does not scale on Aurora, it does not scale on Spanner, it does not scale anywhere. It will scale on MySQL and MariaDB because it will just ignore your SELECT statements, which will cause a disaster at some point.
2) The big problem with cloud databases is the same as with normal databases: let bad developers just do what they want, and you'll have 40 queries per pageload, each returning 20-30 Mb of data. You NEED to optimize as soon as you hit even medium scale. And, sorry, "just slap a cache in front of it" is very likely to make the problem worse.
Cloud Hyperscalers salivate seeing this, have simply said: "don't pay an annoying database/system admin to tell you to FIX IT, simply pay us 10x what you pay that guy".
There's 2 kinds of people. There's people who look at the $10 pageload as both a technical embarassement and a potential business disaster. And there's business people who build hyperscalers, dine the CTO and hold a competition for a $100 pageload, while ordering their next Ferrari.
3) There's nothing automatic about automatic sharding. And if you "just shard UUID", your query performance will be limited. It will not scale. It just won't, you need to THINK about WHAT you shard, and HOW. No exceptions.
4) you'd be surprised how far ansible + k8s + postgres operator goes (for both CI/CD and distributed databases)
Once in my career, I did a high 7 figure migration from cloud to colo.
Our startups lead investor, was invested in a few firms that aimed to make a cloud experience on-prem. So we enjoyed strong board support, and had a strong team. On paper, the migration saved 10-20x its TCO over 3 years. We weren’t flying by night, and used high quality colo facilities near the cloud.
Everything went smooth enough at first, the hardware was late by 12 weeks - and there were some minor hiccups in the build. Time to turn-off of the original infra was maybe 2 months longer than planned. All in all 3 months of net delay where we were running both stacks and the financial clock was ticking. We even had industry press and tech talks on the success of the whole effort.
Now here is where problems actually started, 1 year post-build - folks wanted SSDs. While I fought to get SSDs in the initial build, I was overruled by a consultant for the main db vendor we had at the time. It’s tough to ask for 500k+ in SSDs with 1/4 the storage volume of spinners if everyone else thinks the drives are a waste.
2 years in, the main cloud vendor dropped prices by north of 50% for the hardware we had bought. To make matters worse, the service team wasn’t using the hardware we had to full effect - usage was around 25%.
By year four contracts weren’t renewed and the workload was migrated to cloud again. In total, we had probably saved 10-25% vs. aggressive negotiations with the cloud vendor. We could probably have beaten the Colo price if we had aggressively tuned the workload on the cloud. However there were political reasons not to do this. If you add in opportunity cost and the use of VC funds… we were probably negative ROI.
Given this backdrop, I never worked on colos again (this all went down 10+ years ago).
So you had a badly planned transition, did it in a single big switch (why?), overprovisioned when already having cloud experience instead makes you perfectly placed for under provisioning, and it sounds like you paid up front for no good reason?
Sounds like lots of valuable lessons there, but assuming from this that cloud was better for you seems flawed.
1. Don't do migrations at once. In a large deployment you likely have multiple locations anyway, so do one at a time, and/or rack by rack.
2. Verify usage levels, and take advantage of your ability to scale into cloud instances to ensure you only provision tiny headroom on your non-cloud environment. Doing so makes the cost gap to a cloud environment much bigger.
3. The advantage is being able to tune servers to your workload. That you had to fight for SSDs is already a big red flag - if you don't have a culture of testing and picking based on what actually work, yeah, you may actually be best off in the cloud because you'll overpay by a massive factor no matter what you choose and in a cloud you can at least rapidly rectify the repeated major mistakes that will be made.
4. Don't pay upfront unless you're at a scale where it brings you discounts at a return that makes it worth it. Rent or lease to own. For smaller environments, rent managed servers. The cloud provider dropping prices 2 years in shouldn't matter because by year 3 you should have paid off the hardware at an average cost well below anyway, and start cycling out hardware as your load requires and as/when hardware fails and/or it's not cost effective to have it take up space in your racks any more. Depending on your setup, that might be soon, or you might find yourself still running 6+ year old hardware you paid off 3 years ago for many tasks.
If you can't beat cloud on TCO, you're doing something very wrong. I use cloud for lots of things that aren't cost sensitive, but there are extremely few scenarios where it is the cost effective option.
Hardware beyond 3 years starts to be inefficient due to the space and power cost. paid off servers aren't free :) Otherwise, we did mortgage the servers in the backend - buying upfront gave us around a 50% discount on top of the steep discounts you get from buying direct from an integrator vs. Dell et. al.
On the rent-to-own topic, the people time cost of the migration was budgeted at around 10-20% of total TCO if I recall correctly, but we were in the realm of "we will have to hire people to maintain this project at 24/7 uptime". As the people time was relatively fixed to manage a sane on-call rotation - shrinking the infra footprint would have simply increased the people time portion of the TCO. If the footprint had shrunk by 75% - the maintenance cost would have become problematic. Due to the higher intensity work of build outs vs. maintenance, the people time factor would have risen had we started in more experimental amounts. As it happened, tech salaries also rose ~2x over the 3-5 years that Colo existed.
On the workload front, as happens with many large organizations - there are teams who have the attitude of "don't touch my stuff". At the time that the buy was initiated, there were several estimates indicating that we would need 4x the hardware we were buying in 5 years time.
Whenever someone says that they are beating cloud TCO, I'd suggest you do the following math.
- Sum up all costs related to the on-prem hardware, include smart-hands, power, Racks/network gear, power adaptors.
- Calculate a depreciation schedule for the existing hardware targeted to 3 years.
- Add a 10% cost of capital to depreciate future savings
- Amortize all "non-server" hardware costs onto your principal bottleneck (be it CPU/Network/Memory/Storage)
- Project how your savings compare against cloud costs under different assumptions of future cloud discounts. Both negotiated and public.
- Project how your savings compare under different utilization assumptions.
- Add a bus factor into the investment, what happens if your team leaves/hardware gets smashed/Workload becomes more efficient/Other crises occurs
Next, and more controversially
- Add in the people cost of everyone involved in the maintenance of your on-prem infrastructure.
- Project what happens under varying assumptions for how much of a raise they will ask for. What happens if tech compensation rises by another 2x over the next 3 years?
- Explore what happens if on cloud you needed X% fewer engineers. Depending on what you are doing X could be 0%, or even a negative percentage - but for many shops some fraction of existing work can be automated or avoided.
It was this math that made me turn away from non-cloud offerings. I still use cloud alternatives for some personal projects - but I don't bill myself in that setting.
> Hardware beyond 3 years starts to be inefficient due to the space and power cost. paid off servers aren't free
Hence my caveat, but racks are typically rented at a fixed price per rack, and most places never even reach a scale where they need a full rack per location, so my experience is that for most people it pays to keep servers quite a bit longer because most people have spare space in racks that are already being paid for.
Once you're at a scale where you typically will have whole racks aging out at once, it shifts the calculation somewhat, but then to it really varies greatly depending on e.g. your balance between storage and compute. It's very rare it pays to throw out hardware on the 3 year mark, except in markets where the power and real-estate cost is unusually high, but then moving your hosting wholesale often pays off - e.g. I've in the past moved entirely workloads from London to cheaper locations.
> Otherwise, we did mortgage the servers in the backend - buying upfront gave us around a 50% discount on top of the steep discounts you get from buying direct from an integrator vs. Dell et. al.
Either you loan-financed the hardware or the earlier mention of the "use of VC funds" was irrelevant, then. And nobody gives you a 50% discount for paying up front. You might have paid 50% less than the full cost of the purchase price + interest rate, sure. That's not a discount, that's not having to pay interest. In the end, whether you buy on credit, rent, or lease to own, you either way get a cost-curve per server per month, and that is what is relevant to plug into your models.
> On the rent-to-own topic, the people time cost of the migration was budgeted at around 10-20% of total TCO if I recall correctly
Migration is a one off, so this only makes sense given a time frame to write it off over. 10%-20% written off over 1-2 years wouldn't be completely crazy, though high. If your time horizon is so short that this matters to you, then you have organizational problems.
Put another way: While I did consulting in this space, I'd often offer to do the transition for clients for a percentage of their savings for the first few months, because I knew exactly how much these transitions would take us, and the clients would take a look at the proposals and accept my hourly rate instead when they realised how much they'd save, how quickly.
Reduction in egress fees alone often paid for my fees in couple of months (one system I migrated, which was admittedly atypical, saw hosting costs drop 90% thanks to egress fees alone), and we usually saw devops costs drop at the same time. Most of my clients outsourced 100% of their devops so the costs were easy to quantify.
> If the footprint had shrunk by 75% - the maintenance cost would have become problematic.
This is backward thinking. If the footprint drops by 75%, the cost for that drops. If your maintenance costs don't drop as fast, it doesn't matter - your total cost is still lower, but if your maintenance cost isn't elastic, you have an organizational problem.
And while whether or not the proportion spent on each factor then changes might be a political consideration, but if you then surrender savings because the budget "would have become problematic" then it's no wonder you ended up with a failed transition - if there are incentives to avoid savings to maintain numbers that were broken from the outset. This sounds more and more dysfunctional to me.
> Whenever someone says that they are beating cloud TCO, I'd suggest you do the following math.
Done all of these many times, and never once had cloud come out remotely competitive.
To your "controversial" point, what I usually see when people think their cloud setup is comparative is that they carry out no accounting of how much time they actually spend maintaining cloud-specific things. When they get to the point of handing it over to someone specializing in it (as I did for years), it's often a surprise to them just how much time they offload from their teams.
A few of the other things that seem to shine through here is 1) an assumption of capital outlays. Not needed capital outlays for coloed environments in any setup I've done in the last 20 years - did it for a handful before that; the cost of financing directly with the provider or via a leasing company is priced in when I compare costs with cloud because otherwise it wouldn't comparable. If you then want to pay upfront, that's a choice, not a necessity unless your credit is absolutely worthless.
2) Comparing only colo vs. cloud instead of adding in hybrid or managed hosting. If you build a pure self-honest environment a lot of the price advantage gets eaten up because you need to assume a far higher amount of spare capacity, which will drive up your cost even if you get it right, but people often end up far too conservative here and assume peaks far higher than what they ever need.
The moment you have a hybrid setup that can scale into cloud instances as needed, which is typically little extra effort to set up (you're going to be using an orchestrator anyway), you can easily double (or triple, if people were being conservative) the typical load on your colo servers, and cut the hardware and rack cost accordingly, and usually when people do this they still end up hardly ever actually spinning up cloud instances, because most peoples traffic varies far less than they'd like to think.
Even more so given there are now plenty of providers that offer you a seamless transition from colo, via managed server, to vps's to cloud instances, with often surprisingly little difference in time to spin up extra capacity. Your setup just has a method to register a new resource anyway, irrespective of what is underneath - I've deployed systems that way for nearly 20 years, now, with hybrid setups spanning the gamut from colo, via managed server, VPSs and AWS instances in a single setup.
The net effect tends to be to have to defend retaining the capability to scale into cloud because of how rarely it ends up being used.
3) an assumption that you need to hire people vs. e.g. outsourcing. Most companies never reach a scale where they need even a single full-time person doing hands-on ops - you're better off leaning on colo support, and retainers for monitoring and out-of-hours support for fractional scaling until you reach a scale where staffing several dozen full-time staff becomes viable. I've never had a problem scaling this up/down on an hour-by-hour basis with commitments for base-level needs on a month-by-month basis. For years I used to provide fractional support for companies to facilitate this type of thing.
4) an assumption that you can't automate the same things in a colo environment as in a cloud environment. For a well-managed colo environment, past racking hardware, if you can't boot the system via IPMI etc. straight into an image that automatically enrolls the server in your orchestrator, you're doing something wrong. If your cost of physically managing your servers are more than a rounding error, you''re doing it wrong.
Yet, the flexibility of public cloud environments is like the gold ticket for people doing devops consulting - when I was consulting in this space, the one constant was that the clients in cloud environments ended up paying me 2x-3x as much for assistance for similar size and complexity workloads. And I still usually cut their costs significantly compared to what they used to pay. E.g. the time that goes to maintaining network setups in a typical cloud setup that's solved by plugging things physically into isolated switches is staggering. Yes, usually people could do it cheaper than they are in public cloud setups, but the risk of getting it wrong also tends to be far higher.
I don't get why you'd do this. This sounds like a false choice. If you want a company to deliver racks with hardware to you, plenty of companies do exactly this. You don't need cloud for that.
The choice is not between "do all hardware and building" and "cloud". There's a whole spectrum in between. In fact, it's quite hard to do the building. You can go colo (building + power + electrical done by vendor, optionally network), dedicated (servers done by vendor), hybrid hosting, hyperscaler (you get VMs), ...
As soon as you move above colo, you don't have the hardware issues mentioned here. Or, well, you do have issues, but you "just" file a ticket.
I wrote an orchestrator from scratch in the pre-K8S days, and even factoring in the opportunity cost of that colo and eventually Hetzner was far cheaper than cloud, to the point where - and we costed this out yearly - the company wouldn't have survived a year if we had to pay AWS level prices for infrastructure.
Cloud is great for many things, but not cutting cost.
In the case of fly.io it’s a little different because they have a anycast network with traffic being accepted at the closest region and being proxied to your running VM. This is not unlike a CDN, and if you look at the North America/Europe pricing of most CDNs, fly.io is quite similar.
I believe Cloudflare is in the long-haul of commoditizing their competitors and making their profit through other auxiliary services.
Also, on the plan routing is atrocious, with all requests from Africa and Asia being directed into Europe, which helps keep costs down, but it cannot then be compared with fly.io.
If I'm not mistaken, this is copied from the OP, which is pointing out that you get 100GB/mo free (that's the "100GB/mo" part). The $92.16 cost is for 1TB of egress. It's clearer in the original article.
Not even close. I think a 1gbps ip transit hookup at a data center goes for $200-400 a month right now. And you can push 192TB a month with that. That's not the bulk cost, they buy larger amounts in volume and probably get a huge discount on it. In fact given their size, they might not pay for it at all (peering agreements), so it's just operating costs.
A good chunk of network capital is also amortized into these costs, it isn't just the 20 cents per megabit for IP settlement, it would also be all the routers those ports are connected to, the racks those sit in, the electricity those consume, and the networks beneath that from the hypervisor upwards.
Add on the usual and egregious $150-400/mo for a SMF/OS1 crossconnect in many datacenters.
Now, I'll grant you, "not even close" and handsome margins likely still applies but "I'm going to hook a server directly into a 1Gbps IP transit port for $200" vs. all the complexity of a hyperscaler network is not apples to apples.
> A good chunk of network capital is also amortized into these costs, it isn't just the 20 cents per megabit for IP settlement, it would also be all the routers those ports are connected to, the racks those sit in, the electricity those consume, and the networks beneath that from the hypervisor upwards.
Some of that is capex, and some is opex. But it's worth noting that the hyperscalers are doing something far, for more complex than almost any single-tenant physical installation would do.
In a hyperscaler cloud, I can run an instance anywhere in the AZ, and I can connect it to my VPC, and I get what appears to be, functions like, and performs as if I'm plugged in to conventional network infrastructure. This is amazing, and the clouds don't even charge for this service as such. There are papers written about how this works, presentations are given, costs are bragged about, and entire industries of software-defined networking are devoted to enabling use cases like this.
But an on-prem or normal datacenter installation doesn't need any of this. Your favorite networking equipment vendor will happily sell you a single switch with tens of Tbps of switching bankwidth and individual links that are pushing 1Tbps. It takes maybe 2RU. If you are serving 1 billion DAU, you can serve up a respectable amount of content on that one switch. You could stick all the content in one rack (good luck at that scale), or you can scale across a few racks, and you don't need any of the amazing things that hyperscalers use.
And, realistically, very very few companies will ever need to scale past that in a single datacenter -- there aren't a whole lot of places were even 100% market penetration can find 1 billion people all close enough to a single datacenter that it makes sense.
So the apples-to-oranges comparison cuts both ways. You really can get a lot of mileage out of just plugging some off-the-shelf hardware into a transit port or two.
Notably, anywhere in the AZ is simply a launch constraint, and the hyperscaler can and does place you in the most advantageous way for them which matches against the 'customer promise' (constraint). They in practice do model both the volume and cost of East-West traffic in their network topology and will wherever possible not place Instance A and Instance B all that far apart in the AZ, particularly if that AZ is one of the big ones with 20+ datacenters worth of capacity.
The important takeaway message I'm trying to deliver is that AWS is comically overpriced regardless of what they are trying to run. Direct connections aside, reasonable cloud competitors with similar networks are charging $0-1000 for transit and the same amount would cost $8500 on AWS. Cloudflare in particular has demonstrated that you can run these systems so efficiently that they can charge nothing for egress and still make a healthy profit selling related cloud services.
While I agree with what you said in general, Snowflake is a poor example. Data warehouses like Snowflake really can use the Tbps+ aggregated bandwidth between S3 and EC2 in the same region. There is no way for this to work over the Internet.
Snowflake and most other cloud services offer on-prem for these use cases, so it really doesn't make sense for them to roll their own data centers because it would be pretty niche. Cloud works for startups because they are there already, on-prem for enterprise customers with their own hardware
It's an interesting point, but I doubt it's the lions share of egress that's going to other data centers vs to customers. Fan out is where it gets expensive.
That's not the point. The point is that if the customer and SAAS provider "happen to" be in the same cloud then nobody pays any egress costs, but if the provider operated out of their own datacenter then the customer would pay additional egress costs to AWS to export their data to the SAAS provider. Snowflake is a good example because by the nature of the business customers send them a lot of data. This not only makes the customer pay more, but it also makes their billing and cost accounting much more complicated and unpredictable.
Their list price for storage capacity is only on the order of 2x what S3 charges, and Snowflake and S3 likely both offer discounts. Comparing compute costs is harder.
If I were running a service like Snowflake, I would certainly appreciate the effortless scaling that the major clouds offer. But I also know what I pay for actual servers and roughly what I would pay in a major cloud, and I much prefer the former.
Yeah, storage is much more competitive, but there have been leaks on what hardware they run on, their margins are incredible, which makes sense since they are selling software, not hardware!
tbf, snowflakes whole thing & what makes them "next gen" vs a traditional datawarehouse is that you dont may much for data at rest. You pay as little as possible for the "warehousing" part, and pay mostly/only for compute/querying.
just with DataBricks. you don't pay when you don't use their software, cool. but then you pay for per core per second for the license in addition to running the VMs, right?
sure S3-like storage is 'cheap', but if you want up-to-date data pipelines you'll end up running these things for many hours each day. and plus you pay for each bucket operation, and so on.
“Let’s get rid of IT and stop using on prem software. Outsource it all to the cloud and SaaS. We’ll save so much money!” … said the entire world pretty much …
This is hilarious. Now that nobody makes on prem software anymore and nobody knows how to do IT, it’s time to turn the screws. The prices are only going up. You can’t move your data because it’s all locked into proprietary silos and only readable by one app running in one place.
I saw a hilarious fuck up a few months ago. Company sets up an AWS hosted always on VPN solution. Connects 1000 staff through it. Celebrates how they saved $50k on the VPN solution. Gets $25k AWS bill for the just the first month of egress traffic. Turns out the data was leaving AWS egress three separate times.
It was not an AWS product before or after. It was a SaaS VPN before and a vendor product running on EC2 after. The TCO for a year was $50k before (licenses mostly) and it was replaced with a total cost per month of $25k (licenses, EC2 instances and egress bandwidth) because of misunderstanding of how it worked. As the concentrator and egress gateway were separate components they deployed them in the home region for the org. The end was an egress damage multiplier because local traffic was routed globally to a different AWS region over public internet and then went egress.
There was also a global policy rolled out which pushed the VPN client to various infra machines in branch offices. Some of them were running GitHub runners and data processing software which had terabytes of egress per month.
The whole thing was a disaster and they rolled it back within a week.
Likely they are mixing timelines (one time cost or yearly cost with AWS’s monthly charges).
Cisco anyconnect VPN capable appliances (that can do 10GBE) are very expensive, and licenses are per user- so if an appliance needed an upgrade it is conceivable that it could cost $50k in the first year.
This. It wasn’t Cisco. Far crappier vendor! I don’t want to name names but they had a few high profile incidents that suggested their software was written by idiots. There was a panic move by the security people to switch away and still tick the box but they didn’t really understand the billing and architectural models of AWS.
Pricing was around 75k for 1000 seats for a year. They thought it was going to be $25k a year but it turned out to be that a month.
If you still want to use some AWS services, you can get an AWS Direct Connect fiber cross connect from your data center rack to AWS, just like you do with your Internet connections. They operate Direct Connect in lots of third party carrier-neutral data centers. AWS egress over Direct Connect is $0.02/GB instead of $0.09/GB over the public Internet. You can serve customers through your unmetered Internet connections while accessing S3 (or whatever) via Direct Connect on the backend.
I can pay overpriced cross-connect rates in giant name brand datacenters, with or without terminating one end at Direct Connect. (AFAICT the $1600/mo or so for 10Gbps doesn’t actually cover the cost of the cross-connect.)
But that extra $65k/mo to fully utilize the link is utterly and completely nuts. My mind boggles when someone calls that a good deal. I can buy and fully depreciate the equipment needed to utilize that in a couple of days. (Well, I can’t actually buy equipment on the AWS end, but I can _rent_ it, from AWS, for a lot less money than $64k/mo.)
And I don’t believe at all that it costs AWS anything close to this much to operate the thing. I can, after all, send 10Gbps between two EC2 instances (or between S3 and EC2) for a lot less money.
That $65k is simply a punitive charge IMO. AWS wants me to avoid paying it by not doing this or, in cases where I can’t avoid it (e.g. the thing is actually a link to my office or a factory or whatever) to collect as much money as they can without driving me off AWS entirely.
i'm a customer of the GCP equivalent: partner interconnect. our DC is in an equinix facility, they wire up drops for us that layer 3 straight into GCP. unmetered 1Gbps for about 250 bucks a month per (paid to EQX not GCP). are AWS really charging you per Gb for data egress from AWS into your own DC over an AWS direct connect??
I'm not familiar with GCP but this page below seems basically identical to AWS Direct Connect's pricing structure, including $0.02/GB for egress. Are you sure nobody is paying egress on your behalf?
It really depends on the quality of the peering you expect. It doesn't matter, until it does. Consumer ISPs sometimes do their utmost to not peer with open exchanges, and the entire thing gets even more complex when you go to places where bandwidth is more expensive (i.e. Oceania).
There's a reason the favorite chart to exemplify value Cloudflare reps like to show is Argo Smart Routing, and why it costs about $100 per TB just like AWS and GCP.
I agree, and I would also put forward that most people don't understand what peering is or how it works. When people (usually developers who are not network engineers and have not worked at that level of the stack) talk about "egress", they mean delivering bits from your network (cloud or otherwise) to any other network on the internet. How can you put just one price on delivering a bit either to a host within the same datacenter or one on the opposite side of the planet? Physics still mean that one is more expensive than the other.
The existence of the world wide web has tricked us into thinking that sending traffic anywhere is/should be the same, but of course it is not. So while the price you (a cloud customer) pay for egress pricing is (often) indiscriminate on where that traffic is going, using common sense, we can understand that some traffic is more expensive than others, and the price we pay is a blended price with that aspect "baked in" or "priced in".
Like to see the data on that! Would it really be cheaper if they charged granular by cost like old telephone plans where they list each country and c/min + connection charge?
We are using Argo for our xhr search traffic as it makes more sense than setting up different servers/vms in parts of the world. Each request is only 1kb max.
But I would not use it for static assets. For this we use Bunny edge storage to provide faster response times at very reasonable prices.
Definitely lucrative enough. The use case you've described isn't particularly uncommon, but lots of companies just pay for the egress.
The problem is that there are now multiple generations of software engineers that do not know how bandwidth is priced. They've only used managed providers that charge per unit of ingress/egress, at some fractional dollar per GB.
I’ve had people refuse to believe that bandwidth is actually very cheap and cloud markup is insane (hundreds or even thousands of times cost).
I show them bare metal providers and colo that bills by size of pipe rather than transfer. They refuse to believe it or assume there must be a catch. There usually isn’t, though sometimes the very cheapest skimp on things like rich peering and can be slightly slower or less reliable. But still cheapest is relative here. Expensive bare metal or colo bandwidth is still usually hundreds of times less than big three cloud egress.
It’s just nuts.
It’s a subset of a wider problem of multiple generations of developers being fully brainwashed by “cloud native” in lots of ways. What an amazing racket this all has been for providers…
Cloud providers have done an incredible job creating an entire generation of people who have no knowledge of such things.
$0.09/GB? I guess that’s just what it costs. Expected when you’ve never looked at, considered, or even heard of things like peering and buying transit or even co-location. Enter 95th percentile billing on a gig port for $500/mo or whatever…
Same goes for hardware. Want to watch something glorious? Take someone who has only ever used VMs, etc in big cloud and give them even a moderate spec new bare metal server. The performance melts their brains.
Then they realize the entire thing - colo, servers, bandwidth, even staff is a fraction of the cost.
A big part of the problem is that software development has wrapped itself around cloud native. Sure that metal server has brain melting performance for nothing, but try provisioning it or running it. Nobody knows how and we are dependent on stacks like K8S that require an ops team to run.
The industry really found a way to stop the downward cost spiral of personal computing and bring everyone back into mainframe. Moore’s Law is not dead but you no longer see the benefits. The difference is just pocketed by cloud providers who will use the gains to pack more low performing VMs onto less hardware.
Any recommended reading on how this stuff truly works? I feel like I understand the core protocols (tcp, ip, bgp, etc) not not how the internet is ‘built’
Also very few people buy Equinix transit except maybe as a backup.
You buy space in Equinix DCs because they are network neutral and have a high density of ISP POPs, i.e you can light up fibre between your cage and HE, Level3, whatever other transits you want to use + peer with everyone on the local peering fabric super easily.
I consider Equinix transit to be an option only when you really can't be assed to do anything else and you don't intend to hammer the links anyway, otherwise yeah you should be doing all the things above because otherwise you could have rented colo in some shitty backwater DC instead.
> It’s a subset of a wider problem of multiple generations of developers being fully brainwashed by “cloud native” in lots of ways. What an amazing racket this all has been for providers…
No 1 believes they can do anything anymore. It's not just hosting and hardware either. Everything single thing becomes a SAAS that gets purchased and they all refuse to believe it can be done in-house.
> It’s just nuts.
It's exactly that. I've seen upper management in many companies value external solutions as more cost effective because if they had to do it internally it'd cost 2x as much.
My feeling is that egress is easily measured, so it's where costs that are hard to assess get moved to.
It doesn't feel great to be line item billed for stuff at 10x the rate of credible other offers.
I think there is also some geo-specific pricing that gets hidden in a global price; bandwidth can be a lot more expensive in some locations than others and if you are charged 5x for egress in south america, nobody will use the south america locations and that's not good for business.
Right. Egress is an imperfect, but reasonable metric for overall utilization. If they started charging for CPU hertz above a certain threshold, that'd be a harder sell.
I don’t believe this. Operating an internal cloud network is expensive, but it’s expensive because of internal traffic, and they don’t charge for that internal traffic. Egress is just like traffic to any other system, and AWS doesn’t charge for that.
Also:
> It doesn't feel great to be line item billed for stuff at 10x the rate of credible other offers.
The inter-AZ charges kill me. They’re charging you for traffic over what is almost certainly private fiber, of a distance no more than 50 miles or so. Just absurd.
Then, they have the audacity to recommend you make everything Multi-AZ. “Don’t worry, RDS-RDS traffic isn’t billed!” Yes, but the apps running in us-east-1a that are hitting -1b and -1c due to load balancers most assuredly do bill for traffic.
What’s extra special is that the cloud providers are now disincentivised from making those load balancers zone topology aware. If they made them efficiently pick the same zone whenever possible, they would be missing out on cross-AZ traffic billing, which is basically a money printer.
BTW, if nothing changed the last year or two, then even if you want to run in a single AZ, but also to have (automatically) replicated RDS instance, then I believe the replica MUST be in another AZ. Meaning that any potential failover puts you in exactly that position you have mentioned - apps running in one AZ, database in the other. Suddenly you pay 0.01USD/GB for transfer between those, billed in each direction separately.
Correct. The same thing also happens if you use the Blue/Green functionality to do an upgrade – unless you specify (and even then, I'm not sure if you can do so), the Blue instance will be in a different AZ than Green.
Are you a Google Cloud customer looking to exit Google Cloud? If so, you are eligible for free data transfer when you migrate all of your Google Cloud workloads and data from Google Cloud to another cloud provider or an on-premises data center.
180 TB is still a small customer in the scale of big cloud providers, so they probably just don't care. If the customers are willing to pay the price they are taking their money, if they go to one of the smaller providers they are fine with it too.
Also it could be possible to set up some hybrid solution and offload the egress heavy assets serving to another provider and only run the "brains" inside AWS/etc.
> People use AWS for the same reason they rent houses: less investment required and you can change things more quickly.
Yes, exactly. And it's very often the case that these capabilities are more valuable to users than "real price" as you mean. You... understand this, right? That value is measured contextually, not objectively?
Our 20tb/m costs us $414... if you're saying that you can amortize those servers, rent the space and pay for upkeep for less than $4k a month?
We have another service that has 20x the bandwidth, but it's a legacy GAE that has super cheap premium egress... But I'm told that AWS says that their pre-pay discount will be competitive in about a year, the rate that we're growing.
The negotiated egress prices are much lower, so long as you are buying future egress. If they're not worried about you jumping ship (you use a lot of their aws-native services), you can get a great deal.
I could buy a whole off lease fully deprcriated server and a year of colo with a gig link for under $4,000, yes. I would almost certainly see better performance than any cloud provider can give me too.
The AWS Enterprise Discount Program apparently requires $1M per year spend. 180TB is about $13k on AWS so presumably not enough to be interesting to them. Hopefully someone who works at AWS can share some info.
EDP can be great if you can meet their required year over year growth requirement and if your spend is high enough to make the discounts (which are variable and negotiated often at a product offering level, not blanket) offset the required top-shelf support contract. For smaller orgs even at the $1-2M/mo level, it can often be a risk not worth taking vs other cost savings mechanisms and locating high-egress use cases elsewhere.
Egress bandwidth pricing has been the sacred cow money firehose forever, despite transit costs continuing to shrink.
Yeah, I've built several setups where total cost of ownership was lower than egress cost alone would have been on AWS. Both w/physical hardware in colos and with rented managed servers at providers like OVH and Hetzner.
Context matters here. How critical is that workload? What the economic and reputational impact for the company, if one of the physical connections or some technical problem with the data center causes a downtime of hours or days?
looking at the uptime from aws and from most big outtage notices i have read in the last few years, there does not seem to be a benefit in regards to reliability when using cloud.
see reddit, see amazon/aws outages taking with them netflix/disney plus etc
honestly its a lot better to keep your architecture cloud agnostics and test restores regulary on a different provider/region
also: store your backups somewhere else, not on your primary cloud provider, not connected to your auth system, not overwritable from anywhere
I am not aware of an AWS outage in their 15-16 years of existence, that an architecture, built according to the recommended best practices, of distribution across availability zones and regions, would not be able to withstand. I am willing to be proven wrong. Can you provide one example?
Of course, these come with increased cost, but I am thinking web retail on a large scale, or airline companies, where for example a downtime of a few hours will easily wipe out any savings made by relying on a local data center. It might not be the solution for a smaller company.
I guess you did not have any service running on aws winter 2021? Does not really matter which region, with certain services you could get lucky but basically everything depending on fresh iam creds was down
And if netflix/disney/slack/ring amazon themself can not do it, with a multiple with the ressources i have access to, good luck to you.
there are also times where your company might rely on a single aws product, and it might simply break an integration(aws iot mqtt ->sqs for example) took them 2 days to fix
you need an oncall team be it cloud or not, taking one fte position and dedicating it to managing 2 servers(documentation/updates/backup & restore procedures testing) seems rather.. high
Small example: an actual company I do some work for is in the business of delivering creative assets to distributors. This results in an egress of around 180TB per month, which is, on average just, around 500Mb/s.
So, this company currently operates 2 racks in commercial data centers, linked via 10Gb/s Ethernet-over-DWDM, with 2x512Mb/s and 1x1Gb/s Internet uplinks per DC. Each rack has 2 generic-OEM servers with ~64 AMD Zen cores, 1/2TB RAM, ~8TB NVMe and ~100TB SAS RAID6 storage per node.
Just the cost-savings over egress on AWS is enough to justify that setup, including the cost of an engineer to keep it all up and running (even though the effort required for that turns out to be minimal).
So, are cloud providers ignoring a significant market here, or is the markup on their current customers lucrative enough?