Under the Hood of Amazon EC2 Container Service

samstave · on July 20, 2015

The biggest issue that I had with ECS is that you need to initially create EC2 instances to put into your ECS cluster, using the AMI such that they have the ECS agent on them... BUT you have to prescale that cluster manually thereafter.

In the task definition - it would be MUCH better if you could select the EC2 instance type you want, collect them in an ASG and have the task slicing scale the ASG accordingly.

Right now - you have to manually determine the slice size for each container-to-EC2 and manually scale the ASG.

Further, it was noted that ECS is actually NOT AZ aware and it will spread load over EC2 instances in the pool -- but it wont also balance the tasks across AZs....

So, its a fantastic version one... but these are some resiliency and scaling features that should have already been included.

werner · on July 21, 2015

Nope, we have built a new scheduler for you that will allow placement over multiple AZ's, replace failed containers, allow them to connect to ELB's, etc.

2461001642 · on July 21, 2015

Thanks for responding! Can you say how you deal with the significant latencies over multiple AZs?

Also, would you mind answering my other question about whether or not I can run Marathon or Chronos against ECS since it runs Mesos under the hood?

werner · on July 21, 2015

Can you elaborate on the intra- inter-* latencies your are seeing? for AZ independent services this should single digits.

2461001642 · on July 21, 2015

When I'm balancing a single deployment across multiple AZs (e.g. US-East -> US-West), the latencies between the containers seem far higher than just the 200-300ms predicted by speed of light. Am I doing something wrong?

werner · on July 21, 2015

No mesos under the hood. You can bring mesos as your own scheduler.

2461001642 · on July 21, 2015

Thanks again - this is really helpful. I had talked to someone who had left Amazon but knew the internal workings who said ECS was Mesos just privately branded like Chef -> OpsWorks, but I guess I must have misunderstood.

Thanks for clarifying!

samstave · on July 21, 2015

Thanks - All my info is from the containerization pop-up you guys held last week in SF... The presenters did not seem to know about this.

nogox · on July 21, 2015

check out hyper.sh, this is the future of public CaaS. After all, you don't need EC2 to host your containers, if you can run them directly with a hypervisor.

osipov · on July 20, 2015

I've been using IBM Container Service. They have Docker containers running on bare metal servers with a free tier and a trial account for 30 days. With respect to clusters, you can set up a scalable group with min/max sizes and they'll take care of routing across the containers in the group.

nogox · on July 21, 2015

How do they handle the isolation in a multi-tenant environment?

Pyxl101 · on July 21, 2015

> Right now - you have to manually [...] scale the ASG.

You have to manually scale the AutoScalingGroup, you say? :-)

More seriously, what's the barrier to configuring the ASG to grow and shrink automatically?

samstave · on July 21, 2015

Heh, yeah basically you can create an asg, but ecs has no trigger to scale it based on the number of task containers you attempt to launch on the cluster....

There is a workaround, apparently, which is to create a custom metric but aws says this has not been tested to their knowledge... So any asg will be "static" implicitly...

djb_hackernews · on July 21, 2015

That is definitely inaccurate. We have asgs that scale based on cpu load and custom cloudwatch metrics.

With ecs I can imagine a metric that keeps track of the number of tasks and hosts, or ports used or something. I think asg is the perfect tool to use to auto scale ecs clusters.

samstave · on July 21, 2015

Ah, true... you could create custom metrics to trigger on... I was mentioning from the native ECS standpoint...

Also - all my information is only from the one day container session I attended at the pop-up last week... so I am certainly not an SME on ECS.

justinsb · on July 20, 2015

I think it is interesting that AWS seems to be moving to consistent data-stores. Previously they were championing eventual consistency everywhere, even when it made for painful products (SimpleDB) or painful APIs (retry loops when using EC2 APIs).

csears · on July 20, 2015

My impression is that they have several different backend data stores, each with different trade-offs and consistency models, and they choose the one that makes sense for each app/service.

The way Werner describes the ECS data store sounds very similar to Google's Megastore:

To achieve concurrency control, we implemented Amazon ECS using one of Amazon’s core distributed systems primitives: a Paxos-based transactional journal based data store that keeps a record of every change made to a data entry. Any write to the data store is committed as a transaction in the journal with a specific order-based ID. The current value in a data store is the sum of all transactions made as recorded by the journal. Any read from the data store is only a snapshot in time of the journal. For a write to succeed, the write proposed must be the latest transaction since the last read.

noelwelsh · on July 21, 2015

I believe raw eventual consistency has failed as a programming API. I believe CRDTs in their many incarnations provide a great alternative but i) CRDTs are quite new and ii) they require a more complex API.

larsmak · on July 20, 2015

I don't know, S3 and DynamoDB are both eventual consistent. And keeping in mind the CAP-theorem it makes sense. And I for one love SimpleDB - it's just that, simple. And great for prototyping (really cheap) and small production-loads. Often you just need a place to stick your data, scalability can be achieved to adding a caching layer.

werner · on July 21, 2015

You can choose eventual or fully consistent in DynamoDB. Given that full consistency comes at a higher cost (read from a quorum of replicas) we expose that cost to you.

BTW nobody wants eventual consistency, it is a fact of live among many trade-offs. I would rather not expose it but it comes with other advantages ...

justinsb · on July 20, 2015

They have actually been making S3 _more_ consistent over time: in the newer regions you get e.g. read-your-writes for object creation. DynamoDB also supports consistency, though still defaults to eventual consistency if you prefer.

In my mind, there's definitely a trend towards consistency here. I'd love to see an AWS blog post about the reasons behind this!

zorked · on July 20, 2015

We should be glad. Eventual consistency is hard to reason about, particularly when its tradeoffs have to do with other people's systems...

dbarlett · on July 21, 2015

US Standard now provides read-after-write consistency when accessed through the Northern Virginia endpoint [1].

[1] http://aws.amazon.com/s3/faqs/#What_data_consistency_model_d...

x5n1 · on July 20, 2015

i think you can force consistency on DynamoDB for a price.

rdtsc · on July 20, 2015

What are some other examples.

In general consistent vs available (or neither, that's possible too of course) is a trade-off and you'd want to pick one vs other depending on your business case.

justinsb · on July 20, 2015

I know I'm bound to be proved wrong here, but I think _every_ product after the original set (EC2, S3, SimpleDB) has been consistent or now has a consistency option: the major ones are RDS, EBS, EFS, DynamoDB, RedShift, Elasticache, Route53, Kinesis, SES.

Some of those APIs are sort of odd, admittedly, and could be covering up eventual consistency under the covers (Route53 in particular springs to mind there!)

Edit: And S3 and SimpleDB now expose more consistency than they did at launch.

ranman · on July 20, 2015

DynamoDB encourages eventual consistency by charging .5 for each read capacity unit consumed on an eventually consistent read.

justinsb · on July 20, 2015

I consider consistency a bargain at twice the price :-)

Rapzid · on July 21, 2015

A glaring gap currently is security. Per-container IAM roles would go a long way IMHO, but that still leaves "other" secret management which is a PITA. Other options such as kubernetes lack the AWS/ELB integration; all seem to be lacking a good security management model.

TheIronYuppie · on July 21, 2015

FWIW, Kubernetes provides its own load balancer, which you can put behind ELB.

Other than that, Kubernetes works on AWS out of the box, with a one line setup.

Full disclosure: I work at Google, on Kubernetes.

Rapzid · on July 26, 2015

I'm not sure that's what I would call an integration... AWS provides for easy host management and elastic scaling traditionally through the integration of the ELB with autoscale groups, and now with life-cycle hooks. I'm not aware that kubernetes integrates with this stuff in any way or provides a sufficient alternative. Reading through the documentation I was not able to find information about connection draining on rolling updates, taking hosts out of service for maintenance/scaling/replacement, and so on. I am aware that kubernetes will run on AWS now and there is a guide for setting it up.

However this really wasn't the point of my comment, which is that security for application secrets(and AWS API access) is currently a sore spot. It would be nice if kubernetes would adopt some of hashicorps stuff like consul, templates, and vault. Maybe that's too far up the container stack though and a popular bundling of technologies will appear.

kylemathews · on July 20, 2015

How does this compare to Kubernetes?

carterehsmith · on July 20, 2015

I guess the better question would be "how does Kubernetes compares to this"? It's not like Kubernetes is a market leader or anything.

2461001642 · on July 20, 2015

I've heard that it just runs Mesos under the hood, how does this differ than me running Mesos on my own? Can I run Chronos or Marathon against it?