The biggest issue that I had with ECS is that you need to initially create EC2 instances to put into your ECS cluster, using the AMI such that they have the ECS agent on them... BUT you have to prescale that cluster manually thereafter.
In the task definition - it would be MUCH better if you could select the EC2 instance type you want, collect them in an ASG and have the task slicing scale the ASG accordingly.
Right now - you have to manually determine the slice size for each container-to-EC2 and manually scale the ASG.
Further, it was noted that ECS is actually NOT AZ aware and it will spread load over EC2 instances in the pool -- but it wont also balance the tasks across AZs....
So, its a fantastic version one... but these are some resiliency and scaling features that should have already been included.
Nope, we have built a new scheduler for you that will allow placement over multiple AZ's, replace failed containers, allow them to connect to ELB's, etc.
When I'm balancing a single deployment across multiple AZs (e.g. US-East -> US-West), the latencies between the containers seem far higher than just the 200-300ms predicted by speed of light. Am I doing something wrong?
Thanks again - this is really helpful. I had talked to someone who had left Amazon but knew the internal workings who said ECS was Mesos just privately branded like Chef -> OpsWorks, but I guess I must have misunderstood.
check out hyper.sh, this is the future of public CaaS. After all, you don't need EC2 to host your containers, if you can run them directly with a hypervisor.
I've been using IBM Container Service. They have Docker containers running on bare metal servers with a free tier and a trial account for 30 days. With respect to clusters, you can set up a scalable group with min/max sizes and they'll take care of routing across the containers in the group.
Heh, yeah basically you can create an asg, but ecs has no trigger to scale it based on the number of task containers you attempt to launch on the cluster....
There is a workaround, apparently, which is to create a custom metric but aws says this has not been tested to their knowledge... So any asg will be "static" implicitly...
That is definitely inaccurate. We have asgs that scale based on cpu load and custom cloudwatch metrics.
With ecs I can imagine a metric that keeps track of the number of tasks and hosts, or ports used or something. I think asg is the perfect tool to use to auto scale ecs clusters.
I think it is interesting that AWS seems to be moving to consistent data-stores. Previously they were championing eventual consistency everywhere, even when it made for painful products (SimpleDB) or painful APIs (retry loops when using EC2 APIs).
My impression is that they have several different backend data stores, each with different trade-offs and consistency models, and they choose the one that makes sense for each app/service.
The way Werner describes the ECS data store sounds very similar to Google's Megastore:
To achieve concurrency control, we implemented Amazon ECS using one of Amazon’s core distributed systems primitives: a Paxos-based transactional journal based data store that keeps a record of every change made to a data entry. Any write to the data store is committed as a transaction in the journal with a specific order-based ID. The current value in a data store is the sum of all transactions made as recorded by the journal. Any read from the data store is only a snapshot in time of the journal. For a write to succeed, the write proposed must be the latest transaction since the last read.
I believe raw eventual consistency has failed as a programming API. I believe CRDTs in their many incarnations provide a great alternative but i) CRDTs are quite new and ii) they require a more complex API.
I don't know, S3 and DynamoDB are both eventual consistent. And keeping in mind the CAP-theorem it makes sense. And I for one love SimpleDB - it's just that, simple. And great for prototyping (really cheap) and small production-loads. Often you just need a place to stick your data, scalability can be achieved to adding a caching layer.
You can choose eventual or fully consistent in DynamoDB. Given that full consistency comes at a higher cost (read from a quorum of replicas) we expose that cost to you.
BTW nobody wants eventual consistency, it is a fact of live among many trade-offs. I would rather not expose it but it comes with other advantages ...
They have actually been making S3 _more_ consistent over time: in the newer regions you get e.g. read-your-writes for object creation. DynamoDB also supports consistency, though still defaults to eventual consistency if you prefer.
In my mind, there's definitely a trend towards consistency here. I'd love to see an AWS blog post about the reasons behind this!
In general consistent vs available (or neither, that's possible too of course) is a trade-off and you'd want to pick one vs other depending on your business case.
I know I'm bound to be proved wrong here, but I think _every_ product after the original set (EC2, S3, SimpleDB) has been consistent or now has a consistency option: the major ones are RDS, EBS, EFS, DynamoDB, RedShift, Elasticache, Route53, Kinesis, SES.
Some of those APIs are sort of odd, admittedly, and could be covering up eventual consistency under the covers (Route53 in particular springs to mind there!)
Edit: And S3 and SimpleDB now expose more consistency than they did at launch.
A glaring gap currently is security. Per-container IAM roles would go a long way IMHO, but that still leaves "other" secret management which is a PITA. Other options such as kubernetes lack the AWS/ELB integration; all seem to be lacking a good security management model.
I'm not sure that's what I would call an integration... AWS provides for easy host management and elastic scaling traditionally through the integration of the ELB with autoscale groups, and now with life-cycle hooks. I'm not aware that kubernetes integrates with this stuff in any way or provides a sufficient alternative. Reading through the documentation I was not able to find information about connection draining on rolling updates, taking hosts out of service for maintenance/scaling/replacement, and so on. I am aware that kubernetes will run on AWS now and there is a guide for setting it up.
However this really wasn't the point of my comment, which is that security for application secrets(and AWS API access) is currently a sore spot. It would be nice if kubernetes would adopt some of hashicorps stuff like consul, templates, and vault. Maybe that's too far up the container stack though and a popular bundling of technologies will appear.
In the task definition - it would be MUCH better if you could select the EC2 instance type you want, collect them in an ASG and have the task slicing scale the ASG accordingly.
Right now - you have to manually determine the slice size for each container-to-EC2 and manually scale the ASG.
Further, it was noted that ECS is actually NOT AZ aware and it will spread load over EC2 instances in the pool -- but it wont also balance the tasks across AZs....
So, its a fantastic version one... but these are some resiliency and scaling features that should have already been included.