In what way is pushing 20K requests/second "concurrency fail"?
Node.js doesn't make it easy to share state between processes, but once you're scaling to multiple processes, the jump to multiple machines probably isn't far behind. You'll need to design a distributed algorithm, or just centralize your shared state in something like Redis anyway.
That's a good argument for a plain web app fronting a DB server. But what if you want to implement a server that keeps a lot of data in memory? What if you're not the user of Cassandra, Redis, Lucene or a specialized analytics engine but its creator?
For request/response type of scenarios (unlike batch processing) you basically have two options: (a) Use a multi-process shared memory design (or even just the file cache), which is very cumbersome to implement because you cannot use pointers and hence none of the data structures in your favorite language's library. (b) Use shared in-process state with powerful concurrency primitives. That's what clojure helps you do.
You cannot avoid to choose between these two architectures by using distribution across machines. Distribution is built on top of whatever you do on a single machine.
If you consider the multi-server case, this is a consensus problem. To solve it, you need to either use an algorithm like Paxos, or use some central synchronization point like a Redis or memcached server. Yes, node.js fails compared to Clojure when it comes to taking advantage of multiple cores on this example. Clojure's concurrency primitives are really slick. But that's not the whole story.
Seems like a rare edge case. Also a bad design IMHO
Expanding your network IO handling code to use multiple cores sounds like fun, but it's the wrong way to do things.
Network handling code is not CPU heavy. Have a single process handling the IO, and for CPU heavy tasks hand off to other threads, handled with callbacks.
I also wish we'd stop benchmarking these sorts of things in NNk requests/second. As if most use cases will get anywhere near that in production.
Seems like a generalization. People are interested in evented web programming for applications/services that are less traditional in design - where memcached doesn't buy you anything and NNk requests/second do matter. If you're putting your evented application on an 8-core box, you most certainly want to leverage a good number of those cores.
This is fascinating, but for a lot of us building web apps we have to scale beyond a single box, which means shared state needs to be in something like memcached.
I love Clojure, but it's concurrency constructs aren't that relevant to most web programming. Which is fine, but let's keep in mind that developing a web app with Clojure is painful now due to a lack of libraries / a large enough community. I'm eagerly awaiting the day that changes.
There are a few usable libraries for building web apps in Clojure (Compojure is an example). These libraries are not as comprehensive as libraries in other languages and the communities are small as you noted. That said, I wouldn't describe the experience as painful. Some of the libraries are pleasant to use.
I agree with your comment regarding Clojure's concurrency constructs. I am building my web app to run on more than one machine. I have not yet had a need to use any of the concurrency constructs.
I'm still trying to figure out exactly where the Clojure concurrency model is useful. How many problem domains require extremely high throughput in a single shared memory space? It seems to me that you're usually either well within the bounds of what a single cpu can handle or you're going to need multiple machines and can just keep throwing procs at the problem.
Counter-question: How many people have been so trained by decades of terror with multithreading horrors that they can no longer see when it is the right choice when you have a non-nightmare multithreading capability?
When you encounter a new paradigm like this, its very important to work in it for at least a little and give yourself some time to untrain before writing off the usefulness of the paradigm. I work mostly in servers and the answer to your question is really quite a lot. (Unfortunately, I'm in the worst place, I untrained myself and am now able to see where this is useful, but not really able to use it at work. Sad.)
This echoes my sentiments rather precisely. On the not infrequent occasion that I am building something which requires massive concurrency I _also_ need it to be distributed, if only for service reliability. Multi-core might be the (very near) future, but distributed processing is now. Languages that make distributed computing easy (well, easier) are more valuable, to my mind, than those that make multi-core computing more simple. That is why, for me, Clojure is a lovely little toy, but Erlang's a workhorse.
Those are still different issues - threading & concurrency in one memory area -vs- distributed computing. Clojure is attempting to optimize for the former... it's easy to write thread-safe applications.
They are distinct issues, to be sure, but, then again, they are part and parcel. Properly implemented, a language with distributed computing primitives buys you single address space concurrency: it's the degenerate case of a cluster. No matter what, a single machine is going to be less in terms of total computational capacity and fail-over ability. Languages that optimize for single machines, however nice, are something of a tease.
It's great for doing things like k-means clustering or Delaunay triangulation. Those might not come up a lot for you, but some people need to do them, and would love to be able to speed that up by throwing several processor cores at the problem. Solving them on a cluster is a lot harder, and often unnecessary.
(On k-means clustering, in particular, Clojure-style transactional memory gives an almost linear speedup with the number of processor cores, without significantly changing the code. That's worth something.)
Sorry, I can't point to any code, but in both cases the approach is the same: you have some shared data structures, like an array of regions containing points, or a red-black tree, or a priority queue, and several threads all read and write these data structures. Since transactional memory lets you run concurrently unless there are actual memory conflicts, this often gives better concurrency than locks.
Of course, for examples like these, the tricky part is designing data structures that don't have many inherent memory conflicts. Red-black trees, for example, are tricky because the rebalancing transformations tend to step on other threads' toes.
Has this been done by someone? I remember the same answer (use Clojure + Terracotta) being given months ago when someone in the clojure google group asked for erlang type distribution support. I was wondering if someone had actually done it to see how well it works.
I can answer this question for Java + Terracotta. Terracotta works much of its magic through bytecode instrumentation so I expect that the differences between Clojure and Java are minimal.
Pros:
1. Makes it very easy to turn an existing threaded
application into a multi-node application, often
no code changes required. Killer feature.
2. Integrates seamlessly with Spring and Ehcache. Perhaps
less relevant for Clojure developers but a big boon
for JEE people.
Cons:
1. It uses multicast for replication. This is an
implementation detail, not a conceptual drawback, but
it's a pain to set up when your machines live in
different network segments.
2. Can potentially transmit stupid amounts of data over
the wire. I suppose this is part design, part configuration.
If you are going to use Terracotta you really do need to design for it - even if it could in theory wrap a naive design, you'll probably thrash the network.
Some ideas for Terracotta friendly design I can think of, right off the bat:
1. If this node is the only one using this particular shared data structure right now, it gets to operate locally without network transactions until it's done.
2. If you consistently use a small part of a large data structure (eg: a hash), you'll only have that small part "paged in" on that one node. The whole data structure can be larger than RAM. Thus, node affinity can be important.
3. Be careful how much work is done inside a synchronized method of a shared data structure. It should be a medium amount. Too much, and you'll make other users wait, too little and you'll thrash the network.
Amazon Compute Clusters have 23gb of RAM. If top is to be believed Node.js by the time it's done is around 1.6% memory and well above 420% CPU. Aleph is at 9% memory and steady around 380% CPU the whole time. No surprises really.
So wait, in his first post, "aleph (~8.5K req/s) edges out Node.js v.0.1.100 (~7.0K req/s)" which was with one node.js process on one core. Node.js scales fairly linearly (from what I've seen) as you add additional processors and processes, so I find it very hard to believe that running an additional node.js process would still be slower than aleph. I say testing fail.
I think Amazon EC2 cannot be trusted for these benchmarks.
On the physical hardware I tested node scales very linearly with the amount of cores added.
On Ec2 I have seen very strange results / bad performance. There is something seriously odd here, and I suspect it has to do with Amazons virtualization.
On the box the author tested with, I would expect node (or nginx) to easily serve 50k req / sec.
I'll have to do some more research to figure this out.
Different architectures, different OSes, different results. On a Macbook 13 inch Core 2 Duo, Node outpaced Aleph. I posted the code, anyone with the time can confirm.
Please, try to use some external database/key-value storage lookup instead of printing "hello world" and show us memory and CPU usage. ^_^ Especially for Clojure.
Node.js doesn't make it easy to share state between processes, but once you're scaling to multiple processes, the jump to multiple machines probably isn't far behind. You'll need to design a distributed algorithm, or just centralize your shared state in something like Redis anyway.