So node would probably outperform the given clojure example by 13.5K req/s vs. 8.5k req/s (YMMV).
It is also worth noting that clojure library used (aleph), does not seem to support streaming the response.
The most interesting aspect is clojure's support for futures, and Javascripts lack of such. If you're willing to trade speed for cheap syntactical pleasure, clojure has an advantage here : ).
Clojure should also scale very well on a real 4 core machine (it's the JVM after all). And I wouldn't be surprised if was neck-to-neck with Node.js. Without numbers it's just talk (I don't have access to a real four core machine so I can't say myself).
Also I note that your code is very verbose for managing multiple cores while the Clojure version is pretty much identical to the original Node version.
The clojure example is using a library. There are libraries for node that make the process handling just as easy.
Anyway, I think you are mistaken if you believe a clojure http server could outperform it's node counterpart. I mean node is really just a set of air-thin Javascript bindings to some pretty good C code / native system calls. Node's http parser is also written in C.
The problem with benchmarking raw node.js code is that not a lot of real web apps will look like that. Naked node.js pushes all the asynchronous complexity on the application writer - writing everything with callbacks is equivalent to hand-writing a state machine.
A fair comparison on say a 4-core machine would be 4 JVMs running Clojure webservers with some front-end, and 4 node.js instances running one of the library front-ends, with the benchmarked page written with one of the CPS-transforming toolkits (coffeescript, narrative JS, jwacs, etc.) or maybe monad toolkits if/when somebody decides to write one.
Node's scales linearly if you add a process per additional core (see my original link). That does allow for a certain extrapolation from a single process/cpu benchmark.
Anyway, you are correct - a real benchmark between node / clojure would be needed to proof my argument. But I think reasoning about the underlaying machinery is a better use of my time in this case. Bugs aside, there is no reason to believe Clojure could perform faster if it ends up doing more work under the hood. And from what I can tell, there are many more layers of abstraction in between clojure's http implementation and the native system calls, than there are for node.
That's not really a valid test. NodeJS can easily make use of multiple cores with child processes, but by design, it's a very low-level API, so that stuff isn't used by the HTTP server by default. In other words, it doesn't assume you want to use multiple processes for your HTTP server. Maybe you want to use them for something else. (Like, if you have an HTTP server and an IRC server sharing the same box, and you want to make sure they are isolated from one another, or something.)
When leveraging multiple cores, NodeJS is closer to nginx in terms of performance than it is to Jetty or Django. (Of course, nginx uses almost no memory, and NodeJS has this big honkin VM juggling JavaScript contexts, so it's not as space-efficient.)
For stupid web apps that serve content straight from cache, or for servers that are very I/O bound (like anything involving messaging) ... then Node.js may blow Clojure out of the water.
For everything else ... i.e. most startups I see on HN ... you really have to take the CPU-bound or synchronous logic out of Node.js, otherwise it sucks big donkey balls.
People striving for the ultimate development experience have really forgotten what it's like to have brute processing power. Web apps like PlentyOfFish which had 1.2 billion page views/month in 2009 ran only on 2 balanced web servers ... and this considering that on a dating site everything's dynamic, you really can't have too much caching.
Clojure might turn out to be the best of both worlds ... a productive programming language coupled with the brute CPU-bound performance that the JVM can yield.
At first I misread what you said as "NodeJS can easily make use of multiple cores with child processes, by design". It's true, though. Many NodeJS proponents know this. Google Chrome, which uses the same JavaScript engine, V8, works the same way. The fact that it uses lots of lightweight processes is a big selling point of Chrome.
I'm not completely sure about this, but I remember hearing that SpiderMonkey is more suited to using threads, and that's why CouchDB uses it rather than V8.
Multiple processes are an inefficient way of making use of multiple cores. Either you pay memory overhead to duplicate state, or you pay IPC overhead to get access to shared state of some kind.
In many cases it's more efficient precisely for the reasons you mention. Data duplication removes locking and cache coherence penalties. If you structure your program to communicate by message passing over IPC (for example, via a database) you get those benefits, plus possibly more efficient mutual exclusion mechanisms (CAS, tuned MVCC, etc).
Also, don't forget that GC penalties are going to be paid by all threads in a process.
Multiple threads in a single process doesn't preclude data duplication in any way; and optimal IPC is always going to be slower than optimal in-process message passing. I wouldn't consider bringing a database into the communication channel to ever approach optimality.
As to GC, that presumes that you are using GC. For a request/response style server application, there are more optimal techniques than GC: make sure all request-associated allocations come out of a per-request heap, and you can free that heap en masse, in one go, once that request is done with.
assuming you proxy requests to the different processes. that's not what those examples are doing - they simply share a server socket and let the kernel load balance incoming connections over them.
The main problem both systems have is that they're relatively immature. node.js is still undergoing heavy development, with a few breaking changes every couple of releases (although it's slowing down in that regard). And aleph basically just came out, right?
So I'd regard both as "things to watch" or bootstrapping solutions, not something I'd use for my Global Thermonuclear War management system.
I wrote Aleph this weekend, so yes, it's very new. However, right now it's just a few hundred lines around Netty, which is anything but immature.
I'm certainly not claiming it's flawless, or that there won't be any breaking changes in the future. However, I think this is a great example of one of Clojure's strengths: even brand new projects tend to have solid, well-tested foundations.
I agree. I had a similar problem with CouchDB a while ago when I tried to write the application in CouchDB. The interface was there, but it was obvious that some of the key features hadn't been implemented yet, requiring painful and complex workarounds. I went back to technologies I knew were tried and true.
If you're interested in playing around with tech and having fun, then these technologies are great. But for business I would avoid them for now.
Threads aren't going to scale to the same number of concurrent clients as an evented system though. Isn't that the whole point of things like node? Serving huge numbers of relatively lightweight requests?
If you're doing thread-per-request, than no. If you've got an event-driven I/O system backed by a thread pool for executing parallel requests, than yes.
I set up a Rack adapter through Jetty 7 on JRuby that works exactly like this. The SelectChannelConnector is event-driven I/O, and the QueuedThreadPool handles requests in parallel.
The result? 6k req/sec on my laptop for a simple Rack app, basically the same as what's shown in these examples, since my machine is quite a bit slower, and I've clocked 10k/sec on faster machines.
I was referring to the second example, using threads. Specifically this claim:
In Clojure we don't need to use callbacks. This means for common things like talking to databases, we don't need them to have asynchronous interfaces. That's because we have really fantastic primitives in the language itself for dealing with concurrency. This code runs twice as fast as the Node.js counterpart - probably due to the excellent perf of Clojure coupled with leveraging multiple cores.
You can't tell whether or not something is using "threads" or not just by looking at it. Javascript is a fairly standard Algol language with a modestly unusual object model, and if all you know is other Algol languages you may not realize what is possible. There are numerous languages where you can write the equivalent of:
function do_something():
socket = wait_for_a_socket()
data = read_everything_from_socket(socket)
process_data_long_and_hard_with_lots_of_io(data)
write_to_disk(data)
return_result(socket, data)
Where, as my function names imply, further code may be called that does things like talk to databases, or wait for other data, or any amount of other I/O, and you may not have to write a single asynchronous callback. Why? Because unwrapping code into continuations and managing them in the runtime is a trivial compiler transformation when you design your language to work like that from day one. Like Erlang, or, in this case, Clojure.
The fact that Node.js requires you to manually shatter your code into teensy-weensy little fragments and manually wire it back together is a hack which should not be mistaken for a feature. Javascript requires you to do that, because it's an Algol language and just doesn't work any other way. If that's what you love doing, great, but that's an awful lot of time and effort spent on writing plumbing (and debugging plumbing, and debugging nontrivial asynchronous plumbing in a mutable language isn't in the worst tier of programming tasks in the world, but it's solidly in the second...) that you could have been spending on writing code that actually solves customer problems.
(I am aware of the libraries that pretend to help this. They are a joke compared to working in a language that actually supports this. I can call functions and send messages and read files and read from databases and write functions to abstract all this and I don't spend one second wondering how I'm going to wire all the pieces together at runtime. No amount of code slathered over Javascript can match that, short of an entirely new language that compiles into Javascript. (Which is inevitable. And it will be hailed as a brilliant breakthrough.) All of the libraries I was pointed to last time I brought this up use the exact same obvious hack, which helps with the case of stringing a handful of teensy-weensy fragments of code onto a single string but can't handle anything more.)
Node uses a thread pool also. From Ryan's JSConf slides:
"Blocking (or possibly blocking) system calls are executed in the
thread pool. Signal handlers and thread pool callbacks are marshaled back into
the main thread via a pipe."
The point is that it has a concept of a main thread. You need to get away from that, and rather have a pool of threads ready to dispatch events, not just a pool for blocking calls to avoid the main thread blocking.
Why? Everything that runs in the main thread is Real Work to compute the response. It makes more sense to run multiple node.js processes. With a thread pool for all events, you have a lock in the critical path between accept()ing a connection and handing the fd off to a worker thread, and all worker threads pay the GC penalty. With multiple processes, you have one less lock hot spot, and the processes do their GCs concurrently.
What makes you think that a thread pool dispatching events would use a lock? No modern efficient thread pool implementation dispatching work items is serialized by a single lock, to my knowledge.
What you ought to have is multiple async accept calls outstanding, and as they complete they are entered into work queues. Worker threads (i.e. the thread pool) pulling work of those queues should steal work from other queues when their own queue is empty.
Pulling items off a queue, either by its associated worker thread or another worker thread stealing its work should be lock-free.
As I mentioned elsewhere, GC is not necessarily (or even often) the most efficient approach for request / response style servers. A more optimal approach is a heap associated with the request which can be freed in a single go when the request is done with, with all allocations associated with that request (i.e. that don't need to persist between requests) coming out of that heap.
You can even design a GC around this principle: have one GC heap per worker thread, and collect it after every request has been processed. There should be little or no roots for this heap associated with the worker itself, which should be (very) low down in its call stack after it is done with the request. If you have write barriers for any mutations to inter-request (shared) state, you can trace those to find out which bits of the worker thread's GC heap you need to keep (copy out). Then you can simply zero the GC heap and reset the free pointer. You can make your write barriers smart so that they are associated with that worker's heap, so you don't have to wander all over the shared heap looking for roots.
"What makes you think that a thread pool dispatching events would use a lock? No modern efficient thread pool implementation dispatching work items is serialized by a single lock, to my knowledge."
Are you talking about thread-safe queues implemented using atomic instructions? Those aren't free - how do you think they're implemented at the hardware level? The main advantage of atomic instructions over locks is removing the possibility of waiting on a preempted thread holding the lock (lock-freedom). They also have lower overhead than making system calls to provide locking. But the equivalent mutual exclusion logic (and contention penalties) just get moved down to the chipset level - now instead of waiting on other threads, you're waiting on other cores/CPUs.
"What you ought to have is multiple async accept calls outstanding, and as they complete they are entered into work queues. Worker threads (i.e. the thread pool) pulling work of those queues should steal work from other queues when their own queue is empty.
See e.g. http://www.bluebytesoftware.com/blog/2008/09/17/BuildingACus...
That article is horrible. Please do not follow the author's advice.
Besides the fact that the code deadlocks (see the first comment on the article), it's also easy to see that if one thread starts generating all the work the "solution" is going to become a single global queue (the "local work stealing queue" of that thread), with all the other threads looping through the global queue and then all each other's queues just to reach there!
The one good thing about that article is that the throughput gains on the toy benchmark (which doesn't deadlock or spawn tasks non-uniformly) nicely illustrate my point about the expense of contention even if using atomic instructions.
What the code in the article attempts to do is alleviate contention by partitioning the tasks among several queues. The problem is that if the work is not distributed uniformly among the queues, some threads will be left idle. The way to overcome that is to fake a global queue by having some way to synchronize the partitioned queues. The optimal solution to how to do this depends not only on the particular system you're running on, but also the pattern of work spawning by the application. And all this depends on being deadlock-free (something not managed by the article)!
Why would you ever go through something so horrible for an HTTP server? Multiple node.js processes are much simpler and more efficient.
I don't think you actually read what I wrote, or if you did, you willfully misunderstood it.
"The problem is that if the work is not distributed uniformly among the queues, some threads will be left idle" - this is why you use work-stealing queues! The very nature of work stealing queues is that the worker threads aren't left idle - they steal work from other threads' queues.
And CGI is not anything like the GC I talked about - if you have a process per request, where are you going to put your shared state?
But much of this discussion is besides the point. Don't forget, the OS scheduler is at its heart an event dispatcher when there are more runnable threads than CPU cores. The thread stack is little different than context provided to a triggered event. You want to have the same number of runnable threads as CPU cores in order to avoid the kernel cost of a context switch. You can do that by having multiple single-threaded processes, or multiple threads in a single process. While neither choice of partitioning affects the degree to which you can use an eventing style to serve requests, one - the separate process model - makes it much harder to share state. And therein lies the reason why I believe that optimal performance lies in threads, rather than processes. There are other good reasons for using processes instead - but it will be at some cost to efficiency.
This looks really nice! Our server is based on Node right now but we've had the idea of porting to Erlang later. Maybe Clojure + Netty is another option to explore.
Node uses Google's V8 virtual machine, which they built for Chrome. JavaScript is single-threaded, so yes, unless you spawn a child process it'll be limited to a single core.
Only implicitly: no threading model is documented in the ECMAScript Language Specification [1], and no widely-used interpreter implements threading. The new Web Workers specification adds something like threading to JavaScript in the context of the browser [2], and Node intends to add support for this on the server as well [3]. Have a look at [4] for a nice article on threading in JavaScript.
It uses Google's open source V8 JavaScript engine. (http://code.google.com/p/v8/) It doesn't support multiple cores AFAIK. To use multiple cores, the server would need to spawn subprocesses or run multiple servers behind nginx.
I initially had Javascript off, so I figured it was part of the joke when the space for the shortest Node.js web server was empty. When the shortest Clojure web server was also zero characters, I got suspicious.
Don't forget to add a non-Javascript version for people without Javascript (rare) and bots (common).
Why not? Aesthetics improve readability and maintainability, (probably) trading out for less raw speed. If I'm less concerned about performance than maintainability, why shouldn't aesthetics play a large role in my decision?
Because it can lead you to make arbitrary choices. People usually disagree vehemently on matters of aesthetics. I didn't say it shouldn't play any role, it just shouldn't be an important one. Focus too much on syntax over semantics and you're bikeshedding.
http://gist.github.com/430932
So node would probably outperform the given clojure example by 13.5K req/s vs. 8.5k req/s (YMMV).
It is also worth noting that clojure library used (aleph), does not seem to support streaming the response.
The most interesting aspect is clojure's support for futures, and Javascripts lack of such. If you're willing to trade speed for cheap syntactical pleasure, clojure has an advantage here : ).
edit: Who down voted this / why?