Not sure about that. It is true that no space overhead might be a good thing, but other reasons simply don't look that good.
"algorithms which loop rather than recurse" - something like (primitive example, I know) `f(a, x): f(a+x, x-1); f(a, 0): a` with a tail call should be compiled into a loop he's talking about. There is no difference. There is no overhead. (single `jmp f` at the end and even the arguments should be in the right place)
"Sometimes it’s about shuffling, nudging and swapping" - apart from the situations where by simply inlining/analysing some functions which do the swap you can eliminate the swap completely and simply specialise the function to swap the places it reads the values from. Sometimes you don't need to swap - you just want the values in the right places and don't care how it is done under the hood. Swap done directly with a machine operation in C++ has to do the actual swap.
It was a cheap shot at FP tbh :/ Still right that C++ sometimes gets things right, but for 2 wrong reasons this time.
Yes. And I guess if you choose a strict language like Ocaml, even the space overhead isn't that bad. With a lazy language like Haskell the unevalutad thunks can eat a lot of memory. But you can always make your Haskell programs strict, if you know enough magic.