The wait_all() "yields" to the operating system's thread scheduler, which switches to another thread that's ready to run (or if fibers are used instead of threads, a 'fiber scheduler' would switch to a different fiber within the same thread).
Taking into account that an async/await runtime also needs to switch to a different "context", the performance difference really shouldn't be all that big, especially if the task scheduler uses fibers (YMMV of course).
Taking into account that an async/await runtime also needs to switch to a different "context", the performance difference really shouldn't be all that big, especially if the task scheduler uses fibers (YMMV of course).