You're confusing a few terms. There's latency (time to begin action), and speed ...

You're confusing a few terms. There's latency (time to begin action), and speed (time to complete after beginning).

Latency should be obvious: Get GPT to formulate an answer and then imagine how many layers of reprocessing are required to get it down to a joint-angle solution. Maybe they are shortcutting with end-to-end networks, but...

That brings us to slowness. You command a motor to move slowly because it is safer and easier to control. Less flexing, less inertia, etc. Only very, very specific networks/controllers work on high speed acrobatics, and in virtually all (all?) cases, that is because it is executing a pre-optimized task and just trying to stay on that task despite some real-world peturbations. Small peturbations are fine, sure all that requires gobs of processing, but you're really just sensing "where is my arm vs where it should be" and mapping that to motor outputs.

Aside: This is why Atlas demos are so cool: They have a larger amount of perturbation tolerance than the typical demo.

Where things really slow down is in planning. It's tremendously hard to come up with that desired path for your limbs. That adds enormous latency. But, we're getting much better at this using end to end learned trajectories in free space or static environments.

But don't get me started on reacting and replanning. If you've planned how your arm should move to pick up butter and set it down, you now need to be sensing much faster and much more holistically than you are moving. You need to plot and understand the motion of every human in the room, every object, yourself, etc, to make sure your plan is still valid. Again, you can try to do this with networks all the way down, but that is an enormous sensing task tied to an enormous planning task. So, you go slowly so that your body doesn't change much w.r.t. the environment.

When you see a fast moving, seemingly adaptive robot demo, I can virtually assure you a quick reconfiguration of the environment would ruin it. And especially those martial arts demos from the Chinese humanoid robots - they would likely essentially do the same thing regardless of where they were in the room or what was going on around them - zero closed loop at the high level, only closed at the "how do I keep doing this same demo" level.

Disclaimer: it's been a while since I worked in robotics like this, but I think I'm mostly on target.