The problem is not the CPU per se, but the PCI bandwidth congestion between CPU<->PCI lanes.
These NVME drives can talk directly to each other for raid which means a much larger total bandwidth is available, and potentially improved latency also.
RDMA means you might not be serving via CPU at all.
The difference isn't that big though. Sure with software RAID you write 1GB to a 8 disk RAID6 you write 8/6 x 1GB = 1.33GB. But with RAID offload you nearly double the NVME bandwidth (n - 1) x 2 consumed.
I also wonder, if you have 8 NVMe, write to a stripe to one, it does the RAID calc and sends each disk the share of the stripe. What happens if the master NVMe dies? It's not really a RAID if a single disk can kill the RAID.
And even with e.g. AMD Matisse, aka Desktop Ryzen Zen3 on AM4, it turns around in the PCIe root complex instead of consuming infinity fabric bandwidth.
These NVME drives can talk directly to each other for raid which means a much larger total bandwidth is available, and potentially improved latency also.
RDMA means you might not be serving via CPU at all.