Maybe it's a good use for deepfake technology. Send the audio quickly (that works fine on a telephone, must be OK), then the receiver fakes up a talking head based on that.
You'd be surprised - audio is very slow. Telephone works because we are trained to compensate and also because there are no visual cues. (Weird thing - the old analog lines were much faster than our cell phones of today.)
I can recall using the Hack to force international calls over cable instead of satellite (to reduce latency) when I used to do a lot of international liaison for BT.