Fun stuff. Reminds me of writing code to restore a state snapshot for a sound module with its own processor. It had four byte-wide shared I/O registers in a row. After restoring almost all memory, I put a two-byte infinite-loop branch instruction in the last two bytes and had the sound CPU jump to it, loaded a 1- or 2- byte instruction in the first two bytes, then modified the branch offset to execute this instruction as part of the loop, let it run a few times, then modified the offset to just be a single-instruction loop. I did this multiple times to execute each instruction needed to finish loading memory, restore all registers, and finally jump to the execution address with everything restored.