The universal problem is "generating code from a formal spec" is extremely, maddeningly, pull-your-hair-outingly hard. A slightly easier thing, as you said, generating test cases, which is what the Mongo people have been advancing: https://arxiv.org/abs/2006.00915
(Another interesting approach in this space is P, which is lower-level than TLA+ but also can compile to C#: https://github.com/p-org/P)
Check out Coyote (which is an evolution of P) and allows you to "model check" .NET code to find hard to find race condition-y bugs in your implementation. We have used it very successfully on a number of projects at Microsoft. I should probably do more public talks on it to get the word out. Finding subtle bugs in your implementation using tools like Coyote is easier than most people realize.
TLA+ and Coyote pair quite well actually. Verifying the high level design in TLA+ while checking subtle bugs as you translate that design to an implemention.
Aren't F* and Dafny kinda also solving this problem of bridging the gap between formal spec and executable code? Another interesting question is if there are gaps between the abstract systems-level specs of TLA and the relatively low-level specs of F* that might need bridging?
(Another interesting approach in this space is P, which is lower-level than TLA+ but also can compile to C#: https://github.com/p-org/P)