If you need recoverability from every state to A or B, then it’s not a 2-state (with intermediates) problem and you do need to go through the complexity of modelling all 6 states.
Also, the full state representation is not the problem, that’s easy enough to do with a reasonable type system. The problem is defining the transitions. But if you need the full transition matrix (or even any substantial subset), then you have a lot of things to do. But it’s things you’d have to do with a non-state machine based approach.
Realistically the only overhead with a sparse state machine representation is adding is the “catch-all” error branch you need to deal with any forbidden transitions. Everything else, you’d need to deal with anyway for normal operation. And you probably want the catch-all branch anyway, for debuggability when it breaks.
Also, the full state representation is not the problem, that’s easy enough to do with a reasonable type system. The problem is defining the transitions. But if you need the full transition matrix (or even any substantial subset), then you have a lot of things to do. But it’s things you’d have to do with a non-state machine based approach.
Realistically the only overhead with a sparse state machine representation is adding is the “catch-all” error branch you need to deal with any forbidden transitions. Everything else, you’d need to deal with anyway for normal operation. And you probably want the catch-all branch anyway, for debuggability when it breaks.