I agree that the images correspond to the same region in object space. Further assumptions on optical resolution don't work well, as the optical resolution depends on the f-number.
The angular resolution depends purely on the aperture diameter, not the f-number. There should be no difference between capturing the image in high resolution, and blowing it up for a lower resolution sensor.
All that should be needed is a 200mpx sensor that can output the entire frame in 12mpx, and 12mpx of the central area in full resolution. It's similar to how our eyes work.