Good question! I think a major reason for this is because of transfer learning. ...

Good question!

I think a major reason for this is because of transfer learning. For computer vision, there are many good pretrained models that were trained on huge datasets (like ImageNet) that can be fine-tuned for custom tasks. Other fields often do not have such pretrained models and huge datasets to work on, so it turns out transforming a dataset into an image dataset and fine-tuning a pretrained model works better than training from scratch.