These diffusion models give an ability to create far more than can be communicated with MIDI + instrument. Riffusion gave a hint of this - rather than just notes and drum hits + some processing, it becomes one big pulsating, expressive mass which would not be reproducible without the granularity of a diffusion model. These are reminiscent of some of the serendipity of live recordings with lots of tracks where interesting things happen from interplay of many different layers. Generating a few dozen clips generally would give me 2 or 3 with a beautiful emotional passage which really lights up the pleasurable music part of my brain. Mass farming these clips seems like a good route to some amazing music.
Music is hard to describe well without using artist names or references to specific songs. There isn't an alternative way to really describe things - "Airy EDM with tropical feel" doesn't cut it.
This space will belong to scrappy shadowy decentralised organisations who let you type "give me a filtered french disco song using mizell brothers era johnny hammond jazz funk samples, lil uzi rapping, with a thundercat bassline and crooning"