From what I've seen in the generated tracks so far (this one and others), they're pretty good locally, but just ignore the overall composition. For example any generated blues tracks will have the vague blues feel, but won't keep the 12 bar style. The bluegrass example here doesn't even seem to keep to 4/4 (or is extremely fluid about it...). Maybe one day someone will add a higher level "what's the current section, how far are you into it" inputs to that model to get something better - literally preparing the structure first and then filling it in. That should get much better results for context like "you're playing blues in A with quick change and generating bars 3-4, match the previous bars in style".
I mean, chatgpt knows how to plan this out https://chat.openai.com/share/976077c0-138b-4363-8065-3c8eed... Painting in that picture should be much easier than generating something freeflowing. Generating a good structure isn't that hard for most styles, because you can literally use the same pattern and do a few random changes that keep the key. (See lots of pop songs using the same 3/4 chord progression)
I mean, chatgpt knows how to plan this out https://chat.openai.com/share/976077c0-138b-4363-8065-3c8eed... Painting in that picture should be much easier than generating something freeflowing. Generating a good structure isn't that hard for most styles, because you can literally use the same pattern and do a few random changes that keep the key. (See lots of pop songs using the same 3/4 chord progression)