Have you heard the piano continuations of AudioLM? https://google-research.githu...

phillipharr1s · on Oct 12, 2022

Pretty sure the first continuation is a famous piece with a few notes messed up. Can't remember the name. Honestly it only sounds marginally better than the old markov chain continuations.

macrolocal · on Oct 12, 2022

Yep, Moonlight Sonata (mov. 3) no less. Talk about over-fitting!

vladf · on Oct 12, 2022

Isn’t that as good as it gets? The whole point of the continuations is that given a short leading prompt from a real piece that it should continue it realistically.

It didn’t get to train on the test set, if that’s what you’re implying, and I find it hard to believe the assertion that continuations are copies of the train set (if that’s your claim).

p1esk · on Oct 12, 2022

It definitely copied a piece of Moonlight sonata in the last 7 seconds of the first continuation sample: https://youtu.be/4Tr0otuiQuU?t=516

vladf · on Oct 13, 2022

Wow, good find! They definitely sound similar but it’s not a facsimile. I wonder if this holds for the other samples.

I guess in retrospect we asked it to continue the music in a likely way, not be novel. And it definitely convinced me enough to be impressive. An NN that composes completely fresh music, whatever that means (I’m sure most modern human music has a hefty dose of cross song sampling), would certainly be a good next goal post.

holub008 · on Oct 12, 2022

Interestingly, the original piece is a later Beethoven Sonata, Op. 31 No. 3. The model has its styles down! https://youtu.be/P-Q5aBAw-T4?t=78

bloep · on Oct 12, 2022

Indeed, there is lots of denial or ignorance in this thread (ignorance in the technical sense). AudioLM already produced impressive results and it's a tiny fraction of what is already possible because performance simply improves with scale. One can probably solve music generation today with a ~$1B budget for most purposes like film or game music, or personalized soundtracks. This is not science fiction.

p1esk · on Oct 12, 2022

I don't see a lot of progress in AudioLM compared to results from 2018: https://storage.googleapis.com/magentadata/papers/maestro/in...

What's more interesting and concerning - listen carefully to the first piano continuation example from AudioLM, notice the similarity of the last 7 seconds to Moonlight sonata: https://youtu.be/4Tr0otuiQuU?t=516

I'm afraid we will see a lot of this with music generation models in the near future.

bloep · on Oct 12, 2022

There are quite simple tricks to avoid repetition/copying in NNs, e.g. by (1) training a model to predict the "popularity" of the main model's outputs and penalizing popular/copied productions by backpropping through that model so as to decrease the predicted popularity, or (2) by conditioning on random inputs (LLMs can be prompted with imaginary "ID XXX" prefixes before each example to mitigate repetitions), or (3) by increasing temperature or optimizing for higher entropy. LLM outputs are already extremely diverse and verbatim copying is not a huge issue at all. The point being, all evidence points to this not being a show stopper if you massage these evolutionary methods for long enough in one or more of the various right ways.

p1esk · on Oct 12, 2022

I'm not sure what you mean by "backpropping through that model so as to decrease the predicted popularity". During training, we train a model to literally reproduce famous chunks of music exactly as they are in the training set. We can also learn to predict popularity at the same time, but we can't backpropagate anything that will reduce popularity, because this would directly contradict the main loss objective of exact reproduction.

Having said that, I think the idea of predicting popularity is good - we can use it for filtering already generated chunks during post-training evaluation phase.

I don't think the other two methods you suggest would help here, we want to generate while conditioning on famous pieces, and we don't want to increase temperature if we want to generate conservative, but still high quality pieces.

It's true that we (humans) are less sensitive to plagiarism in the text output, but even for LLMs it is a problem when it tries to generate something highly creative, such as poetry. I personally noticed multiple times a particular beautiful poetry phrases generated by GPT-2 only to google it and find out they were copied verbatim from a human poem.

bloep · on Oct 13, 2022

What I had in mind was kind of like a reward model that is trained by on longer outputs that have a very high similarity to training examples. Something similar has been done to prevent LLMs from using toxic language. You'd simply backprop through that model like in GANs. And no it does not contradict the overall training objective completely because the criterion would be long verbatim copies and it would not affect shorter copies of sound fragments and the like which you would want a music model to produce in order for it to sound realistic and natural.

p1esk · on Oct 13, 2022

Oh OK, so you mean training the model after it has already been trained on the main task, right? Like finetuning. Yes, I think the GAN-like finetuning is a good idea. Though it's less clear where the labels would come from, it seems like some sort of fingerprint would need to be computed for each generated sequence, and this fingerprint would need to be compared against a database of fingerprints for every sequence in the training set. This could be a huge database.

bloep · on Oct 14, 2022

You'd need something Spotify.

Another similar possibility might be to do more RL with this data, e.g. using upside-down RL. One can possibly steer this with user feedback as well.

sdenton4 · on Oct 13, 2022

Meanwhile, the music industry is full of copyright cases brought over matching combinatorial fragments... Humans have the same problem in this case.