Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

That raises an interesting difference between cleaning AI-generated sound and cleaning ordinary recordings. In an ordinary recording, there is an objective reality to discover -- a certain collection of voices was summed to create a signal. With (most? the best?) existing AI audio generation, the waveform is created from whole cloth, and extracting voices from it is an act of creation, not just discovery.

I've come across AI-generated music that outputs something like MIDI and controls synthesizers. Its audio quality was crystal-clear, but the music was boring. That's not to say the approach is a dead-end, of course -- and indeed, as a musician, the idea of that kind of output is exciting. But getting good data to train something that outputs separate MIDI-ish voices seems much harder than getting raw audio signals.



Generative models can certainly create midi, but no one has done it yet. Given the technique is making video, audio, images, and language, all you need to do is train and build a model with an appropriate architecture.

It’s easy to forget this is all pretty new stuff and it still costs a lot to make the base models. But the techniques are (more or less) well documented and implementable with open source tools.


> Generative models can certainly create midi, but no one has done it yet.

Note sequence generation from statistical models has a long history, at least as long if not longer than text generation.

Have a look at section 2.1 of this survey paper [0] that cites a paper from 1957 as the first work that applies Markov models to music generation.

And, of course, plenty of follow-up work 6 decades later on GANs, LSTMs, and transformers.

[0]: https://www.researchgate.net/publication/345915209_A_Compreh...


Yes, in fact I think at some point everyone has written their own Markov generators or at least run dissociative press. But we’ve really only seen meaningfully high quality output over the last few years.


I think it depends on how you define that. People were quite happy with HMM-based MIDI generators that could generate Beethoven- or Mozart-like sequences 10, maybe even 15 or 20 years ago. But of course other people pointed out the problems of it being boring eventually. Then LSTMs improved long-term dependencies and people were impressed by the improved quality of generating whole musical pieces. But still others thought it was not good enough. Then the goalposts moved again with transformers and neural vocoders and now we want top-40 direct audio generation. And these latest systems can kind of sort of do it! But still there are people who demand better. And so on, things will continue to improve.

Progress only moves as fast as expectations, and expectations move with technology. Music is not special in this respect. So you could say at any given time in the past that some people "see meaningfully high quality" and others are disappointed. You see exactly both these sides of the spectrum even now with text-to-image and text-to-audio technology.


> cites a paper from 1957

By Fred Brooks no less…

https://en.m.wikipedia.org/wiki/Fred_Brooks


Do you know if anyone has tried training a text-to-music or text-to-midi model where the training data includes things like emotion labels for each note interval or chord progression?


That sounds expensive and inefficient. Peoples' interpretations of music (and abstract art more generally) can be shockingly different; I suspect the model would not get a clear signal from the result.

But that makes me wonder to what extent labeling can be programmed -- extracting chord changes, dynamics changes, tempo, gross timbral characteristics, etc.


And maybe even labels like popularity/play count/etc so it has a better sense of what “sounds good” to certain groups


It has been done - first by OpenAI (MuseNet, which is no longer available) and later by Stanford (Anticipatory Music Transformer): https://nitter.net/jwthickstun/status/1669726326956371971


I believe Spotify's Basic Pitch[0] is already some work towards building something like this.

[0]: https://basicpitch.spotify.com/about


We’ve done it! wavtool.com


That’s really neat. How long have you been working on this?


Thanks! It grew out of an old side project. Been full time on it since December.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: