Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

You joke, but even those videos of people saying their dog can talk is just like this. It's cute because it's a real dog making sounds we really want to believe when it's just them mimicking sounds because they get pettin's and treats.

What I want is "AI" to do something impressive. Why are we trying to make the system generate the sounds itself? We don't make artists do that, we give them instruments. Give the models actual instruments, and then have it play them like a real artist. I will be much more impressed with an AI that understands composition and scoring, use of musical voices, key signatures. That would still be generative. I guess I just don't understand the point of the direction being taken. It's like a solution looking for a problem.



We work with what we have. We don't have a lot of recordings of the physical movements of musicians; we have recordings.

Similarly, we don't have recordings of the actions of painters; we have finished paintings -- but if you're not impressed with what AI can do in the visual sphere, your standards are, to put it mildly, high.


I'm not really sure how to take this. We absolutely have recordings of instruments. You can buy them as complete sets. You train on complete recordings, and then tell it how to use the sampled instruments to compose a song in the style of the trained data. Building something to make a waveform that looks like another waveform just seems like a very odd direction to take.

Yes, my standards are if it isn't at least as good as what's available now, what's the point.


Well, we found with Midjourney et al that these models can work very well despite having no pre-conceived or symbolic notions of composition, color theory, perspective or anything. Yet they can produce really good results in the image generation space. It's the same idea here, except much earlier days.

In the same way, many successful musicians can't read sheet music or know music theory, they just know how to produce something that sounds good.


>In the same way, many successful musicians can't read sheet music or know music theory, they just know how to produce something that sounds good.

Right, because they can operate the instruments that make the sound with natural talent, but they don't have to draw the waveforms. Audio generation is much different than image generation. It's just very odd to me.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: