Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> But as the various regulatory and judicial and legislative processes grind through different parts of the modern intellectual property issue made so abundantly legible by the modern AI training data gold rush it seems ever more clear that one way or another, we’re going to get a new social contract on IP.

what do you mean by that? as far i'm aware ANYTHING that you publish despite being on the internet or not, if there isn't a copyright notice, you should assume -> "all rights reserved"



Yes, that's correct. Under the Berne Convention all copyright for a work and any derivatives is held with the author, unless the explicitly disclaim it or another legal provision applies (eg fair use for teaching or parody).

However, does an LLM count as a derivative work or a transformative one? That's something for the lawyers to answer.


>> derivative work or a transformative one?

This isn't solved even for humans. There are trials that clear misunderstandings about fair use. (Every developer here has heard of this one: https://en.m.wikipedia.org/wiki/Google_LLC_v._Oracle_America....)

Artificial Intelligence currently has no concept of responsibility (not legal, not ethical), and it will never have existential threats derived from law. The only way that I can think of, as of right now, is that every single product touched by AI must have a human who is legally responsible for it.


This has an easy answer — it’s just not the one that people who desperately want to use LLMs for copyright washing want to hear.

The test for what constitutes a derivative work has not changed; it’s the same whether a single human author produced something, or a team of humans, or an LLM. It will be up to a court to decide whether a work is similar enough to be considered derivative.

If an LLM spits out a verbatim copy, that’s obviously infringement. But if the LLM spits out something similar? Well if the LLM spits out something like George Harrison’s My Sweet Lord [1], a court may well decide that it’s derivative of He’s So Fine. Especially if the LLM “subconsciously” “knew” about He’s So Fine because it was part of the training corpus.

[1] https://en.wikipedia.org/wiki/My_Sweet_Lord#Copyright_infrin...


what are the opinions on [0]? what's the scene for language rather than image?

also what are the opinions on turning generative AI (that doesn't ask permission to creators) public domain? donation money that surpasses the cost of hosting the work to people/groups "creating" with AI, should be a violation of the license? are you allowed to play with the models in hardware made by for-profit entities, like Nvidia?

[0] https://arxiv.org/abs/2212.03860


obligatory IANAL, but seeing LLMs:

- regurgitate entire passages word for word, until that behavior is publicized and quickly RLHF'd away

- rip github repos almost entirely (some new Sonnet 3.5 demos Anthropic employees were bragging about on Twitter were basically 1:1 to a person's public repo)

It seems clear to me that not only can copyrighted work be retained and returned in near entirety by the architectures that undergird current frontier models, but the engineers working on these models will readily confuse a model regurgitating work to be "creating novel work".




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: