Maybe not an appropriate question, but after searching the comments couldn't find someone asking something similar.
If I made my blog GPL licensed would this be a decent way of "protecting" myself? I write programming articles for the sake of it, I'm not trying to turn them into a book to sell just basically collecting my thoughts. I don't even run analytics on it, basically no will to sell this info at all.
Would such an action be a good way to prevent companies from monetizing my writing without complying with GPL?
Their position is this is fair use--that means your license isn't the thing that's allowing them to use it, and thus changing your license won't cause them to stop being allowed to use it. Fair use is the limited ability to use copyrighted works regardless of the creator's license or wishes. It's a powerful right; it's the right to use copyrighted works even when the creator wants to ban you from using it.
Whether they are right that this is fair use is above my pay grade, but if they're right, then GPL won't help at all. You can't stop fair use of your copyrighted works; that's not a technicality, it's the whole point of fair use doctrine. The best you can do is try to show that a particular usage isn't covered by fair use.
I suppose but how is it fair use if they are literally giving you the answer verbatim from the source? I don't mind paraphrasing, where I get upset is someone copying the material and making money off of it; even more so when this is a trillion dollar corporation that can more than easily pay for attribution.
From other comments, it seems like this is far from legally settled.
It may not be fair use at all; I'm not qualified to judge on that point. It's simply the position they are taking in releasing these models. We won't know for sure if they're right until a court rules on it.
If training machine learning models is Fair Use (not totally settled, but seems likely) then they aren't bound by the license of the original content.
Setting up robots.txt is probably your best choice. In the EU commercial text and data mining must respect a machine-readable opt-out. In the US they could legally ignore it, but I reckon they'll follow it anyway to stay on the safe side and avoid having their crawler blocked for misbehaving.
This is what some folks are trying to get the courts to decide regarding GitHub Copilot's skirting of OSS licenses https://githubcopilotlitigation.com.
There aren't really laws/cases that apply to LMs creating derivatives yet. Maybe there will be in the future. Currently, in practice, GPT will digest your blog and incorporate it into itself.
If I made my blog GPL licensed would this be a decent way of "protecting" myself? I write programming articles for the sake of it, I'm not trying to turn them into a book to sell just basically collecting my thoughts. I don't even run analytics on it, basically no will to sell this info at all.
Would such an action be a good way to prevent companies from monetizing my writing without complying with GPL?