Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
login
matus-pikuliak
3 months ago
|
parent
|
context
|
favorite
| on:
Claude for Chrome
That is absolutely not a reliable defense. Attackers can break these defenses. Some attacks are semantically meaningless, but they can nudge the model to produce harmful outputs. I wrote a blog about this:
https://opensamizdat.com/posts/compromised_llms
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search:
https://opensamizdat.com/posts/compromised_llms