Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I can somewhat agree with this if we were discussing forum or comment section moderation. However in this case, due to the nature of the model which finds correlations between anything and anything else in ways that a human could never, modifying or censoring inputs and outputs prevents me from trusting the model like I should. If I'm digging deep into geopolitical issues from say an anarchist perspective, I don't want my results to be tampered with, despite their likelihood of violating the sensibilities of some archetypical silicon valley type. If I can't trust the model in one place, I can't trust it in any.

A sincere question: why should the results be mapped against statistics representing actual violence against certain groups? Why can we not consider all groups equal?



I agree the topic is complex and fraught. I have opinions but they're not strongly held or informed by debate, and unfortunately I don't think even a generally well-moderated forum like HN is the best place to have that debate (plus I don't have time).

However, in this case I wasn't trying to argue how moderation should work, I'm trying to examine a hypothesis on how it does work. The additional mapping of statistical crime data I mentioned would help measure whether that hypothesis is correct.

As I said in a sibling comment, my wording was imprecise. When I said "it makes sense to use crime stats to weight moderation strength" I really meant "If OpenAI was trying to avoid additional harm to already targeted groups then to verify that it would make sense to..." So it was the results of the post's research that I suggested should be mapped against statistics, not the results of ChatGPT's output.

In answer to your sincere question though, and at the risk of going down a rabbit hole, I'll say this. Censorship is bad, but I can see why some may be required (see "yelling fire in a movie theater," libel, direct threats of violence, etc.). The question then becomes, how do you minimize censorship while also attempting to avoid direct harm?

In short, asymmetric filtering could potentially limit the amount of censorship by focusing it on groups that are actively attacking other groups.


IMHO, neural-net AIs are sensors; they're great at surfacing things that are going on in the training data. I would speculate that ChatGPT is labeling these things are differently harmful because they are differently harmful. It's a sensor, like a thermometer; albeit a very complex one. (And, like any sensor, has nuances from implementation and metric-definition: mercury thermometers & barometric pressure, wet bulb / dry bulb, etc)

> Why can we not consider all groups equal?

Because they aren't. (If that's a thing you find debatable, LMK!) Attempting to consider them all equal runs into problems just like you'd run into problems trying to consider all pumps at the gas station (including diesel) equal; even if, for sake of the metaphor, all the prices were equal.

> why should the results be mapped against statistics

I'm not sure that's answerable outside of a specific context. Personally, I think we should because it's fascinating and, I would expect, a really informative way to explore in more detail. Culturally, it's because you get better communities when you do things like give the high-accessibility seat on the train to the person with broken leg; aka, go out of your way to treat harmed people with more care.

If your question is about the inverse, something like "why should language that's harmful against one group by OK when used against a group that doesn't experience it harmfully", I dunno what to say. I feel like the question kinda answers itself; sort of a "ain't broke don't fix".

Or maybe your question is more: "why does this disagree with [me] about what's harmful to [me]", I dunno. Maybe the training data didn't include (enough) for the AI-as-sensor to detect that, maybe that data doesn't exist, maybe (and very cynically) that data doesn't actually point to that "conclusion" when run through the AI. Kinda like how the first time many people experience delayed-onset-muscle-soreness they think it's "I'm hurt"-pain.


This makes sense but how do you square this with the fact that this article is about how it is simply another model making these decisions about whats an acceptable input? Like, if it is a matter of trust, its in principle the same kind of trust your putting into the model to begin with.

In that, I don't get how I could ever "trust" one of these things to tell me any kind of truth. Even if its totally "uncensored" its just reading webpages, but at least with webpages I can see some citation. Like even Wikipedia, it can really just tell me something that I will have to verify elsewhere.

How could one trust something that gives different answers to the same question if you ask it enough?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: