A similar but different result showcases the contrast between things that models guardrail. HN safety and alignment teams (the community) will reliably flag kill any reference to Somali healthcare fraud in Minnesota. This is real, and prosecutions were pursued by the DoJ under federal administrations of both parties but prevailing safety norms make it undiscussable, even in contexts where it is highly relevant like “why is autism skyrocketing in the US?”
The models, however, will consider this where humans will not. This is likely because this aspect of human safety and alignment is not transmittable via text tokenization. Rather than object to the text, it is silently killed in most contexts. Consequently models find it possible to discuss where humans won’t.
If most such text were accompanied by human excoriation of the view, it would likely be detected as harmful.
> will reliably flag kill any reference to Somali healthcare fraud in Minnesota
Almost certainly because of how these tend to get framed.
The Minnesota situation involves, at this point, a couple dozen bad actors being charged. Most of them are Somali.
Now, we can look at this more than one way, but mostly branching off from two distinct paths:
One - that there is some specific relationship between many of these people that resulted in them sharing information between each other and becoming involved. The people doing the fraud met each other in the same community, so that's the proximal cause for their relationship, but we take no value judgment on the community on the whole or try to extrapolate it beyond that, the same way we would not try to extrapolate a out the actions of the mafia to every Italian person in the country.
Two - we could frame it as some sort of immigration issue and make it seem like these actions reflect on the 80,000 other Somali people in the state and the broader immigration conversation in this country, where we try to superimpose the crimes of the few onto a much larger group where the vast majority had nothing to do with any of this.
One allows for discussion in a reasonable manner without getting politically charged. The other incites quite a lot of discord because it is fundamentally a bad faith argument, meant to bolster a political ideology.
Community is working as intended…. Your premise shows your reasoning flaws, “ Somali healthcare fraud in Minnesota”. When the story is actually about Medicaid providers taking advantage of a vulnerable community.
> The sprawling case has also become politically and culturally fraught, as Somali Americans make up 82 of the 92 defendants charged so far, according to the U.S. Attorney’s Office for Minnesota.
The models, however, will consider this where humans will not. This is likely because this aspect of human safety and alignment is not transmittable via text tokenization. Rather than object to the text, it is silently killed in most contexts. Consequently models find it possible to discuss where humans won’t.
If most such text were accompanied by human excoriation of the view, it would likely be detected as harmful.