Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

i haven't read the full study, but its been on my mind for a while.

https://en.wikipedia.org/wiki/Stylometry

The best course of action to combat this correlation/profiling, seems to be usage of a local llm that rewrites the text while keeping meaning untouched.

Ideally built into a browser like Firefox/Brave.



We don't use (much) stylometry, so this won't help. This is totally something you could try, but we use interests and clues. Semantic information you reveal about yourself.

The blog post might be more approachable if you want to get a quick take: https://simonlermen.substack.com/p/large-scale-online-deanon...


Thanks for the providing the details, where I've been just lazy about reading the paper now :))

I'm not a fan of your proposed changes, as they further lock down platforms.

I'd like to see better tools for users to engage with. Maybe if someone is in their Firefox anonymous (or private tab) profile they should be warned when writing about locations, jobs, politics, etc. Even there a small local LLM model would be useful, not foolproof, but an extra layet of checks. Paired with protection about stylometry :D


Mitigations are pretty difficult, I understand it is kind of cool that some websites have really open APIs where you can just read everything. There are some cool apps that used HN data in the past. But I think there should at least be consideration that LLMs are then going to read everything and potentially discover things. Users might have thought this is protected by obscurity, who would read their 5 year old comments?


How helpful would injecting noise and red herring into pseudonymous posts help?

It seems like it would make sense to get in the habit of distort your posts a bit, and do things like make random gender swaps (e.g. s/my husband/my wife), dropping hints that indicate the wrong city (s/I met my friend at Blue Bottle coffee/I met my friend at Coffee Bean), maybe even using an LLM fire off posts indicating false interests (e.g. some total crypto bro thing).


This is probably a good use case for something like OpenClaw. Have it take over your accounts and inject a bunch of non-offensive noise using a variety of personas to pollute their analysis. Meanwhile, you take your real thoughts and opinions underground.


I don't think this is working any more, but there was a stylometic analysis of HN users a few years ago, and it was extremely effective (at least, for myself and people who felt the need to post in the comments): https://news.ycombinator.com/item?id=33755016


There is also a practical issue here that people usually don't write a lot on linkedin, most people just have structured biographical information. We use very limited stylometry in section 6 for matching reddit users who we synthetically split according to time.


L33tsp34k also accomplishes this. The original anonymising hacker stylometry :)

I am intrigued by the idea that in the future, communities might create a merged brand voice that their members choose to speak in via LLMs, to protect individual anonymity.

Maybe only your close friends hear your real voice?

Speaking of which, here's a speculative fiction contest: https://www.protopianprize.com/

Disclaimer: I am an independent researcher with Metagov (one host org), and have been helping them think through some related events.

EDIT: I've belatedly realized that stylometry isn't involved, but I think some of the above "what if" thought could still hold :)


You're absolutely right. It's not just a matter of what you post-- it's a matter of how you post


Was this written by a human?

Sometimes you can just tell something's off. No exclamation mark, double dash instead of an emdash. Human-slop on my HN? This place is becoming more and more like Reddit, I swear!


> The best course of action to combat this correlation/profiling, seems to be usage of a local llm that rewrites the text while keeping meaning untouched.

A problem with that is then your post may read like LLM slop, and get disregarded by readers.

Another reason why LLMs are destruction machines.


[flagged]


I don't really understand the argument your proposing.

Is it impressions in a stylistic sense (flurishes to the language used), which is a what I'm arguing the LLM usage for.

Or is it impression in the subjective sense of what an author would instill through his message. Feelings, imagry, and such.

Or the impression given to the reader? "This person gives me the impression that they know what they talk about", or "don't know what they talk about?"

I don't know which argument your proposing, but I'd like to make an observation of the LLM usage. I don't know what model the perplexity response is based on, but some of them are "eager to please" by default in conversation("you're absolutely right" and all the other memes). If you "preload" it with a contrarian approach (make a brutally honest critique of this comment in reply to this other comment) it will gladly do a 180 https://chatgpt.com/s/t_699f3b13826c8191b701d0cc84923e71


> There are no two ways of expressing something in ways that might create equal impressions.

> Relevant: https://www.perplexity.ai/search/hey-hey-someone-on-hn-wrote...

Did you just use an LLM to write your comment and are citing it as a source?


link doesn't work, it says the thread is private


The link is private.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: