Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Detecting voting rings using HyperLogLog counters (2013) (opensourceconnections.com)
57 points by bemmu on Aug 24, 2016 | hide | past | favorite | 12 comments


>Each post gets 2 HLL counters, one positive votes and one negative. When a user upvotes or downvotes a post, their id is submitted to the corresponding HLL counter.

Third, you wouldn't be able to undo your vote.


It shouldn't be an issue given that HLL is approximate and that relatively few votes are undone. You could also debounce the vote before recording it in the counter, at the cost of a small delay.


Nice idea, but it doesn't matter if this doesn't work in practice.


I don't think this would work for small-ish subs, would it? Most upvotes one gets might actually come from the same group of people...


Yeah. This seems as likely to detect cohorts of similarly interested groups of people as it would a voting ring.

This might be useful though, in that things which trend towards honest are actually trending toward general interest.


Maybe you could compare the user's ratio to the median ratio for all users who have posted in that particular sub, or something along those lines.


Ya. I would be curious at the false positive rate for this. For a website with millions of members and few voting rings you would need the false positive rate to be really really low for this to work


Ye gods, what a waste of time. This is what you get when you let reddit brogrammers run away with their own pedestrian imaginations.

  To keep a count of uniques, you have 
  to store every IP address that you 
  ever see. And upon receiving a new IP 
  address, you have to first check that 
  the new IP address has not been run 
  across before, and only then do you 
  increment the site counter. Under the 
  best of situations, the storage and 
  the computation probably scale 
  as O(log(n)). 
Okay, guy. Scamper off back to your SEO click-bait advertising gif banner day job, and don't work on anything important.

The seas are boiling the foundations of the ecosystem in a stew of toxic waste and plastic. Species are going extinct. Governments are waging wars with robots, while starving countries are crushed, bought and sold wholesale, and spy satellites are enumerating all of us, as we walk to seven eleven for another pack of smokes, and here's an article about tracking unique users by... checking their... IP adresses...

What is this? 1998?


It looks like you stopped reading after the second paragraph; this caused you to completely mischaracterize the OP.

In particular, the paragraph you quoted is the second in the OP; in it, the author is summarising the conventional technique not the one they propose.

of course, this is a common device--ie, "here's the conventional way to do X; here's what we propose here"

in other words, to provide the appropriate context, the paragraph you copied above, should be prefaced with "here's the conventional way to do X"

and in fact, the next paragraph, begins like this: "With the HyperLogLog counter it’s all different"


This is all just an over-engineered way of saying:

"If a person's lifetime tally of all unique fans (say: 150 different people, cumulative) consistently matches the typical number of unique fans per thread/article (say: ~135 on any particular post) then those fans are probably employees with vested interests"

(in other words, it's the same people upvoting that guy every time, and they're all in a gang)

You don't need big O notation and hash tables to conflate that idea.

See: https://en.wikipedia.org/wiki/HyperLogLog

And: https://en.wikipedia.org/wiki/Cardinality


There is no info on the accuracy of the HyperLogLog algorithm relative to the data set size. These algorithms are for big data application.

But I would test it if I was working at Reddit or At Hacker news.


Play with the Javascript implementation that the site links to. Even after trying it a couple times, the error bounds swing up to 20%.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: