Hacker Newsnew | past | comments | ask | show | jobs | submit | gweinberg's commentslogin

How is "more than 100" people "rallying" even remotely newsworthy? What's the threshold, three?

if you read the article instead of just criticizing the headline:

> They listened to Michigan Attorney General Dana Nessel criticizing the lack of transparency with DTE, the utility that's associated with the Saline Township proposal, and legislators who protested tax breaks for data center projects.

> ...

> "We're talking about 1.4 gigawatts, which is, of course, enough to provide energy to a city of a million people," Nessel said. "I think we should be taking this extremely seriously, don't you? Do you guys trust DTE? Do you trust Open AI? Do we trust Oracle to look out for our best interests here in Michigan?"

this wasn't just a random group of 100 people, they were organized enough to get the state AG as well as multiple state legislators to speak. seems fairly newsworthy to me.


In Lansing, it was below freezing and windy most of the day. If I noticed 100 people standing around on the pavement for hours in that, I'd probably imagine they deserved at least some regard for their concerns. But then, I'm not a Michigan politician that needs to get gamer Johnny out of my basement and on to a cushy non-profit no-show kickback job, courtesy of whatever big tech outfit wants a data center.

Three people could be a group of friends. More than 100 is clearly different.

Given that there are usually _zero_ people rallying in Lansing, this is notable enough for the local newspaper.


It’s not just this group. A co-worker of mine went to his town meeting about a proposed data center. When he showed up it was standing room only and they had to move the meeting to a bigger venue. I’ve heard stories like this from a few people now around Michigan where they have been trying to put data centers. No one wants them.

there are dozens of us !

It’s a movement at 50[1].

[1] A. Guthrie, 1967


It would be noteworthy if 100 people showed up to my 5 year old's piano recital.

not so much for a 300 acre noisy, water hogging data center.


There is very little common space in Michigan. There is a lot of private land, and a lot of public land, but very few spaces where people congregate. So when they do, it stands out quite a bit.

> What's the threshold, three?

The threshold is an organization organizing it. Getting 100 people out demonstrates your political power to your supporters and the people you seek to influence. Getting 1,000 people demonstrates that you have more of it.


Since the population is around 112k-114k people that would be around 111,900 people didn’t rally on the low end.

Lol, I prefer that version of the headline:

"99.9% of residents did not show up to protest new datacenters in Michigan"


You could use that same statistic for literally every protest ever, doesn't mean they're not worth the causes

It wouldn't be newsworthy if we could trust our representatives not give extra weight to the opinions of the people who yell the loudest.

I don't understand why anyone thinks self-reported happiness scores mean anything at all. I don't see how they possibly could. If someone says he's a 10 on his personal scale I guess that means he can't imagine being much happier, but I don't see how that means he's particularly happy.

I read the page on Lindsey's paradox, and it's astonishing bullshit. It's well known that with sufficiently insane priors you can come up with stupid conclusions. The page asserts that a Bayesian would accept as reasonable priors that it's equally likely that the probability of child being born male is precisely 0.5 as it is that it has some other value, and also that if it has some other value that all values in the interval from zero to one are equally likely. But nobody on God's green earth would accept those as reasonable values, least of all a Bayesian. A Bayesian would say there's zero chance of it being precisely 0.5, but it is almost certainly really close to 0.5, just like a normal human being would.

A few points because I actually think Lindley’s paradox is really important and underappreciated.

(1) You can get the same effect with a prior distribution concentrated around a point instead of a point prior. The null hypothesis prior being a point prior is not what causes Lindley’s paradox.

(2) Point priors aren’t intrinsically nonsensical. I suspect that you might accept a point prior for an ESP effect, for example (maybe not—I know one prominent statistician who believes ESP is real).

(3) The prior probability assigned to each of the two models also doesn’t really matter, Lindley’s paradox arises from the marginal likelihoods (which depend on the priors for parameters within each model but not the prior probability of each model).


Are you seriously saying that, because a point distribution may well make sense if the point in question is zero (or 1) other points are plausible also? Srsly?

The nonsense isn't just that they're assuming a point probability, it's that, conditional on that point probability not being true, there's only a 2% chance that theta is .5 += .01. Whereas the actual a priori probability is more like 99.99%.


Srsly? Srsly.

> The nonsense isn't just that they're assuming a point probability, it's that, conditional on that point probability not being true, there's only a 2% chance that theta is .5 += .01. Whereas the actual a priori probability is more like 99.99%.

The birth sex ratio in humans is about 51.5% male and 48.5% female, well outside of your 99.99% interval. That’s embarrassing.

You are extremely overconfident in the ratio because you have a lot of prior information (but not enough, clearly, to justify your extreme overconfidence). In many problems you don’t have that much prior information. Vague priors are often reasonable.


Wikipedia has a section on this that I thought was presented fine.

https://en.wikipedia.org/wiki/Lindley%27s_paradox#The_lack_o...

Indeed, Bayesian approaches need effort to correct bad priors, and indeed the original hypothesis was bad.

That said. First, in defense of the prior, it is infinitely more likely that the probability is exactly 0.5 than it is some individual uniformly chosen number to each side. There are causal mechanisms that can explain exactly even splits. I agree that it's much safer to use simpler priors that can at least approximate any precise simple prior, and will learn any 'close enough' match, but some privileged probability on 0.5 is not crazy, and can even be nice as a reference to help you check the power of your data.

One really should separate out the update part of Bayes from the prior part of Bayes. The data fits differently under a lot of hypotheses. Like, it's good to check expected log odds against actual log odds, but Bayes updates are almost never going to tell you that a hypothesis is "true", because whether your log loss is good is relative to the baselines you're comparing it against. Someone might come up with a prior on the basis that particular ratios are evolutionarily selected for. Someone might come up with a model that predicts births sequentially using a genomics-over-time model and get a loss far better than any of the independent random variable hypotheses. The important part is the log-odds of hypotheses under observations, not the posterior.


Wikipedia is infamously bad at teaching math.

This Veritasium video does a great job at explaining how such skewed priors can easily appear in our current academic system and the paradox in general: https://youtu.be/42QuXLucH3Q?si=c56F7Y3RB5SBeL4m


Yeah, it may seem like a "better" (because stronger) conclusion to the author that if you have more pigeons than pigeonholes you must have more than one pigeon in a hole even if negative or irrational numbers of pigeons. But you're pretty much only invoking the pigeonhole principle in discrete math, where "more than one" means "at least 2".


Yeah, I think that was something that irked me about the "general" formulation: It suddenly brought in an average, i.e. a real, even though the "common" formulations only dealt with integers. This may be more general but made reasoning and understanding harder.


For a fingerprint to be useful it must not only be unique but also persistent. If I have a process that randomly installs and deletes wacky fonts, I'm unique at any given time, but the me of today can't be linked to the me of tomorrow, right?


Point still taken, however you can only really check if a given font is installed, not obtain a list of all fonts. Thus, installing a wacky font is pointless as the fingerprinter won’t bother to check that particular font. There is queryLocalFonts on chrome but this requires a permission popup.


Correct, however:

> By following users over time, as their fingerprints changed, they could guess when a fingerprint was an ‘upgraded’ version of a previously observed browser’s fingerprint, with 99.1% of guesses correct.

https://coveryourtracks.eff.org/static/browser-uniqueness.pd...

https://mullvad.net/en/browser/browser-fingerprinting


>If I have a process that randomly installs and deletes wacky fonts, I'm unique at any given time, but the me of today can't be linked to the me of tomorrow, right?

See: https://xkcd.com/1105/

Services with a large enough fingerprinting database can filter out implausible values and flag you as faking your fingerprint, which is itself fingerprintable.


The problem we’re falling into under this (ostensibly accurate) point is when we start making this a game, where fingerprinting is either “100% effective and insidious”, or “can’t be 100% certain 100% of the time, so it’s ineffective and nobody will use it against me”.

The point is that a sufficiently motivated actor could use a very broad array of tactics, some automated and some manual, to identify, observe, track, and/or locate a target. Maybe they can’t pin you down with your browser fingerprints because you’ve been smart enough to use tools that obfuscate it, but that’s not happening in a vacuum. Correlating one otherwise useless datapoint that happens to persist long enough to tie things together at even low-ish confidence is still a hugely worthwhile sieve with which to filter people out of the possibility pool.

The problem isn’t that it doesn’t affect most average people, or that it it’s terribly imprecise. The problem is that it’s even a little effective, while being nearly impossible to completely avoid. It’s also a problem if that’s used by a malicious state actor against a journalist, to pick a rather obvious example. Because even in isolation, this kind of violation of civil liberties necessarily impacts all of society.

The public should be given more information and control, broadly speaking, for when they are asked to trade their rights for convenience, security, and/or commerce. In particular, I think the United States has allowed bad faith arguments against regulatory actions and basic consumer rights so corporate lobbyists can steamroll any chance of even baseline protections. It would behoove all of us to be more distrustful of companies and moneyed interests, while being more engaged with, and demanding of, our governments.


But they still wouldn't be able to confidently connect his different fingerprints to the same individual, just that he is one of a group of individuals who fake their fingerprints.


It would depend on what your existing fingerprint is. If you're using some sort of rare browser/OS/hardware combination (eg. pale moon/gentoo linux/IBM thinkpad) it might be worth spoofing, but if your configuration is relatively "normie" (eg. firefox/windows/relatively recent intel or amd cpu/igpu)you're probably making yourself stick out more by faking your fingerprint.


The issue is that, especially on desktop, I doubt there are many fingerprints that more than 100 people have, given everything that they test. I would even suspect that most common desktop fingerprints are classified as bots.


> If I have a process that randomly installs and deletes wacky fonts, I'm unique at any given time

Technically for fonts, there’s no API for listing installed fonts, so trackers have to check each font by name. Likely they won’t be checking super obscure font names.

That method might help for other signals though.


It's likely that yes, you will end up with an alias that links you because of a cookie somewhere, or a finger print of the elliptic curve when do do a SSL handshake, or any number of other ways.

The ironic thing is that because of GDPR and CCPA, ad tech companies got really good at "anonymizing" your data. So even if you were to somehow not have an alias linking your various anonymous profiles, you will still end up quickly bucketed into a persona (and multiple audiences) that resemble you quite well. And it's not multiple days of data we're talking about (although it could be), it's minutes and in the case of contextual multi-armed bandits, your persona is likely updates "within" a single page load and you are targeted in ~5ms within the request/response lifecycle of that page load.

The good news is that most data platforms don't keep data around for more than 90 days because then they are automatically compliant with "right to be forgotten" without having to service requests for removal of personal data.


You seem to be confused about the difference between "less" and "more". In general a yes-no question gives less than 1 bit of information if yes and no are not equally likely. There is no way it can be expected to give more.


> There is no way it can be expected to give more.

It is indeed not possible for it to give more, because it only has a single bit answer, which by the pigeonhole principle can't give you more than one bit.

The best yes/no questions are the ones which are independent of each other and bisect the group evenly. "Are you female" is typically good because it will be approximately half the population. Then you want independent questions that bisect the population again, like "does your first name have more than the median number of letters" which should be mostly independent of the first question. Another good one is conditional questions like "are you taller than the median for your sex" since a pure height question wouldn't be independent of sex but that one is.

Whereas bad questions would be ones with highly disproportionate responses, like "do you have pink hair with black and green highlights" which might be true for someone somewhere but is going to have >99% of people answering no, or "were you born on the planet Mercury" which will be 100% no and provide zero bits of information.


The commutation problem has nothing to do with matrices. Rotations in space do not commute, and that will be the case whether you represent them as matrices or in some other way.


Right, just monkeys, and they only had 200 study monkeys in the first place. Pretty much a bacon double nothingingburger.


200 is a lot of monkeys



Well, I had to write it down, but I have to write down everything these days. But from the way the problem was phrased, it was obvious you don;t have to actually find to numbers.


They're prone to genetic issues because they're inbred.


So are some groups of human. It would be beneficial to be able to reverse that.


Yes and I don't see how that changes anything I said?


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: