Hacker Newsnew | past | comments | ask | show | jobs | submit | EbTech's commentslogin

There's a link to the repo with our own rating system package. Maybe I should have shown that directly? The intention was to specifically draw attention to this finding.


You could just take "Show HN" out of the title. (Email the mods if it's too late to edit.)


The reference by Rathmanner & Hutter presents a useful analogy. It argues that Kolmogorov complexity (and Solomonoff induction) are best viewed as a conceptual gold standard, like a perfect chess computer that does an exhaustive tree search. Practical methods are approximations.

There are a few results where researchers were able to automatically infer evolutionary trees and such, by using a standard compression algorithm in place of K(x).


Fair enough! I'd like to point out that the Kolmogorov complexity approach can make sense of subjective probability too. Since you lack precise enough information to predict the dice roll, the most compressed way to write down your observations will involve a Shannon-style code with your subjective probabilities. If you have enough information but not enough computational power, the resource-bounded variants of Kolmogorov complexity may be more applicable.


Can you please define Shannon-style code? (Obviously, you don't have the shannon code, the compression method, in mind).

Also, isn't Kolmogorov complexity uncomputable and you run into multiple "who shaves the barber" issues, when trying to determine it?


The compression code can be specified first. If you have a lot of data, the specification will be negligible in length, compared to the code itself. Together, the specification and the Shannon code give an upper bound on the Kolmogorov compmlexity. If this is the shortest known program, we may consider the Shannon code probabilities as our "best explanation" of the data.

You can also get posterior probabilities using a universal prior such as 2^-K(x), but of course, this can only be approximated in the limit of infinite runtime.


Since this topic isn't so well-known, I wrote the case arguing that frequentist interpretations don't work, but algorithmic information theory (Kolmogorov complexity) does. I want to make this accessible and persuasive, so thoughts, questions, and arguments would be appreciated!


A simple sentence that I've found useful for pedagogy: "the probability of that coin toss being 50% does not talk about the coin; it talks about you, and about your partial knowledge of the universe."

You can add: "The coin toss itself is deterministic and the result can be computed if you know the initial position and speed." They will inevitably bother you about the physical impossibility to measure the starting position and speed exactly, and then you say "ok, forget about the coin. You have 5 white and 5 black balls inside this opaque cylinder. What's the probability that the top ball is white? This does not talk about the balls (the color of the top one is already determined) but about your partial knowledge of them".

(EDIT: formatting)


> the probability of that coin toss being 50% does not talk about the coin; it talks about you, and about your partial knowledge of the universe.

But it does talk about the coin - a weighted coin would have a different probability. Same in the example with the white/black balls - if they weren't 5 white and 5 black but 6 white and 4 black, the probability you would assign to the top one would be different. Again, the probability is a way to describe the balls themselves, not just our knowledge.

I get the general idea of representing probability as uncertainty and partial knowledge but your statements strike me as just straight up incorrect.


Thought experiment: you're presented a jar, and offered the chance to bet ten bucks on drawing a white ball. You know nothing about the content of the jar. What odds would you take?

Now, you see ten white balls get added, then are blinded while other balls are added (maybe). You estimate the jar can't hold more than about a hundred balls. What odds would you take?

Now, you see ten white and ten black get put in, and saw it was empty before. What odds?

Now, you see ten white and fifty black, but the whites are larger, and you get to draw a ball. What odds?

The difference between the second-to-last and the last is the missing information we usually think of when we talk about randomness being missing information.

And you'll see that the previous scenarios don't change anything about that.


This is still a partial knowledge situation - you have the information that there are a certain proportion of colored balls in the chamber, but not the information about their order. The probability includes the information we do know and allows inferences about information we don’t know.


Sure, I just can't agree with the parent statement that probability has nothing to do with the object it describes, which is demonstrably false.


It has to do with your knowledge of the object. To that extent, it has something to do with the object it describes.


They didn't say that it has "nothing to do with the object it describes" though. It has to do with your knowledge, which itself has to do with the object your knowledge describes


"The probability of that coin toss being 50% does not talk about the coin." I think there is a subtle equivocation going on here.


Well, it's getting philosophical now. We do not have a way to experience the true nature of the coin. We can only experience an "image" [0] of the coin and we summarize all "images" of that object into knowledge.

[0] by image I do include your vision, but also hearing, feeling and other methods of perception.


I do not think there is anything very philosophical here, just the point that the claim in the post I was replying to was demonstrably false.

I am intrigued by the idea of the coin having a "true nature" that we have no way to experience; I would like to know what this elusive "true nature" is, but if we cannot experience it, I don't suppose you can tell me. Instead, I will settle for an explanation of how you know it has such a true nature.


I don't think it's demonstrably false: If you don't know that the coin is weighted, the probability is 50%. Probabilities are predictions and estimates, not fundamentally about the thing itself, but about what we know about the thing.


The principle of indifference? I know that it is a commonplace assumption, but feels to me as though one is assuming one has more information than is justified. Coming back to the article's "economist's wager", is it rational to bet with even odds on something you know nothing about? If the assumption is interpreted as a testable hypothesis about outcomes, why would complete ignorance imply any particular result? On the other hand, if it is interpreted strictly as a statement about one's knowledge, why present it exactly as if one had sufficient knowledge of the situation to know that the probability is 0.5? Maybe the author will have an answer in part 2.


> If you don't know that the coin is weighted, the probability is 50%.

No, if it's weighted it's not 50%. Your prior probability is 50%, but neither a Bayesian nor a frequentist would claim the true probability is known before testing.


There is no such thing as "true probability" in Bayesian interpretation, it only exists in frequentist world.

Notice that it is possible to build a robot which flips a coin in such a way that it's always heads - sure you might need to build a different robot if the coin is "biased"(you probably mean its weight distribution is uneven) but it's still possible.


But that’s the thing: the “true“ probability is unknowable, and may even be an ill-defined concept. It is a deterministic process, so “probability“ is just a simplifying concept to describe our best guess belief about how the coin behaves in the aggregate.


The true probability requires an infinite sequence of tests, so it's by definition unknowable. But it's what any sort of statistics attempts to approximate.


But, once the approximates are within undetectable differences from the "true probability" then you are done.

Because not only is the true probability unknownable, it is also unencodeable, but, if we are to accept a limitation on encoding, then we CAN give a true probability subject to that.

Like.... if we are to determine a coins probability to 1 decimal place, then we can do that.


> the true probability

Interesting expression.

After testing, it turned out that you flipped it and it landed on heads.

Does that mean that you've discovered that the "true probability" for that flip should have been 100% heads?


Of course not. If you're a frequentist you can say your best estimate is 100% heads with an unknown variance, and if you're a Bayesian you work out p(a|b) = p(b)p(b|a)/p(a) and update your priors (which will not give 100% heads). The more coins you flip, the better you can estimate the true probability


IMO (I should have defined this) the true probability would require an infinite sequence of tests to determine.


Ok, so let's imagine you build a servo-driven flipping machine to carry out an infinite series of tests and notice, after a thousand of them, that 99% of the time, the coin flip matches the orientation of the coin when it's loaded into the machine.

What have you learn about the coin's true flip probability?


You've learned that the system of coin + machine has resulted in the same orientation 99% of the time. You can put some error bars on that, investigate the differences (did that 1% where it changed happen disproportionately with a certain side of the coin up?) and from that provide an estimate for whether the coin is fair. If the confidence intervals aren't small enough for you, you can do more experiments. The confidence interval will never be 0, until you've either done an infinite sequence of trials. (Only axiomatic logic can have confidence interval 0, and it doesn't make statements about the real world, only about the axiomatic system in use.)


So, let's say that we continued the servo tests 1e99 times, with the coin loaded in each orientation equally. We measured 50.00% flips for heads and tails, and continue to see the 0.99 correlation with the initial orientation. The 1% of the time that the correlation doesn't match, it doesn't seem to show any bias for one side or the other.

So after an "infinite" number of tests, we continue to get 50.00% frequency of heads, but with an 0.99 correlation with the orientation when loaded into the machine.

Now I load a coin into the machine and ask you to name the true probability that the result is heads. I don't tell you the initial orientation, but I know it privately.

What's the true probability of heads? Our testing found precisely 50.00% frequency of heads. But are you still sure the probability is an intrinsic property of the system, rather than a property of your state of knowledge of the system?

We can continue the pattern; maybe the 1% error itself correlates to 0.99 with someone running the microwave in the kitchen. This drops the line voltage and causes the servo to impart a little less momentum to the coin, causing it to flip one fewer times on average. Neither of us have currently checked that the microwave is running... And so on...


But what do the error bars themselves mean? Are they not probabilistic in nature themselves?

Say you conduct a thousands trials and calculate the error bar based on the results. If you conduct a hundred such experiments (each consisting of a thousand trials) and one of the experiments violates the error bar, does that invalidate it?


>What have you learn about the coin's true flip probability?

Nothing because you only tested the coin flip machine in aggregate. If you have a different throwing mechanism the results could be completely different.


When I flip it with my hand, aren't I testing the coin-hand system in aggregate?

If nothing is flipping the coin, then what coin flips are we making predictions about?


But what if the apparent probability after n trials does not converge as n grows arbitrarily large?

If we observed such a system in nature, what would its "probability" mean?


I like the balls in container better - the mechanics of determinism is more obvious - and the contrast between perfect and partial knowledge.

And there's another interesting point - you could view "there's five white and five black balls" as your model. If in reality, there's one white ball and nine black - then your math is still right, but your model is wrong.

If you do experiments with the wrong model (assuming 50/50, getting samples from 1/10) - your best conclusion would be the model is wrong.

But for many settings, you'd end up declaring your container or the hand used to pull out balls has magical powers. (and to borrow from Douglas Adams go on to prove that black is white, and get killed in the next zebra crossing).


> But it does talk about the coin - a weighted coin would have a different probability.

Nobody would a priori assume it is a weighted coin, in this example. A coin is chosen because it's been a standard weight and measure-backed object for centuries. It is about the observer's knowledge, which includes assumptions from every day experience.

You have to base a prior on assumptions. If you assume nothing and flip it 1000x, and calculate the probability, than you base that on assumptions of your own flipping ability, and hand wave it away with count divided by trials.


>Again, the probability is a way to describe the balls themselves, not just our knowledge.

Probability is about the fact that you don't know everything about the balls themselves. If you could make 100% predictions then your knowledge of the balls would be equivalent to the balls' description.

Why do you even use "describe the balls themselves" to describe this situation? From the perspective of probability you just set an upper bound to how much knowledge you could possibly have about an object, it's still knowledge.


I think this is just semantics; whether you read the parent comment as describing a hypothetical and fair coin with 50% chance for each state, versus saying that a god-like entity could reduce this to a deterministic computation via replaying the coin flip exactly (eg. same ambient air conditions, position of coin on finger, exact sequence of muscle nerve activations..) and that because that's impossible for you to do you just assign the minimum-information-content-probability to the coin flip (50%)


Forget the coin and the balls — does the nucleus decay? You're not missing any knowledge; there isn't any.


This is definitely the most interesting example, but it's not obvious that a situation where the relevant information is fundamentally inaccessible is a situation where you aren't missing any information.

It's your best bet for a scenario where you can be sure that nobody else has more information than you do, though.


It may be deterministic, but are you sure it would be computable? One does not necessarily imply the other.


As a caveat, while this intuition works for classical mechanics, it does not work for quantum mechanics. All observations are consistent with wave function collapse being fundamentally random. Any hidden variables would need to be transmitted many times faster than the speed of light (~10000x, last time I checked the experiments), and are therefore inconsistent with our understanding of special relativity.


Note that pilot wave theory, an (out of vogue) interpretation of quantum mechanics, also recasts the apparent randomness in quantum mechanics as due to our ignorance of the exact state of the pilot wave.

Even Einstein struggled with quantum mechanics, famously saying "[God] does not play dice with the universe".


Wigner's friend might have something to say about this...

I don't have a specific argument to make here, only the feeling that if it were all just a matter of what a given observer knows, no-one would be talking about there being a QM measurement problem.


This is a very good point!

In fact, some people do argue that there is no measurement problem in the Copenhagen formulation of quantum mechanics to begin with – at least if you take it seriously and strictly go by the rule that the laws laid down by Bohr et al. only concern you as the observer and your knowledge about the system, and not the system itself. Following this train of thought, there is nothing "real" about the wavefunction and it is just a tool to come up with predictions. The same goes for the collapse of the wave function (which just describes a change in your ability to predict future measurements, and not a change of the object) and the term "measurement" (which we might as well replace with "enlightenment", i.e. the moment in which we obtain knowledge about the system).

In that sense, the only difference between classical and quantum mechanics is that our knowledge (viewed as a mathematical quantity) behaves differently in both theories: In classical physics, when we conduct multiple measurements of a given system in a row, our knowledge about that system will increase – to the point that, once we have measured all system properties to sufficient accuracy, we'll able to predict what any future measurement of any of those properties will yield (again, with some predictable uncertainty). So the knowledge of all our measurements has added up, it is an additive quantity.

In QM, this is fundamentally different: We can only know anything about the object the very moment we look at it. The rules of quantum mechanics (again, in the very strict interpretation laid out above) dictate that the second we conduct a measurement, we can forget about any knowledge obtained through previous measurements of other (conjugate) observables: Future measurements of those observables are inherently unpredictable. In that sense, our knowledge about quantum-mechanical objects never "adds up" to anything. (To see that this is really the the distinguishing feature between classical and quantum mechanics, recall that the existence of conjugate observables really is the only thing setting apart the quantum from the classical world: Without conjugate observables it would be impossible to distinguish, say, 100 electrons in a superposition of spin up and down from an ensemble of 100 electrons of which 50 are in a spin up state and the other 50 are in a spin down state.)

Of course, this whole interpretation is very unsatisfactory to lots of people (myself included) for a whole bunch of reasons. I assume that, to a large degree, this is due to the fact that laws of nature that put human observers in their very center seem rather undesirable. (At least since the time we switched from a geocentric to a heliocentric view of the world.)

But my impression is that there's another reason: Our intuition from classical mechanics & statistics has taught us that objects exist independently of us as observers and behave in a deterministic fashion, at least provided we as observers know enough about them. (Meaning that the more we know about the coin's initial position and velocity, the more likely we are to predict the outcome of the coin toss. If we don't know anything about the coin, though, the outcome is as unpredictable as measuring spin up/down in quantum mechanics.) Unfortunately, this whole line of argument is circular: The reason we believe that the existence of physical objects is independent of us, is precisely because knowledge in classical mechanics is an additive quantity and we can get to the point where we know "enough" to come up with deterministic predictions. That is, we never have to discard knowledge when running new measurements and so our knowledge takes on a independent "role" – which we call reality.


This is basically the Bayesian interpretation of probability.


Saying that "the probability of a coin toss of 50% talks about you" is not an interpretation of probability. Saying that we are "50% sure" is also not an interpretation of probability. It's a nonsensical statement. It's like saying we are "50% angry". It doesn't really mean anything.


I don't understand your claims that these statements are are meaningless. They are commonly uttered and understood.


I can understand expressions such as "pretty sure" or "completely sure". I do not understand the expression "to be X% sure". If someone says they're "37% sure" tomorrow will rain, what does that mean exactly?


Can you understand expressions like "more sure of A than B" or "as sure of A as of B"? Then, they are as sure that tomorrow will rain as they are sure that throwing three dice the sum will be 9 or less (37.5%).


It's clear that 37.5% sure is 0.5% more sure than 37% sure. The problem remains how to interpret these numbers.


You're asking about the interpretation of a statement such as "I assign the same probability to events A and B"?

That would mean that both are equally likely as far as that person knows.


No, I'm not asking that.


> If someone says they're "37% sure" tomorrow will rain, what does that mean exactly?

When someone says they're "37% sure" tomorrow will rain they mean that they assign the same probability to "tomorrow will rain" that they do to "if you throw three dice you'll get 9 or less" or "when you threw three dice you got 9 or less". In the second case the event is either true or false already and there is no uncertainty for you, their probability assignment is their best guess with the information they have.


The question is what is probability according to the subjective interpretation of probability? The answer usually given is that probability is a degree of belief. Thus, a probability of 37% means that you're 37% sure that some event will take place. What I'm saying is this definition is meaningless unless you define what it means to be X% sure about something, but the definition of "being X% sure" must not rely on the notion of probability because "probability" is what we are trying to define in the first place!


And what I try to explain is that one way to define what it means to be X% sure about A is to say that

- you put a number on it p(A)

- which is between 0 and 1

- and allows you to compare how sure you are about different things p(A) and p(B)

This number can be used to compute how sure you are about composite things:

p(A or B) = p(A) + p(B) - p(A and B)

p(A and B) = p(A given B) p(B) = p(B given A) p(A)

That number p happens to correspond to the notion of probability, but it has not been defined using a pre-existing notion of probability: https://en.wikipedia.org/wiki/Cox%27s_theorem


What do you mean "you put a number on it"? What number? If the number is arbitrary, which is what your explanation suggests, it cannot mean anything.


It's not completely arbitrary, it represents the degree of plausability you assign to the event.

These numbers have to obey some rules if you require that a set of beliefs is consistent.

The number you assign to the plausability of A and the number you assign to the plausability of not-A have to sum 1.

If you think A and B are equally plausible, you have to put the same number on them.

If you think that A and not-A are equally plausible, you have to assign the number 0.5 to both.

If you put the number p(head)=p(tails)=0.5 as your degree of plausability that the coin I just flipped (I actually did it!) is showing head or tails it's not an "arbitrary" number. It means that you think both (exhaustive) outcomes are equally plausible. Why do you say it cannot mean anything?


It seems to me that if a probability is a quantity representing a degree of belief, and it's only meaningful in relation to another quantity representing another degree of belief, in the sense that we can only say than being X% sure is being more sure than being Y% sure, if X > Y, or equally sure, if X = Y, then such quantity only has an ordinal value, which is to say the quantity itself is meaningless. For it to be meaningful it has to have an interpretation that does not always refer us to another degree of belief. It also must not refer to a "degree of plausibility" since this is just another expression for "probability", and probability is what we are trying to define.


I'm not sure I understand where do you see a problem.

In the example of the degree of belief (between 0 and 1) that you have that the coin on my desk is showing one face or the other, don't you agree that the right numbers that represent your indifference are 0.5 and 0.5? The quantity itself is not meaningless.


You say indifference, but indifference with regards to what? In economics, an individual is said to be indifferent between two alternatives if those alternatives result in the same level of utility for him or her, utility being an abstract concept representing well-being. But you don't say why this person is indifferent to the coin showing one face or the other. Because if it's because he or she thinks both options are equally likely, then we again have a problem, since we don't know what "likely" means.


At what point does the following argument derail for you?

0) I tossed a coin, it lies flat on my desk

1) You have some degree of belief about the statements H:“the coin shows heads” and T:“the coin shows tails”

2) You want to quantify that degree of belief

3) You postulate that you can put a number on your degree of belief about some statement A with the following properties:

3a) p(A) it is between 0 (false) and 1 (true)

3b) p(A or B) = p(A) + p(B) - p(A and B)

3c) p(A and B) = p(A given B) p(B) = p(B given A) p(A)

4) p(H) + p(T) = 1

5) Unless your degree of belief about H is higher than your degree of belief about T

or your degree of belief about T is higher than your degree of belief about H ...

6) ... it follows that p(H) = p(T) = 0.5


(in reply to @kgwgk's comment https://news.ycombinator.com/item?id=25313531)

The argument is flawless, the problem is with the interpretation.

> p(A or B) = p(A) + p(B) - p(A and B)

How does one add degrees of belief and what sense do we make out of the result?

> p(H) = p(T) = 0.5

Sure, two equal quantities representing degrees of belief must mean the degrees of belief are of the same magnitude. But what about P(H) = 2P(T)? What does it mean for one degree of belief to be twice as large as the other?


>> p(A or B) = p(A) + p(B) - p(A and B)

> How does one add degrees of belief and what sense do we make out of the result?

That's how we postulate [1] that the numeric representations of degrees of belief are added. Doesn't that look like a property that you want a numeric representation of degrees of belief to have?

If you have some degree of belief about A, some degree of belief about B, and you believe that A and B are mutually exclusive, wouldn't you want the number representing the degree of belief of "any of them" p(A or B) to be the sum p(A)+p(B)?

>> p(H) = p(T) = 0.5

> Sure, two equal quantities representing degrees of belief must mean the degrees of belief are of the same magnitude. But what about P(H) = 2P(T)? What does it mean for one degree of belief to be twice as large as the other?

Consider p(H or T) = p(H) + p(T) = 2 p(H) = 2 p(T). Isn't it natural to quantify the degree of belief that I got any outcome with a number that is the sum of the numeric representations of the degrees of belief that I got each outcome?

Or say that, instead of flipping a coin, I toss two of them. They're lying flat on my desk right now. The number of heads up is 0, 1, or 2.

How would you describe your degree of belief about the statements "X=0: there are no heads", "X=1: there is one" and "X=2: there are two"?

Wouldn't you say that your degree of belief about "X=1" is of the same magnitude as your degree of belief about "X=0 or X=2"?

Wouldn't you say that your degree of belief about "X=0" is of the same magnitude as your degree of belief about "X=2"?

Wouldn't that make the numerical representation of your degree of belief about "X=1" twice as large as the numerical representations of your degrees of belief about each of "X=0" and "X=2"? (Where you assign numbers to degrees of belief using the representation we're discussing.)

p(X=1) = p(X=0) + p(X=2) = 2 p(X=0) = 2 p(X=2)

[1] in fact I think this is what we get from postulates which are a bit more general, but for the sake of the discussion we may stay in this level


Addition is required by the axioms of probability, not by the interpretation of it. When interpreting probability as a degree of belief this property is not only not useful, but is particularly troublesome, because adding and subtracting degrees of beliefs doesn't appear to make a lot of sense.

All in all, to me it's clear that these degrees of belief are a theoretical construct, not an empirical reality. I don't think people assess the truth value of a statement on a continuum from truth to false. This is not how the human psyche works. Personally, no, it's not natural for me to have a degree of belief (in the way that you have defined them) about a statement, and I have no idea how to interpret arithmetic operations involving these "things".


37% of the time that someone says they are 37% sure of a statement X the statement X is true (assuming they're calibrated correctly/etc).


Of course. But if you pronounce a fancy word like "bayesian" there's a large amount of minds that shut irremediably.


That's also why we call it QBism [1] instead of quantum bayesianism

[1] https://en.wikipedia.org/wiki/Quantum_Bayesianism


There seems to be a lot of quibbling about the simple sentence here but I find it clarifying. Discussing coin tosses is a thought experiment with a very practical physical analog so spelling out clearly what the thought experiment's actual subject is has valuable properties so you don't get lost in the weeds of the physical execution of flipping coins.


How does the ball situation help? It has the same problems of physical impossibility of measuring however the balls were ordered. (Modelling someone's brain?) I guess the argument works on someone with an unscientific model of the human brain, but that's one step forward and two steps back.


Somebody just put the balls there carefully, and did not tell you in what order, just how many of each color.


That still boils down to a lack of prior information which I don't think removes the argument for "I don't have enough information."

Probably have to use actual quantum phenomena that behave probabilistically by definition if you want a currently irrefutable physical example.

I'm personally not convinced even this is fundamentally probabilistic and we currently have to rely on probability theory as a crutch for complex behaviors we just quite don't understand yet or don't have the time and resources to compute.


> "I don't have enough information."

My point exactly. Probability theory is a precise mathematical formalization of the concept of "not enough information".


You don't need quantum physics to formulate a philosophical standpoint that happens to agree with the Kopenhagen Interpretation.

I'm not sure if it helps, but I suppose a compromise here would be the assumption that you don't really know the starting configuration of yourself, why you draw probabilistic inferences naturally, that the sun will go up tomorrow like every day.

If that has a biologic explanation, then the top comment was not just to the illusive argument of platonic ideals.


I would have liked to know what the Kolmogorov approach has to say about the examples used to deflate frequentism ("which of my friends will start a business?"). I don't see from the article how the "smallest-program" approach could say anything useful about those, either. Maybe that wasn't the point--but after poking holes in the frequentist view, it uses unrelated examples like digits of pi to illustrate the Kolmogorov idea, so I'm left unable to directly compare the kinds of statements the two approaches can make.

Even going back to dice or coins would have helped me compare them. Like, I know frequentists can show how the variance in coin-toss outcomes decreases as the sample size increases. What can the smallest-program approach say about that? Or was the point that those variant outcomes aren't "real" enough to talk about? Does that mean there is a connection to constructivism in mathematics here?

It seems either approach benefits from more data, and there must be a concept related to a "confidence interval" where, as 100, then 200, then 300 digits of pi roll in, your pi-program stays the same size while other programs have to keep growing to accommodate the new data. Like, the ratio of the smallest program to the naive encoding ought to say something about how potentially predictive the small program is.

Thanks for the interesting article. It definitely made me think about the issues, and now I'm curious to know more about the topic.


One aspect of Kolmogorov approach is that implicitly models things like biased probabilities through compressibility.

The shortest representation of rolls of fair dice or coins is their exact results, but if there's "less randomness" in some way (biased coin/die, sum of two dice which means non-uniform probabilities, combination of some predictable pattern with random noise) then there are more compact representations of that information, and all of that gets captured by the Kolmogorov approach without any explicit handling of the various possibilities.


Thanks! I hope to better address your concerns in Part 2.


OK - so apply Kolmogorov complexity to election polling.

How does that work out?

I think you're confusing various possible maps with the territory in a less than useful way.

Given that frequentist interpretations are approximations - and understood as such - and Kolmogorov complexity isn't computable at all, what problem have you solved here?


Hm I admit it's hard to talk convincingly about election prediction, since we don't have practical algorithms to do this; a lot of it comes down to human judgment.

The philosophical point (which might be approximated algorithmically someday, or by intelligent minds today) is that your election probabilities should come out of an overall highly compressed model of the world. In theory, a Bayesian who uses the prior 2^-K(x) over all strings x should, with sufficient life experience, come up with good estimates, in a certain sense.

I'll have to think about this example more carefully when fulfilling my promise of writing about how this theory relates to everyday decision-making. Thanks for pointing out a potential weakness :)


The article is excellent, congratulations.

A couple observations/questions.

1) You didn't comment on the bayesian viewpoint that probability reflects a subjective idea about the state of the world. One might argue, for example, that probability isn't measurable, and that therefore, strictly speaking, a statement about the objective probability of an event isn't meaningful. Experimental evaluation would have to be done on an entire model instead. Do you have any objections to that point of view?

2) I don't find the case about Kolmogorov complexity to be actually convincing, at least not as per the requirements the rest of the article sets. "3141592..." could pass as either "random digits" or "first digits of pi". The fact that it's highly unlikely a true RNG would have generated exactly those, we are back to a frequentist argument there. It's likely I'm missing something, could you elaborate more or give me a pointer?


Isn't that what the occam's razor argument was for? Sure, both an RNG and the "40 digits of pie" program can produce that output, but the "40 digits of pie" program is shorter, therefore having less kolmogorov complexity


Is it really shorter?

For all I know this is a false dichotomy. True RNGs don't exist, for one, in a deterministic philosophy.

RNGs are per definition non-deterministic, but algorithms are deterministic--by the definition that I learned. We use floating gates picking up cosmic background radiation or the fallout from radioactive elements to come close to really random. XKCD's butterfly joke applies.

Computable numbers have a computable complexity. For a non-deterministic program the fair comparison to a true RNG would be a programm that outputs all digits of Pi up to infinity, I suppose. Vice-versa, I'm not sure if a pseudo-random number generator couldn't be fit into the equivalent kolmogorov complexity of a 40-digits of PIE.

Are you saying it couldn't be? On the one-hand it is the case that pseudo RNGs simply rely on intractible complexity, which must regularly supercede 40 digits of Pi equivalent to 40 byte key, as opposed to your 4kb RSA key chains and what not.

On the other hand I think your argument is akin to the gamblers fallacy: I have absolutely no clue but I will hope it will not have been a primitive RNG, right?

For reference, here's a simple pseudo RNG taken from TinyPTC example code to produce a TV-noise graphic in a loop

        noise = seed;
        noise >>= 3;
        noise ^= seed;
        carry = noise & 1;
        noise >>= 1;
        seed >>= 1;
        seed |= (carry << 30);
        noise &= 0xFF;
        pixel[index] = (noise<<16) | (noise<<8) | noise;


Yeah but is it? They are both programs with logarithmic Kolmogorov complexity, and Kolmogorov complexity is only defined up to constant factors unless you commit to a computational model. If you do commit to one, which is the shortest is just a function of what are the specifics of the model you chose, which isn't really interesting, it's literally code golf at that point.


In order to discriminate between the output of an RNG and the digits of PI, you are right that they have the same Kolmogorov Complexity to within additive constants.

However, you can ask for resource-bounded Kolmogorov complexity. Suppose you ask for the shortest length of the description of a PTIME machine which output the N bits of an RNG versus that which outputs the first N bits of PI. Complexity theorists believe that the first will be longer. Proving that conclusively might possibly have a bearing on P=BPP.

When you go all the way down to finite-state machines, KC becomes something equivalent to finite-state information-lossless compression, like Lempel-Ziv compressibility.

Long story short: by restricting the resources available for the computation, it is possible to discriminate among some such examples as you point out.


> They are both programs with logarithmic Kolmogorov complexity

In the case of true random numbers, how is that so?

Very few random sequences can be generated by a logarithmic-sized program, since most strings are not significantly compressible [1]. A simple counting argument shows that: there are 2^n strings of n bits, but only 2^(lg n) = n logarithmic-sized strings, a much smaller number!

[1] http://theory.stanford.edu/~trevisan/cs154-12/kolcomplexity-...


1) I rather like the subjective view! Bayesians used to struggle to justify a choice of prior, but it turns out that 2^-K(x) is universal in the sense that it never falls below a constant factor of any given (semi-)computable finite (semi)-measure.

2) Sorry, I should have clarified that the programs are deterministic. So if you want to use an RNG, you also have to supply a string of random bits that cause the RNG to output forty digits of pi.


Why is frequentism bad because it only gives certainty for infinite samples, but complexity is good despite being non-computable?

It's two sides of the same coin -- computable uncertainty va non-computable certainty.


I don't think frequentism is "bad"; just insufficient as a gold standard interpretation of probabilistic claims. I liked an analogy from the reference by Rathmanner & Hutter: the most "correct" chess-playing program involves a complete search along the tree of possible games. In practice, we try to approximate this ideal.

In the case of Kolmogorov complexity, a reasonable takeaway might be to use the shortest program that we're able to find, even if it's not the shortest overall.


Some thoughts on the composition:

* "I wrote the case arguing that frequentist interpretations don't work, but algorithmic information theory does": if that's what you are up to here then I think it would be for readers if you stated that up front in some way. And hit me with some kind of summary at the end that makes the concise version of your argument at the end, it's a long article.

* Shorter might be better: There's a lot of stuff in here that I think you can pare out in the probability discussion that maybe isn't adding that much to your argument. I think there is a lot to be gained by assuming a generous reader.

* Betting might be a distraction to your point: This might be confusing the imperfect knowledge of participants in a market with the imperfect knowledge of all the physical forces involved in a physical phenomenon and how that related to the seeming "randomness" of a coin flip for your reader. (The liquidity and stuff.. this is just not related to your point.)

* Don't undermine your point with unrelated assumptions: "I imagine they wouldn’t consider their world unlikely at all: they would just add a new law to their description of physics: all dice, as if by divine intervention, are deemed to exhibit this strange behaviour" this lead me to think that you were just sort of shooing away the whole last X decades of high vs. low energy physics, we collectively certainly don't think that we have the rules correct precisely because of this complication, we find the idea that we need 2 sets of rules improbable and believe that there must be a way to explain everything with a single set of rules. So your mythical dice society probably would consider their dice exception a very unlikely world.. they would be confident they have the world wrong!

A dubious assertion (or at least one that would need a whole lot of explanation) can be an off-ramp for a subset of readers.


Thanks for the detailed critique! I'll take some time to think about how to better make the points that I wanted to convey with those sections.


There were a few points that I, as someone unfamiliar with many of the ideas presented, got hung up on.

First, the paragraph that begins with "At first blush, the requirement to use..." Seems to be a non sequitur. I don't fully understand how the previous section creates a requirement to use deterministic programs, so I could use more explanation on how that requirement is established.

Second, a very simple concrete example of what one of these programs would look like would be immensely helpful. After re-reading the article a bit I have a mental image of a program that contains a long, compressed string and a decompression algorithm that somehow models the system you're interested in. I can imagine how you might get a useful interpretation of probability from the decompression system, but there are enough open questions there that I'm not sure I have the correct interpretation.

Hope that helps!


Thanks. I should clarify that the computer is deterministic, so as to avoid building randomness into the definition of randomness!

I skimmed over an example too quickly, but your intuition is about right. For that sequence, two possible programs are:

- Compute and print the first 40 digits of pi.

- Decompress the following string according to a Shannon code with probabilities (1/36,1/18,1/12,[etc]): [insert code]


Such an interesting post, thank you for sharing! I'm in my second year of a maths degree currently, and we obviously studied frequentist probability/stats in the first year, but I'm not taking any probability modules this year. I found the tone & accessibility was just right for me :)


My only thought is that discovering a workable definition of "scientific method" is a reach. Philosophers spent a century searching for such a thing, in vain.

On the other hand, providing something that just works would be beneficial enough, even if falling short of the philosophical holy grail, so it's worth pursuing.

I'm a physicist, and physicists have always wondered why math works so well in physics. There's this famous essay by Eugene Wigner:

https://www.dartmouth.edu/~matc/MathDrama/reading/Wigner.htm...


It sounds like you are arguing that i.i.d. frequentism doesn't work. I view AIT as generalizing frequentism to non-i.i.d. timeseries. This is formalized as Martin-Lof tests for randomness, and Solomonoff induction.


This is awesome! Good work!

Are you going to touch on Chaitin's Omega?


Thanks! :)

I wasn't planning to go there! While I enjoy the idea, for now I'm trying to focus on what's needed to make sense of the problem of induction. Is there a nice connection that I missed?


Suggestions are welcome; I'm still learning Rust.

To put context around my provocative comment earlier: I write C++ professionally. C++ is the right choice for my organization, though for purely historical reasons. In the long run, I think we'd all be better off if Rust or similar languages replaced their predecessors. So my comment came from a place of personal frustration, the same that led me to find Rust in the first place.

The old wording was too negative; I'm just excited to show that contest programming, which one might imagine to be hard to translate, is not only practical but indeed arguably nicer in Rust.


Just be careful to not adapt the michaelochurch approach. Very risky :)


I made this cookbook specifically to show it doesn't have to be so hard. While designing an industrial-strength data structure library is an advanced skill, there are easier ways to implement the core ideas that should suffice for contest and coursework purposes. Very few people have to build actual libraries.


Fair point, it's fixed now.


Would be a stronger argument if you had a better rating on CF :) As of now I will stick to the Petr / Egor philosophy.


What's their philosophy? I think Petr was using C# at some point.


Well I would say their philosophy first of all relies on strong tools. Intellij with CHelper plugin allows for simpler testing and also has incredibly powerful debugging as I am sure you have seen in their screencasts.

Also I think they sort of show the value of Java's simpler approach with everything being a reference in terms of writing code quickly. The alternative approach that C++ competitors use relies on a very specific style unlike what production code at say Google uses, with pointers used as sparingly as possible. With everything being a reference it is much easier to work with graphs for instance.

To me Rust goes even further into the C++ direction where it becomes more and more difficult to quickly write code that compiles. With Java I have no issue writing 100 lines and having everything compile the first time around, particularly with an editor like Intellij. With C++ even Clion struggles significantly with producing helpful info, simply due to the complexity of C++.

Anyway don’t have enough time to really eloquently think through all aspects of this. Also I was just kidding about CF rating, you are far better than me obviously.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: