Yeah, zero is probably not the best thing to return if left == right; same with ...

hddqsb · on March 8, 2022

Psst, if you use an inclusive-exclusive interval for the parameters and an exclusive-exclusive interval for the implementation (see my top-level comment), it becomes nicer and you can keep your original invariants. You just need `left -= 1`:

    def binary_search(left, right, is_green):
        left -= 1
        while left + 1 < right:
            middle = (left + right) // 2
            if is_green(middle):
                left = middle
            else:
                right = middle
        return right

    # Usage.
    binary_search(0, len(array), lambda i: array[i] < 6)

This is equivalent* to @anonymoushn's version if you define `left = lo - 1` and `right = hi`.

*Equivalent except for the rounding in the division.

anonymoushn · on March 8, 2022

Very good. I also find the division rounding and symmetry in your implementation aesthetically appealing. I think I just instinctively reach for half-open intervals for reasons unrelated to this particular problem.

For signed search ranges, if you modify my implementation above to compute the midpoint as (lo + hi) // 2 and (lo + hi) < 0 and the language rounds integer division toward zero, you gain a rounding-related bug. In this sort of circumstance, your approach does not gain that bug because it doesn't care which way division is rounded. Python doesn't have overflow and rounds down, so ~all of the errors we can make on that line won't happen anyway :)

If the range is unsigned, lo-1 is likely to be MAX_VALUE, but it seems like it still does the right thing.

tylerhou · on March 8, 2022

I saw your comments above as well. I'm not 100% convinced that this is better; I actually like the three "edge cases" because I find that code/proof of correctness easier to understand. In your code, I have to break the analysis into two cases:

1. The entire range isn't red, so at some point [proof needed] we will move left to a green element. From that point on we can continue analysis like normal. (I can't think of an easy way to prove such a claim without messing with floor division and induction, etc... but it seems intuitively true.)

2. The entire range is red, so green will never be moved, so right will point to the first element and green will point to one before the first. This returns the correct value.

I'll have to come up with a way to easily prove statement 1 above to be convinced that it is better. But I can include this version in the follow up.

hddqsb · on March 9, 2022

Defining `left - 1` to be green `right` to be red can be thought of as "padding" the input to ensure that there is at least one green and one red, but I'll concede that it is sneaky.

(A less sneaky version is to explicitly define new predicates: `is_green2(i) := (i == left - 1) || is_green(i)` and `is_red2(i) := (i == right) || is_red(i)`, where `left` and `right` refer to the function's original parameters (inclusive-exclusive). These predicates can be used as the invariants, and are sufficient for showing that at the end of the loop the result is either the first red, or one-past-the-end if everything is green.)

An alternative is to tweak the invariants:

Let `left` and `right` be the function's original parameters (inclusive-exclusive), and let `l` (that's a lowercase L) and `r` be the variables used in the loop (exclusive-exclusive).

The original invariants are `is_green(l)` and `is_red(r)`, which require sneaky definitions when `l = left - 1` and `r = right`.

The tweaked invariants are `is_green(i) for all i where left <= i <= l` and `is_red(i) for all i where r <= i < right`. In other words, everything up to and including `l` is green, and everything from `r` onward is red.

These invariants are vacuously true when `l = left - 1` and `r = right`, so they hold before the loop (without having to define `left - 1` to be green and `right` to be red).