ziggity's comments

ziggity · on Oct 8, 2019

And then 3 years later we have [0] and 6 years after that we have [1]. I appreciate Masters of the Universe types like Bloch and Lea contributing wickedly-efficient code, but somehow it's always mere mortals who end up mopping things up after the fact. Whether it's a bug in the actual algorithm or a "Comparison method violates its general contract!" exception that happens once in a blue moon, I think putting TimSort in Java was a mistake.

[0] https://dertompson.com/2012/11/23/sort-algorithm-changes-in-...

[1] https://bugs.openjdk.java.net/browse/JDK-8203864

earenndil · on Oct 9, 2019

> I appreciate Masters of the Universe types like Bloch and Lea contributing wickedly-efficient code, but somehow it's always mere mortals who end up mopping things up after the fact

I think this is the wrong way to look at the problem. Progress is always iterative. If a given person happens to make a bigger iteration, then people call them 'masters of the universe', but that iteration isn't inherently different from the normal, smaller kind of iteration. And - consider: if every line of code has a chance of being buggy, then a bigger change is likelier to be more buggy. I say more buggy, rather than have more bugs, because bugginess is rarely disconnected; the whole conception is likely to be more flawed the newer it is. If we were to only accept small contributions then, somewhere along the way from here to the limit, we would arrive at timsort. And all the flaws of timsort wouldn't be gone, they would just be amortized into smaller chunks along the way from here to there. Which may be preferable, but that's something you have to consider; it's not so cut-and-dry as 'if we only make change conservatively, then we avoid catastrophic failure'.

All this not to mention that there are people who are 'masters of the universe' at fixing bugs.

mavelikara · on Oct 9, 2019

> I think this is the wrong way to look at the problem. Progress is always iterative. If a given person happens to make a bigger iteration, then people call them 'masters of the universe', but that iteration isn't inherently different from the normal, smaller kind of iteration.

+1. Bloch himself made the iteration over work by Arthur van Hoff and others at Sun before him.

jonas21 · on Oct 8, 2019

The "Comparison method violates its general contract!" exception is due to a bug in the caller's code, not in the sort implementation. If you have a bug in your code, just because you're able to get away with it today doesn't mean you should expect to get away with it forever.

ziggity · on Oct 8, 2019

I agree in theory. But in practice that specific caller's code bug was so widespread and Timsort broke so much existing code that it warranted offering a "-Djava.util.Arrays.useLegacyMergeSort=true" option.

The specific line from the Javadoc you're alluding to [0] is:

> The implementor must ensure sgn(x.compareTo(y)) == -sgn(y.compareTo(x)) for all x and y.

The problem is the second implied half of that statement: Pre-Timsort it was ", and if you don't, results might not be sorted in the correct order" while with Timsort it was ", and if you don't, you will get a RuntimeException."

It's way too easy to accidentally violate that condition: System.currentTimeMillis(), incorrect null handling, random numbers. Sometimes not even in the Comparator itself. The condition I posted, plus the other two:

> The implementor must also ensure that the relation is transitive: (x.compareTo(y)>0 && y.compareTo(z)>0) implies x.compareTo(z)>0.

> Finally, the implementor must ensure that x.compareTo(y)==0 implies that sgn(x.compareTo(z)) == sgn(y.compareTo(z)), for all z.

Are just way too easy for someone to inadvertently violate to justify a RuntimeException when it occurs.

[0] https://docs.oracle.com/javase/7/docs/api/java/lang/Comparab...

gweinberg · on Oct 8, 2019

It sounds like it didn't break code, it just revealed that a lot of code was already broken. An invalid comparison function is a serious problem, and accepting unsorted output rather than fixing the damn bug is not a good workaround.

ignoramous · on Oct 9, 2019

Obligatory xkcd: https://xkcd.com/1172 (Emacs Workflow)

buildbot · on Oct 8, 2019

It seems way better to fail with a RuntimeException than to get possibly unsorted results that cause some hideous error deeper in your application.

eru · on Oct 9, 2019

It depends on the contract.

It's good to have a contract that defines behaviour even in the case of invalid input.

That could be throwing an error, or it could be given unsorted input. Either way is ok.

(The C and C++ answer is to leave the behaviour in the face of invalid input undefined. Undefined means that anything goes, including formatting your hard disk.)

naniwaduni · on Oct 8, 2019

If you're willing to accept unsorted output from your sort function, why are you sorting? What are you sorting?

AnimalMuppet · on Oct 8, 2019

If you supply an invalid comparison function to your sort function, which would you prefer to happen - that it crash, or that it give you unsorted output? (If you actually wanted sorted output, you should have provided a valid comparison function.)

njharman · on Oct 8, 2019

No question, that it crash. Silent buggy behavior is a bitch to notice.

chc · on Oct 8, 2019

Ideally I'd like a compile-time error — but that's beyond the current level of technology, so the next best thing is a crash. Having code that does unintended things is the worst-case scenario IMO. (Obviously it's the programmer's fault for providing buggy code, but that's a fault that every programmer shares. Personally, I appreciate any help in guarding against it.)

jiofih · on Oct 9, 2019

> beyond the current level of technology

I’m sure there quite a few languages that will catch that at compile time.

earenndil · on Oct 9, 2019

It is possible for a compiler to catch that particular mistake at compile-time. It's also possible for the user to provide a proof to a given compiler that a comparison function is well-formed. But I think that the parent was referring to a function that would check arbitrary comparison functions in java for correctness, without a proof, which is afaik provable impossible.

estebank · on Oct 9, 2019

> but that's beyond the current level of technology

Is it? In Rust and other languages you would use trait bounds to restrict the input types to be comparable with each other, which would make this a compile time error.

Dylan16807 · on Oct 9, 2019

How does that prevent you from accidentally making a.compare(b) and b.compare(a) inconsistent on some values?

Making sure you can compare the types is already happening in the compiler, and doesn't find the bug.

naniwaduni · on Oct 8, 2019

Crash, so that I actually get a reasonably high chance to notice that there is a bug during testing?

The comparison being invalid is definitely a bug; if I didn't want sorted output, I wouldn't be sorting.

wil421 · on Oct 9, 2019

Crash so when wil421 sort is implemented in the next version I’m not scratching my head months or years later.

chias · on Oct 8, 2019

Personally, I would very much prefer that it crash.

longemen3000 · on Oct 9, 2019

maybe you shouldn't supply invalid comparisons in the first place. but if you do, crashing seems like an appropriate response

kragen · on Oct 8, 2019

Most of the time I want it to crash in development and give me unsorted output in production; debugging an unhandled exception is super easy compared to debugging sometimes incorrectly sorted output. In rare cases I would also want it to crash in production. I definitely don't want it to change between the two scenarios when I update my JVM.

aaaa123 · on Oct 8, 2019

crash so i know what a fucking stupid mistake I made and thus I can fix it :)

ziggity · on Oct 8, 2019

Dynamic Ad Insertion wreaks havoc with this. Podcast eps can have different lengths and segments can have different start times depending on the ad profile of the listener.

And that's just at a single point in time: It's not uncommon for episodes to be re-edited and re-uploaded, or ads to be changed up (with different lengths) over time.

ziggity · on Oct 8, 2019

Overcast is an interesting case study on optimizing for listeners vs. creators.

The app developer, Marco Arment, is also co-host of Accidental Tech Podcast. One cool feature in Overcast is chapter markers: You can embed ID3v2 chapter frames in your RSS feed's MP3 files indicating when different segments start. This lets the listener jump between segments, it's a pretty slick experience.

Unsurprisingly, ATP uses this functionality. Advertising chapters are labelled as such, but the timecodes are always deliberately skewed so that skipping to the next chapter after an ad still gives you the last 15 seconds or so of the ad read.

I suppose this is a slightly better user experience than disabling the Skip Next button altogether on an ad segment, but it still irks me to be on the losing end of a conflict of interest.

...says the guy (me) complaining about a free podcast being played in a free podcast app.

mikestew · on Oct 8, 2019

Even as I typed my follow-up, I was thinking, "ya know, I wouldn't put it past Arment to just replicate the Now Playing UI and then season to taste." So, yeah, bad example.

a free podcast app

Arment's attention to little details is one of the reasons I <cough> paid whatever trivial amount he wants for an app I use daily. ;-)

eropple · on Oct 8, 2019

Does anything besides Overcast actually respond to chapter markers?

...says the guy (me) writing a self-hostable podcast platform.

ziggity · on Oct 8, 2019

Apparently I was wrong about how podcast chapters are embedded: It's not in the RSS feed itself but encoded as ID3v2 tags in the MP3 file.

Apple Podcasts added support in iOS 12, here's a Google Sheet of the rest: https://docs.google.com/spreadsheets/d/1c2L14UVH1xtN4iDG4awh...

ziggity · on Aug 28, 2019

> The identity "Me while I'm visiting nytimes.com" is distinct from the identity "Me while visiting cnn.com".

Trying to solve this through purely technical means is futile. If you block it at the user-agent, sites will share data at the back-end to create a super-profile.

Right now it's really convenient for advertisers to run an ad auction right in the user's web browser because all the context is there -- take that away and you'll see user data aggregated on the back-end instead.

Absent some type of regulation and enforcement, I really don't see how this puts a dent in the "reads a lot of articles on NY times about dogs, sees a lot of ads on cnn.com for dog food" profile aggregation.

kodablah · on Aug 28, 2019

> If you block it at the user-agent, sites will share data at the back-end to create a super-profile

This needs a bit more technical detail. If you mean they'll combine IP + other fingerprinting, we can work on mitigation techniques there too.

> I really don't see how this puts a dent [...]

It does as it asks sites to more explicitly install something server side with their HTTP server instead of embed this one-line script tag. Changing from the browser being to store of cross-site identifiers to the backend has a chance to shine more light on the practice and increase the burden of tracking. It can make a real dent.

Regulation/enforcement are orthogonal to technical solutions. There are also varying levels of support for the former vs the latter and we shouldn't mix them nor should we blindly say "regulation and enforcement" without nuance. Many, including myself, are against most regulation/enforcement approaches due to implementation incompetence (intentions notwithstanding). But regardless of that debate, it shouldn't muddy the technical debate.

earthboundkid · on Aug 28, 2019

> This needs a bit more technical detail. If you mean they'll combine IP + other fingerprinting, we can work on mitigation techniques there too.

Yeah, but instead of playing cat and mouse, just make it illegal and fine anyone caught violating it.

Honestly, banning tracking would end the race to the bottom and be good for publishers and consumers. It probably won't affect FB & Google because they are too big to be displaced. It may kill a bunch of middlemen, but they are leaches and should die anyway.

gnode · on Aug 28, 2019

> just make it illegal and fine anyone caught violating it.

I agree that this should happen. Unfortunately, this requires a revolutionary political movement which has so far failed to materialise. When there isn't adequate appetite from the rest of society for the protections you want, your solution has to be a technical one.

Maybe another way to kill the middlemen is to develop technology which undermines their business model, e.g. DTube; YaCy.

pornel · on Aug 28, 2019

Fortunately, they won't do it on the back-end. The ad industry has massive fraud problem, and the lack of trust prevents them from accepting traffic data they haven't seen themselves.

If you really force the industry to switch to "trust me, I've seen these users, now pay me" APIs on the back-end, it'll be a massive shake-up of the entire business model.

naasking · on Aug 28, 2019

> take that away and you'll see user data aggregated on the back-end instead

OK, but at least then it's not polluting the user's experience and burning the user's CPU cycles. Still a strictly positive change IMO.

wlesieutre · on Aug 28, 2019

The negative side is that you can no longer see what sites are doing it, what they're doing, or block it in your browser

tempguy9999 · on Aug 28, 2019

Do people 'see' what the sites are doing, and which? Does it matter if you just prevent it from happening?

> The negative side is that you can no longer [...] block it in your browser

If they're not doing it in your browser then you don't need to block it in your browser, because they're not doing it, because they're doing it in their back-end (which is not your browser) instead of in your browser (which is).

Honestly, what are you trying to say?

wlesieutre · on Aug 28, 2019

> Do people 'see' what the sites are doing, and which? Does it matter if you just prevent it from happening?

Sorry if I've not been clear enough here. Let me explain my thought process.

uBlock Origin shows me this https://i.imgur.com/Vv2xyIL.png when I visit cnn.com.

In contrast, when I go to news.ycombinator.com, nothing is blocked. It gives me some idea of what companies respect my privacy and what companies are happy to sell my internet browsing history out to advertising networks and data brokers.

Yes, I'm blocking it as much as possible regardless, but I think it's still valuable to be aware of which sites are good actors and what sites are not. The little number on the uBO toolbar icon is a rough reminder of this.

> If they're not doing it in your browser then you don't need to block it in your browser, because they're not doing it, because they're doing it in their back-end (which is not your browser) instead of in your browser (which is).

The problem is not that the tracking is in my browser. The problem is the tracking.

If the tracking all happens server-side I have no idea what sites are tracking me and I can't do anything to prevent it. I can't even avoid it because I can't see what sites do it.

This is - from a perspective of not wanting to be tracked everywhere I go on the internet - worse than having javascript trackers on each page which my browser can choose to not run.

tempguy9999 · on Aug 28, 2019

That was helpful.

The original comment that you seemed to be replying to was "Right now it's really convenient for advertisers to run an ad auction right in the user's web browser because all the context is there". I thought you were saying blocking that crap in your browser didn't make a difference.

I can't see your pic because I never allow JS outside of a VM.

If tracking is enabled in a browser it becomes vastly easier for them to assign unique cookies to follow you. OK, now then can do it with etags and browser fingerprinting - mitigating the latter is possible, I don't know about the former.

But this...

> If the tracking all happens server-side I have no idea what sites are tracking me and I can't do anything to prevent it.

...is dubious. Etags and fingerprints aside, tracking non-cooperating (cookie declining) browsers has to be harder. I agree with you about tracking being the problem though.

wlesieutre · on Aug 28, 2019

Harder on shared internet connections, for sure. But my apartment's internet connection is for the most part my own traffic, or guests who bring their phone over. Any traffic coming from that can be trivially tied to me.

I can use a VPN to hide my IP on most of my devices, except for when I'm trying to watch Netflix/Amazon/whatever. But I wish I didn't have to.

tempguy9999 · on Aug 28, 2019

There's a big difference between your ISP knowing stuff and the river of scum that is advertising.

> But I wish I didn't have to.

One way or other you will always have to. Perhaps the most important way of destroying the ad industry online is to have an alternative means of funding sites. Maybe that would work.

wlesieutre · on Aug 29, 2019

Of course my ISP knows my identity, but my point is it's also probably not hard for an advertising company to get one piece of data that links my real identity to my cable modem's IP address, and then any tracking data they've previously accumulated can potentially be tied to that after the fact. They just need to tell when IP was assigned to me, which is probably easy to infer from a sudden change in what websites an IP is visiting.

On a related note, remember that time when AT&T and Verizon were just giving out the cell numbers associated with their customers' IP addresses to whoever asked, because they're complete fucking morons who thought that was a good idea?

https://medium.com/@philipn/want-to-see-something-crazy-open...

naasking · on Aug 28, 2019

You still don't know what sites are doing right now, even with all of your ad-block extensions.

tgsovlerkhgsel · on Aug 29, 2019

One thing that might counteract this is that right now, violating user's privacy is normalized. Regulation exists (GDPR), enforcement is missing. Everyone is doing it, most sites have cookie notices/consent forms that blatantly violate GDPR, and because everyone is doing it, nobody is willing or motivated to change.

Break the platform and force everyone to re-engineer, and you can start severely hitting the companies that continue violating GDPR in the "new world".

ziggity · on Aug 18, 2019

Without knowing more about the "sophisticated anti-piracy system" employed by the app author, it's hard to determine if Google's claim of malicious behaviour is valid.

This could be something as simple as bytecode obfuscation, or something as complex as scraping every bit of personal information available through possibly-questionable means and sending to an insecure server. Copy protection schemes are notoriously user-hostile.

kace91 · on Aug 18, 2019

Regardless of the reason, the fact that you can have your livelihood taken away without an explanation or human contact is pretty worrying.

UncleMeat · on Aug 19, 2019

Having a system that tells malware authors precisely what behavior triggered an alarm is also not great. There is no solution.

kace91 · on Aug 19, 2019

That doesn't seem to be a problem in any other field of security in the world. If you steal a shirt from a shop you'll get told whether they hace security footage of you stealing, an alarm went off or a security guard saw you.

It might be more convenient for google to not say, but it's pretty disrespectful towards clients that choose their platform to make a living and might get caught as a false positive.

baroffoos · on Aug 19, 2019

"Your app has been removed because it was found to be malware. Please reply to this message if you believe this to be a mistake." Doesn't help malware authors at all.

rjvs · on Aug 19, 2019

Isn't that basically what has already happened in this case?

baroffoos · on Aug 19, 2019

They apparently don't offer an easy way to contact someone if there is a mistake or get a human to check the algorithm has correctly identified malware of it if has flagged a legitimate app.

fauigerzigerk · on Aug 19, 2019

There is a very simple solution. Have a competetent reviewer look at the code and decide whether the intent is malicious or not.

If the intent is clearly not malicious and no rules were broken, the reviewer should file a bug report to fix the virus scanner and reinstate the developer account. No further explanation required.

If rules were breached but it may have been done in good faith, issue a warning to the developer and explain in general terms how to fix the problem. Charge a review fee high enough to deter any abuse of the review system.

cmsj · on Aug 19, 2019

"Have a competent reviewer" - requires humans, humans don't scale, proposed solution is not Googley.

(this is sarcasm, but from talking to Googler friends over the years, I doubt it's far from the truth)

fauigerzigerk · on Aug 19, 2019

It scales just fine if they charge enough to discourage abuse. Also, the law should frankly require them to offer proper conflict resolution if they run one of two commercially viable app stores.

bcrosby95 · on Aug 19, 2019

Apparently he had zero trouble with antivirus apps fixing his situation. But I guess Google is so special it can't accomplish what companies with a fraction of Google's budget or manpower can do.

kg · on Aug 18, 2019

Doesn't have to be taken away by Google either. They're extremely bad at processing DMCA notices, so a competitor or malicious teen can take you off the store for weeks. No recourse.

the_trapper · on Aug 18, 2019

For this reason I would never trust my livelihood with the Play Store. Beer money, yes, paying my bills, hell no.

sneak · on Aug 19, 2019

His livelihood was not taken away; he is still free to program for a trade or to sell his apps directly to users.

HillaryBriss · on Aug 19, 2019

I guess you can say that. So ... maybe his investment of time and money in the android ecosystem has been taken away. And this should give pause to other such investors.

ggggtez · on Aug 19, 2019

What if the reason is they were scraping all your private data against the Terms of Service, and hiding it in obfuscated dynamic bytecode loading... if that was the case, you think it's still bad that they can be kicked off the app store?

I mean, we only have the developer's word that they weren't shipping malware in these bytecode files, unless someone reverse engineers it (unlikely), or Google publicly explains the reason for the banning (which they never do).

timmytokyo · on Aug 19, 2019

If that is the reason they were kicked off, why can't google just say so?

ziggity · on Aug 8, 2019

I had a junior dev show me a neat site where you can paste in a Java thread dump and it performs an analysis. After explaining why it's a bad practice to send diagnostic details to an un-trusted third party I think he understood, but it seems like every week I'm finding people using ngrok, unauthorized password managers, grammarly, JWT parsers, Base64 encoders, and all manner of questionable tools.

I too wonder if I'm out of touch, if I'm tilting at windmills.

kelnos · on Aug 8, 2019

> I'm finding people using ngrok... and all manner of questionable tools.

At least ngrok supports end-to-end TLS tunnels[0], where you use your own TLS key/certs and the ngrok server never sees plaintext (the ngrok client is also open source, so for the truly paranoid you can examine it to ensure it isn't doing anything nefarious).

But I agree... I've seen people at a company where I used to work pasting sensitive data into a public pastebin. It still hurts my brain to think about it.

[0] https://ngrok.com/docs#tls

Riverheart · on Aug 8, 2019

You're not out of touch. People like to trade security for convenience and while it doesn't always present an issue, it's a bad habit to get into. It's also one thing to take personal risks, it's another to put risks onto the company.

zbentley · on Aug 8, 2019

How are any of those questionable tools?

Sure, don't put arbitrary shit on the internet and know where your data is going. But every example you gave is incredibly useful to many people on a daily basis.

pippy · on Aug 9, 2019

Leaking a bit of code will unlikely cause a problem, but the guns that can backfire are some of the cert checkers and SAML validators.

hartator · on Aug 8, 2019

It can also happen you don’t care about the data you are sharing. Why not using an online then?

ziggity · on Aug 8, 2019

Still too dangerous, and I don't trust new developers to make that determination. Once you get into the habit of pasting development details into random website textboxes hosted who-knows-where with who-knows-what ad networks, you're one keystroke away from leaking sensitive details that are correlated to your employer's IP range.

Or maybe I'm a crank and need to lighten up. That's why I'm asking.

pbronez · on Aug 8, 2019

It's a reasonable thing to worry about. Bad actors exist. IP is valuable. Computers are insecure. People are lazy. You have to be careful out there.

> I don't trust new developers to make that determination.

Ignoring this issue is a sign of professional immaturity. Recommend you view it as an opportunity to educate the younger members of your team. Show them the power of a solid CLI toolbox that respects your privacy while delivering solid performance.

Still, you shouldn't be dogmatic about it. Webapp tools can be useful for understanding a new programming language or API. Just be judicious.

ziggity · on Aug 8, 2019

I agree 100%, perhaps I could have phrased that better. I try to use it as a teachable moment: "Hey, instead of using base64decode.org did you know you could use atob and btoa in a web inspector?"

Security-related scanners are a tough one though. Free XSS scanners, free TLS cert checkers: The best intentions can result in unintended disclosure. Developers have it constantly beaten into their head "Security! Security! Security!" and are often given nothing more than an OWASP cheat-sheet, so I can totally understand and empathize with the thought process that leads someone to plug a company URL into a free web-hosted XSS scanner.

hartator · on Aug 8, 2019

Hum. I can see both cases. I would also think even if they copy/paste sensitive information. Like SSNs or password. It will be so diluted in the noise of other people data that it won't matter. Most of these websites - I made one myself - are run by people like us and we won't care about what it sent to the server.

jdavis703 · on Aug 9, 2019

Google offers a digital leak prevention service as part of GCP. You could use it for offensive security to find likely PII without much concern about the noise.

Google runs this (or an internal version) of this service to make sure people serving third party ads aren't sending sensitive data. At one of my past companies our customers would send out email marketing campaigns that contained URLs with tracking parameters with PII. We wound up having to just strip off any query parameters we didn't explicitly need because Google kept flagging us for PII leaks caused by our customers.

So yeah, there's a lot of noise. But, people are listening out there!

ziggity · on Aug 6, 2019

A person in the middle can see the hostname via SNI.

But if you're terminating TLS on behalf of a customer, you can see everything. e.g. https://new.blog.cloudflare.com/terminating-service-for-8cha...

> Among other things, that resulted in us cooperating around monitoring potential hate sites on our network and notifying law enforcement when there was content that contained an indication of potential violence.

That indicates deep inspection of traffic going through CloudFlare.