And then 3 years later we have [0] and 6 years after that we have [1]. I appreciate Masters of the Universe types like Bloch and Lea contributing wickedly-efficient code, but somehow it's always mere mortals who end up mopping things up after the fact. Whether it's a bug in the actual algorithm or a "Comparison method violates its general contract!" exception that happens once in a blue moon, I think putting TimSort in Java was a mistake.
> I appreciate Masters of the Universe types like Bloch and Lea contributing wickedly-efficient code, but somehow it's always mere mortals who end up mopping things up after the fact
I think this is the wrong way to look at the problem. Progress is always iterative. If a given person happens to make a bigger iteration, then people call them 'masters of the universe', but that iteration isn't inherently different from the normal, smaller kind of iteration. And - consider: if every line of code has a chance of being buggy, then a bigger change is likelier to be more buggy. I say more buggy, rather than have more bugs, because bugginess is rarely disconnected; the whole conception is likely to be more flawed the newer it is. If we were to only accept small contributions then, somewhere along the way from here to the limit, we would arrive at timsort. And all the flaws of timsort wouldn't be gone, they would just be amortized into smaller chunks along the way from here to there. Which may be preferable, but that's something you have to consider; it's not so cut-and-dry as 'if we only make change conservatively, then we avoid catastrophic failure'.
All this not to mention that there are people who are 'masters of the universe' at fixing bugs.
> I think this is the wrong way to look at the problem. Progress is always iterative. If a given person happens to make a bigger iteration, then people call them 'masters of the universe', but that iteration isn't inherently different from the normal, smaller kind of iteration.
+1. Bloch himself made the iteration over work by Arthur van Hoff and others at Sun before him.
The "Comparison method violates its general contract!" exception is due to a bug in the caller's code, not in the sort implementation. If you have a bug in your code, just because you're able to get away with it today doesn't mean you should expect to get away with it forever.
I agree in theory. But in practice that specific caller's code bug was so widespread and Timsort broke so much existing code that it warranted offering a "-Djava.util.Arrays.useLegacyMergeSort=true" option.
The specific line from the Javadoc you're alluding to [0] is:
> The implementor must ensure sgn(x.compareTo(y)) == -sgn(y.compareTo(x)) for all x and y.
The problem is the second implied half of that statement: Pre-Timsort it was ", and if you don't, results might not be sorted in the correct order" while with Timsort it was ", and if you don't, you will get a RuntimeException."
It's way too easy to accidentally violate that condition: System.currentTimeMillis(), incorrect null handling, random numbers. Sometimes not even in the Comparator itself. The condition I posted, plus the other two:
> The implementor must also ensure that the relation is transitive: (x.compareTo(y)>0 && y.compareTo(z)>0) implies x.compareTo(z)>0.
> Finally, the implementor must ensure that x.compareTo(y)==0 implies that sgn(x.compareTo(z)) == sgn(y.compareTo(z)), for all z.
Are just way too easy for someone to inadvertently violate to justify a RuntimeException when it occurs.
It sounds like it didn't break code, it just revealed that a lot of code was already broken. An invalid comparison function is a serious problem, and accepting unsorted output rather than fixing the damn bug is not a good workaround.
It's good to have a contract that defines behaviour even in the case of invalid input.
That could be throwing an error, or it could be given unsorted input. Either way is ok.
(The C and C++ answer is to leave the behaviour in the face of invalid input undefined. Undefined means that anything goes, including formatting your hard disk.)
If you supply an invalid comparison function to your sort function, which would you prefer to happen - that it crash, or that it give you unsorted output? (If you actually wanted sorted output, you should have provided a valid comparison function.)
Ideally I'd like a compile-time error — but that's beyond the current level of technology, so the next best thing is a crash. Having code that does unintended things is the worst-case scenario IMO. (Obviously it's the programmer's fault for providing buggy code, but that's a fault that every programmer shares. Personally, I appreciate any help in guarding against it.)
It is possible for a compiler to catch that particular mistake at compile-time. It's also possible for the user to provide a proof to a given compiler that a comparison function is well-formed. But I think that the parent was referring to a function that would check arbitrary comparison functions in java for correctness, without a proof, which is afaik provable impossible.
> but that's beyond the current level of technology
Is it? In Rust and other languages you would use trait bounds to restrict the input types to be comparable with each other, which would make this a compile time error.
Most of the time I want it to crash in development and give me unsorted output in production; debugging an unhandled exception is super easy compared to debugging sometimes incorrectly sorted output. In rare cases I would also want it to crash in production. I definitely don't want it to change between the two scenarios when I update my JVM.
Dynamic Ad Insertion wreaks havoc with this. Podcast eps can have different lengths and segments can have different start times depending on the ad profile of the listener.
And that's just at a single point in time: It's not uncommon for episodes to be re-edited and re-uploaded, or ads to be changed up (with different lengths) over time.
Overcast is an interesting case study on optimizing for listeners vs. creators.
The app developer, Marco Arment, is also co-host of Accidental Tech Podcast. One cool feature in Overcast is chapter markers: You can embed ID3v2 chapter frames in your RSS feed's MP3 files indicating when different segments start. This lets the listener jump between segments, it's a pretty slick experience.
Unsurprisingly, ATP uses this functionality. Advertising chapters are labelled as such, but the timecodes are always deliberately skewed so that skipping to the next chapter after an ad still gives you the last 15 seconds or so of the ad read.
I suppose this is a slightly better user experience than disabling the Skip Next button altogether on an ad segment, but it still irks me to be on the losing end of a conflict of interest.
...says the guy (me) complaining about a free podcast being played in a free podcast app.
Even as I typed my follow-up, I was thinking, "ya know, I wouldn't put it past Arment to just replicate the Now Playing UI and then season to taste." So, yeah, bad example.
a free podcast app
Arment's attention to little details is one of the reasons I <cough> paid whatever trivial amount he wants for an app I use daily. ;-)
> The identity "Me while I'm visiting nytimes.com" is distinct from the identity "Me while visiting cnn.com".
Trying to solve this through purely technical means is futile. If you block it at the user-agent, sites will share data at the back-end to create a super-profile.
Right now it's really convenient for advertisers to run an ad auction right in the user's web browser because all the context is there -- take that away and you'll see user data aggregated on the back-end instead.
Absent some type of regulation and enforcement, I really don't see how this puts a dent in the "reads a lot of articles on NY times about dogs, sees a lot of ads on cnn.com for dog food" profile aggregation.
> If you block it at the user-agent, sites will share data at the back-end to create a super-profile
This needs a bit more technical detail. If you mean they'll combine IP + other fingerprinting, we can work on mitigation techniques there too.
> I really don't see how this puts a dent [...]
It does as it asks sites to more explicitly install something server side with their HTTP server instead of embed this one-line script tag. Changing from the browser being to store of cross-site identifiers to the backend has a chance to shine more light on the practice and increase the burden of tracking. It can make a real dent.
Regulation/enforcement are orthogonal to technical solutions. There are also varying levels of support for the former vs the latter and we shouldn't mix them nor should we blindly say "regulation and enforcement" without nuance. Many, including myself, are against most regulation/enforcement approaches due to implementation incompetence (intentions notwithstanding). But regardless of that debate, it shouldn't muddy the technical debate.
> This needs a bit more technical detail. If you mean they'll combine IP + other fingerprinting, we can work on mitigation techniques there too.
Yeah, but instead of playing cat and mouse, just make it illegal and fine anyone caught violating it.
Honestly, banning tracking would end the race to the bottom and be good for publishers and consumers. It probably won't affect FB & Google because they are too big to be displaced. It may kill a bunch of middlemen, but they are leaches and should die anyway.
> just make it illegal and fine anyone caught violating it.
I agree that this should happen. Unfortunately, this requires a revolutionary political movement which has so far failed to materialise. When there isn't adequate appetite from the rest of society for the protections you want, your solution has to be a technical one.
Maybe another way to kill the middlemen is to develop technology which undermines their business model, e.g. DTube; YaCy.
Fortunately, they won't do it on the back-end. The ad industry has massive fraud problem, and the lack of trust prevents them from accepting traffic data they haven't seen themselves.
If you really force the industry to switch to "trust me, I've seen these users, now pay me" APIs on the back-end, it'll be a massive shake-up of the entire business model.
Do people 'see' what the sites are doing, and which? Does it matter if you just prevent it from happening?
> The negative side is that you can no longer [...] block it in your browser
If they're not doing it in your browser then you don't need to block it in your browser, because they're not doing it, because they're doing it in their back-end (which is not your browser) instead of in your browser (which is).
In contrast, when I go to news.ycombinator.com, nothing is blocked. It gives me some idea of what companies respect my privacy and what companies are happy to sell my internet browsing history out to advertising networks and data brokers.
Yes, I'm blocking it as much as possible regardless, but I think it's still valuable to be aware of which sites are good actors and what sites are not. The little number on the uBO toolbar icon is a rough reminder of this.
> If they're not doing it in your browser then you don't need to block it in your browser, because they're not doing it, because they're doing it in their back-end (which is not your browser) instead of in your browser (which is).
The problem is not that the tracking is in my browser. The problem is the tracking.
If the tracking all happens server-side I have no idea what sites are tracking me and I can't do anything to prevent it. I can't even avoid it because I can't see what sites do it.
This is - from a perspective of not wanting to be tracked everywhere I go on the internet - worse than having javascript trackers on each page which my browser can choose to not run.
The original comment that you seemed to be replying to was "Right now it's really convenient for advertisers to run an ad auction right in the user's web browser because all the context is there". I thought you were saying blocking that crap in your browser didn't make a difference.
I can't see your pic because I never allow JS outside of a VM.
If tracking is enabled in a browser it becomes vastly easier for them to assign unique cookies to follow you. OK, now then can do it with etags and browser fingerprinting - mitigating the latter is possible, I don't know about the former.
But this...
> If the tracking all happens server-side I have no idea what sites are tracking me and I can't do anything to prevent it.
...is dubious. Etags and fingerprints aside, tracking non-cooperating (cookie declining) browsers has to be harder. I agree with you about tracking being the problem though.
Harder on shared internet connections, for sure. But my apartment's internet connection is for the most part my own traffic, or guests who bring their phone over. Any traffic coming from that can be trivially tied to me.
I can use a VPN to hide my IP on most of my devices, except for when I'm trying to watch Netflix/Amazon/whatever. But I wish I didn't have to.
There's a big difference between your ISP knowing stuff and the river of scum that is advertising.
> But I wish I didn't have to.
One way or other you will always have to. Perhaps the most important way of destroying the ad industry online is to have an alternative means of funding sites. Maybe that would work.
Of course my ISP knows my identity, but my point is it's also probably not hard for an advertising company to get one piece of data that links my real identity to my cable modem's IP address, and then any tracking data they've previously accumulated can potentially be tied to that after the fact. They just need to tell when IP was assigned to me, which is probably easy to infer from a sudden change in what websites an IP is visiting.
On a related note, remember that time when AT&T and Verizon were just giving out the cell numbers associated with their customers' IP addresses to whoever asked, because they're complete fucking morons who thought that was a good idea?
One thing that might counteract this is that right now, violating user's privacy is normalized. Regulation exists (GDPR), enforcement is missing. Everyone is doing it, most sites have cookie notices/consent forms that blatantly violate GDPR, and because everyone is doing it, nobody is willing or motivated to change.
Break the platform and force everyone to re-engineer, and you can start severely hitting the companies that continue violating GDPR in the "new world".
Without knowing more about the "sophisticated anti-piracy system" employed by the app author, it's hard to determine if Google's claim of malicious behaviour is valid.
This could be something as simple as bytecode obfuscation, or something as complex as scraping every bit of personal information available through possibly-questionable means and sending to an insecure server. Copy protection schemes are notoriously user-hostile.
That doesn't seem to be a problem in any other field of security in the world. If you steal a shirt from a shop you'll get told whether they hace security footage of you stealing, an alarm went off or a security guard saw you.
It might be more convenient for google to not say, but it's pretty disrespectful towards clients that choose their platform to make a living and might get caught as a false positive.
"Your app has been removed because it was found to be malware. Please reply to this message if you believe this to be a mistake." Doesn't help malware authors at all.
They apparently don't offer an easy way to contact someone if there is a mistake or get a human to check the algorithm has correctly identified malware of it if has flagged a legitimate app.
There is a very simple solution. Have a competetent reviewer look at the code and decide whether the intent is malicious or not.
If the intent is clearly not malicious and no rules were broken, the reviewer should file a bug report to fix the virus scanner and reinstate the developer account. No further explanation required.
If rules were breached but it may have been done in good faith, issue a warning to the developer and explain in general terms how to fix the problem. Charge a review fee high enough to deter any abuse of the review system.
It scales just fine if they charge enough to discourage abuse. Also, the law should frankly require them to offer proper conflict resolution if they run one of two commercially viable app stores.
Apparently he had zero trouble with antivirus apps fixing his situation. But I guess Google is so special it can't accomplish what companies with a fraction of Google's budget or manpower can do.
Doesn't have to be taken away by Google either. They're extremely bad at processing DMCA notices, so a competitor or malicious teen can take you off the store for weeks. No recourse.
I guess you can say that. So ... maybe his investment of time and money in the android ecosystem has been taken away. And this should give pause to other such investors.
What if the reason is they were scraping all your private data against the Terms of Service, and hiding it in obfuscated dynamic bytecode loading... if that was the case, you think it's still bad that they can be kicked off the app store?
I mean, we only have the developer's word that they weren't shipping malware in these bytecode files, unless someone reverse engineers it (unlikely), or Google publicly explains the reason for the banning (which they never do).
I had a junior dev show me a neat site where you can paste in a Java thread dump and it performs an analysis. After explaining why it's a bad practice to send diagnostic details to an un-trusted third party I think he understood, but it seems like every week I'm finding people using ngrok, unauthorized password managers, grammarly, JWT parsers, Base64 encoders, and all manner of questionable tools.
I too wonder if I'm out of touch, if I'm tilting at windmills.
> I'm finding people using ngrok... and all manner of questionable tools.
At least ngrok supports end-to-end TLS tunnels[0], where you use your own TLS key/certs and the ngrok server never sees plaintext (the ngrok client is also open source, so for the truly paranoid you can examine it to ensure it isn't doing anything nefarious).
But I agree... I've seen people at a company where I used to work pasting sensitive data into a public pastebin. It still hurts my brain to think about it.
You're not out of touch. People like to trade security for convenience and while it doesn't always present an issue, it's a bad habit to get into. It's also one thing to take personal risks, it's another to put risks onto the company.
Sure, don't put arbitrary shit on the internet and know where your data is going. But every example you gave is incredibly useful to many people on a daily basis.
Still too dangerous, and I don't trust new developers to make that determination. Once you get into the habit of pasting development details into random website textboxes hosted who-knows-where with who-knows-what ad networks, you're one keystroke away from leaking sensitive details that are correlated to your employer's IP range.
Or maybe I'm a crank and need to lighten up. That's why I'm asking.
It's a reasonable thing to worry about. Bad actors exist. IP is valuable. Computers are insecure. People are lazy. You have to be careful out there.
> I don't trust new developers to make that determination.
Ignoring this issue is a sign of professional immaturity. Recommend you view it as an opportunity to educate the younger members of your team. Show them the power of a solid CLI toolbox that respects your privacy while delivering solid performance.
Still, you shouldn't be dogmatic about it. Webapp tools can be useful for understanding a new programming language or API. Just be judicious.
I agree 100%, perhaps I could have phrased that better. I try to use it as a teachable moment: "Hey, instead of using base64decode.org did you know you could use atob and btoa in a web inspector?"
Security-related scanners are a tough one though. Free XSS scanners, free TLS cert checkers: The best intentions can result in unintended disclosure. Developers have it constantly beaten into their head "Security! Security! Security!" and are often given nothing more than an OWASP cheat-sheet, so I can totally understand and empathize with the thought process that leads someone to plug a company URL into a free web-hosted XSS scanner.
Hum. I can see both cases. I would also think even if they copy/paste sensitive information. Like SSNs or password. It will be so diluted in the noise of other people data that it won't matter. Most of these websites - I made one myself - are run by people like us and we won't care about what it sent to the server.
Google offers a digital leak prevention service as part of GCP. You could use it for offensive security to find likely PII without much concern about the noise.
Google runs this (or an internal version) of this service to make sure people serving third party ads aren't sending sensitive data. At one of my past companies our customers would send out email marketing campaigns that contained URLs with tracking parameters with PII. We wound up having to just strip off any query parameters we didn't explicitly need because Google kept flagging us for PII leaks caused by our customers.
So yeah, there's a lot of noise. But, people are listening out there!
> Among other things, that resulted in us cooperating around monitoring potential hate sites on our network and notifying law enforcement when there was content that contained an indication of potential violence.
That indicates deep inspection of traffic going through CloudFlare.
[0] https://dertompson.com/2012/11/23/sort-algorithm-changes-in-...
[1] https://bugs.openjdk.java.net/browse/JDK-8203864