I think that's a mixed blessing. I believe Java did this deliberately to avoid the trouble that C and C++ have with signed and unsigned integer types having to coexist. Personally I've never been inconvenienced by Java's lack of unsigned integer types, but I'm sure it can be annoying in some situations.
I'm quite fond of Ada's approach to integer types, but I suspect I'm in a minority.
> Silent integer overflow/wrap-around. It's not C- did it really have to copy this insanity?
Curiously this cropped up 10 days ago. [0] You're not alone. The great John Regehr put it thus: [1]
> Java-style wrapping integers should never be the default, this is arguably even worse than C and C++’s UB-on-overflow which at least permits an implementation to trap.
> The fact that arrays got type variance wrong.
At least Java has the defence that they didn't know how it would pan out. C# has no such excuse in copying Java.
> No concept of `const` or immutability.
I recall a Java wizard commenting that although a const system is the sort of feature that aligns with Java's philosophy, it's just too difficult to retrofit it.
> For me as a language designer, which I don't really count myself as these days, what "simple" really ended up meaning was could I expect J. Random Developer to hold the spec in his head. That definition says that, for instance, Java isn't -- and in fact a lot of these languages end up with a lot of corner cases, things that nobody really understands. Quiz any C developer about unsigned, and pretty soon you discover that almost no C developers actually understand what goes on with unsigned, what unsigned arithmetic is. Things like that made C complex. The language part of Java is, I think, pretty simple. The libraries you have to look up.
Since Java 8, the standard library has unsigned manipulation arithmetic classes, though.
I don't know about Ada, but I enjoy Rust's strictness when it comes to numeric types.
> Java-style wrapping integers should never be the default, this is arguably even worse than C and C++’s UB-on-overflow which at least permits an implementation to trap.
EXACTLY. It's f-ing stupid. C's excuse was compilers doing magic on UB or whatever. Java has no such excuse. They just wanted it to behave the same as C/C++ to attract C++ devs.
> At least Java has the defence that they didn't know how it would pan out. C# has no such excuse in copying Java.
My understanding was that they DID know it was wrong and chose to do it anyway because it was more convenient and ergonomic to allow it that way. I guess they realized that was a terrible idea, because the generic collection interfaces do it correctly.
I don't see how const and immutability align with Java's original philosophy of being object-oriented, which is all about opaque objects that control internal mutable state. The very fact that it's taken until now to have records is proof-positive that "everything is an object" was taken pretty literally for most of its life. Immutable data doesn't really jive with that.
> I don't see how const and immutability align with Java's original philosophy of being object-oriented, which is all about opaque objects that control internal mutable state.
That's an interesting point, but an object presents an interface and promises to deliver some particular behaviour. A const system is a way of letting the type-system formalise some of an object's promises, no?
I don't think this is particularly 'leaky' (in the sense of leaky abstractions). Java's String class doesn't let me access its internal character array, but it still matters to me that it promises never to mutate it, nor to let anyone else mutate it (at least ignoring reflection). That's relevant at the level of the interface, not only at the level of the implementation.
I get what you're saying and I don't really disagree with you. An object's methods are an interface and its method signatures are a contract about what "messages" (in Alan Kay parlance) it will accept and return.
A C++ style const system would seem to be compatible with that.
And, in every practical sense, I would love such a thing existing in Java. I don't give a crap about whatever "OOP philosophy" and purity, even if my statement were correct/true.
However, (and this is just navel-gazing, honestly), adding const to object methods is exposing information about its internal state. That's not very "objecty" in the Alan-Kay-ish, Actor-model-ish sense. An object's internal state is "none of your business."
> Java's String class doesn't let me access its internal character array, but it still matters to me that it promises never to mutate it, nor to let anyone else mutate it (at least ignoring reflection).
I feel like this is a little different. Strings in Java are technically a class, but they're really treated like primitives (evidenced by the fact that literals are magically made into String objects).
But, it doesn't really matter. I agree. It's great that String promises to be immutable.
I'd argue that immutable class instances aren't really "objects" anymore- they're just (possibly opaque) data types.
> An object's internal state is "none of your business."
An object's state is my business, as immutable objects can be used in ways that mutable ones cannot. They can be passed to arbitrary functions with no need for defensive copying. They can also be useful in concurrent programming. None of that means breaching the separation of interface and implementation.
> Strings in Java are technically a class, but they're really treated like primitives (evidenced by the fact that literals are magically made into String objects).
Immutable objects can generally be treated as values, that's their charm. There's a good talk on this topic, The Value of Values. [0]
> immutable class instances aren't really "objects" anymore- they're just (possibly opaque) data types
They're certainly still objects. The essence of object-orientation is in dynamic dispatch, not in stateful programming.
> An object's state is my business, as immutable objects can be used in ways that mutable ones cannot. They can be passed to arbitrary functions with no need for defensive copying. They can also be useful in concurrent programming. None of that means breaching the separation of interface and implementation.
I'm not advocating for object oriented programming. What I'm saying is that if you "buy in" to the actual, abstract, concept of object oriented programming, then the internal structure or state of the object you're communicating with is, by definition, out of your control. Of course, in practice, you know that sending a "+ 3" message to the object "Integer(2)" is always going to return the same result, but you have no idea if the Integer(2) object you're talking to is logging, writing to a database, tweeting, or anything else. And in "true" OOP, you're not supposed to know- you just take your Integer(5) response message and go on your way. When I say "true OOP" I'm thinking about something like Smalltalk or an Actor framework/language.
I'm not talking about anything practical here. Just the "pure" concepts. Obviously, Java has made pragmatic choices to allow escape hatches from "true" OOP in a few places: unboxed primitives, static methods, and a handful of other things, probably.
So it's just very un-Smalltalk-like for an object's API/protocol/contract to make any kind of reference or promise about its internal state at all. That is implementation in a pure OO sense.
> if you "buy in" to the actual, abstract, concept of object oriented programming, then the internal structure or state of the object you're communicating with is, by definition, out of your control
That's not specific to OOP though, it's a very general concept in programming.
A program is generally decomposed into smaller units which make some promise about how they will behave, hiding their internal workings from the programmer who makes use of them. This is just as true for C/Forth/Haskell as for Python/Java/Smalltalk, depending on how a program is designed.
> you have no idea if the Integer(2) object you're talking to is logging, writing to a database, tweeting, or anything else. And in "true" OOP, you're not supposed to know- you just take your Integer(5) response message and go on your way
Right, you're meant to interact with an object in such a way that you rely only on the documented behaviour that the object promises to provide, you aren't meant to rely on knowledge of its internals. Objects are also a good way of cleanly separating concerns, and then composing the solutions.
On further thought I got it wrong earlier. You're right that internal state isn't my business, but immutability isn't about internal state.
Whether String (or some other class) is mutable or not isn't an implementation detail, it's an important property of the public interface offered by the class, and it's only a property of the public interface. I don't care whether my JVM implements String in Java or in assembly code, neither do I care if it's immutable internally, but I do care that the implementation satisfies the advertised behaviour of the class, and String promises to be (that is, to appear) immutable.
The internal implementation is required to meet the constraints imposed by the class's public interface, and in the case of String, those constraints include that the class must appear immutable to the user, even under concurrent workloads. In principle the implementation is permitted to have mutable internal state, provided the object always appears immutable to the user.
Similarly, whether a class is thread-safe, is a public-facing attribute of the class. The class can implement thread-safety any way it wants.
> EXACTLY. It's f-ing stupid. C's excuse was compilers doing magic on UB or whatever. Java has no such excuse. They just wanted it to behave the same as C/C++ to attract C++ devs.
But... as you yourself are saying, Java's behavior is not "the same as C/C++". Java wraps while in C and C++ signed overflow is undefined. (Interestingly, C++ is now moving away from UB for this, and defining wrapping semantics. While I'm not one for proof by authority, it looks like some very well-informed people disagree with you about the usefulness of this feature.)
Signed integer overflow checking can be almost free. Until it isn't, because it doesn't play nicely with SIMD code. So the code you want to run fastest will pay the biggest price. This article is from 2016 so take it with a grain of salt, but it looks like this can cause 20% to 40% slowdowns: https://blog.regehr.org/archives/1384
I understand that there are performance implications.
But a 20% to 40% slowdown for number crunching in a language that is primarily designed for writing super indirection-heavy, heap-allocation-heavy, application architectures is just nothing.
Having some kind of high performance math section of the standard library would be fine. But the default behavior is, frankly, dangerous. And for a 20% speed up on operations that are probably far less than 1% of the typical Java application?
> a language that is primarily designed for writing super indirection-heavy, heap-allocation-heavy, application architectures
Are there Java design documents that describe the language in these terms, as opposed to something like "a general-purpose object oriented language"?
> But the default behavior is, frankly, dangerous.
You keep saying variations of this, but you haven't really made the case.
True, if you increment a number, you will typically expect the result to be greater. But how many application domains are there where 2^32 - 1 is really the exact upper limit of the range of valid values? I would think that in most cases catching a overflow would come much too late, because the actual error is exceeding some application-specific limit rather than the artificial limit of the range of int. Or put differently, I bet 99.9% of ArrayIndexOutOfBounds errors are because indices leave their legal range without ever overflowing int.
> Are there Java design documents that describe the language in these terms, as opposed to something like "a general-purpose object oriented language"?
I'm sure there aren't. And truthfully, I understand that Java was supposed to be efficient enough to run on small devices and whatnot. But if you look at the evolution of the language as well as where it's mostly used in recent history (no more web browser applets, for example), it seems to me that it has a bit of an identity crisis. Is it the low level implementation language of the JVM platform, or is it a high level app development language?
> True, if you increment a number, you will typically expect the result to be greater.
I'd say this is a pretty big deal for people who are reasoning about code.
> But how many application domains are there where 2^32 - 1 is really the exact upper limit of the range of valid values? I would think that in most cases catching a overflow would come much too late, because the actual error is exceeding some application-specific limit rather than the artificial limit of the range of int.
Agreed. But I've almost never seen code that actually checks value ranges before and after math operations. And Java doesn't make it easy or efficient to do a "newtype" pattern so that the types actually are limited in any meaningful way.
Instead, most enterprisey backend systems I've worked on just accept a JSON request, deserialize some `quantity` fields to an `int`, and go to town with it.
> Or put differently, I bet 99.9% of ArrayIndexOutOfBounds errors are because indices leave their legal range without ever overflowing int.
I'm sure that's true. But indexing a collection is not the issue I was thinking about.
> I believe Java did this deliberately to avoid the trouble that C and C++ have with signed and unsigned integer types having to coexist.
The problems really only come from mixing those types, and the simple solution is to disallow such mixing without explicit casts in cases where the result type is not wide enough to represent all possible values - this is exactly what C# does.
I think Java designers just assumed that high-level code doesn't need those, and low-level code can use wrappers that work on signed types as if they were unsigned (esp. since with wraparound, many common operations are the same).
> Java-style wrapping integers should never be the default
The ironic thing about this one is that C# introduced "checked" and "unchecked" specifically to control this... and then defaulted to "unchecked", so most C# code out there assumes the same. Opportunity lost.
While we're on the subject of numeric types - the other mistake, IMO, is pushing binary floating point numbers as the default representation for reals. It makes sense perf-wise, sure - but humans think in decimal, and it makes for a very big difference with floats, that sometimes translates to very expensive bugs. At the very least, a modern high-level language should offer decimal floating-point types that are at least as easy to use as binary floating-point (e.g. first-class literals, overloaded operators etc).
C# almost got it right with "decimal"... except that fractional literals still default to "double", so you need to slap the "M" suffix everywhere. It really ought to be the other way around - slower but safer choice by default, and opt into fast binary floating-point where you actually need perf.
> At least Java has the defence that they didn't know how it would pan out. C# has no such excuse in copying Java.
I think both Java and C# did it as an attempt to offer some generic data structure that could cover as many use cases as possible, since neither had user-defined generic types. In retrospect, it was an error - but before true generics became a thing, it was also a godsend in some cases.
My opinion is that a high-level language like Java has no business making me guess how many bytes my numeric values will occupy. It's insane. Since when does Java give a crap about memory space? "Allocations are cheap!" they said. "Computers are fast!" they said about indirection costs. Then they stopped and asked me if I want my number to occupy 1, 2, 4 or 8 bytes? Are you kidding me?
Yes, you should have those types available so that your Java code can interact with a SQL database, or do some low-ish level network crap, or FFI with C or something. But the default should basically be a smart version of BigInteger that maybe the JVM and/or compiler could guesstimate the size of or optimize while running.
Thus, IMO, there should be a handful of numeric types that are strict in behavior and do not willy-nilly cast back and forth. Ideally you'd have Integer, UInteger, PositiveInteger, and a similar suite for Decimal types.
Schemes have done numbers correctly since basically forever.
> the default should basically be a smart version of BigInteger that maybe the JVM and/or compiler could guesstimate the size of or optimize while running.
I suspect this would be disastrous for performance. I believe Haskell uses a similar approach though.
Sometimes you want to store 20 million very small values in an array. Forcing use of bigint would preclude doing this efficiently (in the absence of very smart compiler optimisations that is).
As int_19h points out, the Ada approach lets us escape the low-level world of int8/int16/int32/int64 while retaining efficiency and portability and avoiding use of bigint.
> there should be a handful of numeric types that are strict in behavior and do not willy-nilly cast back and forth
I agree that reducing the number of implicit conversions allowed in a language is generally a good move. Preventing bugs is typically far more valuable than improving writeability. This is another thing Ada gets right.
> I suspect this would be disastrous for performance. I believe Haskell uses a similar approach though.
>
> Sometimes you want to store 20 million very small values in an array. Forcing use of bigint would preclude doing this efficiently (in the absence of very smart compiler optimisations that is).
I suspect that it would. I also suspect that I don't care. :p
We're talking about Java. Yes, you can write high-performance Java and I wouldn't want to take that option away. But look at the "default" Java application. You have giant graphs of object instances- all heap allocated, with tons of pointer chasing. You have collections (not arrays) that we don't have to guess the maximum size of.
If you're storing 20 million small values in an array, then go ahead and use byte[] or whatever. But that should be in some kind of high performance package in the standard library. The "standard" Integer type should err toward correctness over performance- the very same reason Java decided to be "C++ with garbage collection".
I'm also not literally talking about the BigInteger class as it's written today. I'm talking about a hypothetical Java that exists in a parallel universe where the built-in Integer type is just arbitrarily large. It could start with a default size of 4 or 8 bytes, since that is a sane default. Maybe the compiler would have some analysis that sees the number could never actually be large enough to need 4 bytes and just compile it to a short or byte. These things should be immutable anyway, so maybe the plus operator can detect overflow (or better if the JVM could do some kind of lower-level exception mechanism so the happy path is optimized) and upsize the returned value size. Remember, integer overflow doesn't actually happen very often- that's exactly the reason people don't typically complain about it or ever notice it (except me ;)), so it's okay if the JVM burps for a few microseconds on each overflow.
All this doesn't matter because it'll never, ever, actually happen. I just think they made the wrong call and it has unfortunately led to lots of real world bugs. It's hard to right correct, robust, software in Java.
I suspect the performance penalty would be so severe it might undermine the appeal of Java. I don't have hard numbers on this though, perhaps optimising compilers can tame it somewhat. Presumably Haskell does.
A more realistic change might be to have Java default to throwing on overflow. The addExact methods can give this behaviour in Java. In C# it's much more ergonomic: you just use the checked keyword in your source, or else configure the compiler to default to checked arithmetic (i.e. throw-on-exception). This almost certainly brings a performance penalty though.
Yeah, I don't have any real intuition about the performance cost, either. But real-world Haskell problems do fine, as you said. And Haskell has fast-math libraries that, presumably, give you the fast-but-risky C arithmetic.
I also agree that a "more realistic" option is to just throw on overflow by default, the same way we throw on divide-by-zero.
OP mentioned "Ada's approach to types", as well. Ada lets you write stuff like "T is range 1 .. 20" or "T is range -1.0 .. 1.0 digits 18". This then gets mapped to the appropriate hardware integer or floating-point type.
Yeah, I've read little snippets like that from blog posts and stuff, but I've never written a single line of Ada, so I really don't know how that works out in practice.
What happens if you overflow at runtime? A crash, I assume/hope?
My point of view is that this is the opposite of what I'm talking about anyway. Java is a high level language where we are usually writing in Java because we're agreeing to give up a lot of raw performance (heap allocations, tons of pointer chasing) in order to have convenient models (objects) and not have to worry about memory management, etc.
In light of the above, I don't see why the default for Java is to have these really nitty-gritty numeric types. I don't want to guess how big a number can be before launching my cool new product. Just like I don't use raw arrays in Java and have to guess their max size- I just use List<> and it will grow forever.
> What happens if you overflow at runtime? A crash, I assume/hope?
In Ada, if range constraints are broken at runtime, a Constraint_Error is raised (or 'thrown', if you prefer). [0] (That's assuming of course that range checks haven't been disabled, which is an option that Ada compilers offer you.)
> I don't see why the default for Java is to have these really nitty-gritty numeric types
At the risk of retreading our earlier discussion:
I think the short answer is performance. Java has lofty goals of abstraction, yes, but it also aims to be pretty fast. If it didn't, its appeal would diminish considerably, so it's reasonable that they struck a balance like this. Same goes for why primitives aren't objects.
It depends on the base type - you can get the traditional unsigned integer wraparound behavior, too. But Ada is very explicit about this, to the point of referring to them as "modulo types", and defining them using the mod keyword instead of range:
Think of range of permissible values as a contract. I agree that the default should be "no limit", but there are many cases where you do, in fact, want to limit it, that have nothing to do with performance per se - but if the language has direct support for this, then it can also use the contract to determine the most optimal representation.
I think that's a mixed blessing. I believe Java did this deliberately to avoid the trouble that C and C++ have with signed and unsigned integer types having to coexist. Personally I've never been inconvenienced by Java's lack of unsigned integer types, but I'm sure it can be annoying in some situations.
I'm quite fond of Ada's approach to integer types, but I suspect I'm in a minority.
> Silent integer overflow/wrap-around. It's not C- did it really have to copy this insanity?
Curiously this cropped up 10 days ago. [0] You're not alone. The great John Regehr put it thus: [1]
> Java-style wrapping integers should never be the default, this is arguably even worse than C and C++’s UB-on-overflow which at least permits an implementation to trap.
> The fact that arrays got type variance wrong.
At least Java has the defence that they didn't know how it would pan out. C# has no such excuse in copying Java.
> No concept of `const` or immutability.
I recall a Java wizard commenting that although a const system is the sort of feature that aligns with Java's philosophy, it's just too difficult to retrofit it.
[0] https://news.ycombinator.com/item?id=26538842
[1] https://blog.regehr.org/archives/1401