FWIW, we use blas just like matlab,numpy,r, and julia.
The speed depends on the blas implementation.
Nd4j has a concept of "backends". Nd4j backends allow us to sub in different blas implementations as well as different ways of doing operations.
We have a data buffer type that allows people to specify floating point or double. Those data buffers then have an allocation type that can be javacpp pointers,nio byte buffers (direct/offheap) or normal arrays
We are also currently working on surprassing the jvm's memory limits ourselves via javacpp's pointers. That allows us to have 64 bit addressing which people normally have access to in c++.
Every current jvm matrix lib that uses net lib java or jblas is going to have problems with jvm communications as well. The reason for this is passing around java arrays and byte buffers is slower than how we handle it which is via passing longs (raw pointer addresses) around that are addressed via unsafe or allocated in jni where we retain the pointer address directly. We expose that to our native operators.
We are solving this by writing our own c++ backend called libnd4j that supports cuda as well as normal openmp optimized for loops for computation.
We also offer a unified interface to cublas and cblas (which is implemented by openblas as well as mkl)
FWIW, I more or less agree with you, but it doesn't mean it shouldn't exist.
JVM based environments can bypass the jvm just like python does now.
The fundamental problem with the jvm is no one just took what works on other platforms and mapped the concepts 1 to 1.
Our idea with nd4j is to not only allow people to write their own backends, but also provide a sane default platform for numerical computing on the jvm. Things like garbage collection shouldn't be a hindrance for what is otherwise a great platform for bigger workloads (hadoop,spark,kafka,..)
In summary, we know the jvm has been bad till now - it's our hope for fixing that.
Looking at it, I don't see anything dealing with sparse matrices or factorization (e.g., LU, QR, SVD). All the Java libraries for SVD are pretty bad. Plus, none of your examples mention double precision. Does the library support it?
I find it interesting in the Numpy comparison no mention of the BLAS Numpy is linked to is mentioned, but it is for Nd4j. Numpy is highly dependent on a good BLAS and the basic Netlib one isn't that great.
We actually appreciate eedback like this thank you.
For net lib java, it links against any blas implementation you give it. It has this idea of a JNILoader which can dynamically link against the fallback blas (which you mentioned)
Lapack doesn't implement any sparse linear algebra. If you think the landscape of "Java matrix libraries" is fragmented, when really they're all just different takes on wrapping Blas and Lapack or writing equivalent functionality in pure Java, wait until you look into sparse linear algebra libraries. There's no standard API, there are 3ish common and a dozen less common different storage formats, only one or two of these libraries have any public version control or issue tracker whatsoever, licenses are all over the map. The whole field is a software engineering disaster, and yet it's functionality you just can't get anywhere else.
> However there are quite a few sparse blas and lapack implementations now.
There's the NIST sparse blas, and MKL has a similar but not exactly compatible version. These never really took off in adoption (MKL's widely used of course, but I'd wager these particular functions are not). What sparse lapack are you talking about?
> If you want to help us fix it we are hiring ;).
We were at the same dinner a couple weeks ago actually. I'm enjoying where I am using Julia and LLVM, not sure if you could pay me enough to make me want to work on the JVM.
The speed depends on the blas implementation.
Nd4j has a concept of "backends". Nd4j backends allow us to sub in different blas implementations as well as different ways of doing operations.
We have a data buffer type that allows people to specify floating point or double. Those data buffers then have an allocation type that can be javacpp pointers,nio byte buffers (direct/offheap) or normal arrays
We are also currently working on surprassing the jvm's memory limits ourselves via javacpp's pointers. That allows us to have 64 bit addressing which people normally have access to in c++.
Every current jvm matrix lib that uses net lib java or jblas is going to have problems with jvm communications as well. The reason for this is passing around java arrays and byte buffers is slower than how we handle it which is via passing longs (raw pointer addresses) around that are addressed via unsafe or allocated in jni where we retain the pointer address directly. We expose that to our native operators.
We are solving this by writing our own c++ backend called libnd4j that supports cuda as well as normal openmp optimized for loops for computation.
We also offer a unified interface to cublas and cblas (which is implemented by openblas as well as mkl) FWIW, I more or less agree with you, but it doesn't mean it shouldn't exist.
JVM based environments can bypass the jvm just like python does now.
The fundamental problem with the jvm is no one just took what works on other platforms and mapped the concepts 1 to 1.
Our idea with nd4j is to not only allow people to write their own backends, but also provide a sane default platform for numerical computing on the jvm. Things like garbage collection shouldn't be a hindrance for what is otherwise a great platform for bigger workloads (hadoop,spark,kafka,..)
In summary, we know the jvm has been bad till now - it's our hope for fixing that.