The people I know who do work in machine learning (Jon Kleinberg, Thorsten Joachims, and their students) work in straight-up C, not Haskell, as the author suggests. Most of the dominant methods used today, such as the SVM, boosting, and the dynamic programming methods used in genomics are (i) numerical more than symbolic and (ii) involve moving a LOT of data around, so you want control of your data structures. (Sometimes these guys compress in-memory pointers!)
I looked long and hard at Scala and decided that I didn't like what I saw. One problem is that, based on the JVM, type erasure is a big "broken window" that seriously damages the Scala type system. You can use generics for ~years~ in Java and never notice the problems that type erasure causes, but I find that 2/3 of the designs that I specifically want an advanced type system for do not work in Scala because of type erasure.
Another issue with Scala is that the interface with Java, particularly using collections, is awkward; people who want to work in the JVM but want concision might be happier with Groovy. Scala's Lispy-Lists are convenient for the pattern matching capabilities in Scala, but are much less efficient than the vector-based Lists that come with Java when your data gets big.
As for oCaml, the real excitement is in F#. You get the wonderful oCaml language and all of its features, plus you get an interface to the C#/.NET world that's much nicer than Scala's interface to the JVM. You've got access to the whole .NET framework base class library, plus all sorts of stuff that's been written for .NET.
I'd also put C# on the border between a "mainstream" and "advanced" language. C#'s generics implementation just added support for covariant and contravariant inheritance, which is one of the features that had me interested in Scala. C# has good lambdas, really neat stuff in LINQ, and "expression trees" which offer metaprogramming capabilities above and beyond any static language I've seen.
Note that there is a good open source implementation of the .NET framework, Mono, so you can develop in C# and F# and target Linux, MacOS and other platforms.
Starting with version 2.7.2, Scala added an experimental feature (which will no longer be experimental as of 2.8.0) called Manifests that allow you to selectively reify erased types. See this blog post for more: http://www.scala-blogs.org/2008/10/manifests-reified-types.h...
This, combined with specialization (compiling primitive-specialized versions of code with generic types to improve performance of, for example, using function types on primitive collections) will make it easier to write performant, numeric code in Scala than in any other JVM language (up to the limits of the JVM, of course).
> Another issue with Scala is that the interface with Java, particularly using collections, is awkward; [...]
Really? For me one of Scala's best attributes, by far, is the ability to use its implicit conversions to make awkward Java APIs smooth like butter.
> One problem is that, based on the JVM, type erasure is a big "broken window" that seriously damages the Scala type system.
I'm not sure I understand or agree. Understand: If you don't use casting, Scala's type system seems just as strong as ML, albeit not as good at inference. (F# has the exact same inference weakness as soon as you start using OO and inheritance.) Agree: having the ability to consciously drop into a dynamically typed/runtime-casted world for small bits of code is quite nice (true for F# too). And I've yet to see a place where erasure (a) caused, even in theory, a runtime type error, or (b) lost any type information within the scala language at compile or link time (which is where the type checking lives). I think it's a red herring / mere implementation detail.
> As for oCaml, the real excitement is in F#. You get the wonderful oCaml language and all of its features, plus you get an interface to the C#/.NET world that's much nicer than Scala's interface to the JVM.
Counterpoint: I've written production code in both Scala and F#; give me Scala any day of the week. I really wish its .NET support was more mature; I'd ditch it in a heartbeat for Scala if I could.
Using implicits is really nice to clean up the interface to some Java APIs (especially for APIs making heavy usage of anonymous inner objects) but I experienced more ugly boilerplate than I would like when having to use Java collection classes from Scala, too.
To be more specific, when I want to transform a collection given by a Java library in a functional manner, I usually ended up with one or two localized import statements - to avoid e.g. scala.collection.mutable leaking into my other methods - and additional calls to convert the Java collection into a Scala specific and back again. This doesn't look like a big issue, but it increased the line count for a lot of cases from '1' to 3-5 ... which puts it uncomfortably close to 'dumb for loop territory'.
Admittedly, this is a small price to pay compared to the constant annoyance that is Java, but for my part I fear that there might be a slight tendency in Scala to fix interesting problems preferably to useful ones.
... lost any type information within the scala language at compile or link time...
How about this use case?
match {
case x: List[Foo] => ...
case x: List[Bar] => ...
}
give me Scala any day of the week
Why do you have the preference for Scala? I'm curious because Scala is my language of choice for my personal projects, but I would ditch it the second a credible F# derivative would appear on the JVM (F#'s stronger emphasis on functional programming, cleaner syntax and abstaining from 'fixing typing' in OOP are the reasons for my preferences).
My guess as to what the poster was about (warning: sleep deprived today, rambling ahead):
F# has basically two type systems -- the O'Caml/ML one, and C#'s. As a plus, you get full type inference so long as you stick with ML discriminated unions or records; as a minus, you lose the ability to apply concepts from OO without converting them to classes -- in which case you lose the ability to apply some handy ML-isms.
Scala tries to pull off a fairly deep unification of OO with functional types; for example making discriminated unions into case classes; or blending OO ideas of objects and classes with ML ideas about modules (including functors) into a single concept. It's far more ambitious in this regard than F#, but not necessarily an unqualified success either.
e.g. is SomeCollection<String> a subtype of SomeCollection<Object> (using Java like Syntax)? For some type of collections in some kind of circumstance this might be sensible (e.g. when your collections are immutable), sometimes having an inverse relationship might be appropriate and in some case you don't want these two types to be related at all (like Java collections - Arrays being the exceptions).
Co/Contravariance (maybe View Bounds?) in Scala enable you to encode this relationship, but from my understanding are somewhat handicapped by type erasure.
Another problem Scala tackles is that using inheritance as a 'code sharing facility' is a bit tricky (e.g. tractability and fragile base class problem) and not composable. Scala's traits and Self Types are a real improvement in this regard. That the order of mixing in traits can have an influence the objects behavior while not its type can be seen as a problem though.
F# takes the .Net object system 'as is' and introduces 'types' more aligned to functional programming (aka. discriminated unions) as separate entities. It does not try to integrate both concepts as does Scala with its case classes.
> To be more specific, when I want to transform a collection given by a Java library in a functional manner, I usually ended up with one or two localized import statements - to avoid e.g. scala.collection.mutable leaking into my other methods - and additional calls to convert the Java collection into a Scala specific and back again.
I think the other poster was right about this being more of a library issue than a language issue. In my experience it was rare to do more than wrap Java collections to or from an Iterable[]; that could be done easily without having to bring full mutable collections into the namespace.
Also, it might be part of the cost of working with "enterprise" Java, but it was actually really rare for me to work with a vanilla Java collection rather than some library's implementation of its own damn 'typesafe' iterator, like this: http://xerces.apache.org/xerces-j/apiDocs/org/w3c/dom/NodeLi.... It was super-nice to toss together my own wrappers of some library or another's semi-standard iteration API and have a ton of functionality come along via the magic of mixins; and get it all automatically applied for the cost of a single import statement via implicits.
> match { case x: List[Foo] => ... case x: List[Bar] => ... }
I suppose such a thing is made impossible via erasure, it's true. I haven't encountered a need for it -- maybe because the problem there is erasure + dynamic/runtime typing, rather than erasure on its own. Also it's something that's coming for Scala, it sounds like. (I haven't paid as much attention as I should to its ongoing development..)
> Why do you have the preference for Scala?
Part of it's just the intuitive feel: When I code in Scala, I'm writing in Scala -- it might be a fairly huge language, but it is its own thing. Coding in F# feels like bouncing back and forth between C# and O'Caml, depending on how functional you're feeling at the moment -- here's a C#-ish clause; there's an O'caml-esque one.
As an example, I don't believe you can use both inheritance and discriminated unions in F#. In Scala you can and it's actually quite useful.
Another example is that in F# you can omit type declarations on function arguments until you start using OO, then they become mandatory and infect your code. On the one hand the extra inference is nice but on the other it really drives home that you are coding in two languages, not one.
Besides that intuitive part, I think that Scala's module/class-level type system, although complicated, helps tremendously for writing big, complex, programs. Traits/mixins, flexibility in type constraints, and allowing types as members of other types add up to a really powerful ability to modularize without sacrificing. At least for me, a lot of my older O'Caml projects made fairly heavy use of functors. F# didn't bother to try to support the idea and only supports plain-vanilla C# interfaces. Scala embraced them, extended it, and made them much better.
In my experience it was rare to do more than wrap Java collections to or from an Iterable[]
Could you give me a simple example how say filtering and mapping a collection received by a java library and pushing it back to a method expecting a collection of the original type would look like? (quite likely that I overcomplicate this in my own code)
Assuming you want your maps and filters to be purely functional, you need only two simple methods: One that constructs an object that implements the Iterable[X] trait from the java object, and another that constructs an instance of the java collection from an Iterable[X]. All the implementation that you'd like is already done for you in the iterable trait, but you can always selectively override them if you'd like to provide a more tailored implementation.
Then it's up to your taste whether you'd like the conversion methods to be implicit (and thus available with a single import but making your code more 'magic') or explicit (single import + adding calls to wrap/unwrap methods).
Total overhead: centralized conversion methods, assuming you write them yourself = implementation of elements(), and in the iterator, next() and hasNext(). One import per module that uses the conversions, and optionally, explicit calls to perform the conversion.
Scala 2.8 has manifests, which reify generic types. What do you wish you could do with reified types that you can't do with manifests?
Also, in Scala 2.7 I wrote my own library to wrap Java collections that didn't take much time or effort and allowed me to seamlessly pass scala collections to java code and use java collections the same as scala ones. Scala 2.8 is supposed to have such a library built in so you don't have to write it.
Lastly, your comparison between Scala's functional lists and Java's ArrayList is pointless. Scala has an ArrayBuffer class too, so you can choose between either as appropriate. You can also pattern match against ArrayBuffer as well.
Note that there is a good open source implementation of the .NET framework, Mono, so you can develop in C# and F# and target Linux, MacOS and other platforms.
Is Mono actually usable with F#? The last time I checked (2 months ago) the F# plugin for MonoDevelop didn't work and the questions on how to getting it to work (by various people) went unanswered by the developer.
The runtime performance of F# code was shaky at best, the REPL had (on OS X at least) startup times to make it practically useless and without TCO the usefulness of F# really took a hit.
Not wanting to bash on Mono or F# (which is my favorite language), but writing production software with this combination is currently not something I would recommend.
[Disclaimer: I'm a compiler dev working on F# at MSFT.]
Performance and user experience on Mono are something we're definitely trying to improve - we spent a fair amount of time trying to make our Mono story better for our 2.0 release, and plan on investing more in this area as we move forward.
We can't be everywhere at once, so your feedback is really important here - If you're having problems with our tools on Mono, please let us know via Microsoft Connect or fsbugs@microsoft.com. Also, don't be bashful about filing Mono bugs. So far, they've been really great about responding to any issues that have cropped up.
On http://github.com/vasili/FSharpBinding is an FSharpBinding
that works with MonoDevelop. Please read the README to under-
stand current functionality. I have tested with new F# projects
plus existing F# projects like Mandelbrot sets, Stephen
Wolfram's Rule 30, etc. There is still a fair amount of
polishing that needs to be done. That is a work-in-progress.
On http://github.com/vasili/FSharpBinding is an FSharpBinding that works with MonoDevelop. Please read the README on this web site. I have tested new F# projects and several existing F# projects like Mandelbrot set, Stephen Wolfram's Rule 30, etc. This binding still needs polishing .... working on that with other colleagues.
I was curious if F# is really that close to ocaml, and it seems they are related: http://stackoverflow.com/questions/179492/f-and-ocaml with some differences (eg. F# doesn't seem to have those crazy +. *. etc float operators for type inference).
Here's a ridicuously detailed table comparing F#, ocaml, haskell, scala, ML (ie. everything in the article but scheme), at the aptly named http://hyperpolyglot.wikidot.com/ml
> The people I know who do work in machine learning work in straight-up C, not Haskell, as the author suggests.
Is their work in applying ML theory, or creating more of it? ML almost always involves large datasets in practical use, but when you're trying to come up with a new method for ensuring correctness in some edge-case, you would be testing it with hypothetical extreme inputs, not feeding it mounds of data that won't trigger the problem. Haskell works well to model computer science research problems—that is, research into CS, not using CS to research other domains. Once your model has "solidified", you should by all means optimize it in C before running it on real-world data.
The people I know who do work in machine learning (Jon Kleinberg, Thorsten Joachims, and their students) work in straight-up C, not Haskell, as the author suggests. Most of the dominant methods used today, such as the SVM, boosting, and the dynamic programming methods used in genomics are (i) numerical more than symbolic and (ii) involve moving a LOT of data around, so you want control of your data structures. (Sometimes these guys compress in-memory pointers!)
I looked long and hard at Scala and decided that I didn't like what I saw. One problem is that, based on the JVM, type erasure is a big "broken window" that seriously damages the Scala type system. You can use generics for ~years~ in Java and never notice the problems that type erasure causes, but I find that 2/3 of the designs that I specifically want an advanced type system for do not work in Scala because of type erasure.
Another issue with Scala is that the interface with Java, particularly using collections, is awkward; people who want to work in the JVM but want concision might be happier with Groovy. Scala's Lispy-Lists are convenient for the pattern matching capabilities in Scala, but are much less efficient than the vector-based Lists that come with Java when your data gets big.
As for oCaml, the real excitement is in F#. You get the wonderful oCaml language and all of its features, plus you get an interface to the C#/.NET world that's much nicer than Scala's interface to the JVM. You've got access to the whole .NET framework base class library, plus all sorts of stuff that's been written for .NET.
I'd also put C# on the border between a "mainstream" and "advanced" language. C#'s generics implementation just added support for covariant and contravariant inheritance, which is one of the features that had me interested in Scala. C# has good lambdas, really neat stuff in LINQ, and "expression trees" which offer metaprogramming capabilities above and beyond any static language I've seen.
Note that there is a good open source implementation of the .NET framework, Mono, so you can develop in C# and F# and target Linux, MacOS and other platforms.