Some thoughts. The people I know who do work in machine learning (Jon Kleinberg,...

jorgeortiz85 · on May 24, 2010

Starting with version 2.7.2, Scala added an experimental feature (which will no longer be experimental as of 2.8.0) called Manifests that allow you to selectively reify erased types. See this blog post for more: http://www.scala-blogs.org/2008/10/manifests-reified-types.h...

This, combined with specialization (compiling primitive-specialized versions of code with generic types to improve performance of, for example, using function types on primitive collections) will make it easier to write performant, numeric code in Scala than in any other JVM language (up to the limits of the JVM, of course).

The NLP communities at Stanford and Berkeley, which do a fair amount of machine learning, have a long history of working in Java (http://nlp.stanford.edu/software/, http://nlp.cs.berkeley.edu/Main.html#Software) and have recently started using Scala as well (http://www.scalanlp.org/).

Scala's interface with Java collections IS a bit clunky. Thankfully, this is a library issue, not a language issue, and can be solved with better libraries. See http://github.com/jorgeortiz85/scala-javautils (for Scala 2.7.x) and http://github.com/scalaj/scalaj-collection (for Scala 2.8.x).

crux_ · on May 24, 2010

> Another issue with Scala is that the interface with Java, particularly using collections, is awkward; [...]

Really? For me one of Scala's best attributes, by far, is the ability to use its implicit conversions to make awkward Java APIs smooth like butter.

> One problem is that, based on the JVM, type erasure is a big "broken window" that seriously damages the Scala type system.

I'm not sure I understand or agree. Understand: If you don't use casting, Scala's type system seems just as strong as ML, albeit not as good at inference. (F# has the exact same inference weakness as soon as you start using OO and inheritance.) Agree: having the ability to consciously drop into a dynamically typed/runtime-casted world for small bits of code is quite nice (true for F# too). And I've yet to see a place where erasure (a) caused, even in theory, a runtime type error, or (b) lost any type information within the scala language at compile or link time (which is where the type checking lives). I think it's a red herring / mere implementation detail.

> As for oCaml, the real excitement is in F#. You get the wonderful oCaml language and all of its features, plus you get an interface to the C#/.NET world that's much nicer than Scala's interface to the JVM.

Counterpoint: I've written production code in both Scala and F#; give me Scala any day of the week. I really wish its .NET support was more mature; I'd ditch it in a heartbeat for Scala if I could.

Artemidoros · on May 24, 2010

Using implicits is really nice to clean up the interface to some Java APIs (especially for APIs making heavy usage of anonymous inner objects) but I experienced more ugly boilerplate than I would like when having to use Java collection classes from Scala, too.

To be more specific, when I want to transform a collection given by a Java library in a functional manner, I usually ended up with one or two localized import statements - to avoid e.g. scala.collection.mutable leaking into my other methods - and additional calls to convert the Java collection into a Scala specific and back again. This doesn't look like a big issue, but it increased the line count for a lot of cases from '1' to 3-5 ... which puts it uncomfortably close to 'dumb for loop territory'.

Admittedly, this is a small price to pay compared to the constant annoyance that is Java, but for my part I fear that there might be a slight tendency in Scala to fix interesting problems preferably to useful ones.

... lost any type information within the scala language at compile or link time...

How about this use case?

match { case x: List[Foo] => ... case x: List[Bar] => ... }

give me Scala any day of the week

Why do you have the preference for Scala? I'm curious because Scala is my language of choice for my personal projects, but I would ditch it the second a credible F# derivative would appear on the JVM (F#'s stronger emphasis on functional programming, cleaner syntax and abstaining from 'fixing typing' in OOP are the reasons for my preferences).

chc · on May 24, 2010

Out of curiosity, what do you mean by "fixing typing"? (I'm genuinely curious. I don't know Scala and F# well enough to really understand.)

crux_ · on May 24, 2010

My guess as to what the poster was about (warning: sleep deprived today, rambling ahead):

F# has basically two type systems -- the O'Caml/ML one, and C#'s. As a plus, you get full type inference so long as you stick with ML discriminated unions or records; as a minus, you lose the ability to apply concepts from OO without converting them to classes -- in which case you lose the ability to apply some handy ML-isms.

Scala tries to pull off a fairly deep unification of OO with functional types; for example making discriminated unions into case classes; or blending OO ideas of objects and classes with ML ideas about modules (including functors) into a single concept. It's far more ambitious in this regard than F#, but not necessarily an unqualified success either.

Artemidoros · on May 24, 2010

yes thats about what I meant, and more concise than my own explanation :-)

Artemidoros · on May 24, 2010

e.g. is SomeCollection<String> a subtype of SomeCollection<Object> (using Java like Syntax)? For some type of collections in some kind of circumstance this might be sensible (e.g. when your collections are immutable), sometimes having an inverse relationship might be appropriate and in some case you don't want these two types to be related at all (like Java collections - Arrays being the exceptions).

Co/Contravariance (maybe View Bounds?) in Scala enable you to encode this relationship, but from my understanding are somewhat handicapped by type erasure.

Another problem Scala tackles is that using inheritance as a 'code sharing facility' is a bit tricky (e.g. tractability and fragile base class problem) and not composable. Scala's traits and Self Types are a real improvement in this regard. That the order of mixing in traits can have an influence the objects behavior while not its type can be seen as a problem though.

F# takes the .Net object system 'as is' and introduces 'types' more aligned to functional programming (aka. discriminated unions) as separate entities. It does not try to integrate both concepts as does Scala with its case classes.

crux_ · on May 24, 2010

> To be more specific, when I want to transform a collection given by a Java library in a functional manner, I usually ended up with one or two localized import statements - to avoid e.g. scala.collection.mutable leaking into my other methods - and additional calls to convert the Java collection into a Scala specific and back again.

I think the other poster was right about this being more of a library issue than a language issue. In my experience it was rare to do more than wrap Java collections to or from an Iterable[]; that could be done easily without having to bring full mutable collections into the namespace.

Also, it might be part of the cost of working with "enterprise" Java, but it was actually really rare for me to work with a vanilla Java collection rather than some library's implementation of its own damn 'typesafe' iterator, like this: http://xerces.apache.org/xerces-j/apiDocs/org/w3c/dom/NodeLi.... It was super-nice to toss together my own wrappers of some library or another's semi-standard iteration API and have a ton of functionality come along via the magic of mixins; and get it all automatically applied for the cost of a single import statement via implicits.

> match { case x: List[Foo] => ... case x: List[Bar] => ... }

I suppose such a thing is made impossible via erasure, it's true. I haven't encountered a need for it -- maybe because the problem there is erasure + dynamic/runtime typing, rather than erasure on its own. Also it's something that's coming for Scala, it sounds like. (I haven't paid as much attention as I should to its ongoing development..)

> Why do you have the preference for Scala?

Part of it's just the intuitive feel: When I code in Scala, I'm writing in Scala -- it might be a fairly huge language, but it is its own thing. Coding in F# feels like bouncing back and forth between C# and O'Caml, depending on how functional you're feeling at the moment -- here's a C#-ish clause; there's an O'caml-esque one.

As an example, I don't believe you can use both inheritance and discriminated unions in F#. In Scala you can and it's actually quite useful.

Another example is that in F# you can omit type declarations on function arguments until you start using OO, then they become mandatory and infect your code. On the one hand the extra inference is nice but on the other it really drives home that you are coding in two languages, not one.

Besides that intuitive part, I think that Scala's module/class-level type system, although complicated, helps tremendously for writing big, complex, programs. Traits/mixins, flexibility in type constraints, and allowing types as members of other types add up to a really powerful ability to modularize without sacrificing. At least for me, a lot of my older O'Caml projects made fairly heavy use of functors. F# didn't bother to try to support the idea and only supports plain-vanilla C# interfaces. Scala embraced them, extended it, and made them much better.

Artemidoros · on May 24, 2010

In my experience it was rare to do more than wrap Java collections to or from an Iterable[]

Could you give me a simple example how say filtering and mapping a collection received by a java library and pushing it back to a method expecting a collection of the original type would look like? (quite likely that I overcomplicate this in my own code)

crux_ · on May 26, 2010

Sorry for the late reply.

Assuming you want your maps and filters to be purely functional, you need only two simple methods: One that constructs an object that implements the Iterable[X] trait from the java object, and another that constructs an instance of the java collection from an Iterable[X]. All the implementation that you'd like is already done for you in the iterable trait, but you can always selectively override them if you'd like to provide a more tailored implementation.

Then it's up to your taste whether you'd like the conversion methods to be implicit (and thus available with a single import but making your code more 'magic') or explicit (single import + adding calls to wrap/unwrap methods).

Total overhead: centralized conversion methods, assuming you write them yourself = implementation of elements(), and in the iterator, next() and hasNext(). One import per module that uses the conversions, and optionally, explicit calls to perform the conversion.

_ivvf · on May 24, 2010

Scala 2.8 has manifests, which reify generic types. What do you wish you could do with reified types that you can't do with manifests?

Also, in Scala 2.7 I wrote my own library to wrap Java collections that didn't take much time or effort and allowed me to seamlessly pass scala collections to java code and use java collections the same as scala ones. Scala 2.8 is supposed to have such a library built in so you don't have to write it.

Lastly, your comparison between Scala's functional lists and Java's ArrayList is pointless. Scala has an ArrayBuffer class too, so you can choose between either as appropriate. You can also pattern match against ArrayBuffer as well.

Artemidoros · on May 24, 2010

Note that there is a good open source implementation of the .NET framework, Mono, so you can develop in C# and F# and target Linux, MacOS and other platforms.

Is Mono actually usable with F#? The last time I checked (2 months ago) the F# plugin for MonoDevelop didn't work and the questions on how to getting it to work (by various people) went unanswered by the developer.

The runtime performance of F# code was shaky at best, the REPL had (on OS X at least) startup times to make it practically useless and without TCO the usefulness of F# really took a hit.

Not wanting to bash on Mono or F# (which is my favorite language), but writing production software with this combination is currently not something I would recommend.

jopamer · on May 24, 2010

[Disclaimer: I'm a compiler dev working on F# at MSFT.]

Performance and user experience on Mono are something we're definitely trying to improve - we spent a fair amount of time trying to make our Mono story better for our 2.0 release, and plan on investing more in this area as we move forward.

We can't be everywhere at once, so your feedback is really important here - If you're having problems with our tools on Mono, please let us know via Microsoft Connect or fsbugs@microsoft.com. Also, don't be bashful about filing Mono bugs. So far, they've been really great about responding to any issues that have cropped up.

Thanks!

Artemidoros · on May 24, 2010

Thats awesome! Thanks for sharing. I'll give Mono/F# another spin in the next week and will report problems as suggested by you. Again thanks.

vigalchin · on May 29, 2010

On http://github.com/vasili/FSharpBinding is an FSharpBinding that works with MonoDevelop. Please read the README to under- stand current functionality. I have tested with new F# projects plus existing F# projects like Mandelbrot sets, Stephen Wolfram's Rule 30, etc. There is still a fair amount of polishing that needs to be done. That is a work-in-progress.

Kind regards,

Vasili I. Galchin

vigalchin · on May 29, 2010

Hello,

     On http://github.com/vasili/FSharpBinding is an FSharpBinding that works with MonoDevelop. Please read the README on this web site. I have tested new F# projects and several existing F# projects like Mandelbrot set, Stephen Wolfram's Rule 30, etc. This binding still needs polishing .... working on that with other colleagues.

Kind regards,

Vasili I. Galchin

10ren · on May 24, 2010

I was curious if F# is really that close to ocaml, and it seems they are related: http://stackoverflow.com/questions/179492/f-and-ocaml with some differences (eg. F# doesn't seem to have those crazy +. *. etc float operators for type inference).

Here's a ridicuously detailed table comparing F#, ocaml, haskell, scala, ML (ie. everything in the article but scheme), at the aptly named http://hyperpolyglot.wikidot.com/ml

derefr · on May 24, 2010

> The people I know who do work in machine learning work in straight-up C, not Haskell, as the author suggests.

Is their work in applying ML theory, or creating more of it? ML almost always involves large datasets in practical use, but when you're trying to come up with a new method for ensuring correctness in some edge-case, you would be testing it with hypothetical extreme inputs, not feeding it mounds of data that won't trigger the problem. Haskell works well to model computer science research problems—that is, research into CS, not using CS to research other domains. Once your model has "solidified", you should by all means optimize it in C before running it on real-world data.

sid0 · on May 24, 2010

Yeah, it's clear that more than a bit of PL research has fed back into C# as a language and .NET as a whole, in stark contrast to Java/JVM.