22 March 2009

Programming Language of Choice

Some of you know that I (at least used to be) a bit of a programming language snob. In fact, on several occasions, I've met (in NLP or ML land) someone who recognizes my name from PL land and is surprised that I'm not actually a PL person. My favorite story is after teaching machine learning for the second time, I had Ken Shan, a friend from my PL days, visit. I announced his visit and got an email from a student who had taken ML from me saying:

I _knew_ your name was familiar! I learned a ton about Haskell from your tutorial, for what's worth.. Great read back in my freshman year in college. (Belatedly) Thanks for writing it!

And it's not like my name is particularly common!

At any rate (and, admittedly, this is a somewhat an HBC-related question) I'm curious what programming language(s) other NLP folks tend to use. I've tried to include a subset of the programming language shootout list here that I think are most likely to be used, but if you need to write-in, feel free to do so in a comment. You can select as many as you would like, but please just try to vote for those that you actually use regularly, and that you actually use for large projects. Eg., I use Perl a lot, but only for o(100) line programs... so I wouldn't select Perl.


What programming language(s) do you use for large-ish projects?
C/C#/C++/Objective-C
D
Eiffel
Erlang
F#
Haskell
Java
Lisp
Matlab
OCaml/SML/ML
Perl
Python
R
Ruby
Scala
Sheme
Smalltalk
Other
  
Free polls from Pollhost.com

29 comments:

  1. You're entering dangerous terrain, possibly starting a programming language war. My experience is that most people who think programming language X is stupid do think so because they do not understand language X.
    Sometimes it is also the case that somebody does not grasp field Y, so language X seems evil. In my opinion this is the case for software design and Java. It's impressive how many people try to produce actual software in Java without having any idea about its inherent software design mechanics.

    ReplyDelete
  2. I'm hoping that we can all avoid the PL war debates as they're clearly a total waste of time :). (Incidentally, I basically agree with everything else you said.)

    ReplyDelete
  3. You just got me curious - what's java's "inherent software design mechanics"? :)

    [personally I mainly use Python for nlp]

    ReplyDelete
  4. I mostly use (and voted for) Python. But it is inherently horrifyingly slow for any ML-lish or number-heavy project. I am now trying to get to grips with Scala as a snappier replacement with some cool features and Java's library support, but it is too early to give any real judgement of it just yet.

    I am also intrigued about Java's inherent software design mechanics.

    ReplyDelete
  5. What about Prolog ?
    I was surprised no to see Prolog on the list.
    Personally, i use Java but I was taught Prolog is typical NLP language, well it was built up for Machine Translation.

    ReplyDelete
  6. I find Python + Java (via Jython) very convenient.

    ReplyDelete
  7. Hi Hal,

    I'm surprised you lumped C# in with C and C++. C# is a Microsoft language that may have some syntactical similarities but in my opinion is used in very different realms. Though maybe it's different in the NLP world. Not a big deal, but C# would probably make sense as its own category.

    ReplyDelete
  8. One thing I've been finding recently is that a lot of NLP is actually very very suitable for unix style programming where you operate on streams of text. At this point the answer becomes much more flexible, because you end up with a lot of small components glued together with scripts.

    e.g. I recently migrated a chunk of our code from Ruby + SQL to a couple of small Java programs + some standard unix utilities (written in C of course) + a mix of sed, awk and bash (this wasn't actually anything that NLP specific - it's a basic data munging application for calculating correlations between features). It has no trouble talking to the rest of the application because the interface to it is utterly language agnostic.

    ReplyDelete
  9. Is Sheme a new Shell/Scheme lovechild I didn't know of or simply a typo:)?

    In follow-up to the comment about mixing C# with C/C++, not even C++ should be mixed with C.

    ReplyDelete
  10. I'd take Steve's argument a step further: as an Avid F# (and .NET user) I wonder how many people know about the one big advantage of .NET: the whole platform is language agnostic, you write in one language and use in another. My setup: a numerical library in C++, a wrapper in C#, my machine learning code in F# (go algebraic datatypes and higher order functions!) and my experiments scripted in (Iron)Python.

    The right tool for the right job ...

    ReplyDelete
  11. The "right" answer to this question is highly task dependent. Do I need to invert a small dense matrix or a large sparse one? Do I have more data than will fit in memory? Do I need to deliver a working, threadsafe application to integrate into a web server? Am I teaching NLP to a class? Am I a theoretician who thinks in vectors and matrices? Is the programming exploratory or production-oriented? Are the algorithms pretty well known, or will there be a lot of fiddling?

    I used to teach Prolog in our intro NLP class at Carnegie Mellon before the turn of the century (i.e. after the internet, before the web). I haven't touched it in years. It's great for backtracking search and matching with unification, and a pain for just about everything else because there's no random access.

    I've been using R a lot lately to interface with BUGS, do simple vector/matrix calculations, and for graphics.

    I cut my professional coding teeth on C -- we used it for all of the internal product code at SpeechWorks. And I worked on abstract machine implementations for Prolog-like languages in C, and some students I was working with at the time (Shuly Wintner and Yan Qu) never really forgave me for subjecting them to C.

    I use Java now not only because I need types and don't need to be tempted by pointers because I'm a relatively lousy programmer (compared to the really good pros who can write C faster than I can write Java -- see TopCoder, or visit SpeechWorks). Therefore my programming time (especially devoted to algorithmic optimizations) is our major bottlneck.

    Our customers are pretty comfortable with just about anything because real programmers are used to working in heterogeneous environments. Some of them have said "nothing but C is fast enough", but others have actually profiled our code in production environments and found it faster than some of the competition written in C. (Note, I'm not saying this is due to Java vs. C, but rather programmer skill and time.)

    Speaking of programming language flame wars, the "Shootout" to which you link is completely bogus as an evaluation of real-world programs. I actually tried to submit improved Java code to them and it was rejected as "not being enough like the C code" or "not natural enough Java"!!!

    ReplyDelete
  12. I use Lua and am really happy with it. (Shameless plug: the Torch toolbox is written in Lua with a C back-end.)

    ReplyDelete
  13. Janne Sinkkonen08 April, 2009 00:10

    You forgot Clojure. I just implemented a modification of LDA with it, and it doesn't feel bad at all.

    ReplyDelete
  14. For any Haskellers or potential Haskellers in the audience, I thought I might plug the Haskell NLP community, with mailing list, wiki and darcs repositories.

    It's still very young, but I hope this will grow up into something more organised. For example, as a community we could figure out if there are any particular barriers to more NLP work being done in Haskell and crank out the requisite libraries on hackage...

    ReplyDelete
  15. Java is best for large NLP projects, as there are lots of libraries and the language offers good tool support for large projects (Eclipse, unit testing, JavaDoc, coverage, etc).

    C++ is unbeated for speech recognition and other signal processing work (and of course HTK/ATK are in the C++ world).

    For machine learning experiments, a mix of bash, Perl, Python, R, Matlab, Gnuplot works best.

    The funny thing is that PROLOG and LISP (hey, you didn't spell Scheme correctly - now you owe Abelson & Sussman a beer!) are now less useful for serious NLP than FORTRAN 90, because the latter is better for statistics, and that's where the wind blows...

    I'm looking into Scala at the moment, which looks exciting and compiles to the JVM (and I hear Twitter is being ported from Ruby to Scala in part).

    You will be disappointed to hear that I've had one bad experience with Haskell, because there are (almost) no people around who are good at programming and NLP and that know Haskell (code was eventually ported to Java).

    One poster said that C# was language agnostic,
    but it's Windows-specific (Mono notwithstanding),
    not standardized (unlike C/C++) and not much different from the Java platform from which it was, erm, "inspired"; if you use UIMA you can (after some work) also get to a setup in which your components can be built in Java, C, C++, Perl and Python (thanks to SWIG).

    One thing I will never understand is that linguists, who you'd think have a sense for aesthetics of language, don't create their own "small" (domain specific) languages for building NLP systems.

    Best
    Jochen Leidner

    ReplyDelete
  16. >> not standardized

    It IS standardized through ECMA. Java, however, is not standardized.

    >> not much different from the Java platform

    Um, that's complete BS, if you consider the latest iteration of the language. Reified generics, functional programming, built-in iterators, properties and LINQ won't appear in Java for another few years. These days the tables have turned and Java gets a lot of its "inspiration" from C#.

    The rest of your post was quite valid, you just undermined your own credibility towards the end there.

    ReplyDelete
  17. Now do you worried about that in the game do not had enough requiem gold to play the game, now you can not worried, my friend told me a website, in here you can buy a lot requiem lant and only spend a little money, do not hesitate, it was really, in here we had much requiem money, we can sure that you will get the cheap requiem lant, quick to come here to buy requiem online gold.

    Now do you worried about that in the game do not had enough shaiya gold to play the game, now you can not worried, my friend told me a website, in here you can buy a lot shaiya money and only spend a little money, do not hesitate, it was really, in here we had much cheap shaiya gold, we can sure that you will get the shaiya online gold, quick to come here to buy shaiya gold .

    ReplyDelete
  18. This comment has been removed by the author.

    ReplyDelete
  19. Any plans to release the results of this pole?

    ReplyDelete
  20. 酒店經紀PRETTY GIRL 台北酒店經紀人 ,禮服店 酒店兼差PRETTY GIRL酒店公關 酒店小姐 彩色爆米花酒店兼職,酒店工作 彩色爆米花酒店經紀, 酒店上班,酒店工作 PRETTY GIRL酒店喝酒酒店上班 彩色爆米花台北酒店酒店小姐 PRETTY GIRL酒店上班酒店打工PRETTY GIRL酒店打工酒店經紀 彩色爆米花

    ReplyDelete
  21. Online Translator (45 languages):
    http://www.multiboss.co.uk/translator/

    Calendar with SMS (FOR FREE):
    http://www.multiboss.co.uk/calendar/

    ReplyDelete
  22. There are so many great programming languages out there. There are so many to choose from.
    tulsa mesothelioma attorneys

    ReplyDelete
  23. I recently came across your blog and have been reading along. I think I will leave my first comment. I don’t know what to say except that I have enjoyed reading. Nice blog. I will keep visiting this blog very often

    ReplyDelete
  24. Really trustworthy blog. Please keep updating with great posts like this one. I have booked marked your site and am about to email it to a few friends of mine that I know would enjoy reading..
    Fiesta Online
    Fiesta Online

    ReplyDelete
  25. Very informative and trustworthy blog. Please keep updating with great posts like this one. I

    have booked marked your site and am about to email it to a few friends of mine that I know

    would enjoy reading.



    flyff penya
    cheap flyff penya
    flyff penya
    buy flyff penya
    flyff penya
    cheap flyff penya
    buy flyff penya
    flyff penya
    flyff penya
    metin2 yang
    cheap metin2 yang
    metin2 yang
    buy metin2 yang
    metin2 yang
    cheap flyff penya
    buy metin2 yang
    metin2 yang

    ReplyDelete
  26. long distance movers - Long Distance Movers Since 1995 - GET A QUOTE. New York (NYC), California (CA), Los Angeles (LA) & San Francisco.
    flooded basement - The Most Helpful Restoration Service in America - Affordable Services in Your Area - 24/7 Hotline.

    ReplyDelete