22 March 2009

Programming Language of Choice

Some of you know that I (at least used to be) a bit of a programming language snob. In fact, on several occasions, I've met (in NLP or ML land) someone who recognizes my name from PL land and is surprised that I'm not actually a PL person. My favorite story is after teaching machine learning for the second time, I had Ken Shan, a friend from my PL days, visit. I announced his visit and got an email from a student who had taken ML from me saying:


I _knew_ your name was familiar! I learned a ton about Haskell from your tutorial, for what's worth.. Great read back in my freshman year in college. (Belatedly) Thanks for writing it!

And it's not like my name is particularly common!

At any rate (and, admittedly, this is a somewhat an HBC-related question) I'm curious what programming language(s) other NLP folks tend to use. I've tried to include a subset of the programming language shootout list here that I think are most likely to be used, but if you need to write-in, feel free to do so in a comment. You can select as many as you would like, but please just try to vote for those that you actually use regularly, and that you actually use for large projects. Eg., I use Perl a lot, but only for o(100) line programs... so I wouldn't select Perl.


What programming language(s) do you use for large-ish projects?
C/C#/C++/Objective-C
D
Eiffel
Erlang
F#
Haskell
Java
Lisp
Matlab
OCaml/SML/ML
Perl
Python
R
Ruby
Scala
Sheme
Smalltalk
Other
  
Free polls from Pollhost.com

29 comments:

Anonymous said...

You're entering dangerous terrain, possibly starting a programming language war. My experience is that most people who think programming language X is stupid do think so because they do not understand language X.
Sometimes it is also the case that somebody does not grasp field Y, so language X seems evil. In my opinion this is the case for software design and Java. It's impressive how many people try to produce actual software in Java without having any idea about its inherent software design mechanics.

hal said...

I'm hoping that we can all avoid the PL war debates as they're clearly a total waste of time :). (Incidentally, I basically agree with everything else you said.)

Anonymous said...

You just got me curious - what's java's "inherent software design mechanics"? :)

[personally I mainly use Python for nlp]

Yoav said...

I mostly use (and voted for) Python. But it is inherently horrifyingly slow for any ML-lish or number-heavy project. I am now trying to get to grips with Scala as a snappier replacement with some cool features and Java's library support, but it is too early to give any real judgement of it just yet.

I am also intrigued about Java's inherent software design mechanics.

Anonymous said...

What about Prolog ?
I was surprised no to see Prolog on the list.
Personally, i use Java but I was taught Prolog is typical NLP language, well it was built up for Machine Translation.

Anonymous said...

I find Python + Java (via Jython) very convenient.

Steve said...

Hi Hal,

I'm surprised you lumped C# in with C and C++. C# is a Microsoft language that may have some syntactical similarities but in my opinion is used in very different realms. Though maybe it's different in the NLP world. Not a big deal, but C# would probably make sense as its own category.

David R. MacIver said...

One thing I've been finding recently is that a lot of NLP is actually very very suitable for unix style programming where you operate on streams of text. At this point the answer becomes much more flexible, because you end up with a lot of small components glued together with scripts.

e.g. I recently migrated a chunk of our code from Ruby + SQL to a couple of small Java programs + some standard unix utilities (written in C of course) + a mix of sed, awk and bash (this wasn't actually anything that NLP specific - it's a basic data munging application for calculating correlations between features). It has no trouble talking to the rest of the application because the interface to it is utterly language agnostic.

Anonymous said...

Is Sheme a new Shell/Scheme lovechild I didn't know of or simply a typo:)?

In follow-up to the comment about mixing C# with C/C++, not even C++ should be mixed with C.

Jurgen Van Gael said...

I'd take Steve's argument a step further: as an Avid F# (and .NET user) I wonder how many people know about the one big advantage of .NET: the whole platform is language agnostic, you write in one language and use in another. My setup: a numerical library in C++, a wrapper in C#, my machine learning code in F# (go algebraic datatypes and higher order functions!) and my experiments scripted in (Iron)Python.

The right tool for the right job ...

Anonymous said...

The "right" answer to this question is highly task dependent. Do I need to invert a small dense matrix or a large sparse one? Do I have more data than will fit in memory? Do I need to deliver a working, threadsafe application to integrate into a web server? Am I teaching NLP to a class? Am I a theoretician who thinks in vectors and matrices? Is the programming exploratory or production-oriented? Are the algorithms pretty well known, or will there be a lot of fiddling?

I used to teach Prolog in our intro NLP class at Carnegie Mellon before the turn of the century (i.e. after the internet, before the web). I haven't touched it in years. It's great for backtracking search and matching with unification, and a pain for just about everything else because there's no random access.

I've been using R a lot lately to interface with BUGS, do simple vector/matrix calculations, and for graphics.

I cut my professional coding teeth on C -- we used it for all of the internal product code at SpeechWorks. And I worked on abstract machine implementations for Prolog-like languages in C, and some students I was working with at the time (Shuly Wintner and Yan Qu) never really forgave me for subjecting them to C.

I use Java now not only because I need types and don't need to be tempted by pointers because I'm a relatively lousy programmer (compared to the really good pros who can write C faster than I can write Java -- see TopCoder, or visit SpeechWorks). Therefore my programming time (especially devoted to algorithmic optimizations) is our major bottlneck.

Our customers are pretty comfortable with just about anything because real programmers are used to working in heterogeneous environments. Some of them have said "nothing but C is fast enough", but others have actually profiled our code in production environments and found it faster than some of the competition written in C. (Note, I'm not saying this is due to Java vs. C, but rather programmer skill and time.)

Speaking of programming language flame wars, the "Shootout" to which you link is completely bogus as an evaluation of real-world programs. I actually tried to submit improved Java code to them and it was rejected as "not being enough like the C code" or "not natural enough Java"!!!

JasonWeston said...

I use Lua and am really happy with it. (Shameless plug: the Torch toolbox is written in Lua with a C back-end.)

Janne Sinkkonen said...

You forgot Clojure. I just implemented a modification of LDA with it, and it doesn't feel bad at all.

kowey said...

For any Haskellers or potential Haskellers in the audience, I thought I might plug the Haskell NLP community, with mailing list, wiki and darcs repositories.

It's still very young, but I hope this will grow up into something more organised. For example, as a community we could figure out if there are any particular barriers to more NLP work being done in Haskell and crank out the requisite libraries on hackage...

Mobile Searcher said...

Java is best for large NLP projects, as there are lots of libraries and the language offers good tool support for large projects (Eclipse, unit testing, JavaDoc, coverage, etc).

C++ is unbeated for speech recognition and other signal processing work (and of course HTK/ATK are in the C++ world).

For machine learning experiments, a mix of bash, Perl, Python, R, Matlab, Gnuplot works best.

The funny thing is that PROLOG and LISP (hey, you didn't spell Scheme correctly - now you owe Abelson & Sussman a beer!) are now less useful for serious NLP than FORTRAN 90, because the latter is better for statistics, and that's where the wind blows...

I'm looking into Scala at the moment, which looks exciting and compiles to the JVM (and I hear Twitter is being ported from Ruby to Scala in part).

You will be disappointed to hear that I've had one bad experience with Haskell, because there are (almost) no people around who are good at programming and NLP and that know Haskell (code was eventually ported to Java).

One poster said that C# was language agnostic,
but it's Windows-specific (Mono notwithstanding),
not standardized (unlike C/C++) and not much different from the Java platform from which it was, erm, "inspired"; if you use UIMA you can (after some work) also get to a setup in which your components can be built in Java, C, C++, Perl and Python (thanks to SWIG).

One thing I will never understand is that linguists, who you'd think have a sense for aesthetics of language, don't create their own "small" (domain specific) languages for building NLP systems.

Best
Jochen Leidner

Anonymous said...

>> not standardized

It IS standardized through ECMA. Java, however, is not standardized.

>> not much different from the Java platform

Um, that's complete BS, if you consider the latest iteration of the language. Reified generics, functional programming, built-in iterators, properties and LINQ won't appear in Java for another few years. These days the tables have turned and Java gets a lot of its "inspiration" from C#.

The rest of your post was quite valid, you just undermined your own credibility towards the end there.

Anonymous said...

Now do you worried about that in the game do not had enough requiem gold to play the game, now you can not worried, my friend told me a website, in here you can buy a lot requiem lant and only spend a little money, do not hesitate, it was really, in here we had much requiem money, we can sure that you will get the cheap requiem lant, quick to come here to buy requiem online gold.

Now do you worried about that in the game do not had enough shaiya gold to play the game, now you can not worried, my friend told me a website, in here you can buy a lot shaiya money and only spend a little money, do not hesitate, it was really, in here we had much cheap shaiya gold, we can sure that you will get the shaiya online gold, quick to come here to buy shaiya gold .

Ted said...
This comment has been removed by the author.
Ted said...

Any plans to release the results of this pole?

Anonymous said...

酒店經紀PRETTY GIRL 台北酒店經紀人 ,禮服店 酒店兼差PRETTY GIRL酒店公關 酒店小姐 彩色爆米花酒店兼職,酒店工作 彩色爆米花酒店經紀, 酒店上班,酒店工作 PRETTY GIRL酒店喝酒酒店上班 彩色爆米花台北酒店酒店小姐 PRETTY GIRL酒店上班酒店打工PRETTY GIRL酒店打工酒店經紀 彩色爆米花

Anonymous said...

Online Translator (45 languages):
http://www.multiboss.co.uk/translator/

Calendar with SMS (FOR FREE):
http://www.multiboss.co.uk/calendar/

Anonymous said...

無料 出会い 競馬予想 無料 競馬予想 競馬予想 無料 競馬予想 無料 競馬予想 無料 近視 手術 メル友 出会い 出会い 出会い 出会い メル友 メル友 人妻 メル友 ギャンブル依存症 AV女優 無料 出会い 出逢い 掲示板 出会い系 無料 出会い 人妻 出会い 人妻 出会い セフレ 人妻 出会い セックスフレンド メル友 出会い SM 愛人 不倫 セフレ 無料 出会い 出会い系 無料 無料 出会い 富士山 写真 富士山 メル友 無臭性動画 カリビアムコム 一本堂 出会い 人妻 セックスフレンド ハメ撮り エッチな0930 メル友 無料 出会い 無料 出会い セフレ セフレ セフレ セックスフレンド セックスフレンド セックスフレンド 人妻 出会い 人妻 出会い 人妻 出会い 出会い系 出会い系 出会い系 カリビアンカム カリビアンカム

gamefan12 said...

There are so many great programming languages out there. There are so many to choose from.
tulsa mesothelioma attorneys

Lots in Costa Rica said...

I recently came across your blog and have been reading along. I think I will leave my first comment. I don’t know what to say except that I have enjoyed reading. Nice blog. I will keep visiting this blog very often

good said...

Really trustworthy blog. Please keep updating with great posts like this one. I have booked marked your site and am about to email it to a few friends of mine that I know would enjoy reading..
Fiesta Online
Fiesta Online

combattery84 said...

IBM ThinkPad R60 Battery
IBM ThinkPad T60 Battery
IBM ThinkPad T41 Battery
IBM ThinkPad T43 Battery
IBM ThinkPad X40 Battery
Thinkpad x24 battery
ThinkPad G41 battery
IBM thinkpad r52 battery
Thinkpad x22 battery
IBM thinkpad t42 battery
IBM thinkpad r51 battery
Thinkpad r50 battery
IBM thinkpad r32 battery
Thinkpad x41 battery
SONY VGP-BPS2 Battery
SONY VGP-BPS2C Battery
SONY VGP-BPS5 battery
SONY VGP-BPL2C battery
SONY VGP-BPS2A battery
SONY VGP-BPS2B battery
SONY PCGA-BP1N battery
SONY PCGA-BP2E battery
SONY PCGA-BP2NX battery
SONY PCGA-BP2S battery
SONY PCGA-BP2SA battery
SONY PCGA-BP2T battery
SONY PCGA-BP2V battery
SONY PCGA-BP4V battery
SONY PCGA-BP71 battery
SONY PCGA-BP71A battery
SONY VGP-BPL1 battery
SONY VGP-BPL2 battery

wangqian said...

Very informative and trustworthy blog. Please keep updating with great posts like this one. I

have booked marked your site and am about to email it to a few friends of mine that I know

would enjoy reading.



flyff penya
cheap flyff penya
flyff penya
buy flyff penya
flyff penya
cheap flyff penya
buy flyff penya
flyff penya
flyff penya
metin2 yang
cheap metin2 yang
metin2 yang
buy metin2 yang
metin2 yang
cheap flyff penya
buy metin2 yang
metin2 yang

zhenglijun said...

This is really a good article,I like it very much.


aoc gold
cheap aoc gold
aoc gold
buy aoc gold
aoc gold
cheap aoc gold
buy aoc gold
aoc gold
warhammer gold
cheap warhammer gold
warhammer gold
buy warhammer gold
warhammer gold
cheap warhammer gold
buy warhammer gold
aion kinah
aion kinah
aion kinah
aion kinah
aion kinah
aion kinah
aion kinah
aion kinah
aion kinah

esenzehealth said...

long distance movers - Long Distance Movers Since 1995 - GET A QUOTE. New York (NYC), California (CA), Los Angeles (LA) & San Francisco.
flooded basement - The Most Helpful Restoration Service in America - Affordable Services in Your Area - 24/7 Hotline.