Main Page - NLTK
NLTK — the Natural Language Toolkit — is a suite of open source Python modules, data and documentation for research and development in natural language processing. NLTK contains Code supporting dozens of NLP tasks, along with 30 popular Corpora and extensive Documentation including a 360-page online Book. Distributions for Windows, Mac OSX and Linux are available.
I ran across this the other day, and I was just blown away by it. I’m a big fan of Python (and use it whenever I get a chance professionally) as well as being interested in linguistics, so being able to write Python code to manipulate this stuff was just…so…cool. It’s startling how much time I can waste just counting things in the provided corpora. I’m really looking forward to playing with it more, and using it to teach myself about Natural Language Processing as well as new bits of Python that I haven’t fully come to grips with yet (e.g. the generator functions). It really pushes all my ooooh! Shiny! buttons.
I’ve successfully installed it and got it to run some of the examples (including the graphing demo) on my Windows box and my Mac OS X laptop. As is unfortunately typical, although it seems to run better on the Mac, installing it on Windows was actually a quite a bit easier. Yeah, yeah, pre-compiled binaries are the work of the devil, but for Joe end-user–even for a relatively sophisticated Joe end-user–it’s a pain to get partway through, realize that it won’t compile because it’s missing some compilers, have to go get the developer toolkit and install that, and then get back to building the thing you were trying to get to work in the first place. I didn’t have any particular problems, other than the time it took to download 183 mb of stuff that I’ll probably barely use, but it was pretty painful compared to double-clicking on an exe and then clicking a couple of buttons. I’m sure hard-core Mac users are so used to the pain that they don’t even perceive it as pain, and I’m certainly not saying that the developers of the NLTK ought to devote any time at all to improving the installation, but if you’re thinking of playing with this and you aren’t a hard-core Mac user, I’d suggest that if you have access to both kinds of system, at least to get your feet wet you should try it in Windows first. You’ll be messing around with the actual NLTK code a lot faster.