Archive for November 6th, 2007

Nerd Pr0n

Main Page - NLTK

NLTK — the Natural Language Toolkit — is a suite of open source Python modules, data and documentation for research and development in natural language processing. NLTK contains Code supporting dozens of NLP tasks, along with 30 popular Corpora and extensive Documentation including a 360-page online Book. Distributions for Windows, Mac OSX and Linux are available.

I ran across this the other day, and I was just blown away by it. I’m a big fan of Python (and use it whenever I get a chance professionally) as well as being interested in linguistics, so being able to write Python code to manipulate this stuff was just…so…cool. It’s startling how much time I can waste just counting things in the provided corpora. I’m really looking forward to playing with it more, and using it to teach myself about Natural Language Processing as well as new bits of Python that I haven’t fully come to grips with yet (e.g. the generator functions). It really pushes all my ooooh! Shiny! buttons.

I’ve successfully installed it and got it to run some of the examples (including the graphing demo) on my Windows box and my Mac OS X laptop. As is unfortunately typical, although it seems to run better on the Mac, installing it on Windows was actually a quite a bit easier. Yeah, yeah, pre-compiled binaries are the work of the devil, but for Joe end-user–even for a relatively sophisticated Joe end-user–it’s a pain to get partway through, realize that it won’t compile because it’s missing some compilers, have to go get the developer toolkit and install that, and then get back to building the thing you were trying to get to work in the first place. I didn’t have any particular problems, other than the time it took to download 183 mb of stuff that I’ll probably barely use, but it was pretty painful compared to double-clicking on an exe and then clicking a couple of buttons. I’m sure hard-core Mac users are so used to the pain that they don’t even perceive it as pain, and I’m certainly not saying that the developers of the NLTK ought to devote any time at all to improving the installation, but if you’re thinking of playing with this and you aren’t a hard-core Mac user, I’d suggest that if you have access to both kinds of system, at least to get your feet wet you should try it in Windows first. You’ll be messing around with the actual NLTK code a lot faster.

Tuesday, November 6th, 2007

Adieu, OED Online

Parting is such sweet sorrow. I love the OED Online, and I don’t quite know what I’m going to do to replace it, but I don’t quite love it enough for $395 a year for an individual subscription. You folks with your fancy institutional subscriptions don’t know how good you’ve got it. So when I got the notice that my subscription was expiring and they were going to charge me for another year, I bit the bullet and canceled.

Actually what really irks me is that I have the OED 3.1 on CD-ROM, but it stopped working one day, as their asinine anti-copying protection scheme (which calls for relicensing the software every 90 days) screwed up and made the program unlaunchable.

Tuesday, November 6th, 2007

It lives again!

It’s been a little less than a year since I’ve posted to this blog, but it’s been a busy year for me. I met a woman, fell in love, proposed to her, bought a house with her, moved in together (along with her teen-aged daughter), and my beloved cat died. During all that, I pretty much stopped blogging except for the occasional LJ post, but now that things are settling down again I’ve felt some of that old itch to write stuff down.

So here we are again… Once more into the breach, dear friends!

Tuesday, November 6th, 2007

Another day, another upgrade

Testing the latest WP bugfix release.

Tuesday, November 6th, 2007