Archive for the ‘software’ Category

Microsoft, sucking by design

    • Ars picked up this tidbit at the recent RSA 2008 security conference in San Francisco, where David Cross, Microsoft’s product unit manager for Windows security, discussed the company’s security directions post-Vista. "The reason we put UAC into the platform was to annoy users. I’m serious," Cross is quoted as saying.

Microsoft reasons that by annoying the users every time a program requires rights that MS thinks it shouldn’t, users will put pressure on developers to fix those programs.  This ignores the fact that users will, rightly, blame Microsoft and not the particular program for this misfeature, and that if they get annoyed enough, they’ll turn off the security entirely.  Even if the user thought to complain to the vendors of the program, and the vendor jumped right on doing something about it, the lag between the time it first started annoying the user (i.e. as soon as it was installed on Vista) and when a patch would be available to fix the "problem" would encourage the user to just turn the damn security feature off.  And then, if you don’t want to be nagged incessantly to turn it back on, you also end up turning off that warning too–which requires telling the Security Control Panel not to warn you about anything.  If Microsoft didn’t have the arrogant, overbearing culture that they do, they’d have designed it the way ZoneLabs designed their popup warnings about programs trying to do things that might be dangerous: allow the user to white-list the particular program if they know it’s safe, but re-inquire if something has changed about the program (indicating it might have been tampered with by a virus or trojan), and if you didn’t care about a particular class of warning message, disable just that message.

Tuesday, June 10th, 2008

The World’s Cutest Little Wiki

I’ve started using this for all my personal note-taking projects; the ability to just copy a page and have a new wiki that you can carry around on a thumb drive is nifty^2.

Friday, April 18th, 2008

OED CD-Rom on Vista

I finally got the OED working again on my new machine (which involved sending the original disks back to OUP-US so they would send me the new point release version that works under Vista, because there’s apparently no patch process).

So, yay.

Friday, February 15th, 2008

Browser Security

NoScript - JavaScript/Java/Flash blocker for a safer Firefox experience! - what is it? - InformAction

I don’t know if you care, but I’m reasonably paranoid about computer security, and this Firefox plugin is the best approach I’ve found.  It defaults to disallowing every form of scripting on any site you visit until you explicitly approve it (via a little toolbar button in the status line that shows you all the sites the page is trying to run scripts from), either temporarily or permanently.  It would be hard to get control more fine-grained than that and still be useful, though I do still use Adblock to block certain scripts from running on a site that I otherwise trust, just to block out ads that contain motion.

Note to advertisers if you happen to stumble across this:  I will never put up with any ad that contains anything that moves, blinks, or changes color.  As soon as  I see one of those, I not only block the offending ad, I block the entire provider.  If you want me to see your ad, you had better make it just sit there rather than demanding my attention.

Wednesday, January 23rd, 2008

Speaking of Transcription

If I were a journalist, a student, or really anyone whose job required taking notes on what people say, I would be all over this.  As it is I’m kind of wishing my job actually did require it, just so I could play with it.

Livescribe :: Smartpen

The Livescribe smartpen revolutionizes the act of writing by recording and linking audio to what you put on paper. Tap on words or drawings in your notes, and the smartpen replays recorded audio from the time you were writing. Transfer notes to your PC to backup, replay, and share them online.

The pen contains a computer and recorder that can record up to a hundred hours of audio at a time, and a little optical sensor that tracks the position of the pen against the tiny dots printed on special paper that you take notes on (according to at least one account I’ve read you can print your own paper with a laser printer, as well as buying it fairly cheap from the manufacturer).  It uses those dots to synch the audio recording against what you were writing at the time, so that clicking on the notes lets it replay the audio from around that time.  It’s no help if your note-taking (like mine in many of my college classes) consisted of just staring off into space or drawing random doodles, though I guess you’d at least still be able to listen to the lecture again, but if you’re at least semi-diligent about putting something as a mnemonic trigger on the page, this is so brilliant I can’t stand it.

By the way, the book that finally taught me how to take useful class notes was The Study Game: How To Play and Win with Statement-PIE.  Long out of print, it’s still remembered fondly (at least by me and five reviewers on Amazons) for its simple, concise, and practical approach to learning how to listen actively in order to organize your notes into paragraphs consisting of Statement, Proof, Information, Examples.  Most of the time, what a student needs is not a complete transcription of what the teacher said (the production of which generally takes up so much attention that there’s little left over to actually process what’s being said), but a summary of the key information.  Unfortunately, at least some of that time you really do need an accurate transcription, particularly of complex ideas that are new to you and so are hard to summarize.  That’s where being able to replay just that portion of the lecture with LiveScribe would be incredibly useful.  Yes, good teachers will do more than just rattle it off, and can provide a number of ways to convey and reinforce the crucial points, including including it in class handouts, writing it on the board…but honestly, even at (or is it especially at) the college level, good teachers are few and far between, and even they are not generally bringing their A game to teaching Intro Calc at eight in the morning.

Saturday, December 1st, 2007

W00t! Sort of

Grousing about losing my access to the online OED (and a sleepless night due to a head-cold) led me to give another go at installing the CD-ROM version, and this time I discovered some helpful information on the Oxford University Press site, namely that the symptom I was seeing (launching and immediately exiting with no error message) was caused by a Microsoft security patch. Figures. Fortunately, MS had developed a hot-fix for this, and the OUP had a link. I installed it, and wonder of wonders, it actually worked and I had my lovely OED CD-ROM working on my desktop again.

So that’s the w00t.

The sort-of is because in the process I discovered that they’ve released two new point-releases of the CD-ROM since I bought it, which among other things allow it to download updates from their site, fixes the printing problems, and removes the stupid, stupid relicense-every-ninety-days restriction. Which would be great, except there doesn’t seem to be any upgrade path from the v3.0 2004 disks that I have to the the v3.1.1 2005 disks. So it appears that unless I want to buy it again, I’m stuck with the original retarded DRM. I’m going to dig around further, but at least in the meantime I can once again bask in the glory that is the OED.

update: It turns out there is an upgrade from 3.0 to 3.1.1 for $70, so I’ve ordered it.  Just never having to re-install the license is worth that to me, plus it appears that the 3.1.1 version is necessary to run reliably on Mac OS X under an emulator, which would be my ideal way to do it.  Pretty much the only thing I use my Windows machine for is to play City of Heroes and run a couple of other programs that only like Windows (until MS broke it, the OED CD-ROM was one of those).  Being able to carry the OED around on my Mac laptop would be a consummation devoutly to be wished.

Wednesday, November 21st, 2007

I was excited there for a minute

Amazon.com: Kindle: Amazon’s New Wireless Reading Device

Initially I misread the announcement and thought that the Kindle came bundled with the OED as its built-in dictionary. I would happily have paid $399 for the complete contents of the OED in something that weighed less than a paperback book that happened to also be able to wirelessly download 80,000 other books and store up to 200 of them at once, particularly since the OED on CD-ROM goes for about $236 (and if my experience is any guide will just stop working round about the 3rd license update). Unfortunately a second read reveals that it’s the far less exciting New Oxford American Dictionary that ships with the Kindle. The NOAD is the dictionary that comes bundled with Mac OS X, which is fine and all, but without the quotations and the date chart, it just isn’t the same. sigh

Monday, November 19th, 2007

More Nerd Pr0n

SAGE: Open Source Mathematics Software

General and Advanced Pure and Applied Mathematics
Use SAGE for studying a huge range of mathematics, including algebra, calculus, elementary to very advanced number theory, cryptography, numerical computation, commutative algebra, group theory, combinatorics, graph theory, and exact linear algebra.

Recently I’ve found myself having to do real mathematics for the first time in many years. Surprisingly, despite being a programmer of actuarial math calculations, there’s not a lot of call for solving equations; the algorithms seldom change. This year, though, we’ve been rushing to implement changes for the Pension Protection Act which have made things very much more complicated and changed the way we calculate benefits in a big way. And what I’ve found is that I’d forgotten a lot of what I once knew.

So I decided to refresh my memory of a lot of the math I once studied as an undergrad, and, being a nerd one of the first things I did (after buying a couple of books) is to go fishing around for some software to play with. I was particularly interested in stuff that would let me model and graph equations, to try to regain some intuitive sense of their behavior, and I wanted something fairly easy to program. While it’s possible to write spreadsheets to validate the actuarial calculations I’ve been working on (and I have), it’s a bitch-and-a-half to read them again later or debug them. And if possible I wanted it to be open-source.

SAGE is what I found, and to my delight it’s yet another application that makes heavy use of Python. And, no, I didn’t go looking on a Python site to find these. Straight Google searches turned up both SAGE and the NLTK. It’s no coincidence that applications looking for a way to provide straight-forward but powerful programming tend to be built on or with Python, but it wasn’t one of my search criteria.

Again I installed it on both my Windows and Mac boxen, and again the Windows installation was a bit more straightforward, though this time not by much. In the case of Windows, you have to first install a VMWare player (free, but not open source) so that SAGE can run in its own virtual machine, and then you have to configure your firewall (I use ZoneAlarm) so that you can hit the web-server that SAGE runs (if you’re going to use the graphical interface, which is built as a web application). In the case of Mac, stuffit repeatedly had problems opening the .tar.gz file and I ended up just downloading and unzipping and untarring it from the command-line; after that running the setup.py script was straightforward.

Once you have it set up, it’s a breeze to use. You can run it from a command prompt (in fact, you have to start it that way), but the most convenient way to use it is to run a “notebook” sub-application that sets up a web server; surf to that server on your localhost and you get a graphical interface (web-page) that lets you create and manage “notebook” pages–basically persistable interactive sessions. You can even upload these sessions to public instances of SAGE running on the internet (for instance at the University of Washington math department); in fact a great way to explore SAGE is to surf there, create an account and start playing around.

Like most really powerful pieces of software, there is a learning curve to using SAGE, and it’s a steeper curve if you don’t know any Python, but a lot of the most basic stuff (assigning variables, solving simple equations) is pretty much exactly what you expect. Tip: Enter what you evaluate into the box on the screen (the boxes are for code). Shift+Enter to cause SAGE to evaluate what you’ve entered in the box and create a new empty box below, use your mouse to put the cursor in a box you’ve evaluated to edit it, and hover your mouse above the top border of a box until you see a bluish-purple line across the page then click to insert a new empty box prior to an existing box.

I’d also recommend starting with the SAGE Programming for Newbies link from the SAGE Documentation, even though it’s incomplete, rather than the SAGE Tutorial. The Tutorial jumps right into operator precedence, “rings” and other such minutiae without even stopping to explain what’s facing you at the prompt once you’ve completed the install. The SAGE Programming for Newbies is a much gentler introduction (and you can skip the parts that are too gentle, like “what is a computer”).

SAGE is also built to interface with other standard mathematics software packages, like Maple, Mathematica, MATLAB, and so forth, if that floats your boat. I don’t have access to them, but I can see how that would be useful.

What can I say? I find this kind of thing really, really cool.

Wednesday, November 7th, 2007

Nerd Pr0n

Main Page - NLTK

NLTK — the Natural Language Toolkit — is a suite of open source Python modules, data and documentation for research and development in natural language processing. NLTK contains Code supporting dozens of NLP tasks, along with 30 popular Corpora and extensive Documentation including a 360-page online Book. Distributions for Windows, Mac OSX and Linux are available.

I ran across this the other day, and I was just blown away by it. I’m a big fan of Python (and use it whenever I get a chance professionally) as well as being interested in linguistics, so being able to write Python code to manipulate this stuff was just…so…cool. It’s startling how much time I can waste just counting things in the provided corpora. I’m really looking forward to playing with it more, and using it to teach myself about Natural Language Processing as well as new bits of Python that I haven’t fully come to grips with yet (e.g. the generator functions). It really pushes all my ooooh! Shiny! buttons.

I’ve successfully installed it and got it to run some of the examples (including the graphing demo) on my Windows box and my Mac OS X laptop. As is unfortunately typical, although it seems to run better on the Mac, installing it on Windows was actually a quite a bit easier. Yeah, yeah, pre-compiled binaries are the work of the devil, but for Joe end-user–even for a relatively sophisticated Joe end-user–it’s a pain to get partway through, realize that it won’t compile because it’s missing some compilers, have to go get the developer toolkit and install that, and then get back to building the thing you were trying to get to work in the first place. I didn’t have any particular problems, other than the time it took to download 183 mb of stuff that I’ll probably barely use, but it was pretty painful compared to double-clicking on an exe and then clicking a couple of buttons. I’m sure hard-core Mac users are so used to the pain that they don’t even perceive it as pain, and I’m certainly not saying that the developers of the NLTK ought to devote any time at all to improving the installation, but if you’re thinking of playing with this and you aren’t a hard-core Mac user, I’d suggest that if you have access to both kinds of system, at least to get your feet wet you should try it in Windows first. You’ll be messing around with the actual NLTK code a lot faster.

Tuesday, November 6th, 2007

Hapax Legomena and Spam

It used to be that hapax legomena were mostly something of interest to linguists and other word-freaks–after all, what use is a word that doesn’t occur anywhere else (and thus is often of uncertain meaning)? Well, if you’re a spamming low-life, then with what passes for ingenuity among your kind you might think that if people start filtering out words like “viagra” and “cock” in the subject header, you could substitute “v1agra” or “cokc” and get your evil missives through. How can a list of disallowed words possibly guard against hapax legomena?

It turns out, though, that there are better ways to filter than against obvious “spammy” words like viagra and cialis. Paul Graham in A Plan For Spam wrote

I think it’s possible to stop spam, and that content-based filters are the way to do it. The Achilles heel of the spammers is their message. They can circumvent any other barrier you set up. They have so far, at least. But they have to deliver their message, whatever it is. If we can write software that recognizes their messages, there is no way they can get around that.

Using a pseudo-Bayesian probabilistic approach, Graham’s plan calls for a user to train the filter by classifying each message in a corpus of mail received as spam or not-spam. Tim Peters, who worked on the Python implementation of this approach, called SpamBayes, took to calling the not-spam “ham” and the name seems to have stuck in the anti-spam community. The filter breaks all the messages apart into words (defined in this case as any run of whitespace or punctuation separated text) and then ranks the words as to their spaminess and hamminess (the extent to which the mere presence of the word in a message is a good predictor of whether the message is spam or ham). A weighted aggregate score is computed for all the words in the message, and the filter classifies it as spam, ham, or not-sure (roughly equal ham and spam scores). Because of the need to communicate, and in particular to get you to visit a web-page or click on a link to sell you stuff, for any given person certain words are found in almost all spam messages but in almost no real messages (e.g. “cheap” and “click”, or words with numbers in them). Words that are commonly found in both types of messages, such as your name, or articles and prepositions, end up with a middling score that basically doesn’t change the final result. The Graham approach has proved to be remarkably accurate, often getting no false positives or false negatives after only a week or two of training; I don’t think anyone has reported that they never get any unsures, no matter how much training is done, but that’s to be expected. Most spam when examined statistically over the whole body of the message leaves lots of clues to its spammy nature.

So where do hapax legomena come in with Bayesian spam filtering? They don’t. One of the truly nifty things about the scheme is that by definition, any word that has never been seen before counts as .5: neither spammy nor hammy, and has no effect on the ultimate rating of the message. So the spammer’s trick of making a visually similar hapax legomenon is foiled (as is the other trick of padding the message with words unrelated to the spam to try to lower the score–unless the words happen to be hammy for your particular corpus they won’t budge the score at all.) But as soon as the message is classified one way or the other, based on other clues in the message or by the user if the message rates an unsure, upon training that hapax legomenon becomes a clue for the type of message it is. So v1agra becomes at least a mild clue for spam, while once your uncle Ignatz writes you his name becomes at least a mild clue that the message is ham.

The SpamBayes Project has a nice discussion of how this all works.

Thursday, October 28th, 2004