Archive for the ‘Languages’ Category

Learning Japanese

The Foreigner - Japan: So You Want To Learn Japanese.

Cruel, but funny.

Just to show what a geek I am, the laughing girl spot illo is Yukino Miyazawa from _Kareshi Kanojo no Jijou_ by Tsuda Masami.

Wednesday, January 12th, 2005

Out of Little Eggcorns

Mighty weathervanes grow. Naked Translations, in discussing how to translate flip-flop into French has a really neat discussion of the evolution of the French term for weathervane: girouette. A crucial step appears like it might have been an eggcorn. Wirewite became gyrouette, probably because the folk-etymology of gire (turn) + rouette (little wheel) seemed more compelling than the real ancestral Anglo-Norman borrowing of Norse veðrviti (weather + indicator).

From the same post:

Finally, this morning on Radio 4, I heard two men bicker over whether English spelling should be simplified or not. The one against it argued that a word’s spelling gives us a good idea of its etymology and origin, the other argued that a word’s spelling is actually often misleading (see girouette!). However, I think that if you simplify English, then you’ll lose any chance at all of knowing where a word comes from and what its relationship to other words is.

For what it’s worth, I agree. As a word nerd the advantages of simplified spelling often seem to me to be overstated.

Monday, December 13th, 2004

The OED, now with more OE!

Old English in the OED - June 2002 Newsletter - Oxford English Dictionary

The revision of Old English material in the Third Edition will be thoroughgoing. Every single Old English quotation, whether already in OED or newly added, is being checked against the most recent reliable edition of the text, with new bibliographical details and additional context being given where appropriate. Dating of quotations has been radically revised, with NED’s assumed composition dates replaced by a simple threefold division of all pre-1150 quotations into ‘early OE’ (up to 950), ‘OE’ (950-1100), and ‘late OE’ (1100-1150), based firmly on manuscript dates as agreed by the most recent scholarship.

I’m thinking of ponying up for an individual subscription to the online edition as my holiday present to me this year. It’s spendy ($295), but oh-so-tempting.

Tuesday, November 30th, 2004

Plus Je Too Darn Lovey

I love songs that playfully quote other languages. “Plus Je T’Embrasse“, as performed by Blossom Dearie is one such, containing the lovely line that’s the headline of this post and the only English in the song. Short phrases are what tickle me the most. Something like the Beatles’s “Michelle, Ma Belle” are somehow too earnest to really strike my fancy. Or maybe it’s just overexposure to the Beatles.
Anime theme-songs are a fruitful source of this sort of thing. Probably J-Pop songs in general are, and I’m just not familiar enough to say for sure. But there definitely seems to be an unwritten law of Anime lyrics that the song contain at least one phrase of English—not counting loan words like gaarufurendo or purezento. Or I think not counting them; modern Japanese is such a avid borrower from English that it’s hard for me to tell what’s been adopted and what’s just quoted for hipness. In general, I tend to assume that words that have been adapted to the Japanese phonetic system are mostly assimilated, and words where an effort is made to preserve the English pronunciation are intended to sound exotic, but I’m sure it’s not as straightforward as that for native speakers—particularly the younger generations. For instance, the original Japanese theme for Sailor Moon “Moonlight Densetsu”[1.- something like Moonlight Legend, or Legendary Moonlight] includes moonlight, midnight, weekend, happy-end, all with (pretty much) English pronunciation–as opposed to “miracle romance”, which is adapted to the more native sounding mirakuru romansu.

Monday, November 29th, 2004

The Case for Dropping Whom

Casey and Andy

’cause everybody likes a good cartoon about grammar

Friday, November 19th, 2004

Hapax Legomenon

A hapax legomenon is a word or phrase that occurs only once in a given corpus (usually an entire language, but sometimes in a particular text, or the work of a particular author). They are often found in dead languages, but my friend badger may have found one in Spanish: parracial

It apparently occurs in a poem by Pablo Neruda:

La parracial rosa devora
y sube a la cima del santo:
con espesas garras sujeta
el tiempo al fatigado ser:
hincha y sopla en las venas duras,
ata el cordel pulmonar, etonces
llargamente escucha y respira.

It doesn’t appear in any of the Spanish dictionaries that she consulted (or in any of the online ones that I looked at), and a Google search turns up 7 hits: 2 hits to her blog mentioning her search, 2 hits to another blog referring to her blog, 1 hit to the poem itself, and 2 to an essay about Neruda.

Thursday, October 28th, 2004

The Language Museum

The Language Museum is an interesting little site that attempt to realize the Language Museum proposed in Bodmer’s _The Loom of Language_ as a website. There you can look at word list of various languages, organized by family, side-by-side (with English translations).

Thursday, October 28th, 2004

Cronaca: Hominid hearing & speech

Via Cronaca: Hominid hearing & speech:

Early humans evolved the anatomy needed to hear each other talk at least 350,000 years ago. This suggests rudimentary form of speech developed early on in our evolution.

Wednesday, June 23rd, 2004

That’s a big county

Looking at the MLA Interactive Language Map is fascinating. What a neat toy.

Just for yucks, I decided to look up Yiddish, and found some surprising concentrations. There doesn’t seem to be anywhere in the country that reports > 99 Yiddish speakers (or maybe 199, the colors being very similar) until you drill down for a closer look. There you can see a few pockets of the up to 19,999. For instance Philadelphia has 2,922 Yiddish speakers.

One surprise was the entire top of Maine was colored in. Changing the view to By Zip instead of By County and the top of Maine shrunk to a tiny dot. I’m picturing this one old Jew living on the rocky shore, contributing his Yiddish to the county score. Conclusion: Aroostook is one big-ass county, particularly for a New England state. Ayuh.

Thanks to Language Hat for the link

Friday, June 18th, 2004

Machine Translation

As a programmer, moreover one who has made a stab at a program to inflect Latin words , I’m naturally quite interested in the field of machine translation. It’s one of those things, like playing Chess, that just seems as if it ought to be tricky, but do-able, at least until you really start trying to do it. Also like Chess, it seems like some of the most promising approaches in terms of a successful machine, rather than deeper understanding of how humans do it, is to take as much advantage as possible of what machines are good at doing: crunching numbers.

According to the old (possibly apocryphal) story, one of the first machine translation programs translated the saying “The spirit is willing, but the flesh is weak” to Russian, and then back as “The meat is good, but the vodka is rotten.”

Today, BabelFish gives us:
Дух охотно готов, но плоть �?лаба

Spirit is willingly ready, but flesh is weak

(I can’t read the Cyrillic at all, but the English isn’t bad at all compared to the story.)

Language Weaver represents an interesting approach to machine translation that departs radically from attempting to program a sophisticated model of the language’s underlying grammar. What the approach that Language Weaver takes (following pioneering work by IBM and certain Japanese groups) does is to start with a large corpus of texts in the source language that have already been translated by humans into the target language…and then to crunch numbers to produce a set of probabilities mapping runs in the one that correspond to runs in the other (the following is from the Language Weaver web site):

The USC/ISI research team led by Dr. Kevin Knight and Dr. Daniel Marcu has developed a new, statistical/cryptographic approach to the automatic translation of human languages. In contrast to current commercial machine translation systems, the statistical translation system uses techniques that automatically learn how texts can be translated from one language into another. All that Language Weaver’s statistics-based translation engine requires for “learning” is a large collection of sentence pairs that are mutual translations of each other. Language Weaver learns the translation patterns for every word and phrase in the training data. It can then use those patterns to translate new text of the same type.

The statistical basis of the translation engine, and its potential for commercial success, are analogous to the technology behind today’s commercial speech recognition systems. We believe that statistics-based automatic translation will be the breakout product in the automatic translation market just as it has been in speech recognition. These advances will change the nature of translation, and are a decisive step toward pervasive real-time conversion of textual information between languages.

This harkens back to an idea I’d run across before, which apparently originated with Warren Weaver for treating a foreign language as a code to break, except using modern cryptanalysis methods and computing power.

“One naturally wonders if the problem of translation could conceivably be treated as a problem in cryptography. When I look at an article in Russian, I say: ‘This is really written in English, but it has been coded in some strange symbols. I will now proceed to decode.”
- Warren Weaver, March 1947 (another interesting tidbit from their website, also referenced here)

Monday, June 7th, 2004