Flexibility and Ambiguity in Chinese

I was surprised to find myself having a new appreciation of Chinese (the language, not the people) on this trip. I've never been a good student of Chinese; my last formal course in it was in middle school, and that I almost failed. I can understand Chinese fine, but I can't write it to save my life. Understandably, I tend to shy away from Chinese literature even though I read it without problems. I was therefore surprised to find myself thinking about two aspects of the language I had not considered before.

The first thought is on how words are constructed. You might have heard that Chinese doesn't have an alphabet, instead using radicals in some combinatorial fashion. It is strange to think of letters and words in Chinese. Not knowing how linguists classify the language (that is, IANAL: I Am Not A Linguist), Chinese seems to jump straight from pen strokes to morphemes. Each radical (usually) has its own meaning, and often may be a "word" by itself. Word is in quotes because, while it is the small free-standing unit, it is often not sufficient to refer to objects. For example, "lions" may be written as by itself, but if mentioned in isolation 獅子 is more often used. In this sense Chinese phrases look more like idioms, except there are also other Chinese idioms with less transparent meanings. Furthermore, each individual word may be used in multiple phrases, which gives that word by itself some flexibility in meaning.

What most textbooks don't mention, however, is how new words (in the combination-of-radicals sense) are often created. There is a common, but not universal, pattern in Chinese words, where one portion of the word gives the pronunciation and another portion gives the semantic association. These portions, especially those for semantics, are often radicals, but they can often be entire words. The word for lion, for example - - contains the word for master - . Indeed, the two words are homophones of each other, and the remaining radical - - is often used in words related to animals. This compositional nesting is similar to the prefix and suffix system in English, which allows the creation of words like anti-dis-establish-ment-arian-ism. I suppose one could call it an infix system - the prime English example being abso-fucking-lutely - except that it's more specific than that. This allows authors to transcribe colloquial, spoken Chinese, which uses words and sounds which did not exist. One example of this is a Cantonese word for stuff - - which uses the traditional word plus the radical for mouth, semantically meaning that it's mostly a spoken word. The expressive power of this system has escaped me until now.

The second source of appreciation of Chinese comes from the ambiguous meaning of single words. The inspiration for this came from a restaurant in the Hong Kong Sheraton, called 雲海. This directly translates to "sea of clouds", which is decently poetic, but not succinct enough to be a restaurant name. Then there's the matter of connotations. The word , or cloud, can also be used to mean a large amount and in high density, usually as referring to people (雲雲人海). Similarly for sea, , which also connotes an unsurpassed depth and vastness. For two weeks, the question of how to properly translate this name popped into my head whenever I was bored. My best attempt, although its still missing some of the connotations, is "rolling clouds". Of course, it's also the case that "rolling" here cannot be directly translated back to Chinese. There's an art to translating more abstract concepts; it reminds me of the various translations of Jabberwocky, whose meaning is not so much written than connoted. For Chinese, I think, there's a power which comes from the ambiguity and flexible of each word, giving every phrase a deeper connotation than otherwise exists.

Displaying Calendars

I've talked about my small obsession with digital calendars before. I just want to mention that "logarithmic calendars" seem to be in vogue recently. Then I discovered the timeline widget from the MIT Simile project. It's a cool idea, displaying the same data in two separate views in different time scales. I also like how time is truly represented in a single dimension, allowing the user the scroll infinitely into the past or the future. Having played with it a little, my only complaint is that creating and syncing more than three such timelines really slows down the browser, which is unfortunately, as it would be cool to simultaneously see events on the day, week, month, year, and decade scales. I think for this to work there would also need to be a hierarchical classification of events. We definitely think of history in this way (World War II being a time period, but within which could be further divided into battles, and each battle into smaller engagements and skirmishes), and there's literature suggesting that our brains organize our past experiences in this way too (Conway, 1996. Autobiographical Memories and Autobiographical Knowledge). This makes me think of Gantt charts, but I have yet to see a good integration of logarithmic calendars (although I haven't really looked).
No comments

On My First Return to Hong Kong in Three Years

Potential titles for travel books on Hong Kong

  • Pollution (You Name It, We Got It)
  • Holy Shit, Chinese People!
  • What Personal Bubble?
  • Wrong Side of the Road, Dude!
  • ¿Hablas cantonés, mandarín e Inglés?

More seriously, some of the changes since the last time I was here (which was December 2008):

  • A lot of ads have Android/iPhone app icons, even Facebook links. QR codes doesn't seem to be quite as popular yet, although there's some similar system that seems local to Hong Kong.
  • All the steps on stairs seem really short. I can't tell if I've grown taller in the past three years, but I've noticed that I'm definitely above average height when on the trains. YES.
  • When I first got to the States I kept doing price conversions back to HKD. Now I do it the other way around, except that I have no clue what the baseline should be. 39 HKD seems instinctively more than 5 USD, but it's ultimately the same. Obviously, small numbers are inherently more likeable (since we see them more often). I wonder if there's literature on how the conversion rate influences spending...
  • I've lost a lost of my spatial memory of Hong Kong, even for places I would visit on a weekly basis. Then my parents' house has also been remodeled, so it's a little harder to get around.
  • Whenever I've stayed in the States for a while then come back, I get allergic to something in Hong Kong and would always have a runny nose and keep sneezing. This time I settled with a partially stuffed nose. I think being sick just before leaving somehow buffered whatever I was reacting to.
No comments

Font Fun

I'm heading to DC tomorrow for the AAAI Fall Symposium on Advances in Cognitive Systems. My paper got a poster acceptance, and so the last two weeks was spent wrangling with beamer. One can only focus on LaTeX for so long, so to pass time when I'm not sword fighting I decided to play around with some fonts. Can you guess the major web businesses that use the following fonts? Hover over the images for the answer.

While making the poster, my advisor noted that people sometimes confuse the Michigan block M wordmark with the Missouri block-M wordmark. So I looked up all the M states (turns out there's eight of them; my initial list left out Maine and Massachusetts), and collected their wordmark for comparison:

University of Maine

University of Maryland

University of Massachusetts

University of Michigan

University of Minnesota

University of Missouri

Mississippi State University

Montana State University
No comments

Book Tracking

Those of you who follow me on Twitter will know that I recently closed my Shelfari account. The reason for this is their restriction against exporting my library: some new policy in the last year required your profile to be at least 90% complete for your library to be exportable. This didn't use to be the case, and it understandably led to a lot of complaints (but which the Shelfari staff never justified). Getting up to that percentage required me to join a few groups (which I didn't want to) and add "friends" (which I don't have... kidding! *sob*). I joined a few generic groups, but really didn't want to contact other people to be friends, and eventually gave up on that process. The upshot of this is that I closed my account entirely.

I've therefore set up a new account at Goodreads. It has worked well for me so far; I particularly like how the tag system is done through a "hovering" dialog box, so I can very quickly move from one book to the next. In Shelfari this was through a model dialog, which required more clicks to do the same thing. In general Goodreads do a better job at user interface. There is a lot less AJAX crud, which makes the page load faster. I'm also keeping an eye on the recommendation system, although I don't have too high hopes for it. I suspect the system is much more useful for fiction than non-fiction, where people read largely similar books; with non-fiction, it's boring to read about the exact same topic over and over again.

Anyway, in the process of switching to Goodreads I had to export my library from Shelfari. Recall that I had given up on making Shelfari allow me to export my list. Instead, I loaded up their list of my books, then saved the HTML. I then wrote a quick script which extracted the authors and titles of books. Ah, the advantages of being a programmer... Here I hit a snag: Goodreads allows users to import books, but only by ISBN. Shefari, in its exported CSV file, contains those, but not on it's normal display page. Luckily, I had a backup file of my library, so I had ISBNs for the majority of my books, but not all of them. For the rest, I used the Library of Congress' Search via URL service, which would return detailed book information given a title and an author... which I have! Putting everything together took about an hour, manually verifying that the books were correct a little longer, but at the end of that I had completely moved my library with minimal loss of information. The other upshot is that I cleaned up my list a little, removing books that I'm no longer interested in.

To make sure that this wouldn't happen again, I checked the file that Goodreads would export. It had more information than Shefari, which was nice. What caught my eye was that, in addition to the ratings I gave my books, the exported spreadsheet also contained other reader's average ratings. Which allowed me to make the following plot:

The x-axis is the average rating of other people of any particular book on Goodreads, while on the y-axis is my own rating. The red crosses are the books on this scale, while the red line is a linear regression over these points. The blue line is y=x; that is, what the regression should look like if my ratings were exactly in line with the average reader. As you can see, I have a slightly lower opinion of books in general, especially on the lower end of the scale. Qualitatively, my tastes agree with the average reader, but the discrete ratings on my side makes it hard to give a good regression.

PS. A book I'm reading that is not listed on my shelf is Donald Knuth's The TeXbook. One might expect a book about a pseudo-programming language for typesetting to be dry, but Knuth makes it pretty interesting.

No comments

Scattered Thoughts

In lieu of writing single posts about each of these topics (which would require willpower I don't have), I've decided to give one paragraph abstracts of my thoughts. If any of these particularly interest you though, I might spend the time to refine and lengthen it.

Computer Science Education

The gist of this train of thought (and it's a fricking freight train) is that computer science education should be mandatory much earlier on, in the same way that maths is. There are many arguments for this, the most powerful argument being that while computers are now ubiquitous, many people still view them as magic, aka. "sufficiently advanced technology". Another powerful argument is that, like literature and mathematics and science, computer science teaches a different way of thinking. If science is the study of why (things are the way they are) and mathematics the study of what (is the relationship between structures and its implications), then computer science is the study of how (to achieve a specific effect through many small steps). I sincerely think that computer science is the study of process, of the "how" of things; programs are merely a formal language for describe how to change things from one state to another. Since this is about computer science education, I also think that computer science is relatively easy to teach, precisely because computers are everywhere. Students don't need to wait for the teacher's validation - if their program works, it works! On a negative note, a recent paper suggests that not everyone can become good programmers...

Mike Rowe's Testimony to Congress (transcript)

I heard about this from the Blogosaur, and another friend of mine agrees with her, but I have a slightly different opinion to share. I don't disagree that plumbers, welders, and other "dirty jobs" and skilled laborers are as necessary now as they are before, or that they are worthy of respect. What I disagree with is Rowe's suggestion that hard work is no longer valued, or the more blatant assertion that technology does not require hard work. Programmers at start-ups work no less hard than the skilled laborers of yesteryear, and these are the same people who would be fascinated by metalwork and paved roads and suspension bridges 40 years ago. Rowe talks of people who don't know how to fix things, who are afraid to get dirty; how many computer owners know how to fix their computer? Do computer technicians get any more face time with their customers than plumbers? Market economics suggests that as wages for skilled labor increase, this gap will be filled. It might not be filled by Americans, but more likely, people will be more willing to go into those jobs. If plumbers really are as essential to our society as psychiatrist, is it so bad an idea that we should pay more for the latter? Isn't that, in itself, a reflection of our culture's valuation of plumbers?

PS. This goes nicely with my thoughts.

Finite and Infinite Games (wikipedia)

I read this on a friend's suggestion a few years ago. I recently found my notes for it, reported back to my friend, and have since thought a lot more about its subject. The big take away for me is that you can only lose a game if you are playing a game, and you can only play a game if you willingly join it. A corollary is that if you're losing a game, you can always decide to play a larger game instead - until you play the largest game of all (aka. life), in which no one can lose. It supports something I've come to firmly believe: if you don't like something, either change it or change yourself. It also fits into the "Ha Ha Only Serious" hacker mindset, as well as why I derailed my philosophy class onto unicorns for 15 minutes. But that's a different story.

Teach For America (wikipedia)

Some of you may know that I had planned on Teach For America as my backup in case I wasn't accepted into any grad schools. I abandoned my application when I was accepted into Michigan, but I also did a little more research, primarily by reading Donna Foote's Relentless Pursuit. Hindsight is 20/20 and it sounds like sour grapes, but I have philosophical disagreements with the TFA philosophy. While their goal of education equality is commendable, I don't think having a bunch of recent graduates teaching for two years is the best way to do it. This approach is inherently transient, despite the stated goal of hoping TFA fellows will go on to impact education at the policy level. Another stated assumption, that good leaders will be good teachers, may also be unjustified, and that's without taking into account whether the people they hire are good leaders. I personally believe that to become a good teacher one must first know and love the subject, which is more than I can say for most graduates. That applicants to the program has surged in recent years provides additional evidence that people are not seeing it as an opportunity to solve the education problem, but merely as a sentence on their resume (if, of course, they make it through the program). Undoubtedly, my views would be different if I did go through the program, but I would like to think that at least I've done enough teaching outside of TFA to know that I like it.
No comments

Circular Logic - "glow"

The NYTimes has a regular column on math puzzles. I don't usually look at them, but when I do I prefer Dos Equis... what? Oh yes, Numberplay. Most of the time I can't be bothered to figure out the answer, but one of the questions this week happens to be computationally easy. The question is:

Consider the word "glow." If you replace each letter with its counterpart in a mirror alphabet you will get the legitimate word "told." What other words exhibit this same property?

So I started wrote a little script in Python:

#!/usr/bin/env python3
import re
if __name__ == "__main__":
    src = open("/usr/share/dict/cracklib-small", "r")
    words = set()
    for word in src:
        word = word.strip()
        if len(word) == 1 or re.match('[^a-z]', word):
    for word in words:
        mirror = "".join(chr(219-ord(c)) for c in word)
        if mirror in words:
            print(word, mirror)

This script uses the computer's dictionary file (which I've used before), mutates the letters, then checks if the result is in the dictionary. The script outputs:

all   zoo
ark   zip
art   zig
blip  york
de    wv
dr    wi
drib  wiry
elm   von
era   viz
err   vii
fir   uri
fm    un
ge    tv
girl  trio
girt  trig
glib  tory
glow  told
gm    tn
gs    th
hob   sly
hold  slow
holt  slog
holy  slob
horn  slim
ir    ri
irk   rip
iv    re
ivy   reb
levi  over
low   old
lug   oft
md    nw
me    nv
mix   nrc
mn    nm
mrs   nih
ms    nh
nh    ms
nih   mrs
nm    mn
nrc   mix
nv    me
nw    md
oft   lug
old   low
over  levi
re    iv
reb   ivy
ri    ir
rip   irk
slim  horn
slob  holy
slog  holt
slow  hold
sly   hob
th    gs
tn    gm
told  glow
tory  glib
trig  girt
trio  girl
tv    ge
un    fm
uri   fir
vii   err
viz   era
von   elm
wi    dr
wiry  drib
wv    de
york  blip
zig   art
zip   ark
zoo   all

As a sanity check, notice that "glow" does indeed turn into "told" (and vice versa).

Problem solved in 10 minutes.

PS. I would have commented on the post, but I have no clue what my NYTimes password is.
No comments


I read a book recently (or maybe it was the internet; I don't remember) where the author talked about how history has been male dominated and asked the question, "what happened to herstory?" Well, if you want to play that game...

I used a short regex to find all the words in my computer's dictionary file which started with "his", "man", and "male". I chose a few to systematically replace with "her", "woman", and "female", with slight hand-tuned adjustments for spelling. Here are some particularly funny ones (with commentary).

NOTE: Before you yell at me to say that not all feminists do things as pointless as wordplay, I know. I'm using the word "feminist" (and related terms) below to refer to those who do play these games.

  • femalefactor - female criminals; also, the critical element behind every successful man
  • femalevolent -  things like the silent treatment; also, witches
  • herpanic - what Spanish women would do if they found out about this change
  • herred - past tense of the sound female snakes make
  • womanager - your female boss
  • womanatee - another name for baby tees
  • womandrake - a shapely plant
  • womaneuver - the special way females handle vehicles
  • womangled - what every women's hair is when they wake up
  • womanhole - *ahem*
  • womania - what Freud and folk psychology called "hysteria" (hersteria?)
  • womanifold - laundry; not to be confused with "womanifolds"
  • womanipulate - actually, this is the etymology of the word "manipulate"
  • womanservant - subject of a lot of male fantasy
  • womanslaughter - it seems like the unmodified version fits the feminist movement better

We can, of course, go further. See if you can identify the original word for these:

  • abdowomen - the medical name for a pregnant belly
  • accompaniwoment - another word for escort
  • antidisestablishwomentarianism - the longest word in the English language
  • Archeredes - a virtually-unknown female ancient Greek mathematician
  • ewomancipation - the process of women obtaining political rights and equality under the law
  • hashersh - why feminism feels good
  • homomorphersm - another word for lesbianism
  • husbgyny - the domestication of women
  • hywoman - a membrane around the penis which breaks on first sexual intercourse
  • multidiwomensional - the idea that women cannot be rated on a single scale
  • portwomanteau - a famous species of wine grape
  • rowomantics - the behaviors of the female crew team
  • serapher - another word for angel
  • whersical - this list
  • womenstruate - actually, I have no clue what this means

I'll all for feminism, but really: there are more important things to talk about than sewomantics. Also, study etymology.

PS. Also see the ad fenimam logical fallacy.


Twitter Wordle

A summary of my tweets (created with Wordle):

A poor excuse for not writing, I know. I'm hoping to have something up soon, on computer science education.
No comments

Some Trends

I've been playing with graph creators lately. No, not the mathematical graphs... well, I mean... the other type of mathematical graphs (aka. plots). In particular, I was trying my hands at gnuplot, and of course I needed data. Where better than to plot trends from my own journal?

The x-axis shows time; each point represents the average over 6 months, taking the first/later half of the year. The y-value is the percentage of entries over this period which contains this word (or variants thereof; for example, the Twitter plot also includes the word "tweet"). This is all generated programmatically - I extended my journal script to calculate the percentage of entries containing given terms and output it in table (space-separated value) form, which gnuplot then reads. Since the data is highly specific, my gnuplot is also particular to this application. I abused gnuplot's ability to ignore non-existent columns and left it using 5 columns. The source is below:

#!/usr/bin/env gnuplot

set terminal png

set title "Prevalence of Term over Time" font "Arial,16"

set border 3

set xdata time
set timefmt "%Y-%m"
set format x "%Y"
set xlabel "Date" font "Arial,12"
set xtics nomirror rotate by -45
set mxtics 0

set ylabel "Percent Entries with Term" font "Arial,12"
set yrange [0:100] 
set ytics 10 nomirror
set key autotitles columnheader enhanced reverse outside font "Arial,12"
set style data lines # linewidth 2 ?
set grid

# plot [raw_data] using [x-col]:[y-col] [attributes]...
plot "data.csv" using 1:2 linewidth 2, \
             "" using 1:3 linewidth 2, \
             "" using 1:4 linewidth 2, \
             "" using 1:5 linewidth 2, \
             "" using 1:6 linewidth 2, \
             "" using 1:7 linewidth 2, \
             "" using 1:8 linewidth 2

In the future I'm considering writing a script that generates gnuplot scripts. It'll be easier than remember all these ugly commands.

And here are the resulting plots. The first one is on my use of online services. You can clearly tell when I started using Reddit and how quickly it dominated my life. The growth of Twitter is much slower comparatively. Facebook is huge mostly because it's my main source of how other people are doing.

Next are some hobbies of mine, at least those I can think of off the top of my head. It's obvious when I started climbing more often (and therefore better); it's also clear that when I teach at CTY, it becomes a big part of my life. As a computer science major and now a grad student, I clearly code more than the chart shows; one could consider that as habituation, I guess.

Here's another one, on academic topics. These were all topics I considered studying in college. I eventually majored in computer science and got a certificate in engineering design, but took extra classes in all these subjects. I think it's neat that these topics has all been mentioned more often over time - that I'm still into the same subjects I was in high school and college... except design, which I haven't done any of since junior year.

I would love to do more of these, but I can't think of anymore sets of terms to compare. Comment if you can.
No comments

Personal Responsibility

Author's note: I know this is pretty bad writing, but I've been struggling to put this idea down in words and this is the best I could do.

On the bus today I overheard a conversation about someone only taking 12 credits, then saying "but it's not my fault I'm only taking 12 credits." That got me angry, I've been thinking about the idea of personal responsibility for the past month - and trying to write about it , unsuccessfully - and that kind of excuse is the last thing I want to hear. The main idea is that people are responsible for themselves no matter the circumstance. In psychology we recently read Neil Postman, who wrote that TV and newspaper are destroying people's capacity to understand longer pieces of language, which was more common before when people listed to 7 hour speeches and debates (which is also where Lincoln-Douglas debates came from). But it's not the case that books don't exist anymore, just that TV and newspaper, with its short, out-of-global-context pieces, are much more attention grabbing and much less attention requiring. If people want they can still choose to read, but they don't. To blame the media for this is to take the responsibility away from the individual, over what the individual could control but chose not to. Either you accept you made the choice and live with it, or you get out.. To say you didn't have the choice is to say you're not in charge of your life - and in that case, you deserve all the misery you feel.

I don't know when I acquired this sense of individual responsibility - there are traces of my high school existentialism class in this - but recent news events made me think about it more often. There was the WikiLeaks diplomatic cable leak, where I believe the intelligence officer Bradley Manning should be court-martialed but that WikiLeaks is crime free The justification is that Manning signed a contract/swore an oath to keep data confidential, and leaking the cable constitutes violating that contract and taking back his word. WikiLeaks made no such promise, and their actions are at worst inappropriate or distasteful., but in no way illegal. There was the case of two people who used a bug in casino video poker machines and used it to win jackpots. They are being sued by the casino for fraud - but, I assume, the developers are not suffering any consequences. This case is particularly egregious because at a casino you're expected to try and win. You may be thrown out or barred for counting cards- again, legal because casinos are private property (although look at Atlantic City) - but courting cards is not against the law. How is exploiting a bug in video poker any different? It is the developer of the machine who are ultimately responsible. If there was no way to load poker in the house's favor, casino's wouldn't be playing it, except in this case the poker machine is faulty.

There's the case of a stalker using information from Facebook to guess people's security questions on email accounts, then blackmailing them with explicit photos from there. I'm not saying the stalker is guilt free, for he is in violation of the law- and he chose to blackmail these people. But the victims are also naive to use such easily answerable security questions, not to mention taking and keeping explicit photos of themselves.

In my mind I extend this responsibility of choice to a person's entire life course. To think that people didn't know dropping out of school will result in a hard life is absurd. It's not that dropouts always end up nowhere - look at Bill Gates and Steve Jobs - but its that people chose that course and later complain about it, or blame the circumstances for their suffering without considering their complicity in ending there in the first place. There are things we have no control over - one could be born with AIDS, or into war, or be abused - and even then the circumstance should not be blamed for all the failures of the victim. Being forced to immigrate by your parents due to bankruptcy does not justify spending romance novels all day and only ever working server positions.

Let me be clear on this part, in case you haven't gotten it yet. I don't think it's bad to read novels all day and serve as a waitress for the rest of your life. If that has been your greatest desire since kindergarten, I think it's wonderful you have achieved your ultimate - if perhaps small - life goal. My problem is when you find that situation is not to your liking, and your refuse to accept responsibility for being in
that situation nor take initiative in changing it. It's your fucking life, fucking live up to it.

PS. I am aware that there are gray areas in the above philosophy. There are circumstances that, although preventable, would have required inhuman foresight. Flood insurance may have made life easier, but if you live where floods are unlikely, then it's potentially excusable. It comes down to what "common sense" dictates, which is debatable. But the big idea here is not to prevent all negative experiences, but to not use that as a crutch preventing happiness.

PPS. Related reading: The Rise of the New Global Elite.
No comments