Sunday, February 4, 2018

The Multilingual Idioms List

Linguaholic created a crowdsourcing project, The Multilingual Idioms List. I think two things are new in this project.

  • As far as I know, there was never a dictionary that pairs idioms and only idioms from different languages. It's true that numerous dictionaries of idioms for a specific language have been published. The explanations or definitions of the idioms may be in the same language as the idioms, or in a different language. When they are in a different language (called target language for the sake of argument), more often than not a matching idiom in the target language cannot be found, and a wordy explanation is provided. The Multilingual Idioms List project handles this situation differently: leaving the entry blank on the target language side. This is actually a good thing. It either positively acknowledges such lack, or catches readers' attention and waits for other native speakers to find a good idiom in later times.
  • The List is multilingual, not limited to two languages. Unlike any published dictionary of idioms where the source and target languages differ, the contributors, or in a sense lexicographers, of the crowdsourcing List are not language professionals. This is not a big problem since the List is not a highly technical dictionary. The big advantage, on the other hand, is that the contributors are almost all native speakers. This is significant because good or even correct usage of idioms is very much dependent on real life experience in the language environment. Being native may be more relevant to this project than being professional if being both is not possible.

Today, I made a small contribution to the List, by adding the column Chinese (since no one before me had done that), and providing a dozen or so idioms, as follows:

a bitter pill不得不吞的苦果
a piece of cake小菜一碟
Achilles' heel软肋
add insults to injury雪上加霜;往伤口上撒盐
an arm and a leg倾家荡产
beat around the bush拐弯抹角
best of both worlds两全其美
bite the bullet硬着头皮上
burn the midnight oil开夜车
cast in stone板上定钉
cat nap打个盹儿
from A to Z从头到尾
from scratch从零开始
have eyes in the back of one's head眼观四路,耳听八方
hit the road上路
let the cat out of the bag抖包袱
kick the bucket见阎王
off the hook如释重负

In Chinese, there are different types of idioms. 成语 (literally probably "solidified or invariable phrases") are more formal and literary, mostly of four characters, such as "自相矛盾" ("self-contradictory"), "纸上谈兵" ("talk of military strategy (only) on paper"). 歇后语 (literally "sentences said after taking a rest") are colloquial proverbs, such as "和尚打伞,无法无天" ("A monk holds up an umbrella. No hair|law. No sky.", or "The dharma is obscured and heaven blocked."). Obviously some idioms are in neither category, and yet are expressions that cannot be literally interpreted, such as "硬着头皮上", literally "go ahead with hardened scalp", which I consider matching "bite the bullet" in English.

I can think of one improvement that may be made on the current List. It would be nice to provide a place to enter the literal translation of an idiom and optionally a brief explanation. For instance, I would love to add that the Chinese idiom "软肋" for "Achilles' heel" literally means "soft rib" because the rib bone is relatively weak and fragile, and that "雪上加霜" for "add insults to injury" literally means "add frost on top of snow", a phrase that may not need an explanation. With these additions, the List would be more fun to read. So for instance, we'll know that instead of "beat around the bush", the Chinese "make turns and scratch corners" ("拐弯抹角"), and the French "turn around the pot" ("tourner autour du pot") instead. While English-speaking people consider Greek a difficult language ("It's all Greek to me!"), the Chinese language is regarded by by far the most other peoples; "Chinese" occurs 24 times out of about 100, compared to 12 for "Greek", on the Wikipedia page for Greek to me. Through this List, we know a little more about different cultures. But technical limitation for the List is understandable; it is in the format of a spreadsheet, where adding two more columns (literal meaning and explanation) for each language would make the list too hard to read. Other options include adding comments to the spreadsheet cell, where the comments are not shown unless the mouse is over the cell.

Overall, this is a great project. I hope they'll set up a Wikipedia page, with versions in many different languages contributed by the same volunteers that build the List.

Saturday, November 11, 2017

Chinese translation of a poem by Kahlil Gibran

Kahlil Gibran (1883 – 1931) was an accomplished Lebanese poet. His well-known poem On Children

Your children are not your children.
They are the sons and daughters of Life's longing for itself.
They come through you but not from you,
And though they are with you, yet they belong not to you. 
has been translated into Chinese as follows:
or in another version:

The second line, plainly paraphrased, means that the children are the offspring or outcome of the longing of Life for itself. Here Life acts as an entity as if it exists in space and time. It tries to find itself, and in the process, are born the children who appear to belong to you, the addressee of the author. The Chinese rendering of this abstract description, "生命为自己所渴望的儿女", is a grammatically perplexing one. Let's build up from the basics. "他所渴望的是工作" is "What he longs for is a job". Based on that model, "自己所渴望的" must mean "what (someone/something) he/she/it-self longs for", or here specifically, "what (something) itself longs for". (I added "someone" or "something" solely to work around the problem that the word he/she/it-self alone cannot stand alone.) Now, if we substitute Life for this something, therefore, "what Life itself longs for" or "生命自己所渴望的" in Chinese, that doesn't match the original meaning; the author intends to say the children are the outcome of the longing, not of what Life longs for. Life longs for itself and this longing process begets the children. Unfortunately, the translation "生命为自己所渴望的儿女" is not saying the same thing, either. In fact, it says something a native Chinese speaker has trouble understanding. I can't even think of a good literal translation of this ambiguous and possibly ungrammatical phrase. In contrast, the second translation, "他们是生命对于自身渴望而诞生的孩子" is a good one, thanks to the extra word "诞生" added by the translator. Literally it says "They are the children born out of Life's longing for itself", which is remarkably close to Gibran's original.

The third line is deceivingly simple. What does the author exactly mean by "through you but not from you"? The first Chinese translation, "他们是借你们而来,却不是从你们而来", uses "借" (v. "to borrow"; prep. "with the help of") for "through", and "从" for "from". The second translation, "他们借助你来这世界,却非因你而来", uses "借助" ("with the help of") for "through", and "因" ("because", "because of", "due to") for "from". Both translations interpret "through you" as "with the help of you". The first literally renders "from", while the second changes it to "because of". I checked the translations of this line into a few other languages. For example
Spanish: Vienen a través vuestro, pero no de vosotros.
French: Ils viennent à travers vous mais non de vous.
German: Sie kommen durch dich, aber nicht von dir.
Italian: Tu li metti al mondo, ma non li crei.
Only the Italian version does not literally translate the prepositions "through" and "from" in the original poem. Instead, the sentence means, plainly put, "You put them into the world, but do not create them."

The Italian rendering, in my opinion, has gone a little too far from the author's possibly deliberate wording that borders on mischievous play of words. Similarly, the Chinese translations, which change the author's "through" to "with the help of" and (in one case) "from" to "because of", would be frowned upon by the author. We know that unlike scholarly translation which should be literal, some or even a great deal of flexibility is allowed in translation of literary especially poetic works. But the Spanish, French and German translations I found all stubbornly stick to the literal mapping of the two prepositions. My take on this is that if the original poem can be understood in its original language and also in the translated language with literal translation, no word change should be made, and I believe that is exactly the case here. We can make sense of "They come through you but not from you" if we use a good analogy. Imagine the scene in which bright sunlight shines through the window and comes into the room. This sunlight (the children in Gibran's poem) comes through the window glass (you) and yet it is not truly from the window or glass, but from the sun. In this interpretation, the light travels literally through the glass, without the help of the glass (contrary to both Chinese interpretations), without the glass somehow putting the light down into the room (contrary to the Italian interpretation), and having no cause-and-effect relation with the glass (contrary to the second Chinese translation). The light belongs to the sun because the sun created it. The light can come into the room simply because only the window out of the whole external wall is transparent. Gibran's "through you but not from you", when likened to "through the window glass but not from the glass", is a clever play of the prepositions and yet makes perfect sense. There is no need to replace them unless misunderstood. The best Chinese translation may simply be a literal one, "他们通过你而来,却不是从你而来". If needed, a translator's note can be provided to help the reader. Anything else will likely tarnish the beauty of this line.

Thursday, September 14, 2017

Language difficulty

Chinese has been widely considered to be one of the most difficult languages in the world. What constitutes the difficulty of a language? Can it be measured and how? Whenever someone posts a message about language difficulty on a forum, it almost always generates a heated discussion. Comments range from "English is the easiest because the verbs have minimum conjugations and nouns have no gender", "Chinese and Japanese are hard because there're too many characters or kanji's", to "No language is inherently more difficult than any other because native speakers grow up speaking it with about the same effort", and "Language difficulty is subjective perception", to name a few.

Most language enthusiasts on various forums are not scholars. The diversity of those opinions is a result of no good definition of language difficulty. But we can tell that most people are referring to the difficulty experienced by an adult (not a young child) in learning a foreign language (not mother tongue), and in many cases the adult's native language is English. If we qualify the discussion with these requirements, i.e.

  • the learner is an adult;
  • the language whose difficulty is evaluated is learned by the adult as a foreign language;
  • the difficulty is evaluated when the adult's native language is specified
then a measurement of language difficulty becomes meaningful.

I believe that in many social sciences, there are two general methods to measure a quantity, internal and external. For example, in linguistics, a researcher can define a set of factors pertinent to the correlation between orthography (spelling) and pronunciation in order to calculate the orthographic depth of a language, i.e. "the degree to which a written language deviates from simple one-to-one letter-phoneme correspondence". Alternatively, one can simply conduct a controlled study among a group of people (cohort) and see which language causes how many spelling errors in dictation or in a similar experiment.

When it comes to rating language difficulty, we can devise a set of rules and individually assess each language against these rules and then sum the rule ratings (with weights); e.g., percentage of words that have cognate or loan relationship with the words in the learner's native language, whether the nouns have genders and cases, how many variations in verb conjugation, whether the dominate word order differs from that of his native language, etc. But so far I'm not aware of such internal research, even though it's doable.

The external evaluation, on the other hand, has been done and is widely quoted. The most well-known data for English native speakers are from Defense Language Institute of the US, where they statistically measure the time for the learners to take in achieving a certain language proficiency level. The official Web page for this study is, duplicated below for your convenience.

  • Category I languages, 26-week courses, include Spanish, French, Italian and Portuguese.
  • Category II, 35 weeks, includes German and Indonesian
  • Category III, 48 weeks, includes Dari, Persian Farsi, Russian, Uzbek, Hindi, Urdu, Hebrew, Thai, Serbian Croatian, Tagalog, Turkish, Sorani and Kurmanji
  • Category IV, 64 weeks, includes Arabic, Chinese Mandarin, Korean, Japanese and Pashto
The earliest version of this data was on a Webpage of Dr. William Baxter of the University of Michigan, which he got "from documents I got at a workshop of some kind" (private email). But Dr. Baxter later removed it from his Website, so you have to reference it from, duplicated below.

Languages included
(Languages regularly offered at the University of Michigan are in capital letters; this is NOT a complete list)

Hours of instruction required for a student with average language aptitude to reach level-2 speaking proficiency

Speaking proficiency level expected of a student with superior language aptitude, after 720 hours of instruction
GROUP I Afrikaans, Danish, DUTCH, FRENCH, Haitian Creole, ITALIAN, Norwegian, PORTUGUESE, Romanian, SPANISH, Swahili, SWEDISH 480 3
GROUP II Bulgarian, Dari, FARSI (PERSIAN), GERMAN, (Modern) Greek, HINDI-URDU, INDONESIAN, Malay 720 2+ / 3
GROUP III Amharic, Bengali, Burmese, CZECH, Finnish, (MODERN) HEBREW, Hungarian, Khmer (Cambodian), Lao, Nepali, PILIPINO (TAGALOG), POLISH, RUSSIAN, SERBO-CROATIAN, Sinhala, THAI, TAMIL, TURKISH, VIETNAMESE 720 2 / 2+
That data differs from DLI's current data in not a small way. I had some email exchanges with DLI but they didn't explain these discrepancies.

I've found one possible source about the difficulty in learning a few languages as a foreign language when the adults' native language is NOT English, a book by Robert Marzari, Leichtes Englisch, schwieriges Französisch, kompliziertes Russisch. The book content is unavailable on either Amazon or Google Books; hence my "possible source". Without reading the book, I interpret the book title as English < French < Russian in order of difficulty (which may be intuitive to most polyglots). And it's not clear whether that order is for all Europeans in general (i.e. average) or for a specific native-language group.

Unfortunately, I'm not aware of any other research on this topic. But as you can already see, an otherwise hot topic can be made quite cool by the above analysis, cool as opposed to hot or debatable, and cool in the sense of being interesting.

Monday, July 10, 2017

Tian Ji's horse racing and the electoral vote system

[The following was written on November 11, 2016.]

The author of the famous military strategy book The Art of War, Sun Wu, commonly known as Sun Tzu, had a descendent, Sun Bin, who also wrote a book with the same title. In ca. 340 BC, Sun Bin advised his patron Tian Ji at a horse racing event and won the race. The following is the excerpt from Sima Qian's Records of the Grand Historian about this interesting story:

(The ambassador of the Qi state went to the Liang state. Sun Bin as a convicted criminal went to visit and talk to him secretly. The Qi ambassador regarded Sun as valuable and carried him back to Qi. Tian Ji, the Qi general, gave him a warm reception. Ji and some princes often betted heavily on horse racing. Mr. Sun saw that all the horses were about equally capable, rated superior, average, and inferior. So Sun advised Tian Ji, "Sir, you just bet heavily. I'll make you win." Tian Ji trusted him and betted a thounsand units of gold with the king and the princes. Right before the race, Mr. Sun said, "Use your inferior horse to race with his best horse, use your average horse to race with his inferior horse, and use your best horse to race with his average horse." After three rounds, Tian Ji lost one and won two of the three rounds, and carried away one thousand units of gold. Then Ji recommended Mr. Sun to the King Wei, who interviewed Sun on military tactics and assigned him as the Chief of Staff.)

Fast forward to 2016. We see that the electoral vote in the US presidential race matters while the popular vote does not and that the two votes mathematically represent two different winners in this 2016 presidential race. Although neither Hillary Clinton nor Donald Trump can move her or his supporters from one state to another, there is similarity between the electoral vote system and Tian Ji's winning strategy. If democracy is the name given to the principle of the minority obeying the majority, the popular vote is the only true democracy. (As of this writing, Clinton has won 60,274,974 popular votes, while Trump has won 59,937,338.)

The reasons for some people to decide to not vote are (A) equal dislike of the candidates; (B) lack of interest in politics; (C) living in a non-swing state, one person's vote matters little. Group C may be small. But it's the only one out of the three that would make a difference if the American electoral vote system were abolished or even mitigated (by adjusting the weights i.e. the electors assigned to different states, e.g.). If that happened, swing states would have lower voter turnout and non-swing states would have higher. But since there're fewer swing states than non-swing states, the total popular vote count would be higher.

Sunday, April 16, 2017

自由: "freedom" or "liberty"?

A Chinese reader asked me about the difference between "freedom" and "liberty" when translating Chinese "自由" into English. We can find many answers with a Google search for "difference between freedom and liberty". One article maintains that "Freedom is a state of being capable of making decisions without external control", while liberty "is freedom which has been granted to a people by an external control". And some like this laboriously attempt to make a clear distinction between these two words.

Having read a handful of such answers but not satisfied with any of these, I told the person asking me the question: 1. the etymology of the two words differs; 2. in general usage, "liberty" is more abstract and philosophical than "freedom". Other than these two points, there is no difference, but in different contexts, only one of the two words is more common. For example, nowadays we say "freedom of speech", not "liberty of speech". (But see the ngram figure in Appendix 1.) We say "Liberty, Equality, Fraternity", not "Freedom, Equality, Fraternity". These set phrases are by convention, just as in Chinese idiom "破釜沉舟" ("cut off all means of retreat", "decide to fight to death"), not "破釜沉船", even though "舟" and "船" are completely synonymous.

Making distinctions between words is so intriguing that someone even builds a Web site dedicated to this task. Language professionals and general public alike are fond of writing articles on these topics. While many such articles are valuable contributions to the correct usage in English, there is one common deficiency not fully recognized: the judges are the native speakers of the language, not linguists or scholars. An age-old debate among lexicographers is relevant here: Should a dictionary be prescriptive, directing people toward correct or supposedly correct usage, or be descriptive, faithfully documenting the actual usage in the native speaker community? Nowadays there may be more dictionaries in the latter category, presumably consistent with the increased level of public education. In the case of "freedom" vs. "liberty", if enough people, not English-as-a-foreign-language learners but native speakers, ask the question about their difference, the very fact that they ask this is a sign that the distinction, if there is a theoretical one, hardly exists in practice. Instead of making a great effort to separate them, it would be better to acknowledge, in modesty, the lack of difference between them.

The prescriptive-descriptive dichotomy, however, only applies to everyday language usage. In academic fields, especially of science and technology, but to some extent, of social sciences and humanities as well, the "prescriptive" approach should be supported. Take osteoarthritis as an example. An educated English speaker would think this meant inflammation (-itis) of bone (osteo-) joint (-arthr-). But it is not. Then, should the distinction between "freedom" and "liberty", if non-existing in practice, be made in the academic circle as two different terms in social sciences or humanities, followed by educative admonition to the public about the research outcome? Scholars have the freedom of research and can make any distinction between any pair of words in their research. In fact, social scientists and particularly philosophers habitually do that. As to whether the distinction should be imposed to the public, No!


Appendix 1

This figure is the Google ngram showing the historical usage of "freedom of speech" and "liberty of speech". We can see that from the mid-19th century on, "freedom of speech" has significantly gained in usage over "liberty of speech". But before that time, it only had slightly higher usage frequency.

Appendix 2

Some Weibo users gave me a few helpful pointers on this topic. One user informed me that political theorist and philosopher Isaiah Berlin's Four Essays on Liberty used "freedom" and "liberty" interchangeably. Two other users directed me to political scientist Hanna Pitkin's Are Freedom and Liberty Twins? According to Pitkin, most people don't make a distinction between these two terms, but Hannah Arendt is an exception. However, the author questioned Arendt's distinction from the point of view of political science as well as etymology (see the bottom of p.6 and p.9 of the article).

Monday, January 9, 2017

Comparison of Chinese and Western Etymology

In my last post, I said "Most languages in the world take the alphabetic writing system. Studying the internal history of its vocabulary primarily means analyzing phonological and morphological changes through time." In this post,[note1] I'll expand on that point and contrast that with the Chinese tradition.

Take the word language as an example. In English, we read

late 13c., langage "words, what is said, conversation, talk," from Old French langage "speech, words, oratory; a tribe, people, nation" (12c.), from Vulgar Latin *linguaticum, from Latin lingua "tongue," also "speech, language," from PIE *dnghu- "tongue" (see tongue (n.)).
The -u- is an Anglo-French insertion (see gu-); it was not originally pronounced. Meaning "manner of expression" (vulgar language, etc.) is from c. 1300. ...

Source: Online Etymology Dictionary

In Spanish, we have

idioma m. language. [LL. idiōma: id. <Gk. idiōma: peculiarity (as lang.) <idiousthai: to make one's own <idios. See idio-.]; idiomático,ca a. idiomatic. [Gr. idiōmatikos: particular.]
Source: A Comprehensive Etymological Dictionary of the Spanish Language with Families of Words based on Indo-European Roots by Edward A. Roberts, 2014.

And most importantly, in French, we have

LANGUE, sf. a tongue; formerly lengue, from L. lingua. For in=en=an see § 71, and Hist. Gram. p. 48. — Der. langage, languette.
Source: An Etymological Dictionary of the French Language by Auguste Brachet, 1882.

The reason for my praise "most importantly" is that Auguste Brachet, the "romanistischer Autodidakt"-turned-professor according to (German) Wikipedia, created a monumental masterpiece in not just French etymology but etymology in general. In addition to what a regular etymologist would do, such as tracing the word form to its etymons in the same or other languages, Mr. Brachet systematically summarized the rules of the morphological and phonological changes and applied them to individual words in his dictionary. In the said example, he noted that for the derivation of in < en < an in the development of Latin lingua to French langue, the reader can consult his rule 71 in the book, where he says

I in Latin position [i.e. "when followed in the Latin word by two consonants" according to him, a convention not exactly the same as adopted today; my note] is changed to e in Merovingian Latin: thus fermum, ..., for firmum, ...' and this e, pronounced ei (see § 66), has produced two distinct French forms, according as it has preferred the open è sound, or the i sound.

You can choose to follow up to rule 66 in this book and p.48 of his A Historical Grammar of the French Tongue for more information about these sound (phonological) as well as spelling (orthographic) changes.

Western etymological publications may be divided into two groups: (1) dictionaries that give etymons or source words; (2) scholarly books and research articles on phonological and orthographic changes over time. Mr. Brachet's dictionary is unique in that it merges the two into one, so that the reader is conveniently offered the explanation of sound changes right in the headword entry, obviating the need to research as to why, e.g., the first i in *linguaticum would change to a in the history of the English word language.

However, a word contains more than its sound and spelling, but its meaning as well, which etymology cannot avoid tracing. But as linguist Calvert Watkins warned us, it is "more hazardous to attempt to reconstruct meaning than to reconstruct linguistic form". Sense development is much less researched and also less described in dictionaries. Unlike phonology, semantics or the study of meanings of words is not easily subject to formal (as in "formal logic") structural analysis. And yet tracing the sense development is the primary task of Chinese etymology. Chinese phonological development is a separate field of study; it is not incorporated in etymology, because the meaning of Chinese characters (or words, whose meanings are almost always based on the component characters) is largely dissociated from the sound. Take the character 文 ("text") as an example.

Source: 谢光辉《汉语字源字典》, 北京大学出版社, 2000年, 29页
Translation of the embedded text: "文" is a pictographic character. "文" in oracle bone script (甲骨文) and bronze inscription script (金文) resembles a standing person facing forward. His chest bears tattoo of decorative patterns. This is in fact a vivid description of the ancient "文身" (tattoo) custom. Thus the original meaning of "文" was a person with tattoo on his body, as well as pattern, texture. Later, the meaning was extended to character, article, culture, civilization etc.

That was a typical entry of Chinese character etymology. For simple characters especially pictographic ones, it is simply pure 依类象形 or description of the object according to what it looks like. The focus is on the meaning, not the reading or sound. Some more complicated characters may be decomposed into elements each of which is analyzed the same way, as in the case of "秦" (see my last post).

Needless to say, the majority of the characters (at least 80%) are of the type 形声字 or characters of form and sound, such as "指" (finger; to point), where the form radical "扌" suggests the meaning, i.e. something related to hand, and the sound component "旨" suggests its reading , i.e. zhǐ. The classical Shuowen Jiezi (说文解字) dictionary, unsurprisingly, points out that this character "从手旨聲" (the meaning is based on "手" and the sound on "旨").

Similarities and differences between Chinese and Western etymology can also be revealed from the definition of the word etymology itself. The Webster dictionary defines it as "the history of a linguistic form (as a word) shown (1) by tracing its development since its earliest recorded occurrence in the language where it is found, (2) by tracing its transmission from one language to another, (3) by analyzing it into its component parts, (4) by identifying its cognates in other languages, or (5) by tracing it and its cognates to a common ancestral form in an ancestral language" (I added the parenthesized numbers). Thus we see that most western dictionaries with etymological information meet the requirements (1) and (2), sometimes (3). Wiktionary and Friedrich Kluge's An Etymological Dictionary of the German Language also meet (4) and (5) most of the time. What if we apply these requirements to Chinese character etymology? (1) is often met if we interpret it as finding the first occurrence in history, which nowadays is made drastically easy with the aid of a computer-based search. But tracing its development in the course of long history, either inside Chinese or (2) across different languages, is rarely done. (3) is done, though with significant differences from that in western languages. (4) and (5) are rare because they're mostly irrelevant to Chinese characters.

How is analyzing a Chinese character into its components special compared to the western tradition? While a character e.g. "指" can be analyzed into "扌" (for meaning) and "旨" (for sound), there is no systematic change of a component from one form to another. Take rule 126, one of the many summarized by Mr. Brachet for French, as an example, "Before a, initial c ... passes through the successive aspirated sounds k'h, tk'h, kch, ch." He supports this rule of ca- > ch- with about 80 words as evidence, champ < campus, chien < canis, etc. Can we construct an analogy of this rule and find supporting examples in Chinese etymology? Since Chinese does not use an alphabetic writing system, there's hardly any need in dealing with the sound change of a character in etymology. Instead, we may substitute the change in form of a character. For example, after studying the 金文, 小篆, and 楷体 forms of "指" and other characters with "扌" on the left side, we may conclude that all (or most) such characters have gone through the predictable change of this radical in these forms, just as the French ca- changed to ch-. Similarly, all or most characters with "旨" on the right side probably went through the same change as shown here (see the row for 字源演变). Thus, we find in etymological studies a parallel between Chinese and western languages in identifying common component change in characters or words.

However, Chinese etymological dictionaries are also interested in finding the "root cause" of the most basic characters. Because the characters are ultimately from pictographs in origin, this "root cause" finding is mostly "依类象形" (describing the object according to what it looks like). If we must find a parallel for this practice in western etymology, it is equivalent to answering the question why e.g. the Proto-Indo-European stem from which Modern English word word is ultimately derived is *were-, that is, why that sound. Obviously, except for some onomatopoeias, there is no answer, or no such research. While Chinese etymologists have forged ahead in that direction, so far this "research" is, I'm afraid, very much based on guess work, simply because there is no record left in history about why a specific character was invented to be of that form. "文" may indeed be a symbol for a person with tattoo, with no hard proof anyway. But this is too error-prone. In my last post, I quoted the article 许慎为何将象释成母猴——“为”字趣释 (Why did Xu Shen interpret an elephant as a female monkey: interesting interpretation of character "为"). In a recent weibo blog post, a scholar interpreted, purely based on its resemblance, "夷" in its original oracle bone script as a person squatting, while in 《汉语字源字典》 (Dictionary of Chinese Character Etymology) by another scholar in this field, it was thought to represent a man bound by ropes, to be served as a slave or for sacrifice. On this stretch of imagination, I have but one comment: "汉字字源,看图识字,见仁见智" (Chinese character etymology / Look at pictures and learn to be literate / Trust your opinions and beliefs).

[note1] Due to the unique nature of the Chinese language, etymology can be of characters as well as words. This post is about character etymology.

Sunday, September 4, 2016

Why is it rare to see Chinese etymology?

People speaking English as the native language are used to dictionaries in which each headword contains not only the definition of the word and example phrases or sentences, but also brief etymology, as in this example in the Merriam-Webster dictionary for the word word.

Middle English, from Old English; akin to Old High German wort word, Latin verbum, Greek eirein to say, speak, Hittite weriya- to call, name
First Known Use: before 12th century

A Chinese dictionary, on the other hand, almost never gives the etymology. In this blog posting, I'll try to explain why.

For the sake of discussion, we need to make a distinction between two types of Chinese dictionaries. Due to the nature of the Chinese language, the English word dictionary (or its equivalent in most other languages) can mean either "字典" (literally "character-dictionary") or "词典" also written as "辞典" (literally "word-dictionary") in Chinese. I have not seen a dictionary for general Chinese words published by anyone that contains etymological information for the headwords.[note1] Thereinafter, a Chinese etymological dictionary only refers to a character-dictionary.

The disappointment at lack of an etymological dictionary of Chinese words does not extend to that for a dictionary of Chinese characters or 字典. Back in the Eastern Han dynasty (25–220 AD), the scholar Xu Shen (c. 58 – c. 147 CE) wrote the monumental dictionary Shuowen Jiezi (literally "Explaining Graphs and Analyzing Characters" according to Wikipedia). Since Xu lived in a period only one thousand or less years after a large number of Chinese characters were invented, the etymology he gave in the book for each of the 9000 plus characters is mostly trustworthy. Take the character "秦" (qín) as an example. (This character is significant in that it is the ultimate source for the word China in English or its equivalent in most other languages in the world. Two other sources of the word referring to China are Khitan as in the case of Russian, and silk.)

(The fief given to the descendant of Boyi. The land is suitable for crops. The character has a meaning based on "禾" ("crop") and contains an abbreviation or syncope of the character "舂". Another theory claims that this character is the name of a crop. This character in Zhouwen script [a script used just before the time of the First Emperor], "𥠼", is based on "秝". Pronounced with the initial consonant of 匠 combined with the final of 鄰.)

This is an excellent example of Chinese character etymology; it not only describes the source of the character but also analyzes the morphology or form of the character, as evidenced by the construction of "秦" through "禾" and part of "舂". The significance of Xu's book in the history of the Chinese language is such that almost two millennia later, scholars are still using his book in research. The only major revision came after the 1899 discovery of oracle bones, which the Shang dynasty (c. 1600 BC–c. 1046 BC) people used for divination. The oracle bone script predates Xiaozhuan script, the primary source for Xu Shen's character etymology because the latter is the earliest script known to Xu. Owing to this gap of knowledge, Xu inevitably made numerous mistakes in his otherwise near-perfect dictionary. One good example can illustrate the point. In the article 许慎为何将象释成母猴——“为”字趣释 (Why did Xu Shen interpret an elephant as a female monkey: interesting interpretation of character "为"), the author explained how the simple character "为", meaning "for" or "to do" nowadays, evolved from the oracle-bone pictograph depicting a man holding an elephant leash but mistaken for a female monkey by Xu Shen. (By the way, elephants indeed roamed around middle and northern China three thousand years ago, but the species was not the same as in southern China or India today.)

With all the background information, now we may answer the question why it is rare to see Chinese etymology. By that I don't mean you can't find character etymology at all. Books such as 《汉语字源字典》 ("Dictionary of Chinese Character Etymology") and the Web site Chinese Etymology by Richard Sears are available. But this is almost never incorporated into a Chinese dictionary other than a specialized etymological dictionary. If a general English reader is not more academically inclined than a Chinese reader, why does a common English dictionary such as the Webster, American Heritage, or OED (Oxford English Dictionary) include etymology without hesitation? The reason may be that Chinese (character) etymology almost never helps a reader in studying the Chinese language due to the long history and evolution of the character. (Can you stretch your imagination far enough to associate the scene of a man and an elephant with the sense of "for" or its slightly older sense of "to do"? See above.) In addition to the long history, I believe there's another, more subtle, element in clouding the Chinese etymology. Most languages in the world take the alphabetic writing system. Studying the internal history of its vocabulary primarily means analyzing phonological and morphological changes through time; e.g., there was a systematic change of f to h in Spanish for a large number of words. Secondly, less conducted is the semantic evolution of words; it's less done because it is "more hazardous to attempt to reconstruct meaning than to reconstruct linguistic form" as linguist Calvert Watkins said. And yet, the Chinese characters rarely went through systematic morphological changes that apply to a large number of characters and, since Chinese is not based on an alphabetic writing system, phonological changes are not conducive to the study of etymology per se. This leaves a large part of Chinese etymology to the study of semantic evolution, which is, as stated, more error-prone in scholarly reconstruction.

There is another reason for not incorporating etymology in Chinese dictionaries. Many characters originate from pictographs or pictograph-like glyphs such as Xiaozhuan script. Publication has to render them as images instead of text, which is an editorial inconvenience. The images with their explanatory texts take a significant amount of space relative to the definitions and examples in usage, which a regular user cares more about. This is in contrast with the etymology in an English dictionary, which can be made brief and still makes sense to the minority of interested readers. And yet a third reason may be that it's just the custom of Chinese lexicography, i.e. no etymology except in specialized dictionaries. This is probably also the reason why dictionaries of other languages than English lack etymology. (Try to find etymology in any dictionary of Spanish, French, German or Italian in a bookstore or library!) But nobody knows the original cause or reason for this custom.

Therefore, unlike a language where a student may make use of etymology in vocabulary study optionally combined with some mnemonics (as demonstrated in my book for Spanish), the Chinese characters have to be studied in a different way. Etymology comes in handy only for the very first few characters, such as "火" ("fire"), "山" ("mountain"), which are frequently used to impress complete beginners. After 10 or 20 such "pictographs", rote memory is commonly adopted, but books such as Tuttle Learning Chinese Characters that laboriously make up mnemonics are helpful. Fortunately, a large portion of the character repertoire consists of characters combining two parts, one more or less representing its meaning and the other representing the sound. However, in none of these cases would etymology play any role.

[note1] By emphasizing "general", I'd like to point out that a special group of Chinese words, 成语 (idioms), are an exception, in that dictionaries of Chinese idioms almost always give the first occurrence of the idioms and sometimes even briefly describe the sense development as well.
With regard to dictionaries of words in general, one may think of the book 《辭源》, literally "word origin". First published in 1925, it takes a misleading title because it's no more than a dictionary (albeit of high-quality) of Chinese words with no etymology. In fact, even if we take an alternative interpretation of "辭源" as "first occurrence of word", this book fails as well; e.g., the entry for "中国" does not list its first occurrence in the Book of Documents, or the bronze inscription which the Book records. Another book we can even more readily dismiss is the 《詞源》 by Zhang Yan in the Song dynasty because the book is on the subject of the literary genre , not "words".

Saturday, August 13, 2016

Translation of "technical"

The dictionary translation of "technical" is "技术的", as in "technical skill", "technical innovations". But the word is often used in a more general, "non-technical", context, particularly as an adverb, "technically", e.g., "Technically, driving at 31 mph at a speed limit of 30 is speeding." In this case, instead of "技术的", a very natural Chinese equivalent may be "严格说来" (strictly speaking).

Another example (modified from the original),

--- begin quote ---
the problems are technical, not systemic. Afterward, when she told her sister they had named the problems as "technical," her sister responded “What does that mean?” Indeed that was the question I had, because the discussion was not about technical issues at all
--- end quote ---

The word "technical" literally translated to "技术的" in this context indeed causes confusion to people not speaking English at all, but might make some sense if the Chinese knows a little English. A more meaningful translation, I think, would be "具体操作的", as "这些问题是有关具体操作的,而不是整体上的(或体制上的)". But if the reader or listener is moderately proficient in English, the translation "这些问题是有关技术性细节的" works, too.

Saturday, May 28, 2016

"Oriental" is not derogatory

On May 20th, Obama signed a bill that removes "Negro," "Oriental" and a few other terms from federal laws, specifically, "striking 'a Negro, Puerto Rican, American Indian, Eskimo, Oriental, or Aleut or is a Spanish speaking individual of Spanish descent' and inserting 'Asian American, Native Hawaiian, a Pacific Islander, African American, Hispanic, Puerto Rican, Native American, or an Alaska Native'." The bill, sponsored by New York congresswoman Grace Meng, an Asian American born in 1975, focused on the word "Oriental" but included other derogatory terms such as "Negro".

No doubt "Negro" is offensive, derogatory, reminding us all of the dark history of slavery. But does "Oriental" have the same effect to arouse a mental image of Chinese exclusion, coolies, or other more subtle discriminations in later decades? As an Asian American myself who came to the United States in early 1990's, I say No to this specific question. Discrimination against Asian Americans has never been completely eliminated and takes different forms from those against, say, African Americans: secretly raising college entrance standard, racial slurs in public broadcast with impunity, and others. But it never occurred to me that the word "Oriental" would be offensive to me in any way. About twenty years ago, I worked at a lab, where we all shared one telephone. One day the phone rang. My coworker, a white technician, came to me saying, "It's for you. The guy has an Oriental accent". That sounded absolutely normal to me. Interestingly, now I just realize that the word "Oriental" was indeed rarely used in recent years. In fact, I don't recall hearing it again in daily conversation ever since. But that may be just due to a natural evolution of the English language in which some words gain and some words lose popularity, instead of people's realization of the newly acquired offensive sense.

I'm not the only Oriental, a.k.a Asian, that considers the word neutral. Two years ago, a reader commented on an article saying "the word 'Oriental' is still widely used here in Japan". I want to add that the word is also commonly used as part of English translations for thousands if not millions of hotels, restaurants, all kinds of businesses in China, including the famous 东方明珠, officially named Oriental Pearl Tower, the tallest structure in China from 1994–2007 and one of the most visited places in Shanghai. Right after Obama signed the bill, an Asian American wrote My 'Oriental' Father: On The Words We Use To Describe Ourselves on Her father emigrated from Hong Kong to the US in 1969 and has always insisted on using the term "Oriental" to refer to himself and the style of his Chinese restaurant, in spite of the author's repeated reminders that the term has picked up an offensive connotation over the years. Readers of the article generally consider "Oriental" to be neutral as well. I can't agree more with the following comment currently at the top:

As a dumpy old white guy, I have never thought of Oriental as a disrespectful term. Yet, regardless of my feelings on the matter, if someone feels marginalized by the term, it shouldn't be a problem for me to use a word or phrase that they find more appropriate.

That being said, there is indeed a distinction we can make between self-referral and referral-to-others, as one reader comments

This is a critical point that is very different from words used by others to describe each of us. Your wife [referring to another reader's comment] is comfortable referring to herself as "Oriental," like the author's father. But it may be different for her if someone else uses the same word in a different way, such as "it is hard to tell what Orientals are thinking" or "inscrutable Oriental."

That is because there is often a need to consider intent (versus ignorance) in the words used by others to describe each of us. A shift to geographically based terms like European, African, Asian reduces that need somewhat.

Very well said! However, whether a word becomes derogatory should follow a simple "democracy" rule, so to speak. If a large number of people speaking this language use the word in a derogatory sense, it is so. If not, it is not. There's no magic. It's a descriptive rule not, in this case, challenged by prescriptive linguists or scholars, but ironically, challenged by some young generation Asian Americans, up to Congresswoman Grace Meng, good intentions notwithstanding. Although eliminating one word from our vocabulary or limiting its use to specialized areas is harmless, if we continue to move words into the dictionary of tabooed language, our life will nevertheless become increasingly more inconvenient.

By the way, it would be interesting to find the origin of the new, allegedly derogatory, connotation of "Oriental", something no article I've read touched upon. It's not likely that one single incident or a fictional scene created such a dramatic effect. Certain young Asian Americans may have suffered from weak and implicit unfairness in whose context the word "Oriental" was used. If this wild guess is completely unfounded, another source of this connotation may be a continuation and re-surge of Orientalism most famously expounded by Palestinian-American scholar Edward Said in late 1970's. In a Foreign Policy article Chinese Is Not a Backward Language, the author uses the term "Orientalism 2.0" as a label for the re-emerging notion of western superiority and corresponding eastern inferiority. Is there a causal association with "Oriental" derogation? The Orientalist ideas are largely restricted to the academic circles. If the derogatory sense of "Oriental" has truly been felt by mostly scholars and "leaked" to some highly educated young Asian Americans, that may indeed be the origin of the new connotation we are looking for, and it's consistent with the fact that the general public is not aware of the semantic evolution.

Saturday, March 12, 2016

English "can" and Chinese "会"

An auxiliary verb is one that cannot be used alone and must work with a regular verb. English "can" is an example, e.g. "I can speak Chinese", where the verb "speak" cannot be omitted. But in the case of Chinese "会", both "我会说中文" and "我会中文" are perfectly grammatical. In this blog posting, we'll compare the English "can" with its Chinese counterpart "会" particularly in the context of language study.

The sentence "我会中文" must be translated to English as "I know Chinese", or "I can [a verb such as speak] Chinese", but not "I can Chinese", because "会" is used as a regular transitive verb, a usage not existing for English "can". In the first translation here, "会" matches "know". But if you mull over the connotation, there's a subtle nuance that easily escapes our attention. To know is to have knowledge. "I know Chinese" implies that I have knowledge of this language, a passive knowledge not readily leading to an action. The Chinese "会", on the other hand, often suggests a more active role, and "我会中文" is more accurately translated to "I can [a verb such as speak] Chinese" than to "I know Chinese". The only problem with this "more accurate" translation is that we can't assume "会" is unambiguously "can speak"; of the various aspects of the language skill, speaking is only one, parallel with reading, writing and listening comprehension.

There seems to be a deficiency in second language education in China when compared to that in other countries. "哑巴英语" (literally, "mute or dumb English"), referring to English education with emphasis on scoring high on paper tests at the expense of speaking skills, was and probably still is widespread in China. But language study in other countries is generally in a better shape, where someone said to know a language is assumed to be able to speak that language. As a result, "我中文" and "I can speak Chinese" become equivalent in real-life situations.

It's obvious that Chinese "会" is used as an auxiliary verb when it's followed by a regular verb, just like English "can". When "会" is followed by a noun, a usage missing for English "can", it is a full-fledged regular verb. In this sense, "会" means "be capable of" or "know" as in "know a language". The noun that follows must represent a type of skill. A language is probably the most common example. But many other skills work as well, e.g., "他会魔术" ("he can do magic", "he knows how to perform magic"), "他会书法" ("he can do calligraphy", "he's good at calligraphy"), "他会量子力学" ("he knows quantum mechanics", although this English sentence may be better interpreted as "他懂量子力学"). In other cases, it becomes ambiguous whether the object is a noun or verb, e.g., "我会游泳" ("I can swim", "I know how to swim"), where "游泳" can be both a noun and a verb.

Chinese is not the only language where the verb "会" may function not only as an auxiliary verb but also as a regular verb. In the Facebook Polyglots group, one German learner asks, "Why do I come across sentences where the main verb is left out; 'Ich kann Deutsch auch'....Where is 'Sprechen'?!". That's simply because the German word "können" (for which "kann" is the first person singular form) serves as a regular verb here. Interestingly, the question asks "Where is 'Sprechen' [speak]?", consistent with the above observation that "speaking" is the dominant or default aspect of the language skill.