Saturday, November 11, 2017

Chinese translation of a poem by Kahlil Gibran

Kahlil Gibran (1883 – 1931) was an accomplished Lebanese poet. His well-known poem On Children

Your children are not your children.
They are the sons and daughters of Life's longing for itself.
They come through you but not from you,
And though they are with you, yet they belong not to you. 
has been translated into Chinese as follows:
or in another version:

The second line, plainly paraphrased, means that the children are the offspring or outcome of the longing of Life for itself. Here Life acts as an entity as if it exists in space and time. It tries to find itself, and in the process, are born the children who appear to belong to you, the addressee of the author. The Chinese rendering of this abstract description, "生命为自己所渴望的儿女", is a grammatically perplexing one. Let's build up from the basics. "他所渴望的是工作" is "What he longs for is a job". Based on that model, "自己所渴望的" must mean "what (someone/something) he/she/it-self longs for", or here specifically, "what (something) itself longs for". (I added "someone" or "something" solely to work around the problem that the word he/she/it-self alone cannot stand alone.) Now, if we substitute Life for this something, therefore, "what Life itself longs for" or "生命自己所渴望的" in Chinese, that doesn't match the original meaning; the author intends to say the children are the outcome of the longing, not of what Life longs for. Life longs for itself and this longing process begets the children. Unfortunately, the translation "生命为自己所渴望的儿女" is not saying the same thing, either. In fact, it says something a native Chinese speaker has trouble understanding. I can't even think of a good literal translation of this ambiguous and possibly ungrammatical phrase. In contrast, the second translation, "他们是生命对于自身渴望而诞生的孩子" is a good one, thanks to the extra word "诞生" added by the translator. Literally it says "They are the children born out of Life's longing for itself", which is remarkably close to Gibran's original.

The third line is deceivingly simple. What does the author exactly mean by "through you but not from you"? The first Chinese translation, "他们是借你们而来,却不是从你们而来", uses "借" (v. "to borrow"; prep. "with the help of") for "through", and "从" for "from". The second translation, "他们借助你来这世界,却非因你而来", uses "借助" ("with the help of") for "through", and "因" ("because", "because of", "due to") for "from". Both translations interpret "through you" as "with the help of you". The first literally renders "from", while the second changes it to "because of". I checked the translations of this line into a few other languages. For example
Spanish: Vienen a través vuestro, pero no de vosotros.
French: Ils viennent à travers vous mais non de vous.
German: Sie kommen durch dich, aber nicht von dir.
Italian: Tu li metti al mondo, ma non li crei.
Only the Italian version does not literally translate the prepositions "through" and "from" in the original poem. Instead, the sentence means, plainly put, "You put them into the world, but do not create them."

The Italian rendering, in my opinion, has gone a little too far from the author's possibly deliberate wording that borders on mischievous play of words. Similarly, the Chinese translations, which change the author's "through" to "with the help of" and (in one case) "from" to "because of", would be frowned upon by the author. We know that unlike scholarly translation which should be literal, some or even a great deal of flexibility is allowed in translation of literary especially poetic works. But the Spanish, French and German translations I found all stubbornly stick to the literal mapping of the two prepositions. My take on this is that if the original poem can be understood in its original language and also in the translated language with literal translation, no word change should be made, and I believe that is exactly the case here. We can make sense of "They come through you but not from you" if we use a good analogy. Imagine the scene in which bright sunlight shines through the window and comes into the room. This sunlight (the children in Gibran's poem) comes through the window glass (you) and yet it is not truly from the window or glass, but from the sun. In this interpretation, the light travels literally through the glass, without the help of the glass (contrary to both Chinese interpretations), without the glass somehow putting the light down into the room (contrary to the Italian interpretation), and having no cause-and-effect relation with the glass (contrary to the second Chinese translation). The light belongs to the sun because the sun created it. The light can come into the room simply because only the window out of the whole external wall is transparent. Gibran's "through you but not from you", when likened to "through the window glass but not from the glass", is a clever play of the prepositions and yet makes perfect sense. There is no need to replace them unless misunderstood. The best Chinese translation may simply be a literal one, "他们通过你而来,却不是从你而来". If needed, a translator's note can be provided to help the reader. Anything else will likely tarnish the beauty of this line.

Thursday, September 14, 2017

Language difficulty

Chinese has been widely considered to be one of the most difficult languages in the world. What constitutes the difficulty of a language? Can it be measured and how? Whenever someone posts a message about language difficulty on a forum, it almost always generates a heated discussion. Comments range from "English is the easiest because the verbs have minimum conjugations and nouns have no gender", "Chinese and Japanese are hard because there're too many characters or kanji's", to "No language is inherently more difficult than any other because native speakers grow up speaking it with about the same effort", and "Language difficulty is subjective perception", to name a few.

Most language enthusiasts on various forums are not scholars. The diversity of those opinions is a result of no good definition of language difficulty. But we can tell that most people are referring to the difficulty experienced by an adult (not a young child) in learning a foreign language (not mother tongue), and in many cases the adult's native language is English. If we qualify the discussion with these requirements, i.e.

  • the learner is an adult;
  • the language whose difficulty is evaluated is learned by the adult as a foreign language;
  • the difficulty is evaluated when the adult's native language is specified
then a measurement of language difficulty becomes meaningful.

I believe that in many social sciences, there are two general methods to measure a quantity, internal and external. For example, in linguistics, a researcher can define a set of factors pertinent to the correlation between orthography (spelling) and pronunciation in order to calculate the orthographic depth of a language, i.e. "the degree to which a written language deviates from simple one-to-one letter-phoneme correspondence". Alternatively, one can simply conduct a controlled study among a group of people (cohort) and see which language causes how many spelling errors in dictation or in a similar experiment.

When it comes to rating language difficulty, we can devise a set of rules and individually assess each language against these rules and then sum the rule ratings (with weights); e.g., percentage of words that have cognate or loan relationship with the words in the learner's native language, whether the nouns have genders and cases, how many variations in verb conjugation, whether the dominant word order differs from that of his native language, etc. For lack of a better term, we may call this an internal evaluation.

The external evaluation, on the other hand, has been done and is widely quoted. The most well-known data for English native speakers are from Defense Language Institute of the US, where they statistically measure the time for the learners to take in achieving a certain language proficiency level. The official Web page for this study is, duplicated below for your convenience.

  • Category I languages, 26-week courses, include Spanish, French, Italian and Portuguese.
  • Category II, 35 weeks, includes German and Indonesian
  • Category III, 48 weeks, includes Dari, Persian Farsi, Russian, Uzbek, Hindi, Urdu, Hebrew, Thai, Serbian Croatian, Tagalog, Turkish, Sorani and Kurmanji
  • Category IV, 64 weeks, includes Arabic, Chinese Mandarin, Korean, Japanese and Pashto
The earliest version of this data was on a Webpage of Dr. William Baxter of the University of Michigan, which he got "from documents I got at a workshop of some kind" (private email). But Dr. Baxter later removed it from his Website, so you have to reference it from, duplicated below.

Languages included
(Languages regularly offered at the University of Michigan are in capital letters; this is NOT a complete list)

Hours of instruction required for a student with average language aptitude to reach level-2 speaking proficiency

Speaking proficiency level expected of a student with superior language aptitude, after 720 hours of instruction
GROUP I Afrikaans, Danish, DUTCH, FRENCH, Haitian Creole, ITALIAN, Norwegian, PORTUGUESE, Romanian, SPANISH, Swahili, SWEDISH 480 3
GROUP II Bulgarian, Dari, FARSI (PERSIAN), GERMAN, (Modern) Greek, HINDI-URDU, INDONESIAN, Malay 720 2+ / 3
GROUP III Amharic, Bengali, Burmese, CZECH, Finnish, (MODERN) HEBREW, Hungarian, Khmer (Cambodian), Lao, Nepali, PILIPINO (TAGALOG), POLISH, RUSSIAN, SERBO-CROATIAN, Sinhala, THAI, TAMIL, TURKISH, VIETNAMESE 720 2 / 2+
That data differs from DLI's current data in not a small way. I had some email exchanges with DLI but they didn't explain these discrepancies.

[Update 2018-04]
Dr. Robert Marzari, the author of Leichtes Englisch, schwieriges Französisch, kompliziertes Russisch, kindly sent me a summary of the result of his research and granted me permission to post it here.

In my book I tried to evaluate the difficulty of seven European languages (English, French, Spanish, Italian, Russian, Polish - and German) for a German speaking learner; for the evaluation of the German language I imagined a Romance speaker, i.e. a mixture of a French, Italian and Spanish speaker. The results of the evaluation therefore do not show absolute degrees of complexity, but rather relative degrees of difficultness, i.e. relative to a German or Romance speaker.
   If you could get hold of my book (at a University library perhaps?) just take a look at the charts on pages 269 to 275: On these charts I give the results of my evaluation of those seven languages according to the linguistic subsystems of phonetics, writing system, grammar, lexicon and textual structurization (i.e. reading difficulty).
   According to these the degree of a learner`s difficulty is as follows:
     active competence  passive competence  complete competence
     (speaking+writing)          (reading)
Spanish   29 points         11 points          40 points
English   33 points         13 points          46 points
Italian   35 points         13 points          48 points
French    43 points         10 points          53 points
Russian   51 points         15 points          66 points
German    50 points         18 points          68 points
Polish    54 points         16 points          70 points

This excellent research indicates that a German native speaker rates language difficulty as Spanish < English < Italian < French < Russian < Polish, which is quite consistent with many polyglots's experience, although reading has a slightly different order. Apparently this research uses an internal evaluation (see above for a description), rating various aspects of a language instead of checking students' learning challenge. Thus, placing German in this language list makes sense even though the German learners speak a different native language, a Romance language instead of German.

Unfortunately, I'm not aware of any other research on this topic. But as you can already see, an otherwise hot topic can be made quite cool by the above analysis, cool as opposed to hot or debatable, and cool in the sense of being interesting.

Monday, July 10, 2017

Tian Ji's horse racing and the electoral vote system

[The following was written on November 11, 2016.]

The author of the famous military strategy book The Art of War, Sun Wu, commonly known as Sun Tzu, had a descendent, Sun Bin, who also wrote a book with the same title. In ca. 340 BC, Sun Bin advised his patron Tian Ji at a horse racing event and won the race. The following is the excerpt from Sima Qian's Records of the Grand Historian about this interesting story:

(The ambassador of the Qi state went to the Liang state. Sun Bin as a convicted criminal went to visit and talk to him secretly. The Qi ambassador regarded Sun as valuable and carried him back to Qi. Tian Ji, the Qi general, gave him a warm reception. Ji and some princes often betted heavily on horse racing. Mr. Sun saw that all the horses were about equally capable, rated superior, average, and inferior. So Sun advised Tian Ji, "Sir, you just bet heavily. I'll make you win." Tian Ji trusted him and betted a thounsand units of gold with the king and the princes. Right before the race, Mr. Sun said, "Use your inferior horse to race with his best horse, use your average horse to race with his inferior horse, and use your best horse to race with his average horse." After three rounds, Tian Ji lost one and won two of the three rounds, and carried away one thousand units of gold. Then Ji recommended Mr. Sun to the King Wei, who interviewed Sun on military tactics and assigned him as the Chief of Staff.)

Fast forward to 2016. We see that the electoral vote in the US presidential race matters while the popular vote does not and that the two votes mathematically represent two different winners in this 2016 presidential race. Although neither Hillary Clinton nor Donald Trump can move her or his supporters from one state to another, there is similarity between the electoral vote system and Tian Ji's winning strategy. If democracy is the name given to the principle of the minority obeying the majority, the popular vote is the only true democracy. (As of this writing, Clinton has won 60,274,974 popular votes, while Trump has won 59,937,338.)

The reasons for some people to decide to not vote are (A) equal dislike of the candidates; (B) lack of interest in politics; (C) living in a non-swing state, one person's vote matters little. Group C may be small. But it's the only one out of the three that would make a difference if the American electoral vote system were abolished or even mitigated (by adjusting the weights i.e. the electors assigned to different states, e.g.). If that happened, swing states would have lower voter turnout and non-swing states would have higher. But since there're fewer swing states than non-swing states, the total popular vote count would be higher.

Sunday, April 16, 2017

自由: "freedom" or "liberty"?

A Chinese reader asked me about the difference between "freedom" and "liberty" when translating Chinese "自由" into English. We can find many answers with a Google search for "difference between freedom and liberty". One article maintains that "Freedom is a state of being capable of making decisions without external control", while liberty "is freedom which has been granted to a people by an external control". And some like this laboriously attempt to make a clear distinction between these two words.

Having read a handful of such answers but not satisfied with any of these, I told the person asking me the question: 1. the etymology of the two words differs; 2. in general usage, "liberty" is more abstract and philosophical than "freedom". Other than these two points, there is no difference, but in different contexts, only one of the two words is more common. For example, nowadays we say "freedom of speech", not "liberty of speech". (But see the ngram figure in Appendix 1.) We say "Liberty, Equality, Fraternity", not "Freedom, Equality, Fraternity". These set phrases are by convention, just as in Chinese idiom "破釜沉舟" ("cut off all means of retreat", "decide to fight to death"), not "破釜沉船", even though "舟" and "船" are completely synonymous.

Making distinctions between words is so intriguing that someone has even built a Web site dedicated to this task. Language professionals and general public alike are fond of writing articles on these topics. While many such articles are valuable contributions to the correct usage in English, there is one common deficiency not fully recognized: the judges are the native speakers of the language, not linguists or scholars. An age-old debate among lexicographers is relevant here: Should a dictionary be prescriptive, directing people toward correct or supposedly correct usage, or be descriptive, faithfully documenting the actual usage in the native speaker community? Nowadays there may be more dictionaries in the latter category, presumably consistent with the increased level of public education. In the case of "freedom" vs. "liberty", if enough people, not English-as-a-foreign-language learners but native speakers, ask the question about their difference, the very fact that they ask this is a sign that the distinction, if there is a theoretical one, hardly exists in practice. Instead of making a great effort to separate them, it would be better to acknowledge, in modesty, the lack of difference between them.


Appendix 1

This figure is the Google ngram showing the historical usage of "freedom of speech" and "liberty of speech". We can see that from the mid-19th century on, "freedom of speech" has significantly gained in usage over "liberty of speech". But before that time, it only had slightly higher usage frequency.

Appendix 2

Some Weibo users gave me a few helpful pointers on this topic. One user informed me that political theorist and philosopher Isaiah Berlin's Four Essays on Liberty used "freedom" and "liberty" interchangeably. Two other users directed me to political scientist Hanna Pitkin's Are Freedom and Liberty Twins? According to Pitkin, most people don't make a distinction between these two terms, but Hannah Arendt is an exception. However, the author questioned Arendt's distinction from the point of view of political science as well as etymology (see the bottom of p.6 and p.9 of the article).

Appendix 3

The prescriptive-descriptive dichotomy, however, only applies to everyday language usage. In academic fields, especially of science and technology, but to some extent, of social sciences and humanities as well, the "prescriptive" approach should be supported. Take osteoarthritis as an example. An educated English speaker would think this meant inflammation (-itis) of bone (osteo-) joint (-arthr-). But it is not. Then, should the distinction between "freedom" and "liberty", if non-existing in practice, be made in the academic circle as two different terms in social sciences or humanities, followed by educative admonition to the public about the research outcome? Scholars have the freedom of research and can make any distinction between any pair of words in their research. In fact, social scientists and particularly philosophers habitually do that. As to whether the distinction should be imposed to the public, No!

Monday, January 9, 2017

Comparison of Chinese and Western Etymology

In my last post, I said "Most languages in the world take the alphabetic writing system. Studying the internal history of its vocabulary primarily means analyzing phonological and morphological changes through time." In this post,[note1] I'll expand on that point and contrast that with the Chinese tradition.

Take the word language as an example. In English, we read

late 13c., langage "words, what is said, conversation, talk," from Old French langage "speech, words, oratory; a tribe, people, nation" (12c.), from Vulgar Latin *linguaticum, from Latin lingua "tongue," also "speech, language," from PIE *dnghu- "tongue" (see tongue (n.)).
The -u- is an Anglo-French insertion (see gu-); it was not originally pronounced. Meaning "manner of expression" (vulgar language, etc.) is from c. 1300. ...

Source: Online Etymology Dictionary

In Spanish, we have

idioma m. language. [LL. idiōma: id. <Gk. idiōma: peculiarity (as lang.) <idiousthai: to make one's own <idios. See idio-.]; idiomático,ca a. idiomatic. [Gr. idiōmatikos: particular.]
Source: A Comprehensive Etymological Dictionary of the Spanish Language with Families of Words based on Indo-European Roots by Edward A. Roberts, 2014.

And most importantly, in French, we have

LANGUE, sf. a tongue; formerly lengue, from L. lingua. For in=en=an see § 71, and Hist. Gram. p. 48. — Der. langage, languette.
Source: An Etymological Dictionary of the French Language by Auguste Brachet, 1882.

The reason for my praise "most importantly" is that Auguste Brachet, the "romanistischer Autodidakt"-turned-professor according to (German) Wikipedia, created a monumental masterpiece in not just French etymology but etymology in general. In addition to what a regular etymologist would do, such as tracing the word form to its etymons in the same or other languages, Mr. Brachet systematically summarized the rules of the morphological and phonological changes and applied them to individual words in his dictionary. In the said example, he noted that for the derivation of in < en < an in the development of Latin lingua to French langue, the reader can consult his rule 71 in the book, where he says

I in Latin position [i.e. "when followed in the Latin word by two consonants" according to him, a convention not exactly the same as adopted today; my note] is changed to e in Merovingian Latin: thus fermum, ..., for firmum, ...' and this e, pronounced ei (see § 66), has produced two distinct French forms, according as it has preferred the open è sound, or the i sound.

You can choose to follow up to rule 66 in this book and p.48 of his A Historical Grammar of the French Tongue for more information about these sound (phonological) as well as spelling (orthographic) changes.

Western etymological publications may be divided into two groups: (1) dictionaries that give etymons or source words; (2) scholarly books and research articles on phonological and orthographic changes over time. Mr. Brachet's dictionary is unique in that it merges the two into one, so that the reader is conveniently offered the explanation of sound changes right in the headword entry, obviating the need to research as to why, e.g., the first i in *linguaticum would change to a in the history of the English word language.

However, a word contains more than its sound and spelling, but its meaning as well, which etymology cannot avoid tracing. But as linguist Calvert Watkins warned us, it is "more hazardous to attempt to reconstruct meaning than to reconstruct linguistic form". Sense development is much less researched and also less described in dictionaries. Unlike phonology, semantics or the study of meanings of words is not easily subject to formal (as in "formal logic") structural analysis. And yet tracing the sense development is the primary task of Chinese etymology. Chinese phonological development is a separate field of study; it is not incorporated in etymology, because the meaning of Chinese characters (or words, whose meanings are almost always based on the component characters) is largely dissociated from the sound. Take the character 文 ("text") as an example.

Source: 谢光辉《汉语字源字典》, 北京大学出版社, 2000年, 29页
Translation of the embedded text: "文" is a pictographic character. "文" in oracle bone script (甲骨文) and bronze inscription script (金文) resembles a standing person facing forward. His chest bears tattoo of decorative patterns. This is in fact a vivid description of the ancient "文身" (tattoo) custom. Thus the original meaning of "文" was a person with tattoo on his body, as well as pattern, texture. Later, the meaning was extended to character, article, culture, civilization etc.

That was a typical entry of Chinese character etymology. For simple characters especially pictographic ones, it is simply pure 依类象形 or description of the object according to what it looks like. The focus is on the meaning, not the reading or sound. Some more complicated characters may be decomposed into elements each of which is analyzed the same way, as in the case of "秦" (see my last post).

Needless to say, the majority of the characters (at least 80%) are of the type 形声字 or characters of form and sound, such as "指" (finger; to point), where the form radical "扌" suggests the meaning, i.e. something related to hand, and the sound component "旨" suggests its reading , i.e. zhǐ. The classical Shuowen Jiezi (说文解字) dictionary, unsurprisingly, points out that this character "从手旨聲" (the meaning is based on "手" and the sound on "旨").

Similarities and differences between Chinese and Western etymology can also be revealed from the definition of the word etymology itself. The Webster dictionary defines it as "the history of a linguistic form (as a word) shown (1) by tracing its development since its earliest recorded occurrence in the language where it is found, (2) by tracing its transmission from one language to another, (3) by analyzing it into its component parts, (4) by identifying its cognates in other languages, or (5) by tracing it and its cognates to a common ancestral form in an ancestral language" (I added the parenthesized numbers). Thus we see that most western dictionaries with etymological information meet the requirements (1) and (2), sometimes (3). Wiktionary and Friedrich Kluge's An Etymological Dictionary of the German Language also meet (4) and (5) most of the time. What if we apply these requirements to Chinese character etymology? (1) is often met if we interpret it as finding the first occurrence in history, which nowadays is made drastically easy with the aid of a computer-based search. But tracing its development in the course of long history, either inside Chinese or (2) across different languages, is rarely done. (3) is done, though with significant differences from that in western languages. (4) and (5) are rare because they're mostly irrelevant to Chinese characters.

How is analyzing a Chinese character into its components special compared to the western tradition? While a character e.g. "指" can be analyzed into "扌" (for meaning) and "旨" (for sound), there is no systematic change of a component from one form to another. Take rule 126, one of the many summarized by Mr. Brachet for French, as an example, "Before a, initial c ... passes through the successive aspirated sounds k'h, tk'h, kch, ch." He supports this rule of ca- > ch- with about 80 words as evidence, champ < campus, chien < canis, etc. Can we construct an analogy of this rule and find supporting examples in Chinese etymology? Since Chinese does not use an alphabetic writing system, there's hardly any need in dealing with the sound change of a character in etymology. Instead, we may substitute the change in form of a character. For example, after studying the 金文, 小篆, and 楷体 forms of "指" and other characters with "扌" on the left side, we may conclude that all (or most) such characters have gone through the predictable change of this radical in these forms, just as the French ca- changed to ch-. Similarly, all or most characters with "旨" on the right side probably went through the same change as shown here (see the row for 字源演变). Thus, we find in etymological studies a parallel between Chinese and western languages in identifying common component change in characters or words.

However, Chinese etymological dictionaries are also interested in finding the "root cause" of the most basic characters. Because the characters are ultimately from pictographs in origin, this "root cause" finding is mostly "依类象形" (describing the object according to what it looks like). If we must find a parallel for this practice in western etymology, it is equivalent to answering the question why e.g. the Proto-Indo-European stem from which Modern English word word is ultimately derived is *were-, that is, why that sound. Obviously, except for some onomatopoeias, there is no answer, or no such research. While Chinese etymologists have forged ahead in that direction, so far this "research" is, I'm afraid, very much based on guess work, simply because there is no record left in history about why a specific character was invented to be of that form. "文" may indeed be a symbol for a person with tattoo, with no hard proof anyway. But this is too error-prone. In my last post, I quoted the article 许慎为何将象释成母猴——“为”字趣释 (Why did Xu Shen interpret an elephant as a female monkey: interesting interpretation of character "为"). In a recent weibo blog post, a scholar interpreted, purely based on its resemblance, "夷" in its original oracle bone script as a person squatting, while in 《汉语字源字典》 (Dictionary of Chinese Character Etymology) by another scholar in this field, it was thought to represent a man bound by ropes, to be served as a slave or for sacrifice. On this stretch of imagination, I have but one comment: "汉字字源,看图识字,见仁见智" (Chinese character etymology / Look at pictures and learn to be literate / Trust your opinions and beliefs).

[note1] Due to the unique nature of the Chinese language, etymology can be of characters as well as words. This post is about character etymology.