English for Chinese

Saturday, July 25, 2020

Adjective in the form of the past participle of an intransitive verb

First, a few grammatical terms. Everyone knows what an adjective is, like "big" in "a big car". Past participle (PP hereinafter) is a form of a verb that you use after "have" to indicate a completed action, like "opened" in "I have opened the door". A verb is intransitive when it is not followed by an object, like "happen" in "The incident happened", although it can be followed by a complement indicating time, place, etc. A transitive verb is followed by an object, like "hit" in "He hit him".

Sometimes the PP of a verb can be used as an adjective, like "opened" in "the opened jar", referring to the jar that was opened (by somebody), which is slightly different from "the open jar", where the speaker emphasizes the state of the jar more than someone's opening action.

All is fine if the verb is transitive. That is, there is no problem in using PP of a transitive verb as the modifier of a noun (nominal modifier), serving the function of an adjective. But can PP of an intransitive verb do so? The answer is sometimes but not always. We can say "an expired license", which is the same as "a license that has expired". The phrase "the disappeared man" seems to be acceptable, referring to the man that has disappeared, not necessarily implying that the man was forced to disappear by e.g. abduction. (The verb disappear does have the rare transitive sense of "to make vanish" according to Wiktionary, but we don't discuss it here.) On the other hand, we cannot say *"a come guest" (* means incorrect) and have to say "a guest that has come".

An interesting question is, How do we know when the PP of an intransitive verb can be used as an adjective or nominal modifier? I posted a question to the Facebook Linguistics group. One reader, apparently a linguist, referred me to the concept of "unaccusative verb". According to Wikipedia, "an unaccusative verb is an intransitive verb whose grammatical subject is not a semantic agent. In other words, it does not actively initiate, or is not actively responsible for, the action of the verb." Let me paraphrase. Just because a word (or phrase) is the grammatical subject in front of a verb doesn't always mean it actively (主动地) takes the action indicated by the verb. For example, "The window broke" doesn't mean the window wanted to break and therefore broke. It broke probably because someone broke it, or the bad weather caused it to break. This is different from "A guest comes" because the guest can walk and take action by himself and comes. Note that in linguistics, "accusative" refers to the relationship between the verb and its immediate action on its direct object; it has nothing to do with the action of accusing someone doing something bad, although "John accuses Jake" does have the accusative action in it ("accuses Jake").

The article goes on to say "[u]naccusative past participles can be used as nominal modifiers with active meaning", and gives a criterion to identify such verbs. For example, in the archaic sentence "He is fallen/come" (which means He, usually referring to Jesus, has fallen / come), because "is" instead of "has" is used, both "fall" and "come" are unaccusative. Well, obviously, in Modern English, only "a fallen tree", not *"a come visitor", makes sense. So I'm afraid we can only say some unaccusative past participles can be used as nominal modifiers or adjectives. The article lists 6 groups of unaccusative verbs given by Perlmutter (1978). But I don't think all are fit to be used as nominal modifiers. Specifically, I would say (a) and (c) won't work (e.g. *"the happened event"). In (f), only "survive" works.

For native English speakers, this is a non-issue because which intransitive verb can and which cannot be turned into PP and act as an adjective naturally comes to the mouth or pen (nowadays keyboard). For English learners, it may be more fruitful to just learn them by reading and listening than by studying the grammatical rule. Nevertheless, the linguists' effort to decipher the underlying grammatical rule is intriguing to the curious mind.

Monday, February 10, 2020

"self-driving" vs "self-driven", "self-limiting" vs "self-limited"

In English, the compound adjectives <NP>-<V>ing (noun or noun phrase followed by verb in its -ing form) and <NP>-<V>ed (noun or noun phrase followed by verb in its -ed or past participle-like form) imply different relationships between <NP> and <V>. Specifically, in the former case, <NP> is the object[note] of the action <V>, while in the latter, <NP> is the agent of <V>. For example, "man-made" implies that man makes (whatever follows), as in "a man-made satellite". If you were to say "man-making", it would denote something that makes man or a human!

But this analysis seems to break when the first element is the word "self". A Google exact phrase search for "self-driving car" currently returns about 6,980,000 results and a search for "self-driven car" returns about 540,000. While the latter -ed form is less than 10% of the -ing form, most articles appear to be written by native speakers, suggesting that both forms are accepted (but people may be subconsciously treating "self" as an object more than an agent?). After all, it makes sense because "self" means, well, self; there's no need to distinguish between agent and object.

The recent coronavirus causes pneumonia that is self-limited, according to China’s National Health Commission. So, let's check "self-limiting disease" vs. "self-limited disease", a term referring to a disease that runs its course without medical treatment (treatment may speed up the process, but that's a separate point). "Self-limiting" is slightly more popular than "self-limited", 118,000 vs. 105,000 on Google. Indeed, when the <NP> is "self", either the -ing or the -ed form of the verb is accepted.

_________
[note] A more technical term for "object" here is "patient", not in any way related to a sick person in a hospital.

Friday, September 27, 2019

What's special about English "until"/"till"?

英语介词until（till）即使不是在人类所有语言也至少是最普遍的十几种语言中非常独特的词：它具有一种语义上的状态反转。以下是我在Facebook语言学群中发的帖：

English "until"/"till" has a side effect of reversing the state. For example:
"The scientists had not found a solution to the problem until 1970."
(informally, "did not find")
It implies that the solution *was* found in 1970. I don't know if there's a linguistic term to describe this state reversal. But in many other languages (probably Spanish, German, Chinese, Hindi, Persian, etc.), the word generally translated as English "until" does not seem to have this implication. This lack affects the English writing of the people speaking those languages natively. For example, a Persian scientist wrote (mcijournal.com/article-1-62-en.pdf): "Microgravity has different effects on normal and cancer cells, but the related mechanisms are not well-known till now." He didn't mean to say the mechanisms are (finally) well known now; by "till now", he meant "so far" or "even as of today", a continuation of the state of "are not well known".
This implied reversal of state of English "until" seems to be more obvious if the sentence is negative.

I'd like to know
(1) whether there's a linguistic term for this semantic reversal of state implied by English "until";
(2) whether it's correct to list Spanish, German, Chinese, Hindi, Persian as the languages (maybe all languages except English?) in which a simple equivalent of English "until" does not have this semantic reversal.

(见facebook.com/groups/generallinguistics/permalink/10157494708249346/及讨论)

在此我提出这句英语
"The scientists had not found a solution to the problem until 1970."
暗示科学家们终于在1970年找到了对这个问题的解决方法，而该语句的西班牙语、德语、汉语、印地语、波斯语的直译不具有这个隐含意义。比如汉语“科学家们直到1970年没有找到对这个问题的解决方法”，一般会被理解为1970年仍然没有找到，即这个没有找到的状态持续存在，但在英语中却从没有找到反转为找到。该Facebook群的成员来自世界各地，大多数是语言学学者或学生，整个群能读懂的语言估计至少有二三十种吧，没有人对我列出的几种语言提出实质性的异议，还有人补充缺乏这种状态反转的语言，例如有人指出意大利语等。至于是否语言学中有无术语描述我暂命名为semantic reversal of state的现象，多人提出几个概念（presupposition、telicity等），但我认为都不能完全符合。因此，我暂时得出结论：英语的until或till是人类语言中罕见或唯一具有语义的状态反转属性的用于表达时间延续到某点的介词。
（顺便说一句，我与那位讲波斯语的伊朗生物学家email联系，确证了他本来的确是想说“the related mechanisms are not well known so far”。）

Wednesday, July 24, 2019

Word order flexibility

According to Wikipedia, about half of the world's languages take subject–object–verb (SOV) as the primary word order of a sentence, while one third follow the subject–verb–object (SVO) order. English and Chinese belong to the SVO category; e.g., in "She loves him" or "她爱他", if any of the three words or characters is re-positioned, the meaning of the sentence will be altered or be completely lost.

Recently I was reading the very entertaining tale of Phyllis and Aristotle. Legend has it that "Aristotle advised his pupil Alexander to avoid the king's seductive mistress, Phyllis, but was himself captivated by her. She agreed to ride him, on condition that she could play the role of dominatrix." (summarized by Wikipedia) On the Wikipedia page, the Old French verse that told this story ended with Aristotle excusing himself to Alexander, saying

Amour vainc tot, & tot vaincra
tant com li monde durera

with Modern English translation as "Love conquers all, and all shall conquer / As long as the world shall last".

English readers don't need to be fluent in French, much less Old French, to identify the French words corresponding to the English words; e.g. amour "love", vainc "conquers" (think of vanquish), tot "all" (think of total), etc. But what's troubling to me is that the second part of the first line, tot vaincra, is translated as "all shall conquer". The English word conquer is a transitive verb, i.e. it must be followed by an object. It took me a while to realize that "all shall conquer" actually means "(love) shall conquer all". The original author of the verse didn't write "& vaincra tot" simply because the inversion that places vaincra at the end makes it rhyme with the last word of the second line, durera ("last"). But an average English reader having no knowledge of French will have difficulty understanding "all shall conquer". So I edited the Wikipedia page to read "and shall conquer all". A few months later, someone disagreed and changed the translation back, saying it's poetic English.

I took this issue to a language forum and asked for people's opinions. As expected, most forum members agree with me. One even says he initially thought "all shall conquer" meant "all will fight back", which is a totally wrong interpretation. But one member, apparently a native Frenchman, disagreed with me and said the reader should adapt to the text of the author and the translator should respect the style of the author. Others disagreed with him, and my response was that "the adaptation should not go so far as to rendering the 'translated' text incomprehensible in the target language". I have no doubt that his mother tongue influences his appreciation of English speakers' low tolerance of flexible word order. If he were to translate the Old French verse into Chinese (suppose he knows some Chinese), the Chinese verse would probably read "爱征服一切，一切征服", the latter part of which likewise makes no sense to a native Chinese speaker.

In Romance languages such as French or Spanish, the primary word order is also SVO. However, occasionally we see sentences whose constituent is moved to a different position than the SVO rule would stipulate. (E.g. "Ont été reçus Pierre, Paul et Marie", possibly in response to "à Qui a été reçu ?") Native speakers are used to these sentence structures and can understand the meaning based on context and/or the idiomatic nature of such expressions. As far as I know, there is no metric or index in linguistics to measure the word order flexibility of a language. We know that highly inflected languages such as Latin and Russian have fairly flexible word order. But English and Chinese would be quite low on this metric, while various Romance languages are probably in the middle. Sentences such as "That I know", or "那个我知道", of an apparent OSV order, are exceptions, and their OVS variants, i.e. "That know I" and "那个知道我", are completely prohibited or meaningless, even though it may be understood in French in a certain context.

Thursday, February 28, 2019

Mutual intelligibility in writing only

When we are not sure whether to call a distinct form of language a language or a dialect of a language, we call it a language variety. To determine whether a language variety is a language or a dialect, linguists have proposed various criteria. Among them, mutual intelligibility may be the best known, in spite of some complications. One such complication is the separation between writing and verbal intelligibility as in the language varieties spoken in China; people in many regions of China may pronounce the same characters so differently that they can communicate with each other only by writing and not verbally. How do we define or measure mutual intelligibility in writing only (MIW hereinafter) in general? Here I describe a thought experiment that may serve as a starting point. Two people with high school or more education who natively speak language varieties A and B, respectively, but not both, are subject to a test. Each person reads 100 sentences in normal speed randomly selected from the entire corpus of Modern A and B, respectively, and the other person listens. (As an approximation to the entire corpus, take the Internet and book content indexed by Google as an example.) Each sentence is followed by 10 interpretations given in the language variety the other person understands, and he (she) chooses the correct one (10 choices instead of 4 or 5 just to reduce the random guessing correctness). Then repeat the test switching the two people along with their respective language variety. If >=50 sentences are correctly understood, A and B are excluded from MIW. If the result is <50, they are further subject to a test in which 100 sentences selected from the entire corpus are shown in writing. If >=90 sentences are correctly understood, we consider varieties A-B a case of MIW.

Thus, Sichuanese-Mandarin will be disqualified because they can be verbally communicated (and of course with written script). But Shanghainese-Mandarin, Hunanese-Mandarin, Shanghainese-Hunanese are good examples of MIW. Cantonese warrants more discussions. It's obvious that the Cantonese-Mandarin (or -Shanghainese etc.) pair has no verbal MI. There are grammar particles, pronouns and some common words unique to Cantonese. When a literate person who natively speaks Cantonese but has not learned the written Chinese in the way Chinese is taught in mainland China writes in Cantonese, can the writing be understood with >=90 correctness by one speaking Mandarin only? Suppose the content is absolutely randomly selected from the entire Cantonese corpus, and is not purely colloquial and definitely not contrived to contain a disproportionately high ratio of Cantonese-specific markers or characters. I don't know the answer, and an actual experiment is needed. One example of such a written script in Cantonese is a Wikipedia page. I personally don't know Cantonese and I may or may not be able to correctly answer 90 out of 100 questions in a reading comprehension test. Note that Cantonese is special in that many native Cantonese speakers do read Chinese text proficiently, although Mandarin or other Chinese dialect speakers don't read Cantonese text (such as that Wikipedia page), creating asymmetric intelligibility, which is quite common in the world. Thus, when these two people try to communicate by writing, the preferred script they choose will be Chinese, not Cantonese. In discussing MIW, we should define two levels, one only allowing the written script to be the textual representation of the spoken language (e.g. Cantonese text for Cantonese speech), the other allowing the two people to choose whatever their preferred script is. Technically, we should limit MIW to the first case.

According to Wikipedia, Icelandic-Faroese and German-Dutch are MIW pairs. Based on a posting in Facebook Linguistics group, the following are additional language varieties that are candidates for MIW:

* Scots-English, and many languages in south Asia with Sanskrit roots
* Swiss German-Standard German
* Hanoi Vietnamese-Southern Vietnamese (but highly disputed in the Sinosphere group whose members are mostly Vietnamese)
* Danish-some other Scandinavian languages

Note that I'm dealing with MIW between language varieties, a concept encompassing dialects within a language as well as languages, styles, registers, etc. While MIW within a language may not be limited to Chinese, it's probably safe to say the Chinese language has the most MIW pairs among its dialects, due to the dissociation between the pronunciation and the written form.

Most people inside mainland China consider Mandarin and Cantonese two dialects within the Chinese language, while many language lovers and polyglots outside of China consider them two distinct languages, calling the dialect view a political propaganda. The MI criterion strictly prohibits political influences and is meant to be scientific and linguistic. What I have proposed above is a supplement to the MI criterion because MI does not inherently restrict intelligibility to listening comprehension, leaving MIW as a valid option. Are Mandarin and Cantonese two dialects or two languages? The answer is, in terms of MI in listening, they are languages. But in terms of MIW, they may be dialects or languages, depending on the results of the proposed experiment.

Saturday, August 4, 2018

"below" is not an adjective

In a technical discussion forum about databases, someone posted an off-topic message: "to all Oracle staff, this phrase is not English: 'follow the below steps' How does this slip into Oracle Support tech note documents". Indeed, the English word "below" should not be used as if it was an adjective (see e.g. Wiktionary). But I've seen this incorrect usage for 20+ years especially in the IT industry. In the beginning, it mostly occurred in messages written by people with Indian-like names. Nowadays, Chinese or other ethnicities as well.

In any case, instead of saying "the below steps", we should say "the following steps", or "the steps below" (implying "located" before "below"). I'm guessing the adjectival usage of "below" is probably due to influence from the antonym "above", which *can* be used as an adjective as in "the above steps".

In light of the descriptivism vs. prescriptivism debate in which the latter has slowly lost ground in the past century, some people may argue that as more and more people start to use "below" as an adjective, this usage may eventually become accepted; after all, language evolves with the way it's spoken by the people. In fact, Merriam-Webster has already acknowledged this usage, after adverb, preposition, and noun. But for now, the majority of the native speakers and no other English dictionary consider this usage acceptable. It's wise to be standard-compliant and stop saying "the below steps".

(A good discussion is found on Daily Writing Tips.)

Friday, July 6, 2018

Basic Chinese Characters

I finally finished my little book Basic Chinese Characters. It contains 2500 commonly used Chinese characters selected by the Ministry of Education of China, with pinyin and definitions manually added by me. The book sorts the characters by frequency usage according to Google's estimate of occurrences of each character on the Internet (a method only I used and probably I invented). Some more descriptions of the book, plus sample pages, are at yong321.freeshell.org/bcc/. The book is available on Amazon as an e-book.

The book is in the format of character - pinyin (tones marked with numbers) - definition. For example,

1-99
二 er4 two
三 san1 three
四 si4 four
六 liu6 six
七 qi1 seven
零 ling2 zero
本 ben3 notebook; (measure word for books etc.); 本来(lai2) originally
日 ri4 sun
所 suo3 (function word, roughly “that which”); 所以 therefore; bureau
下 xia4 down, below; to go down
...
1200-1299
止 zhi3 to stop
脆 cui4 brittle, crispy
诞 dan4 birth
碍 ai4 blocking, hindrance
散 san4 to scatter, to disperse; scattered, loose (w.p. san3)
兽 shou4 beast
逝 shi4 to drift away; 逝世(shi4) to pass away, to die
猪 zhu1 pig
暂 zan4 temporary
腊 la4 preserved meat

Free offer If you as a reader of this blog are interested in this book, for a limited time, I can selectively offer this book for free on one condition and one wish. You must not share my book with anyone else. If your friend would like a copy, please have him or her contact me directly. But since there is no technical way to enforce this requirement, I can only trust you as on a verbal agreement. In addition to this requirement, I sincerely hope you can write an honest review and post it to the Amazon.com website, or if not feasible, on Goodreads. You can request a free copy by sending me an email at yong321@yahoo.com. It would be nice if you could tell me one or a few book reviews you previously wrote on Amazon.com.

Irrespective of any interest in the book, if you have any comments, suggestions, or corrections, please let me know. They are highly appreciated.

Monday, April 30, 2018

Ludwig Feuerbach and the End/Outcome of Classical German Philosophy

The 200th anniversary of the birth of Karl Marx (May 5, 1818 - March 14, 1883) is coming soon. This great thinker is one of the very few that have had profound influence over human history. His numerous works, along with those of his close friend and also great thinker, Friedrich Engels, have been translated into dozens of languages and meticulously studied around the world. This short posting is about one single word in the title of Engels' book, Ludwig Feuerbach and the End of Classical German Philosophy.

In the 1980's, I read about disagreement with the Chinese rendering of the word, “终结” (literally "end", "termination"), in a Chinese article. If my memory serves me right, the author of the note was 朱光潛, a renowned scholar and philosopher in China. He argues that, as the original German title "Ludwig Feuerbach und der Ausgang der klassischen deutschen Philosophie" uses the word "Ausgang", literally "exit" or "outcome", there's no reason to change it to "end" in English or “终结” in Chinese, which is obviously different in meaning. Since both Wikipedia and Marxists.org use the word "end" in English and only a small number of websites on the Internet use the word "outcome", I had some email exchanges with a knowledgeable volunteer on Marxists.org, Ben, partly duplicated as follows:

Ben:
it always struck me as strange that this has always been translated as "end" - maybe it was a result of a certain "Stalino-Hegelian" teleology, which infected the movement in the 20th century? 'Ausgang' would probably be better translated as 'denouement' (as in a novel or play) or, as you suggest, "outcome".
Communist greetings

Me:
If it was the result of "Stalino-Hegelian" teleology, why would scholars in the English world be affected, as would the Russian and Chinese translators, which is understandable? British or American translators don't need to go through Russian and Chinese sources to do the German-to-English translation.

Ben:
I think it is worth bearing in mind that the project of translating Marx and Engels into English was also overseen by mainly Soviet funds and Soviet-type scholars. I am not suggesting that they have not done an outstanding job (of course they have!) but am merely pointing out that ideology and outlook cannot *but* find reflection in translation work and rendering somebody's thoughts into another language.
Communist greetings

Me:
I wanted to confirm that Russian translators were responsible for the popular English translation "end" but couldn't find definitive evidence. According to the translation by Foreign Language Press in Beijing, this 1976 English translation in China is based on the 1951 edition by Foreign Languages Publishing House in Moscow. Then I found an earlier one, published in 1946, Ludwig Feuerbach and the End of Classical German Philosophy, by Progress Publishers, which according to Wikipedia "was a Moscow-based Soviet publisher founded in 1931. It was noted for its English-language editions of books on Marxism-Leninism".

As we can see, in the English translation as early as 1946, the Moscow edition already used "end" for "Ausgang", as if Engels was announcing the death of classical German philosophy. A good description is in fact given by the last link, i.e. "Engels considered this something of a summation or closure of the post-Hegelian criticism Marx and he had initiated in The German Ideology 43 years before." Note that the words "summation", "closure", although not literally matching "Ausgang", are a good paraphrase of it.

The Wikipedia page also gives the title translation in other languages. French uses "fin", Portuguese "fim", Japanese "終結", and Russian "конец", all meaning "end". It's a small surprise that all these semi-official translations in various languages somewhat deviate from the German original.

Sunday, March 11, 2018

A few word-play jokes

First, a translation of a poem (ci-poem to be exact) by Ms. Li Qingzhao (李清照, 1084 – ca 1155/1156), a poet at the turn of the Northern-to-Southern Song dynasty.

李清照《永遇乐·落日熔金》
落日熔金，Sunset of molten gold
暮云合壁，Evening clouds of enclosing jade
人在何处。 Where am I standing?
染柳烟浓， Mist coloring the willows thickens
吹梅笛怨， Flute plays “The plum of melancholy”
春意知几许。 How's the springtime coming?
元宵佳节， The joyous Festival of Lantern
融和天气， in this clement weather
次第岂无风雨。 “Will it not be windy and rainy soon?”
来相召、香车宝马，谢他酒朋诗侣。 “Sorry”, said I to my wine-and-poetry friends, who came to invite me for an outing, in their fragrant BMW

Second, a list of words offered to "improve" English vocabulary, with a caution to the readers when I posted it to Weibo. And the "facts" stated therein are not to be trusted.

English vocabulary (non-)study
英语词汇的（非）学习
Learners of limited vocabulary should wear gas masks to avoid poisoning.
词汇有限的学习者须戴防毒面具

* infantry:
In the mid-20th century, the first public child care facility in the US was established in the suburb of Chicago, Jenkins Infantry, named after the owner Mary Jenkins.

* indefatigable
At the end of the 3-month clinical trial, 35% of the volunteers presented no change in either the body-mass index or the normalized adipose quantity. These indefatigable participants were advised to join a more aggressive weight watch program.

* bruxiathesaurus
A group of international paleontologists recently discovered never-seen-before dinosaur fossils, tentatively named bruxiathesaurus, on the evidence that these creatures apparently would grind their teeth while sleeping. Bruxia or bruxism, grinding or clenching teeth at night, is common among homo sapiens. This is the first time dinosaurs are found to have this behavior.

* infarction
Some patients with irritable bowel syndrome (IBS) try to “hold in” flatulence. There is no controlled study on either any benefit or harm done by this practice of infarction.

Sunday, February 4, 2018

The Multilingual Idioms List

Linguaholic created a crowdsourcing project, The Multilingual Idioms List. I think two things are new in this project.

As far as I know, there was never a dictionary that pairs idioms and only idioms from different languages. It's true that numerous dictionaries of idioms for a specific language have been published. The explanations or definitions of the idioms may be in the same language as the idioms, or in a different language. When they are in a different language (called target language for the sake of argument), more often than not a matching idiom in the target language cannot be found, and a wordy explanation is provided. The Multilingual Idioms List project handles this situation differently: leaving the entry blank on the target language side. This is actually a good thing. It either positively acknowledges such lack, or catches readers' attention and waits for other native speakers to find a good idiom in later times.
The List is multilingual, not limited to two languages. Unlike any published dictionary of idioms where the source and target languages differ, the contributors, or in a sense lexicographers, of the crowdsourcing List are not language professionals. This is not a big problem since the List is not a highly technical dictionary. The big advantage, on the other hand, is that the contributors are almost all native speakers. This is significant because good or even correct usage of idioms is very much dependent on real life experience in the language environment. Being native may be more relevant to this project than being professional if being both is not possible.

Today, I made a small contribution to the List, by adding the column Chinese (since no one before me had done that), and providing a dozen or so idioms, as follows:

a bitter pill	不得不吞的苦果
a piece of cake	小菜一碟
Achilles' heel	软肋
add insults to injury	雪上加霜；往伤口上撒盐
an arm and a leg	倾家荡产
beat around the bush	拐弯抹角
best of both worlds	两全其美
bite the bullet	硬着头皮上
burn the midnight oil	开夜车
cast in stone	板上定钉
cat nap	打个盹儿
from A to Z	从头到尾
from scratch	从零开始
have eyes in the back of one's head	眼观四路，耳听八方
hit the road	上路
let the cat out of the bag	抖包袱
kick the bucket	见阎王
off the hook	如释重负

In Chinese, there are different types of idioms. 成语 (literally probably "solidified or invariable phrases") are more formal and literary, mostly of four characters, such as "自相矛盾" ("self-contradictory"), "纸上谈兵" ("talk of military strategy (only) on paper"). 歇后语 (literally "sentences said after taking a rest") are colloquial proverbs, such as "和尚打伞，无法无天" ("A monk holds up an umbrella. No hair|law. No sky.", or "The dharma is obscured and heaven blocked."). Obviously some idioms are in neither category, and yet are expressions that cannot be literally interpreted, such as "硬着头皮上", literally "go ahead with hardened scalp", which I consider matching "bite the bullet" in English.

I can think of one improvement that may be made on the current List. It would be nice to provide a place to enter the literal translation of an idiom and optionally a brief explanation. For instance, I would love to add that the Chinese idiom "软肋" for "Achilles' heel" literally means "soft rib" because the rib bone is relatively weak and fragile, and that "雪上加霜" for "add insults to injury" literally means "add frost on top of snow", a phrase that may not need an explanation. With these additions, the List would be more fun to read. So for instance, we'll know that instead of "beat around the bush", the Chinese "make turns and scratch corners" ("拐弯抹角"), and the French "turn around the pot" ("tourner autour du pot") instead. While English-speaking people consider Greek a difficult language ("It's all Greek to me!"), the Chinese language is regarded by by far the most other peoples; "Chinese" occurs 24 times out of about 100, compared to 12 for "Greek", on the Wikipedia page for Greek to me. Through this List, we know a little more about different cultures. But technical limitation for the List is understandable; it is in the format of a spreadsheet, where adding two more columns (literal meaning and explanation) for each language would make the list too hard to read. Other options include adding comments to the spreadsheet cell, where the comments are not shown unless the mouse is over the cell.

Overall, this is a great project. I hope they'll set up a Wikipedia page, with versions in many different languages contributed by the same volunteers that build the List.

Saturday, November 11, 2017

Chinese translation of a poem by Kahlil Gibran

Kahlil Gibran (1883 – 1931) was an accomplished Lebanese poet. His well-known poem On Children

Your children are not your children.
They are the sons and daughters of Life's longing for itself.
They come through you but not from you,
And though they are with you, yet they belong not to you.

has been translated into Chinese as follows:

你们的孩子，都不是你们的孩子
乃是生命为自己所渴望的儿女。
他们是借你们而来，却不是从你们而来
他们虽和你们同在，却不属于你们。

or in another version:

你的儿女，其实不是你的儿女。
他们是生命对于自身渴望而诞生的孩子。
他们借助你来这世界，却非因你而来，
他们在你身旁，却并不属于你。

The second line, plainly paraphrased, means that the children are the offspring or outcome of the longing of Life for itself. Here Life acts as an entity as if it exists in space and time. It tries to find itself, and in the process, are born the children who appear to belong to you, the addressee of the author. The Chinese rendering of this abstract description, "生命为自己所渴望的儿女", is a grammatically perplexing one. Let's build up from the basics. "他所渴望的是工作" is "What he longs for is a job". Based on that model, "自己所渴望的" must mean "what (someone/something) he/she/it-self longs for", or here specifically, "what (something) itself longs for". (I added "someone" or "something" solely to work around the problem that the word he/she/it-self alone cannot stand alone.) Now, if we substitute Life for this something, therefore, "what Life itself longs for" or "生命自己所渴望的" in Chinese, that doesn't match the original meaning; the author intends to say the children are the outcome of the longing, not of what Life longs for. Life longs for itself and this longing process begets the children. Unfortunately, the translation "生命为自己所渴望的儿女" is not saying the same thing, either. In fact, it says something a native Chinese speaker has trouble understanding. I can't even think of a good literal translation of this ambiguous and possibly ungrammatical phrase. In contrast, the second translation, "他们是生命对于自身渴望而诞生的孩子" is a good one, thanks to the extra word "诞生" added by the translator. Literally it says "They are the children born out of Life's longing for itself", which is remarkably close to Gibran's original.

The third line is deceivingly simple. What does the author exactly mean by "through you but not from you"? The first Chinese translation, "他们是借你们而来，却不是从你们而来", uses "借" (v. "to borrow"; prep. "with the help of") for "through", and "从" for "from". The second translation, "他们借助你来这世界，却非因你而来", uses "借助" ("with the help of") for "through", and "因" ("because", "because of", "due to") for "from". Both translations interpret "through you" as "with the help of you". The first literally renders "from", while the second changes it to "because of". I checked the translations of this line into a few other languages. For example
Spanish: Vienen a través vuestro, pero no de vosotros.
French: Ils viennent à travers vous mais non de vous.
German: Sie kommen durch dich, aber nicht von dir.
Italian: Tu li metti al mondo, ma non li crei.
Only the Italian version does not literally translate the prepositions "through" and "from" in the original poem. Instead, the sentence means, plainly put, "You put them into the world, but do not create them."

The Italian rendering, in my opinion, has gone a little too far from the author's possibly deliberate wording that borders on mischievous play of words. Similarly, the Chinese translations, which change the author's "through" to "with the help of" and (in one case) "from" to "because of", would be frowned upon by the author. We know that unlike scholarly translation which should be literal, some or even a great deal of flexibility is allowed in translation of literary especially poetic works. But the Spanish, French and German translations I found all stubbornly stick to the literal mapping of the two prepositions. My take on this is that if the original poem can be understood in its original language and also in the translated language with literal translation, no word change should be made, and I believe that is exactly the case here. We can make sense of "They come through you but not from you" if we use a good analogy. Imagine the scene in which bright sunlight shines through the window and comes into the room. This sunlight (the children in Gibran's poem) comes through the window glass (you) and yet it is not truly from the window or glass, but from the sun. In this interpretation, the light travels literally through the glass, without the help of the glass (contrary to both Chinese interpretations), without the glass somehow putting the light down into the room (contrary to the Italian interpretation), and having no cause-and-effect relation with the glass (contrary to the second Chinese translation). The light belongs to the sun because the sun created it. The light can come into the room simply because only the window out of the whole external wall is transparent. Gibran's "through you but not from you", when likened to "through the window glass but not from the glass", is a clever play of the prepositions and yet makes perfect sense. There is no need to replace them unless misunderstood. The best Chinese translation may simply be a literal one, "他们通过你而来，却不是从你而来". If needed, a translator's note can be provided to help the reader. Anything else will likely tarnish the beauty of this line.

Thursday, September 14, 2017

Language difficulty

Chinese has been widely considered to be one of the most difficult languages in the world. What constitutes the difficulty of a language? Can it be measured and how? Whenever someone posts a message about language difficulty on a forum, it almost always generates a heated discussion. Comments range from "English is the easiest because the verbs have minimum conjugations and nouns have no gender", "Chinese and Japanese are hard because there're too many characters or kanji's", to "No language is inherently more difficult than any other because native speakers grow up speaking it with about the same effort", and "Language difficulty is subjective perception", to name a few.

Most language enthusiasts on various forums are not scholars. The diversity of those opinions is a result of no good definition of language difficulty. But we can tell that most people are referring to the difficulty experienced by an adult (not a young child) in learning a foreign language (not mother tongue), and in many cases the adult's native language is English. If we qualify the discussion with these requirements, i.e.

the learner is an adult;
the language whose difficulty is evaluated is learned by the adult as a foreign language;
the difficulty is evaluated when the adult's native language is specified

then a measurement of language difficulty becomes meaningful.

I believe that in many social sciences, there are two general methods to measure a quantity, internal and external. For example, in linguistics, a researcher can define a set of factors pertinent to the correlation between orthography (spelling) and pronunciation in order to calculate the orthographic depth of a language, i.e. "the degree to which a written language deviates from simple one-to-one letter-phoneme correspondence". Alternatively, one can simply conduct a controlled study among a group of people and see which language causes how many spelling errors in dictation or in a similar experiment.

When it comes to rating language difficulty, we can devise a set of rules and individually assess each language against these rules and then sum the rule ratings (with weights); e.g., percentage of words that have cognates or loan relationships with the words in the learner's native language, whether the nouns have genders and cases, how many variations in verb conjugation, whether the dominant word order differs from that of his native language, etc. For lack of a better term, we may call this an internal evaluation.

The external evaluation, on the other hand, has been done and is widely quoted. The most well-known data for English native speakers are from Defense Language Institute of the US, where they statistically measure the time for the learners to take in achieving a certain language proficiency level. The official Web page for this study is https://www.ausa.org/articles/dlis-language-guidelines, duplicated below for your convenience.

Category I languages, 26-week courses, include Spanish, French, Italian and Portuguese.
Category II, 35 weeks, includes German and Indonesian
Category III, 48 weeks, includes Dari, Persian Farsi, Russian, Uzbek, Hindi, Urdu, Hebrew, Thai, Serbian Croatian, Tagalog, Turkish, Sorani and Kurmanji
Category IV, 64 weeks, includes Arabic, Chinese Mandarin, Korean, Japanese and Pashto

The earliest version of this data was on a Webpage of Dr. William Baxter of the University of Michigan, which he got "from documents I got at a workshop of some kind" (private email). But Dr. Baxter later removed it from his Website, so you have to reference it from archive.org, duplicated below.

	Languages included (Languages regularly offered at the University of Michigan are in capital letters; this is NOT a complete list)	Hours of instruction required for a student with average language aptitude to reach level-2 speaking proficiency	Speaking proficiency level expected of a student with superior language aptitude, after 720 hours of instruction
GROUP I	Afrikaans, Danish, DUTCH, FRENCH, Haitian Creole, ITALIAN, Norwegian, PORTUGUESE, Romanian, SPANISH, Swahili, SWEDISH	480	3
GROUP II	Bulgarian, Dari, FARSI (PERSIAN), GERMAN, (Modern) Greek, HINDI-URDU, INDONESIAN, Malay	720	2+ / 3
GROUP III	Amharic, Bengali, Burmese, CZECH, Finnish, (MODERN) HEBREW, Hungarian, Khmer (Cambodian), Lao, Nepali, PILIPINO (TAGALOG), POLISH, RUSSIAN, SERBO-CROATIAN, Sinhala, THAI, TAMIL, TURKISH, VIETNAMESE	720	2 / 2+
GROUP IV	ARABIC, CHINESE, JAPANESE, KOREAN	1320	1+

That data differs from DLI's current data in not a small way. I had some email exchanges with DLI but they didn't explain these discrepancies.

[Update 2018-04]
Dr. Robert Marzari, the author of Leichtes Englisch, schwieriges Französisch, kompliziertes Russisch ("Easy English, difficult French, complicated Russian"), kindly sent me a summary of the result of his research and granted me permission to post it here.

In my book I tried to evaluate the difficulty of seven European languages (English, French, Spanish, Italian, Russian, Polish - and German) for a German speaking learner; for the evaluation of the German language I imagined a Romance speaker, i.e. a mixture of a French, Italian and Spanish speaker. The results of the evaluation therefore do not show absolute degrees of complexity, but rather relative degrees of difficultness, i.e. relative to a German or Romance speaker.
If you could get hold of my book (at a University library perhaps?) just take a look at the charts on pages 269 to 275: On these charts I give the results of my evaluation of those seven languages according to the linguistic subsystems of phonetics, writing system, grammar, lexicon and textual structurization (i.e. reading difficulty).
According to these the degree of a learner's difficulty is as follows:

     active competence  passive competence  complete competence
     (speaking+writing)     (reading)
Spanish   29 points         11 points          40 points
English   33 points         13 points          46 points
Italian   35 points         13 points          48 points
French    43 points         10 points          53 points
Russian   51 points         15 points          66 points
German    50 points         18 points          68 points
Polish    54 points         16 points          70 points

This excellent research indicates that a German native speaker rates language difficulty as Spanish < English < Italian < French < Russian < Polish, which is remarkably consistent with many polyglots' experience, although reading has a slightly different order. Apparently this research uses an internal evaluation (see above for a description), rating various aspects of a language instead of checking students' learning challenges. Thus, placing German in this language list makes sense even though the German learners speak a different native language, a Romance language instead of German.

Unfortunately, I'm not aware of any other research on this topic. But as you can already see, an otherwise hot topic can be made cool by the above analysis, cool as opposed to hot or debatable, and cool in the sense of being interesting.

Monday, July 10, 2017

Tian Ji's horse racing and the electoral vote system

[The following was written on November 11, 2016.]

The author of the famous military strategy book The Art of War, Sun Wu, commonly known as Sun Tzu, had a descendent, Sun Bin, who also wrote a book with the same title. In ca. 340 BC, Sun Bin advised his patron Tian Ji at a horse racing event and won the race. The following is the excerpt from Sima Qian's Records of the Grand Historian about this interesting story:

齐使者如梁，孙膑以刑徒阴见，说齐使。齐使以为奇，窃载与之齐。齐将田忌善而客待之。忌数与齐诸公子驰逐重射。孙子见其马足不甚相远，马有上、中、下、辈。于是孙子谓田忌曰：“君弟重射，臣能令君胜。”田忌信然之，与王及诸公子逐射千金。及临质，孙子曰：“今以君之下驷与彼上驷，取君上驷与彼中驷，取君中驷与彼下驷。”既驰三辈毕，而田忌一不胜而再胜，卒得王千金。于是忌进孙子于威王。威王问兵法，遂以为师。
(The ambassador of the Qi state went to the Liang state. Sun Bin as a convicted criminal went to visit and talk to him secretly. The Qi ambassador regarded Sun as valuable and carried him back to Qi. Tian Ji, the Qi general, gave him a warm reception. Ji and some princes often betted heavily on horse racing. Mr. Sun saw that all the horses were about equally capable, rated superior, average, and inferior. So Sun advised Tian Ji, "Sir, you just bet heavily. I'll make you win." Tian Ji trusted him and betted a thounsand units of gold with the king and the princes. Right before the race, Mr. Sun said, "Use your inferior horse to race with his best horse, use your average horse to race with his inferior horse, and use your best horse to race with his average horse." After three rounds, Tian Ji lost one and won two of the three rounds, and carried away one thousand units of gold. Then Ji recommended Mr. Sun to the King Wei, who interviewed Sun on military tactics and assigned him as the Chief of Staff.)

Fast forward to 2016. We see that the electoral vote in the US presidential race matters while the popular vote does not and that the two votes mathematically represent two different winners in this 2016 presidential race. Although neither Hillary Clinton nor Donald Trump can move her or his supporters from one state to another, there is similarity between the electoral vote system and Tian Ji's winning strategy. If democracy is the name given to the principle of the minority obeying the majority, the popular vote is the only true democracy. (As of this writing, Clinton has won 60,274,974 popular votes, while Trump has won 59,937,338.)

The reasons for some people to decide to not vote are (A) equal dislike of the candidates; (B) lack of interest in politics; (C) living in a non-swing state, one person's vote matters little. Group C may be small. But it's the only one out of the three that would make a difference if the American electoral vote system were abolished or even mitigated (by adjusting the weights i.e. the electors assigned to different states, e.g.). If that happened, swing states would have lower voter turnout and non-swing states would have higher. But since there're fewer swing states than non-swing states, the total popular vote count would be higher.

Sunday, April 16, 2017

自由: "freedom" or "liberty"?

A Chinese reader asked me about the difference between "freedom" and "liberty" when translating Chinese "自由" into English. We can find many answers with a Google search for "difference between freedom and liberty". One article maintains that "Freedom is a state of being capable of making decisions without external control", while liberty "is freedom which has been granted to a people by an external control". And some like this laboriously attempt to make a clear distinction between these two words.

Having read a handful of such answers but not satisfied with any of these, I told the person asking me the question: 1. the etymology of the two words differs; 2. in general usage, "liberty" is more abstract and philosophical than "freedom". Other than these two points, there is no difference, but in different contexts, only one of the two words is more common. For example, nowadays we say "freedom of speech", not "liberty of speech". (But see the ngram figure in Appendix 1.) We say "Liberty, Equality, Fraternity", not "Freedom, Equality, Fraternity". These set phrases are by convention, just as in Chinese idiom "破釜沉舟" ("cut off all means of retreat", "decide to fight to death"), not "破釜沉船", even though "舟" and "船" are completely synonymous.

Making distinctions between words is so intriguing that someone has even built a Web site www.differencebetween.net dedicated to this task. Language professionals and general public alike are fond of writing articles on these topics. While many such articles are valuable contributions to the correct usage in English, there is one common deficiency not fully recognized: the judges are the native speakers of the language, not linguists or scholars. An age-old debate among lexicographers is relevant here: Should a dictionary be prescriptive, directing people toward correct or supposedly correct usage, or be descriptive, faithfully documenting the actual usage in the native speaker community? Nowadays there may be more dictionaries in the latter category, presumably consistent with the increased level of public education. In the case of "freedom" vs. "liberty", if enough people, not English-as-a-foreign-language learners but native speakers, ask the question about their difference, the very fact that they ask this is a sign that the distinction, if there is a theoretical one, hardly exists in practice. Instead of making a great effort to separate them, it would be better to acknowledge, in modesty, the lack of difference between them.

________________________

Appendix 1

This figure is the Google ngram showing the historical usage of "freedom of speech" and "liberty of speech". We can see that from the mid-19th century on, "freedom of speech" has significantly gained in usage over "liberty of speech". But before that time, it only had slightly higher usage frequency.

Appendix 2

Some Weibo users gave me a few helpful pointers on this topic. One user informed me that political theorist and philosopher Isaiah Berlin's Four Essays on Liberty used "freedom" and "liberty" interchangeably. Two other users directed me to political scientist Hanna Pitkin's Are Freedom and Liberty Twins? According to Pitkin, most people don't make a distinction between these two terms, but Hannah Arendt is an exception. However, the author questioned Arendt's distinction from the point of view of political science as well as etymology (see the bottom of p.6 and p.9 of the article).

Appendix 3

The prescriptive-descriptive dichotomy, however, only applies to everyday language usage. In academic fields, especially of science and technology, but to some extent, of social sciences and humanities as well, the "prescriptive" approach should be supported, in accordance with the principle of division of linguistic labor as proposed by the philosopher Hilary Putnam. Take osteoarthritis as an example. An educated English speaker would think this meant inflammation (-itis) of bone (osteo-) joint (-arthr-). But it is not. Then, should the distinction between "freedom" and "liberty", if non-existing in practice, be made in the academic circle as two different terms in social sciences or humanities, followed by educative admonition to the public about the research outcome? Scholars have the freedom of research and can make any distinction between any pair of words in their research. In fact, social scientists and particularly philosophers habitually do that. As to whether the distinction should be imposed to the public, No!

Monday, January 9, 2017

Comparison of Chinese and Western Etymology

In my last post, I said "Most languages in the world take the alphabetic writing system. Studying the internal history of its vocabulary primarily means analyzing phonological and morphological changes through time." In this post,[note1] I'll expand on that point and contrast that with the Chinese tradition.

Take the word language as an example. In English, we read

late 13c., langage "words, what is said, conversation, talk," from Old French langage "speech, words, oratory; a tribe, people, nation" (12c.), from Vulgar Latin *linguaticum, from Latin lingua "tongue," also "speech, language," from PIE *dnghu- "tongue" (see tongue (n.)).
The -u- is an Anglo-French insertion (see gu-); it was not originally pronounced. Meaning "manner of expression" (vulgar language, etc.) is from c. 1300. ...
Source: Online Etymology Dictionary

In Spanish, we have

idioma m. language. [LL. idiōma: id. <Gk. idiōma: peculiarity (as lang.) <idiousthai: to make one's own <idios. See idio-.]; idiomático,ca a. idiomatic. [Gr. idiōmatikos: particular.]
Source: A Comprehensive Etymological Dictionary of the Spanish Language with Families of Words based on Indo-European Roots by Edward A. Roberts, 2014.

And most importantly, in French, we have

LANGUE, sf. a tongue; formerly lengue, from L. lingua. For in=en=an see § 71, and Hist. Gram. p. 48. — Der. langage, languette.
Source: An Etymological Dictionary of the French Language by Auguste Brachet, 1882.

The reason for my praise "most importantly" is that Auguste Brachet, the "romanistischer Autodidakt"-turned-professor according to (German) Wikipedia, created a monumental masterpiece in not just French etymology but etymology in general. In addition to what a regular etymologist would do, such as tracing the word form to its etymons in the same or other languages, Mr. Brachet systematically summarized the rules of the morphological and phonological changes and applied them to individual words in his dictionary. In the said example, he noted that for the derivation of in < en < an in the development of Latin lingua to French langue, the reader can consult his rule 71 in the book, where he says

I in Latin position [i.e. "when followed in the Latin word by two consonants" according to him, a convention not exactly the same as adopted today; my note] is changed to e in Merovingian Latin: thus fermum, ..., for firmum, ...' and this e, pronounced ei (see § 66), has produced two distinct French forms, according as it has preferred the open è sound, or the i sound.

You can choose to follow up to rule 66 in this book and p.48 of his A Historical Grammar of the French Tongue for more information about these sound (phonological) as well as spelling (orthographic) changes.

Western etymological publications may be divided into two groups: (1) dictionaries that give etymons or source words; (2) scholarly books and research articles on phonological and orthographic changes over time. Mr. Brachet's dictionary is unique in that it merges the two into one, so that the reader is conveniently offered the explanation of sound changes right in the headword entry, obviating the need to research as to why, e.g., the first i in *linguaticum would change to a in the history of the English word language.

However, a word contains more than its sound and spelling, but its meaning as well, which etymology cannot avoid tracing. But as linguist Calvert Watkins warned us, it is "more hazardous to attempt to reconstruct meaning than to reconstruct linguistic form". Sense development is much less researched and also less described in dictionaries. Unlike phonology, semantics or the study of meanings of words is not easily subject to formal (as in "formal logic") structural analysis. And yet tracing the sense development is the primary task of Chinese etymology. Chinese phonological development is a separate field of study; it is not incorporated in etymology, because the meaning of Chinese characters (or words, whose meanings are almost always based on the component characters) is largely dissociated from the sound. Take the character 文 ("text") as an example.

Source: 谢光辉《汉语字源字典》, 北京大学出版社, 2000年, 29页
Translation of the embedded text: "文" is a pictographic character. "文" in oracle bone script (甲骨文) and bronze inscription script (金文) resembles a standing person facing forward. His chest bears tattoo of decorative patterns. This is in fact a vivid description of the ancient "文身" (tattoo) custom. Thus the original meaning of "文" was a person with tattoo on his body, as well as pattern, texture. Later, the meaning was extended to character, article, culture, civilization etc.

That was a typical entry of Chinese character etymology. For simple characters especially pictographic ones, it is simply pure 依类象形 or description of the object according to what it looks like. The focus is on the meaning, not the reading or sound. Some more complicated characters may be decomposed into elements each of which is analyzed the same way, as in the case of "秦" (see my last post).

Needless to say, the majority of the characters (at least 80%) are of the type 形声字 or characters of form and sound, such as "指" (finger; to point), where the form radical "扌" suggests the meaning, i.e. something related to hand, and the sound component "旨" suggests its reading , i.e. zhǐ. The classical Shuowen Jiezi (说文解字) dictionary, unsurprisingly, points out that this character "从手旨聲" (the meaning is based on "手" and the sound on "旨").

Similarities and differences between Chinese and Western etymology can also be revealed from the definition of the word etymology itself. The Webster dictionary defines it as "the history of a linguistic form (as a word) shown (1) by tracing its development since its earliest recorded occurrence in the language where it is found, (2) by tracing its transmission from one language to another, (3) by analyzing it into its component parts, (4) by identifying its cognates in other languages, or (5) by tracing it and its cognates to a common ancestral form in an ancestral language" (I added the parenthesized numbers). Thus we see that most western dictionaries with etymological information meet the requirements (1) and (2), sometimes (3). Wiktionary and Friedrich Kluge's An Etymological Dictionary of the German Language also meet (4) and (5) most of the time. What if we apply these requirements to Chinese character etymology? (1) is often met if we interpret it as finding the first occurrence in history, which nowadays is made drastically easy with the aid of a computer-based search. But tracing its development in the course of long history, either inside Chinese or (2) across different languages, is rarely done. (3) is done, though with significant differences from that in western languages. (4) and (5) are rare because they're mostly irrelevant to Chinese characters.

How is analyzing a Chinese character into its components special compared to the western tradition? While a character e.g. "指" can be analyzed into "扌" (for meaning) and "旨" (for sound), there is no systematic change of a component from one form to another. Take rule 126, one of the many summarized by Mr. Brachet for French, as an example, "Before a, initial c ... passes through the successive aspirated sounds k'h, tk'h, kch, ch." He supports this rule of ca- > ch- with about 80 words as evidence, champ < campus, chien < canis, etc. Can we construct an analogy of this rule and find supporting examples in Chinese etymology? Since Chinese does not use an alphabetic writing system, there's hardly any need in dealing with the sound change of a character in etymology. Instead, we may substitute the change in form of a character. For example, after studying the 金文, 小篆, and 楷体 forms of "指" and other characters with "扌" on the left side, we may conclude that all (or most) such characters have gone through the predictable change of this radical in these forms, just as the French ca- changed to ch-. Similarly, all or most characters with "旨" on the right side probably went through the same change as shown here (see the row for 字源演变). Thus, we find in etymological studies a parallel between Chinese and western languages in identifying common component change in characters or words.

However, Chinese etymological dictionaries are also interested in finding the "root cause" of the most basic characters. Because the characters are ultimately from pictographs in origin, this "root cause" finding is mostly "依类象形" (describing the object according to what it looks like). If we must find a parallel for this practice in western etymology, it is equivalent to answering the question why e.g. the Proto-Indo-European stem from which Modern English word word is ultimately derived is *were-, that is, why that sound. Obviously, except for some onomatopoeias, there is no answer, or no such research. While Chinese etymologists have forged ahead in that direction, so far this "research" is, I'm afraid, very much based on guess work, simply because there is no record left in history about why a specific character was invented to be of that form. "文" may indeed be a symbol for a person with tattoo, with no hard proof anyway. But this is too error-prone. In my last post, I quoted the article 许慎为何将象释成母猴——“为”字趣释 (Why did Xu Shen interpret an elephant as a female monkey: interesting interpretation of character "为"). In a recent weibo blog post, a scholar interpreted, purely based on its resemblance, "夷" in its original oracle bone script as a person squatting, while in 《汉语字源字典》 (Dictionary of Chinese Character Etymology) by another scholar in this field, it was thought to represent a man bound by ropes, to be served as a slave or for sacrifice. On this stretch of imagination, I have but one comment: "汉字字源，看图识字，见仁见智" (Chinese character etymology / Look at pictures and learn to be literate / Trust your opinions and beliefs).

________________________
[note1] Due to the unique nature of the Chinese language, etymology can be of characters as well as words. This post is about character etymology.

Sunday, September 4, 2016

Why is it rare to see Chinese etymology?

People speaking English as the native language are used to dictionaries in which each headword contains not only the definition of the word and example phrases or sentences, but also brief etymology, as in this example in the Merriam-Webster dictionary for the word word.

Middle English, from Old English; akin to Old High German wort word, Latin verbum, Greek eirein to say, speak, Hittite weriya- to call, name
First Known Use: before 12th century

A Chinese dictionary, on the other hand, almost never gives the etymology. In this blog posting, I'll try to explain why.

For the sake of discussion, we need to make a distinction between two types of Chinese dictionaries. Due to the nature of the Chinese language, the English word dictionary (or its equivalent in most other languages) can mean either "字典" (literally "character-dictionary") or "词典" also written as "辞典" (literally "word-dictionary") in Chinese. I have not seen a dictionary for general Chinese words published by anyone that contains etymological information for the headwords.[note1] Thereinafter, a Chinese etymological dictionary only refers to a character-dictionary.

The disappointment at lack of an etymological dictionary of Chinese words does not extend to that for a dictionary of Chinese characters or 字典. Back in the Eastern Han dynasty (25–220 AD), the scholar Xu Shen (c. 58 – c. 147 CE) wrote the monumental dictionary Shuowen Jiezi (literally "Explaining Graphs and Analyzing Characters" according to Wikipedia). Since Xu lived in a period only one thousand or less years after a large number of Chinese characters were invented, the etymology he gave in the book for each of the 9000 plus characters is mostly trustworthy. Take the character "秦" (qín) as an example.[note2]

伯益之後所封國。地宜禾。从禾，舂省。一曰秦，禾名。𥠼，籒文秦从秝。匠鄰切
(The fief given to the descendant of Boyi. The land is suitable for crops. The character has a meaning based on "禾" ("crop") and contains an abbreviation or syncope of the character "舂". Another theory claims that this character is the name of a crop. This character in Zhouwen script [a script used just before the time of the First Emperor], "𥠼", is based on "秝". Pronounced with the initial consonant of 匠 combined with the final of 鄰.)

This is an excellent example of Chinese character etymology; it not only describes the source of the character but also analyzes the morphology or form of the character, as evidenced by the construction of "秦" through "禾" and part of "舂". The significance of Xu's book in the history of the Chinese language is such that almost two millennia later, scholars are still using his book in research. The only major revision came after the 1899 discovery of oracle bones, which the Shang dynasty (c. 1600 BC–c. 1046 BC) people used for divination. The oracle bone script predates Xiaozhuan script, the primary source for Xu Shen's character etymology because the latter is the earliest script known to Xu. Owing to this gap of knowledge, Xu inevitably made numerous mistakes in his otherwise near-perfect dictionary. One good example can illustrate the point. In the article 许慎为何将象释成母猴——“为”字趣释 (Why did Xu Shen interpret an elephant as a female monkey: interesting interpretation of character "为"), the author explained how the simple character "为", meaning "for" or "to do" nowadays, evolved from the oracle-bone pictograph depicting a man holding an elephant leash but mistaken for a female monkey by Xu Shen. (By the way, elephants indeed roamed around middle and northern China three thousand years ago, but the species was not the same as in southern China or India today.)

With all the background information, now we may answer the question why it is rare to see Chinese etymology. By that I don't mean you can't find character etymology at all. Books such as 《汉语字源字典》 ("Dictionary of Chinese Character Etymology") and the Web site Chinese Etymology by Richard Sears are available. But this is almost never incorporated into a Chinese dictionary other than a specialized etymological dictionary. If a general English reader is not more academically inclined than a Chinese reader, why does a common English dictionary such as the Webster, American Heritage, or OED (Oxford English Dictionary) include etymology without hesitation? The reason may be that Chinese (character) etymology almost never helps a reader in studying the Chinese language due to the long history and evolution of the character. (Can you stretch your imagination far enough to associate the scene of a man and an elephant with the sense of "for" or its slightly older sense of "to do"? See above.) In addition to the long history, I believe there's another, more subtle, element in clouding the Chinese etymology. Most languages in the world take the alphabetic writing system. Studying the internal history of its vocabulary primarily means analyzing phonological and morphological changes through time; e.g., there was a systematic change of f to h in Spanish for a large number of words. Secondly, less conducted is the semantic evolution of words; it's less done because it is "more hazardous to attempt to reconstruct meaning than to reconstruct linguistic form" as linguist Calvert Watkins said. And yet, the Chinese characters rarely went through systematic morphological changes that apply to a large number of characters and, since Chinese is not based on an alphabetic writing system, phonological changes are not conducive to the study of etymology per se. This leaves a large part of Chinese etymology to the study of semantic evolution, which is, as stated, more error-prone in scholarly reconstruction.

There is another reason for not incorporating etymology in Chinese dictionaries. Many characters originate from pictographs or pictograph-like glyphs such as Xiaozhuan script. Publication has to render them as images instead of text, which is an editorial inconvenience. The images with their explanatory texts take a significant amount of space relative to the definitions and examples in usage, which a regular user cares more about. This is in contrast with the etymology in an English dictionary, which can be made brief and still makes sense to the minority of interested readers. And yet a third reason may be that it's just the custom of Chinese lexicography, i.e. no etymology except in specialized dictionaries. This is probably also the reason why dictionaries of other languages than English lack etymology. (Try to find etymology in any dictionary of Spanish, French, German or Italian in a bookstore or library!) But nobody knows the original cause or reason for this custom.

Therefore, unlike a language where a student may make use of etymology in vocabulary study optionally combined with some mnemonics (as demonstrated in my book for Spanish), the Chinese characters have to be studied in a different way. Etymology comes in handy only for the very first few characters, such as "火" ("fire"), "山" ("mountain"), which are frequently used to impress complete beginners. After 10 or 20 such "pictographs", rote memory is commonly adopted, but books such as Tuttle Learning Chinese Characters that laboriously make up mnemonics are helpful. Fortunately, a large portion of the character repertoire consists of characters combining two parts, one more or less representing its meaning and the other representing the sound. However, in none of these cases would etymology play any role.

___________________________________
[note1] By emphasizing "general", I'd like to point out that a special group of Chinese words, 成语 (idioms), are an exception, in that dictionaries of Chinese idioms almost always give the first occurrence of the idioms and sometimes even briefly describe the sense development as well.
With regard to dictionaries of words in general, one may think of the book 《辭源》, literally "word origin". First published in 1925, it takes a misleading title because it's no more than a dictionary (albeit of high-quality) of Chinese words with no etymology. In fact, even if we take an alternative interpretation of "辭源" as "first occurrence of word", this book fails as well; e.g., the entry for "中国" does not list its first occurrence in the Book of Documents, or the bronze inscription which the Book records. Another book we can even more readily dismiss is the 《詞源》 by Zhang Yan in the Song dynasty because the book is on the subject of the literary genre 词, not "words".
[note2] Incidentally, the character "秦" is significant in that traditionally many scholars including Paul Pelliot believed that it is the ultimate source for the word China in many languages in the world, although more recent research attributed the origin to "晋". Two other sources of the word referring to China are Khitan as in the case of Russian, and silk.)