Wednesday, July 24, 2019

Word order flexibility

According to Wikipedia, about half of the world's languages take subject–object–verb (SOV) as the primary word order of a sentence, while one third follow the subject–verb–object (SVO) order. English and Chinese belong to the SVO category; e.g., in "She loves him" or "她爱他", if any of the three words or characters is re-positioned, the meaning of the sentence will be altered or be completely lost.

Recently I was reading the very entertaining tale of Phyllis and Aristotle. Legend has it that "Aristotle advised his pupil Alexander to avoid the king's seductive mistress, Phyllis, but was himself captivated by her. She agreed to ride him, on condition that she could play the role of dominatrix." (summarized by Wikipedia) On the Wikipedia page, the Old French verse that told this story ended with Aristotle excusing himself to Alexander, saying

Amour vainc tot, & tot vaincra
tant com li monde durera

with Modern English translation as "Love conquers all, and all shall conquer / As long as the world shall last".

English readers don't need to be fluent in French, much less Old French, to identify the French words corresponding to the English words; e.g. amour "love", vainc "conquers" (think of vanquish), tot "all" (think of total), etc. But what's troubling to me is that the second part of the first line, tot vaincra, is translated as "all shall conquer". The English word conquer is a transitive verb, i.e. it must be followed by an object. It took me a while to realize that "all shall conquer" actually means "(love) shall conquer all". The original author of the verse didn't write "& vaincra tot" simply because the inversion that places vaincra at the end makes it rhyme with the last word of the second line, durera ("last"). But an average English reader having no knowledge of French will have difficulty understanding "all shall conquer". So I edited the Wikipedia page to read "and shall conquer all". A few months later, someone disagreed and changed the translation back, saying it's poetic English.

I took this issue to a language forum and asked for people's opinions. As expected, most forum members agree with me. One even says he initially thought "all shall conquer" meant "all will fight back", which is a totally wrong interpretation. But one member, apparently a native Frenchman, disagreed with me and said the reader should adapt to the text of the author and the translator should respect the style of the author. Others disagreed with him, and my response was that "the adaptation should not go so far as to rendering the 'translated' text incomprehensible in the target language". I have no doubt that his mother tongue influences his appreciation of English speakers' low tolerance of flexible word order. If he were to translate the Old French verse into Chinese (suppose he knows some Chinese), the Chinese verse would probably read "爱征服一切,一切征服", the latter part of which likewise makes no sense to a native Chinese speaker.

In Romance languages such as French or Spanish, the primary word order is also SVO. However, occasionally we see sentences whose constituent is moved to a different position than the SVO rule would stipulate. Native speakers are used to these sentence structures and can understand the meaning based on context and/or the idiomatic nature of such expressions. As far as I know, there is no metric or index in linguistics to measure the word order flexibility of a language. We know that highly inflected languages such as Latin and Russian have fairly flexible word order. But English and Chinese would be quite low on this metric, while various Romance languages are probably in the middle. Sentences such as "That I know", or "那个我知道", of an apparent OSV order, are exceptions, and their OVS variants, i.e. "That know I" and "那个知道我", are completely prohibited or meaningless, even though it may be understood in French in a certain context.

Thursday, February 28, 2019

Mutual intelligibility in writing only

People in many regions of China may pronounce the same characters so differently that they can communicate with each other only by writing and not verbally. How do we define or measure mutual intelligibility in writing only (MIW hereinafter) in general? Here I describe an experiment that may serve as a starting point. Two people with high school or more education who natively speak language varieties A and B, respectively, but not both (if A and B are different), are subject to a test. Each person reads 100 sentences in normal speed randomly selected from the entire corpus of Modern A or B. (As an approximation to the entire corpus, take the Internet and book content indexed by Google as an example.) Each sentence is followed by 10 interpretations given in the language variety the other person understands, and he (she) chooses the correct one (10 choices instead of 4 or 5 just to reduce the random guess correctness). Then repeat the test switching the two people along with their respective language variety. If >=50 sentences are correctly understood, A and B are excluded from MIW. If it's <50, they are further subject to a test in which 100 sentences selected from the entire corpus are shown in writing. If >=90 sentences are correctly understood, we consider varieties A-B a case of MIW.

Thus, Sichuanese-Mandarin will be disqualified because they can be verbally communicated (and of course with written script). But Shanghainese-Mandarin, Hunanese-Mandarin, Shanghainese-Hunanese are good examples of MIW. Cantonese warrants more discussions. It's obvious that the Cantonese-Mandarin (or -Shanghainese etc.) pair has no verbal MI. There are grammar particles, pronouns and some common words unique to Cantonese. When a literate person who natively speaks Cantonese but has not learned the written Chinese in the way Chinese is taught in mainland China writes in Cantonese, can the writing be understood with >=90 correctness by one speaking Mandarin only? Suppose the content is absolutely randomly selected from the entire Cantonese corpus, and is not purely colloquial and definitely not contrived to contain a disproportionately high ratio of Cantonese-specific markers or characters. I don't know the answer, and an actual experiment is needed. One example of such a written script in Cantonese is a Wikipedia page. I personally don't know Cantonese and I may or may not be able to correctly answer 90 out of 100 questions in a reading comprehension test. Note that Cantonese is special in that many native Cantonese speakers do read Chinese text proficiently, although mandarin or other Chinese dialect speakers don't read Cantonese text (such as that Wikipedia page), creating asymmetric intelligibility, which is quite common in the world. Thus, when these two people try to communicate by writing, the preferred script they choose will be Chinese, not Cantonese. In discussing MIW, we should define two levels, one only allowing the written script to be the textual representation of the spoken language (e.g. Cantonese text for Cantonese speech), the other allowing the two people to choose whatever their preferred script is. Technically, we should limit MIW to the first case.

According to Wikipedia, Icelandic-Faroese and German-Dutch are MIW pairs. (The article also lists French-some Romance languages but does not give a good reference to support it. To my knowledge of a few Romance languages, this pair is invalid.) Based on a posting in Facebook Linguistics group, the following are additional language varieties that are candidates for MIW:

* Scots-English
* Many languages in south Asia with Sanskrit roots
* Swiss German-Standard German
* Hanoi Vietnamese-Southern Vietnamese (but highly disputed in the Sinosphere group whose members are mostly Vietnamese)
* Danish-some other Scandinavian languages

Note that I'm dealing with MIW between language varieties, a concept encompassing dialects within a language as well as languages, styles, registers, etc. While MIW within a language may not be limited to Chinese, it's probably safe to say the Chinese language has the most MIW pairs among its dialects, due to the dissociation between the pronunciation and the written form.