Thursday, September 14, 2017

Language difficulty

Chinese has been widely considered to be one of the most difficult languages in the world. What constitutes the difficulty of a language? Can it be measured and how? Whenever someone posts a message about language difficulty on a forum, it almost always generates a heated discussion. Comments range from "English is the easiest because the verbs have minimum conjugations and nouns have no gender", "Chinese and Japanese are hard because there're too many characters or kanji's", to "No language is inherently more difficult than any other because native speakers grow up speaking it with about the same effort", and "Language difficulty is subjective perception", to name a few.

Most language enthusiasts on various forums are not scholars. The diversity of those opinions is a result of no good definition of language difficulty. But we can tell that most people are referring to the difficulty experienced by an adult (not a young child) in learning a foreign language (not mother tongue), and in many cases the adult's native language is English. If we qualify the discussion with these requirements, i.e.

  • the learner is an adult;
  • the language whose difficulty is evaluated is learned by the adult as a foreign language;
  • the difficulty is evaluated when the adult's native language is specified
then a measurement of language difficulty becomes meaningful.

I believe that in many social sciences, there are two general methods to measure a quantity, internal and external. For example, in linguistics, a researcher can define a set of factors pertinent to the correlation between orthography (spelling) and pronunciation in order to calculate the orthographic depth of a language, i.e. "the degree to which a written language deviates from simple one-to-one letter-phoneme correspondence". Alternatively, one can simply conduct a controlled study among a group of people (cohort) and see which language causes how many spelling errors in dictation or in a similar experiment.

When it comes to rating language difficulty, we can devise a set of rules and individually assess each language against these rules and then sum the rule ratings (with weights); e.g., percentage of words that have cognate or loan relationship with the words in the learner's native language, whether the nouns have genders and cases, how many variations in verb conjugation, whether the dominant word order differs from that of his native language, etc. For lack of a better term, we may call this an internal evaluation.

The external evaluation, on the other hand, has been done and is widely quoted. The most well-known data for English native speakers are from Defense Language Institute of the US, where they statistically measure the time for the learners to take in achieving a certain language proficiency level. The official Web page for this study is https://www.ausa.org/articles/dlis-language-guidelines, duplicated below for your convenience.

  • Category I languages, 26-week courses, include Spanish, French, Italian and Portuguese.
  • Category II, 35 weeks, includes German and Indonesian
  • Category III, 48 weeks, includes Dari, Persian Farsi, Russian, Uzbek, Hindi, Urdu, Hebrew, Thai, Serbian Croatian, Tagalog, Turkish, Sorani and Kurmanji
  • Category IV, 64 weeks, includes Arabic, Chinese Mandarin, Korean, Japanese and Pashto
The earliest version of this data was on a Webpage of Dr. William Baxter of the University of Michigan, which he got "from documents I got at a workshop of some kind" (private email). But Dr. Baxter later removed it from his Website, so you have to reference it from archive.org, duplicated below.

Languages included
(Languages regularly offered at the University of Michigan are in capital letters; this is NOT a complete list)

Hours of instruction required for a student with average language aptitude to reach level-2 speaking proficiency

Speaking proficiency level expected of a student with superior language aptitude, after 720 hours of instruction
GROUP I Afrikaans, Danish, DUTCH, FRENCH, Haitian Creole, ITALIAN, Norwegian, PORTUGUESE, Romanian, SPANISH, Swahili, SWEDISH 480 3
GROUP II Bulgarian, Dari, FARSI (PERSIAN), GERMAN, (Modern) Greek, HINDI-URDU, INDONESIAN, Malay 720 2+ / 3
GROUP III Amharic, Bengali, Burmese, CZECH, Finnish, (MODERN) HEBREW, Hungarian, Khmer (Cambodian), Lao, Nepali, PILIPINO (TAGALOG), POLISH, RUSSIAN, SERBO-CROATIAN, Sinhala, THAI, TAMIL, TURKISH, VIETNAMESE 720 2 / 2+
GROUP IV ARABIC, CHINESE, JAPANESE, KOREAN 1320 1+
That data differs from DLI's current data in not a small way. I had some email exchanges with DLI but they didn't explain these discrepancies.

[Update 2018-04]
Dr. Robert Marzari, the author of Leichtes Englisch, schwieriges Französisch, kompliziertes Russisch, kindly sent me a summary of the result of his research and granted me permission to post it here.

In my book I tried to evaluate the difficulty of seven European languages (English, French, Spanish, Italian, Russian, Polish - and German) for a German speaking learner; for the evaluation of the German language I imagined a Romance speaker, i.e. a mixture of a French, Italian and Spanish speaker. The results of the evaluation therefore do not show absolute degrees of complexity, but rather relative degrees of difficultness, i.e. relative to a German or Romance speaker.
   If you could get hold of my book (at a University library perhaps?) just take a look at the charts on pages 269 to 275: On these charts I give the results of my evaluation of those seven languages according to the linguistic subsystems of phonetics, writing system, grammar, lexicon and textual structurization (i.e. reading difficulty).
   According to these the degree of a learner`s difficulty is as follows:
     active competence  passive competence  complete competence
     (speaking+writing)          (reading)
Spanish   29 points         11 points          40 points
English   33 points         13 points          46 points
Italian   35 points         13 points          48 points
French    43 points         10 points          53 points
Russian   51 points         15 points          66 points
German    50 points         18 points          68 points
Polish    54 points         16 points          70 points

This excellent research indicates that a German native speaker rates language difficulty as Spanish < English < Italian < French < Russian < Polish, which is quite consistent with many polyglots's experience, although reading has a slightly different order. Apparently this research uses an internal evaluation (see above for a description), rating various aspects of a language instead of checking students' learning challenge. Thus, placing German in this language list makes sense even though the German learners speak a different native language, a Romance language instead of German.

Unfortunately, I'm not aware of any other research on this topic. But as you can already see, an otherwise hot topic can be made quite cool by the above analysis, cool as opposed to hot or debatable, and cool in the sense of being interesting.

Monday, July 10, 2017

Tian Ji's horse racing and the electoral vote system

[The following was written on November 11, 2016.]

The author of the famous military strategy book The Art of War, Sun Wu, commonly known as Sun Tzu, had a descendent, Sun Bin, who also wrote a book with the same title. In ca. 340 BC, Sun Bin advised his patron Tian Ji at a horse racing event and won the race. The following is the excerpt from Sima Qian's Records of the Grand Historian about this interesting story:

齐使者如梁,孙膑以刑徒阴见,说齐使。齐使以为奇,窃载与之齐。齐将田忌善而客待之。忌数与齐诸公子驰逐重射。孙子见其马足不甚相远,马有上、中、下、辈。于是孙子谓田忌曰:“君弟重射,臣能令君胜。”田忌信然之,与王及诸公子逐射千金。及临质,孙子曰:“今以君之下驷与彼上驷,取君上驷与彼中驷,取君中驷与彼下驷。”既驰三辈毕,而田忌一不胜而再胜,卒得王千金。于是忌进孙子于威王。威王问兵法,遂以为师。
(The ambassador of the Qi state went to the Liang state. Sun Bin as a convicted criminal went to visit and talk to him secretly. The Qi ambassador regarded Sun as valuable and carried him back to Qi. Tian Ji, the Qi general, gave him a warm reception. Ji and some princes often betted heavily on horse racing. Mr. Sun saw that all the horses were about equally capable, rated superior, average, and inferior. So Sun advised Tian Ji, "Sir, you just bet heavily. I'll make you win." Tian Ji trusted him and betted a thounsand units of gold with the king and the princes. Right before the race, Mr. Sun said, "Use your inferior horse to race with his best horse, use your average horse to race with his inferior horse, and use your best horse to race with his average horse." After three rounds, Tian Ji lost one and won two of the three rounds, and carried away one thousand units of gold. Then Ji recommended Mr. Sun to the King Wei, who interviewed Sun on military tactics and assigned him as the Chief of Staff.)

Fast forward to 2016. We see that the electoral vote in the US presidential race matters while the popular vote does not and that the two votes mathematically represent two different winners in this 2016 presidential race. Although neither Hillary Clinton nor Donald Trump can move her or his supporters from one state to another, there is similarity between the electoral vote system and Tian Ji's winning strategy. If democracy is the name given to the principle of the minority obeying the majority, the popular vote is the only true democracy. (As of this writing, Clinton has won 60,274,974 popular votes, while Trump has won 59,937,338.)

The reasons for some people to decide to not vote are (A) equal dislike of the candidates; (B) lack of interest in politics; (C) living in a non-swing state, one person's vote matters little. Group C may be small. But it's the only one out of the three that would make a difference if the American electoral vote system were abolished or even mitigated (by adjusting the weights i.e. the electors assigned to different states, e.g.). If that happened, swing states would have lower voter turnout and non-swing states would have higher. But since there're fewer swing states than non-swing states, the total popular vote count would be higher.