Thursday, September 14, 2017

Language difficulty

Chinese has been widely considered to be one of the most difficult languages in the world. What constitutes the difficulty of a language? Can it be measured and how? Whenever someone posts a message about language difficulty on a forum, it almost always generates a heated discussion. Comments range from "English is the easiest because the verbs have minimum conjugations and nouns have no gender", "Chinese and Japanese are hard because there're too many characters or kanji's", to "No language is inherently more difficult than any other because native speakers grow up speaking it with about the same effort", and "Language difficulty is subjective perception", to name a few.

Most language enthusiasts on various forums are not scholars. The diversity of those opinions is a result of no good definition of language difficulty. But we can tell that most people are referring to the difficulty experienced by an adult (not a young child) in learning a foreign language (not mother tongue), and in many cases the adult's native language is English. If we qualify the discussion with these requirements, i.e.

  • the learner is an adult;
  • the language whose difficulty is evaluated is learned by the adult as a foreign language;
  • the difficulty is evaluated when the adult's native language is specified
then a measurement of language difficulty becomes meaningful.

I believe that in many social sciences, there are two general methods to measure a quantity, internal and external. For example, in linguistics, a researcher can define a set of factors pertinent to the correlation between orthography (spelling) and pronunciation in order to calculate the orthographic depth of a language, i.e. "the degree to which a written language deviates from simple one-to-one letter-phoneme correspondence". Alternatively, one can simply conduct a controlled study among a group of people (cohort) and see which language causes how many spelling errors in dictation or in a similar experiment.

When it comes to rating language difficulty, we can devise a set of rules and individually assess each language against these rules and then sum the rule ratings (with weights); e.g., percentage of words that have cognate or loan relationship with the words in the learner's native language, whether the nouns have genders and cases, how many variations in verb conjugation, whether the dominate word order differs from that of his native language, etc. For lack of a better term, we may call this an internal evaluation.

The external evaluation, on the other hand, has been done and is widely quoted. The most well-known data for English native speakers are from Defense Language Institute of the US, where they statistically measure the time for the learners to take in achieving a certain language proficiency level. The official Web page for this study is https://www.ausa.org/articles/dlis-language-guidelines, duplicated below for your convenience.

  • Category I languages, 26-week courses, include Spanish, French, Italian and Portuguese.
  • Category II, 35 weeks, includes German and Indonesian
  • Category III, 48 weeks, includes Dari, Persian Farsi, Russian, Uzbek, Hindi, Urdu, Hebrew, Thai, Serbian Croatian, Tagalog, Turkish, Sorani and Kurmanji
  • Category IV, 64 weeks, includes Arabic, Chinese Mandarin, Korean, Japanese and Pashto
The earliest version of this data was on a Webpage of Dr. William Baxter of the University of Michigan, which he got "from documents I got at a workshop of some kind" (private email). But Dr. Baxter later removed it from his Website, so you have to reference it from archive.org, duplicated below.

Languages included
(Languages regularly offered at the University of Michigan are in capital letters; this is NOT a complete list)

Hours of instruction required for a student with average language aptitude to reach level-2 speaking proficiency

Speaking proficiency level expected of a student with superior language aptitude, after 720 hours of instruction
GROUP I Afrikaans, Danish, DUTCH, FRENCH, Haitian Creole, ITALIAN, Norwegian, PORTUGUESE, Romanian, SPANISH, Swahili, SWEDISH 480 3
GROUP II Bulgarian, Dari, FARSI (PERSIAN), GERMAN, (Modern) Greek, HINDI-URDU, INDONESIAN, Malay 720 2+ / 3
GROUP III Amharic, Bengali, Burmese, CZECH, Finnish, (MODERN) HEBREW, Hungarian, Khmer (Cambodian), Lao, Nepali, PILIPINO (TAGALOG), POLISH, RUSSIAN, SERBO-CROATIAN, Sinhala, THAI, TAMIL, TURKISH, VIETNAMESE 720 2 / 2+
GROUP IV ARABIC, CHINESE, JAPANESE, KOREAN 1320 1+
That data differs from DLI's current data in not a small way. I had some email exchanges with DLI but they didn't explain these discrepancies.

[Update 2018-04]
Dr. Robert Marzari, the author of Leichtes Englisch, schwieriges Französisch, kompliziertes Russisch, kindly sent me a summary of the result of his research and granted me permission to post it here.

In my book I tried to evaluate the difficulty of seven European languages (English, French, Spanish, Italian, Russian, Polish - and German) for a German speaking learner; for the evaluation of the German language I imagined a Romance speaker, i.e. a mixture of a French, Italian and Spanish speaker. The results of the evaluation therefore do not show absolute degrees of complexity, but rather relative degrees of difficultness, i.e. relative to a German or Romance speaker.
   If you could get hold of my book (at a University library perhaps?) just take a look at the charts on pages 269 to 275: On these charts I give the results of my evaluation of those seven languages according to the linguistic subsystems of phonetics, writing system, grammar, lexicon and textual structurization (i.e. reading difficulty).
   According to these the degree of a learner`s difficulty is as follows:
     active competence  passive competence  complete competence
     (speaking+writing)          (reading)
Spanish     29 points           11 points            40 points
English     33 points           13 points            46 points
Italian     35 points           13 points            48 points
French      43 points           10 points            53 points
Russian     51 points           15 points            66 points
German      50 points           18 points            68 points
Polish      54 points           16 points            70 points

This excellent research indicates that a German native speaker rates language difficulty in speaking or writing as Spanish < English < Italian < French < Russian < Polish, which is quite consistent with many polyglots's experience. Although the difficulty in reading has a slightly different order, reading for an adult is generally less challenging than speaking and is given less competence points. As a result, the complete competence, i.e. sum of the two types of points, has the same order as speaking+wriring competence. As this research is an internal evaluation (see above for a description), placing German in this language list makes sense even though the assumed learners speak a different native language, a Romance language instead of German.

Unfortunately, I'm not aware of any other research on this topic. But as you can already see, an otherwise hot topic can be made quite cool by the above analysis, cool as opposed to hot or debatable, and cool in the sense of being interesting.

No comments: