Monday, January 23, 2017

Language facts: Norwegian

Norwegian is a Scandinavian language and a branch of the Germanic languages that has slightly more than 4 million native speakers. Two standard varieties of Norwegian exist, Bokmål and Nynorsk (literally "new Norwegian"). 

Oslo, Norway
Source: Adobestock.com

Nynorsk is used primarily in the western regions and is spoken by around 0.5 million people. Bokmål is used by the rest of Norway and remains the preferred variant when writing Norwegian, although the spoken Norwegian resemble Nynorsk more than Bokmål


Nynorsk vs. Bokmål conflict

Today Nynorsk and Bokmål both have equal legal status in Norway, though the private and commercial sectors of Norway's economy are dominated completely by Bokmål, while all public bodies uphold both variants. Interestingly, both Bokmål and Nynorsk are just writing standards, yet don't provide guidelines on the spoken form of the language. In result, a mixture of dialects is used in everyday (even official) communication and no spoken form considered as "incorrect".
The main difference in the two forms is based in their historical origin. While Bokmål is a Norwegianized version of Danish used by the elite and upper class, as Danish used to be the standard for writing Norwegian from 16th to 19th century, Nynorsk resulted from opposition to the Danish language and tried to established the language on "pure" Norwegian rural dialects. These two writing standards and their use turned into a fundamental political controversy in Norway mainly throughout the 20th century.
The decades-long efforts to merge Norwegian writing standards into one common language (called Samnorsk) failed after series of language reforms, and the policy was eventually abandoned in 2002 due to strong public resistance, keeping this interesting linguistic schizophrenia very much alive.

Alphabet

The Norwegian alphabet consists of 29 letters. In addition to the standard English alphabet, Norwegian ends with … X Y Z Æ Ø Å. Certain letters can be modified by diacritics (é, è, ê, ó, ò, ô and occasionally also ì and ù and ỳ in Nynorsk).


A B C D E F G H I J K L M N O P Q R S T U V W X Y Z Æ Ø Å
a b c d e f g h i j k l m n o p q r s t u v w x y z æ ø å

Friday, January 13, 2017

Machine translation in 2017: It's getting neural

By November 2016, something happened to the most used open machine translation platform, Google Translate. Users around the globe – mainly those translating in combinations of English to Chinese, Spanish, French, Japanese, German, Korean, Portuguese and Turkish – have noticed a major improvement in the machine translation results which suddenly translated whole sentences, worked with much broader context and got, well, more human. Google itself claims the platform has improved "more in a single leap than we’ve seen in the last ten years combined". So what happened?

Comparison of Statistical Machine Translation vs. Neural Machine Translation results.
Source: blog.google.com

SMT vs. NMT

In short, neural networks and AI happened. Rather than relying on statistical methods that solve problems "by force" (the more complex databases and computing power available, the better results), neural networks utilize artificial neurons in computing and loosely imitate actual models of a biological brain. Google's Statistical Machine Translation (SMT) methods impressed the world by the ability to translate words and short phrases with more or less acceptable accuracy in over 100 languages (currently 103 to be exact). But the newly implemented Neural Machine Translation goes beyond this. Using deep-learning techniques, it first assumes the most relevant variant for translation that fits the context of sentences rather than just limited phrases, and then transforms it to match human speech and grammar as much as possible (as demonstrated in the picture above).

How good can NMT get

Naturally, neural networks get better over time as they learn and Google's NMT has still a lot of learning to do in order to match professional human translation, mainly for inflected languages (seems like Latin and Greek will be the last to go). But the recent evolution of the technology demonstrates exponential improvements in machine translation. Over next few years, Google will be perfecting their NMT results for all the 103 languages covered and implement the translation feature into the very DNA of "intelligent" online platforms and apps.

As it has been 10 years now since the introduction of Google Translate, it will be interesting to observe where the service will be in another 10 years and how deep will it affect the professional human translation industry. Will human translators become human editors with additional required specialization and skills by 2027?



Friday, January 6, 2017

Language facts: Latvian

Latvian is the official state language of Latvia and an official EU language. There are about 1.5 million native Latvian speakers in Latvia and about 150,000 abroad.

Riga, Latvia.
Source: AdobeStock.com

Latvian is one of the two living languages of the Balts (the other being Lithuanian), a group of its own within the Indo-European language family. Latvian is an inflective language with several analytical forms, three dialects, and German syntactical influence (as the ruling class in the Baltic region were Germans until the 19th century). In German, the language is actually called Lettish, which is also an older English term for Latvian. 

Language as a living relic

It is still a bit of a mystery how the Baltic languages really developed in early stages when evolving from the Proto-Indo-European language, the common ancestor of the largest language family in the world (the Indo-European). Both Latvian and Lithuanian contain linguistic features supposedly characteristic of the early stages of the proto-language, which makes the Baltic branch particularly interesting to academics. In fact, Latvian and Lithuanian used to be just dialects of one common language in the Baltics and started to differentiate more only after the 8th century AD. Mutually intelligible dialects still existed in modern history (estimates go back as late as to the 17th century). 

Apart from German, also the Russian language had its say in modern Latvian language evolution. (It's actually very interesting to observe the outlines of historical conflicts and battles for influence zones mainly on minor languages of Central and Eastern Europe, based on the German and Russian linguistic impact). The first wave of Russification in the late 19th century, followed by almost 50 years of Soviet occupation (from 1941 to 1990) as well as Stalin's intent for Russia to colonize the Baltic region diminished the ethnic Latvian population (from 80% before World War II to only 52% in 1989). After massive deportations of Latvians, the area was populated by immigrants who kept Russian as their mother tongue. After the Soviet union collapsed in 1991, Latvia introduced policies to strengthen the use as well as education of the Latvian language and the number of native Latvian speakers increased to more than 60% in Latvia accordingly.

Alphabet

The modern standard Latvian alphabet uses 22 unmodified letters of the Latin alphabet (all except Q, W, X and Y). It adds a further eleven letters by modification. Latvian spelling has almost perfect correspondence between graphemes and phonemes. Every phoneme has its own letter so that a reader need not learn how a word is pronounced, but simply pronounce it. 

A, Ā, B, C, Č, D, E, Ē, F, G, Ģ, H, I, Ī, J, K, Ķ, L, Ļ, M, N, Ņ, O, P, R, S, Š, T, U, Ū, V, Z, Ž 
a b c d e f g h i k l m n o p q r s t v x y z