Friday, January 13, 2017

Machine translation in 2017: It's getting neural

By November 2016, something happened to the most used open machine translation platform, Google Translate. Users around the globe – mainly those translating in combinations of English to Chinese, Spanish, French, Japanese, German, Korean, Portuguese and Turkish – have noticed a major improvement in the machine translation results which suddenly translated whole sentences, worked with much broader context and got, well, more human. Google itself claims the platform has improved "more in a single leap than we’ve seen in the last ten years combined". So what happened?

Comparison of Statistical Machine Translation vs. Neural Machine Translation results.
Source: blog.google.com

SMT vs. NMT

In short, neural networks and AI happened. Rather than relying on statistical methods that solve problems "by force" (the more complex databases and computing power available, the better results), neural networks utilize artificial neurons in computing and loosely imitate actual models of a biological brain. Google's Statistical Machine Translation (SMT) methods impressed the world by the ability to translate words and short phrases with more or less acceptable accuracy in over 100 languages (currently 103 to be exact). But the newly implemented Neural Machine Translation goes beyond this. Using deep-learning techniques, it first assumes the most relevant variant for translation that fits the context of sentences rather than just limited phrases, and then transforms it to match human speech and grammar as much as possible (as demonstrated in the picture above).

How good can NMT get

Naturally, neural networks get better over time as they learn and Google's NMT has still a lot of learning to do in order to match professional human translation, mainly for inflected languages (seems like Latin and Greek will be the last to go). But the recent evolution of the technology demonstrates exponential improvements in machine translation. Over next few years, Google will be perfecting their NMT results for all the 103 languages covered and implement the translation feature into the very DNA of "intelligent" online platforms and apps.

As it has been 10 years now since the introduction of Google Translate, it will be interesting to observe where the service will be in another 10 years and how deep will it affect the professional human translation industry. Will human translators become human editors with additional required specialization and skills by 2027?



Friday, January 6, 2017

Language facts: Latvian

Latvian is the official state language of Latvia and an official EU language. There are about 1.5 million native Latvian speakers in Latvia and about 150,000 abroad.

Riga, Latvia.
Source: AdobeStock.com

Latvian is one of the two living languages of the Balts (the other being Lithuanian), a group of its own within the Indo-European language family. Latvian is an inflective language with several analytical forms, three dialects, and German syntactical influence (as the ruling class in the Baltic region were Germans until the 19th century). In German, the language is actually called Lettish, which is also an older English term for Latvian. 

Language as a living relic

It is still a bit of a mystery how the Baltic languages really developed in early stages when evolving from the Proto-Indo-European language, the common ancestor of the largest language family in the world (the Indo-European). Both Latvian and Lithuanian contain linguistic features supposedly characteristic of the early stages of the proto-language, which makes the Baltic branch particularly interesting to academics. In fact, Latvian and Lithuanian used to be just dialects of one common language in the Baltics and started to differentiate more only after the 8th century AD. Mutually intelligible dialects still existed in modern history (estimates go back as late as to the 17th century). 

Apart from German, also the Russian language had its say in modern Latvian language evolution. (It's actually very interesting to observe the outlines of historical conflicts and battles for influence zones mainly on minor languages of Central and Eastern Europe, based on the German and Russian linguistic impact). The first wave of Russification in the late 19th century, followed by almost 50 years of Soviet occupation (from 1941 to 1990) as well as Stalin's intent for Russia to colonize the Baltic region diminished the ethnic Latvian population (from 80% before World War II to only 52% in 1989). After massive deportations of Latvians, the area was populated by immigrants who kept Russian as their mother tongue. After the Soviet union collapsed in 1991, Latvia introduced policies to strengthen the use as well as education of the Latvian language and the number of native Latvian speakers increased to more than 60% in Latvia accordingly.

Alphabet

The modern standard Latvian alphabet uses 22 unmodified letters of the Latin alphabet (all except Q, W, X and Y). It adds a further eleven letters by modification. Latvian spelling has almost perfect correspondence between graphemes and phonemes. Every phoneme has its own letter so that a reader need not learn how a word is pronounced, but simply pronounce it. 

A, Ā, B, C, Č, D, E, Ē, F, G, Ģ, H, I, Ī, J, K, Ķ, L, Ļ, M, N, Ņ, O, P, R, S, Š, T, U, Ū, V, Z, Ž 
a b c d e f g h i k l m n o p q r s t v x y z

Friday, December 2, 2016

Language facts: Vietnamese

Vietnamese belongs to the Austroasiatic language family (that also includes Khmer, which is spoken in Cambodia). It was heavily influenced by the Chinese due to centuries of Chinese rule and as a result around half of the Vietnamese vocabulary consists of naturalized Chinese expressions. Later, as a result of the French occupation and strong cultural influence from the West, a lot of new words were added (such as "tivi" for TV).
Terraced rice fields in the sunset – Mu chang chai, Yen bai, Vietnam.
Source: AdobeStock.com


An emigrated language

Vietnamese is the national language of Vietnam, spoken by approximately 70 million people in Vietnam and about another 3 million mostly in East and Southeast Asia, as well as the United States and Australia as a result of vast Vietnamese emigration. Vietnamese-speaking communities and their cultural influence that surprisingly integrates Vietnamese minorities has caused the language to be recognized in very surprising parts of the world. In the Czech Republic, for example, Vietnamese even has an official status. It is recognized as one of the minority languages that entitles Czech citizens from the Vietnamese community to use Vietnamese language in communication with the public authorities as well as courts. In those municipalities where Vietnamese exceed 10% of population, the language is used also in public information channels (including election information), and the minority is entitled to require assistance in its language.

Alphabet

Vietnamese uses the Latin alphabet (quốc ngữ), but with frequent use of diacritics, and has borrowed a large part of its vocabulary from Chinese. Formerly until the 20th century, the language was written using the modified Chinese writing system set (chữ nôm)

A Ă Â B C D Đ E Ê G H I K L M N O Ô Ơ P Q R S T U Ư V X Y

a ă â b c d đ e ê g h i k l m n o ô ơ p q r s t u ư v x y



Friday, November 25, 2016

How to update your multilingual product catalog for 2017

The year comes to an end once more (as the years pass, it seems it is disturbingly faster, right?), and the time has come to update the yearly product catalog again. In all those languages it comes in, and hopefully, some more – since everyone wants to expand internationally. Your best choice is to entrust a professional language service provider, but even then there are things to watch to make sure the process will run smoothly.

How to update your multilingual product catalog for 2017 and keep your sanity?
Source: AdobeStock.com

If you keep just 3 things in mind this year, those 2017 multilingual updates won't turn into a nightmare. Read on and enjoy:


1. Have your translation resources revised before you reuse them. If you don't have any, you should get them created (see how to create translation memory here).

Translation resources consist mainly of translation memories and glossaries (translation memory explained in detail here). These are databases of already translated text, which significantly reduce translation cost and delivery times when applied in translation, mainly in case of repetitive content – which surely applies to product catalogs. 
The translation resources, however, degenerate over time if not maintained and properly updated. They need to be checked from time to time because any errors detected in the resources will propagate into all future translations. Or in plain words: You don't want to have the same typos in a catalog 3 years in a row... 

2. Ask for a price estimate including a pre-translation analysis. Don't pay for repetitive, or already translated content.

As explained above, having a good part of content already translated before and processed in form of translation resources logically reduces both translation time and translation cost. Your supplier shouldn't charge the same for repetitive or exact match content, and the price tag on your translation project needs to reflect this. Therefore always ask your translation supplier to provide you with a translation estimate that contains a detailed analysis of reused content and check how this affects the final price. 


3. Always insist on having the translated content reviewed and proofread.

Humans make mistakes, and machines programmed by humans make mistakes too. Until the day comes when AI takes over translation, it's always a healthy idea to review translated documents before they are published. Especially with technical content, it can be critical not to overlook mistakes (e.g. a translator confusing "always push this button" with "never push this button") or else such mistakes can end up as very unpleasant lawsuits. For this reason, demand that your texts not only be translated, but also fully reviewed and edited by a second native translator, and then as a final step proofread by humans with computer-assisted checkers. By the way, it's also good to make sure your translation supplier holds indemnity insurance - just in case...




That's it. The issues are clear, the process is given, but if it sounds as too much to handle, just leave the job to us. We specialize in technical translation and we're experts on catalogs, manuals, and guidelines. 


Order your translation now, or contact our project coordinators for further information and enjoy the Happy Holidays – this time stress-free.









Wednesday, November 16, 2016

Language facts: Slovenian

Slovenian (or Slovene) is a Slavic language from the South Slavic group, most closely related to Croatian and a distant relative of languages such as Russian. Slovenian should NOT be confused with the Slovak language, which does not have much in common with Slovenian, apart from both being Slavic languages. Interestingly, both languages call their own language by the same expression – slovensky/i, sloven(s)cina – which literally means Slavic in the old Slavonian. Slovenian is spoken by about 2 million people in Slovenia – a small country, but with both high mountains (Alps) and a sea (the Adriatic sea), as well as Slovenian communities in neighboring countries and immigrants around the world. Slovenian is also an official EU language.

Church in the middle of Lake Bled, Slovenia.
Source: AdobeStock.com

The least homogeneous Slavic language

Slovenian is a heavily inflected language with some ancient grammatical peculiarities, such as the dual grammatical number. Despite the small number of speakers, the dialects are heavily diversified and strong dialects from opposite sides of the country, influenced by neighboring languages, are practically mutually unintelligible. This was due to the fact that compulsory schooling was in other languages than Slovenian (mainly German and Italian). Standardized Slovenian as a national language was formed in the 18th century based on the Upper and Lower Carnolian dialects. 


Alphabet

Slovenian uses the Latin alphabet, without the letters Q, X, Y, W and with the addition of a few extra letters. The letters Q, X, Y, W, however, are used as independent letters in encyclopedias and dictionary listings (and as such are included in the alphabet here). 



A B C Č Ć D Đ E F G H I J K L M N O P Q R S Š T U V W X Y Z Ž
a b c č ć d đ e f g h i j k l m n o p q r s š t u v w x y z ž

Friday, October 28, 2016

Translation tips: How to localize dates?

There were times, and it is not so long ago, when not even Europe had a unified calendar – not to mention the world. And although the IT revolution made us unify most of the information to 0 and 1, including all everyday thing, calendar dates can still turn into a real pain when it comes to localization. 
Calendar date formatting.
Souce: AdobeStock.com

Calendar dates formatting

There are various formats that different languages and cultures use for writing dates. The reason for such usage of the specific formats are usually historic and cultural, but some are also driven by technical development. The calendar dates can vary as follows:
  • Order of date components (e.g. day-month-year = little-endian; month-day-year = middle-endian; year-month-day = big-endian) - the most popular in the majority of countries around the world is the day-month-year format, mainly due to the Western religious and legal customs of writing dates (e.g. the 1st day of November, Anno Domini 2016)
  • Usage of leading zeros in days and months (e.g. 01-01-2016 vs. 1-1-2016) – German-speaking and German-influenced regions, for instance, tend to use 
  • Separators like hyphens, dots, etc. (e.g. 01-01-2016, 01.01.2016, 1 January 2016, 1. January 2016 or 01/01/2016)
  • Year format (e.g. 01-01-2016 vs. 01-01-16)
  • Numeral type usage – Arabic vs. Roman (e.g. 1. XII. 2016 vs. 1.12.2016)
  • Months name usage (months can be written down using both names and numbers, e.g. 1.1.2016 vs. 1.January 2016)
  • Other language or cultural specifics (e.g. 1st January 2016 in English, or adding AD (Anno Domini), or CE (common era) to the date)
  • Reversed day and month this is a popular format used only in the United States and often a default settings in many computers, e.g. 01-31-2016 for January 31, 2016.
There is also an ISO 8601 standard for data elements and interchange formats, that works with YYYY-MM-DD format.


Time zones matter in dates localization 

Not only the formatting, but also timezones need to be taken into consideration, based on the observer's view. This can be rather tricky with important historical dates, where e.g. the attack on Pearl Harbor, generally known to be December 7th, 1941, actually took place on December 8th in Japanese time.