The following tutorials have been accepted for MT Summit XVI.
Monday, September 18, Morning
To NMT or Not to NMT – Applications of Neural MT in a Localization Ecosystem
Dimitar Shterionov (KantanMT)
While Neural Machine Translation (NMT) is steadily replacing Phrase-based Statistical MT (PBSMT) as the state-of-the-art in MT in academic environments, we observe that in industry the preferred and widely used option is still PBSMT. Major reasons for the slow adoption rate of NMT include computational complexity, which reflects on the economical aspect and the uncertainty of NMT output quality when compared to PBSMT.
This tutorial comprises three parts. This first part will cover basics on NMT and initial engine training. Each participant will be given access to the KantanMT platform to train and test NMT engines. The second part will focus on the quality assessment of NMT engines and the comparison to PBSMT. It will involve performing Language Quality Review (LQR) for NMT with KantanLQR™. NMT and PBSMT engines trained on the same data for 5 language pairs (English -> Chinese, French, German, Japanese and Spanish) will be provided and participants will be engaged in an A/B Test (or side-by-side comparison) for their preferred language pair. During and at the end of the test they will be able to monitor the progress. In the third part of this tutorial we will show how to customise the NMT engine – how to improve its quality and how to tailor it towards a specific translation project in mind. We will address post-editing techniques (e.g., PEX rules, tokenizer exceptions, etc.), as well as engine adaptation.
The learning outcome of this tutorial will be twofold: on the one hand participants will be familiarised with an industry viable NMT solution, and on the other, they will be equipped to understand how well such a solution might suit their localization or translation production line, which would allow them to make an informed decision as to whether and for what projects they can adopt NMT.
Dr Dimitar Shterionov is Head MT Researcher at KantanMT. Dimitar holds a PhD in Computer Science from KU Leuven Belgium. He has worked on design and development of Artificial Intelligence software for learning and reasoning with uncertain data. Since 2016 Dimitar leads KantanLabs – a research and development group committed to advancing language technology. Within KantanLabs Dimitar and the team work on introducing innovative technology in the KantanMT platform such as efficient word reordering, improved alignment, Neural MT, and others.
Monday, September 18, Afternoon
Machine Translation Customization with Microsoft Translator Hub
Chris Wendt (Microsoft)
– Short overview of statistical MT
– Overview of the Microsoft Translator web service and API
– Probabilistic models and their function
– The power of customization
– The Translator Hub: Walk through an engine customization
– Training data, suitable and unsuitable documents for training
– Test and tuning sets
– Collaborative features in the Translator Hub
– Continuous retraining, or one-time shots
– Deploying your customized system
– Using your customized system in applications, TMSs and CMSs
– Using your customized system in your own code
Good if you bring your problems, questions and issues. We’ll be able to discuss them and come up with a good solution, which will be helpful for the other attendees. OK if you don’t. Best if you can bring a laptop. Best if you have translation memory or previously translated documents with at least 5000 segments, but no problem if you don’t have it. We will discuss the concepts using Microsoft Translator Hub and the Microsoft Translator API as examples. Many of the concepts are transferable to other MT systems, but other MT systems will not be discussed in detail.
Chris Wendt graduated as Diplom-Informatiker from the University of Hamburg, Germany, and subsequently spent a decade on software internationalization for a multitude of Microsoft products, including Windows, Internet Explorer, MSN and Bing – bringing these products to market with equal functionality worldwide. Since 2005 he is leading the program management team for Microsoft’s Machine Translation development, responsible for Microsoft Translator services, including Bing Translator, Skype Translator, the Translator API and Microsoft’s self-service MT customization system, the Translator Hub. Chris is responsible for the design of these products, connecting Microsoft’s research activities with its practical use in services and applications That includes Microsoft’s own applications, but more importantly third party applications and enterprise use. Chris’ goal in life is breaking down language barriers between the humans inhabiting earth. He believes it’ll take us a while to get there, but that we are moving in the right direction. Slowly, but occasionally moving a bit faster. We are right in the middle of one of these occasions. He is based at Microsoft headquarters in Redmond, Washington, USA.
Friday, September 22, Morning
Controlled Language helps you improve comprehensibility and translation quality
Tetsuzo Nakamura (JTCA / electrosuisse japan)
I suggest that you use Controlled Language techniques to improve comprehensibility and translation quality of your documents. Controlled Language has long survived, but seems not to have become the center of writing guidelines all through the ages. It is true that some Controlled Language expressions may sound unnatural to native speakers; though, its idea is better and more long-sighted for ICT age (In particular, Controlled Language techniques surely help English become a real “lingua franca”). I will show you effects and possibilities of Controlled Language as follows.
First, you may want to know the differences between Controlled Language and Technical Writing, which has long been believed to be the best writing guideline. They are similar to each other; however, Technical Writing cannot improve comprehensibility among the international audience. I will make the comparison between them and show you the differences.
Next, you may want to know the types of Controlled Languages in English—Controlled English. There are several varieties of Controlled English available in the transatlantic hemisphere such as ASD-STE100 Simplified Technical English which was originally developed for the European aerospace industry, the Global English Style Guide from SAS, an American business solution company, and Rule-Based Writing that was developed by tekom, the largest Technical Communication organization in Europe. The concepts of these three are similar to each other in terms of English grammar, but they differ from each other regarding their purpose and mission. I will compare their methodologies and show you which Controlled English is best suited to you.
Lastly, you may want to know Controlled Language in other languages. I will show you my “Plain and Logical Japanese 77 rules”—Controlled Japanese and extend its rules to English writing. Many of the rules are applicable to English writing; it is possible to create common rules in languages.
Tetsuzo Nakamura is a Technical Communicator with diverse knowledge and experiences both in technical communication and advertisement over 35 years. He worked for Yamaha for 30 years and quit it to pursue his career as a TC specialist. Since then he has played a central role in Japan Technical Communicators Association (JTCA) as a lecturer and continues to improve English document quality for Japanese manufacturers.