UTX (simple glossary format)

Asia-Pacific Association for Machine Translation

UTX Home





Achievements and Articles


(Japanese version / 日本語版)

UTX (universal terminology exchange) FAQ

Updated: 2012/5/27


Creating a glossary would require a lot of effort. I don't want to do it!
I don't want to waste my time and money!

Actually, you are wasting your time by NOT making a glossary. Writing/translating documents without a glossary is quite time consuming and laborious. If you create a glossary, you can save your time. If you have reliable information in the form of glossary that other people can reference, you will see fewer mistakes. Everybody (including you) doesn't have to spend time in checking up terms that someone else already knows. You don't have to wonder which term is the best every single time you have multiple alternatives.

If you choose to create documents or develop software without a glossary, you are also wasting money. No matter how much money you spend implementing wonderful features into your application, your user will not notice them if your naming of the features are inappropriate and inconsistent. Using a consistent glossary can prevent this situation.

Isn't it hard to create a glossary?

That's exactly where UTX can help. UTX drastically simplifies the creation and maintenance of glossary by providing minimum, simple rules.

Are you getting any money for maintaining UTX?

No. The members of the UTX team (i.e. AAMT members) are dedicated volunteers. The activity of the UTX team is financed by AAMT.

How can I find out more about UTX?

The brochure, specification, and sample dictionaries are available.

I don't want to share any glossaries with others!

It's a pity, but you may still get benefits of using UTX by easily merging other UTX glossaries into your own.

Creating a UTX glossary

Do I have to include thousands of entries to create a useful glossary?

Absolutely not! The UTX team's research has shown that a glossary containing as little as fifty entries is useful to enhance the quality of machine translation for a 4000-word document. How a UTX glossary improve the overall efficiency of human translator is more difficult to measure. But the benefit of glossary will even extend to the improved readability and comprehension of the readers.

What kind of terms should we include in a UTX glossary?

A UTX glossary should contain technical terms within a specific domain. The majority of such terms are compound nouns. Please refer to the brochure and specification for the details.

How can I edit a UTX glossary?

UTX can be edited with any spreadsheet applications (such as Microsoft Excel or LibreOffice) or text editors that can handle UTF-8 (such as "Notepad" included in Windows operating systems).

Can a UTX glossary include sentences?

It can, but sentences are better handled by translation memory formats, such as TMX. We recommend excluding sentences from a UTX glossary unless it's absolutely necessary. Generally speaking, you should avoid including an excessively long term in a UTX glossary. By keeping the length of terms to a certain length, columns of a UTX glossary will be more readable.

Is a UTX glossary high-quality?

A UTX glossary should be high-quality, because its entries are hand-picked, and it should be inspected by a dictionary administrator. By contrast, automatically generated raw glossary data contain many inappropriate entries that degrade the quality of translation. This situation could be refer to as "big data, big noise". UTX's term status property allows a dictionary administrator to authorize or reject terms collected from various term contributors.

Can I use UTX to normalize terms?

Yes. A detailed instruction will be provided in the future.

Do I have to pay AAMT to create a UTX glossary?

No. AAMT doesn't charge you for the use of the UTX specification.

Can I change or sell existing UTX glossaries?

It depends on the license included in the header of the glossary. The UTX specification recommends indicating the license of a glossary. Creative Commons is a good idea. Of course, you can declare any license to your glossary as long as it is legally reasonable. You can keep it for your own internal use. But UTX glossaries can be more useful and rich if you share them!

It's impossible to choose only one translation for one term!
How can I follow the "one word, one meaning" principle?

If you find hard to follow this principle, you might have one or two of the following problems.

1. Assuming that you are the author of the document, you might be using multiple meanings for a particular term.

You should avoid using ambiguous terms in technical documentation. It is not a good idea to use multiple terms for a single meaning or a single term for multiple meanings. For example, avoid using "terms" to refer to an agreement, especially if your main topic is terminology. If you use potentially ambiguous terms, such terms must be clearly defined and differentiated to show their proper uses.

2. You are mixing multiple domains into one glossary.

In principle, one domain requires one glossary. If a translation project deals with multiple domains, for example, medical devices, you may need to have glossaries for medicine, machine, medical devices per se, and perhaps more. You don't want to use a single glossary for the entire project, because it is not compartmented and hard to reuse.

If entries from multiple domains are included in one glossary without a good reason, the situation can be called "domain contamination." Different domain requires different terminology. For example, a file and a window have different meanings in carpentry and the ICT domain. If you maintain one glossary for one domain, one translation term is enough for a source term.

For Translation Client/Language Service Provider

Why do we want to use UTX?

Have you felt frustration that you don't get technical terms translated correctly? Perhaps all you had to do was simply creating a glossary.

If you don't have any plan of using UTX for machine translation, then you will benefit from the simple terminological management of UTX. With a proper understanding, agreement, and arrangement, you might be able to collect contributions of new target term candidates from individual translators. Then you can create and use your own glossary.

If you do have a plan to use RBMT (see also here), then you can quickly build high-quality user dictionaries based on the UTX glossary.

We are translating books and games. How can we use UTX?

In books or game software, you will encounter tons of terms, perhaps many of them being proper nouns. They could be names of characters, skills, items, places etc. These are actually all "technical terms."  Without a glossary, how would you properly keep track of thousands of terms over several months of translation? The readers of your book will be confused, and the user of your game will be angry when they see incoherent terms. Also your translation is likely to involve many translators and checkers. UTX is very useful to standardize the use of terms across translators and checkers, with or without terminological tools.

Can I propose a translation project that uses UTX?

Sure! Please let us know your ideas using the contact form.

I don't see why UTX could improve translation productivity.

That's perhaps because you are not reusing UTX glossaries. They are most useful when they are reused and/or shared among various user, tools, and environments.

Do I need to have a style guide to create a good UTX glossary?

Doing so is strongly recommended to maintain consistency. A number of well-established style guides are available for English and other languages. For Japanese, JTF Standard Style Guide can be used.

Machine Translation

Why is UTX tab-separated format instead of XML?

UTX is designed to be simple. It is so simple that a UTX glossary is viable with only three mandatory columns (source and target term, and part of speech). They are manageable without using XML.

Why do we need a format if it is that simple?

Many online glossaries are published on the web, but many of them are very hard to use. They don't follow best practices of glossary. They often include similar entries without indicating priorities or clarifications of different usages. Their entries are not well-formed and they don't list their basic forms (singular or root form). However simple UTX looks, it can serve its purpose as a glossary by keeping to a certain specification.

Does UTX replace TBX, TBX-Basic, or any other existing glossary formats?

No. A UTX glossary can be created from scratch, as a collection of hand-picked technical terms by translators. It can be created with a very little effort (see the diagram below). It can serve as a basis for large-scale, complicated termbases for bigger translation projects. But it is quite useful as it is for small to medium-sized translation projects.


Position of UTX and TBX

What's wrong with TBX, TBX-Basic, or any other existing glossary formats?

There is nothing wrong about them. It's just that they are too complicated for a wider range of term contributors. Term contributors may or may not be familiar with XML or the details of various glossary formats. They can be professional translators who just know appropriate translations.

It would be nice to leverage such knowledge in the form of a usable glossary.

What is the difference between a system dictionary and a user dictionary (in translation software)?

(Rule-based) Translation software uses two types of dictionaries - system dictionaries and user dictionaries. A system dictionary is a collection of pre-defined terms that are fine-tuned to achieve the best translation results. A user dictionary is a collection of terms defined and added by the user to further increase the translation quality for a specific translation project. For this purpose, the entries of a user dictionary usually supersede those of a system dictionary. In general, a user dictionary should not include entries that are already included in a system dictionary. The user, however, can choose more suitable translations by adding such terms in a user dictionary and override the translations in the system dictionary.

What is the difference between a glossary and a user dictionary (of translation software)?

A glossary is a collection of technical terms that can be used by people or by software. A glossary may include definitions and descriptions, which are not used by translation software (translation software would need them in the form that they can understand). In contrast, a user dictionary is specifically created and used for translation software. One can convert a glossary into a user dictionary. At this point, the content of a glossary and a user dictionary is very similar. However, a user dictionary may have additional properties or entries that are not used by people. Generally, an extensive glossary can be a very good source for a high-quality user dictionary.

For developers (MT etc.)

Is the UTX specification established with RBMT (rule-based machine translation) in mind?

Yes. But UTX can be used with almost any translation/terminological tools.

Why did AAMT create the UTX format? What is the background?

Commercial translation software package like SYSTRAN is known worldwide, but you might not be familiar with translation software in Japan, where AAMT is based. The UTX specification is not limited to Japanese software or Japanese language, but a piece of historical background may be helpful to understand why UTX was established in Japan. In Japan, there are a number of commercial RBMT translation software packages. These high-end applications are shipped with 7-8 million basic/technological terms. They are highly sophisticated, and they have 30 or more options to control various aspects of translations (the high-end version of SYSTRAN has only 2 options for Japanese). As they can guess conjugations for user dictionaries, there is no need to feed detailed properties for each term entry.

Still, translation software need well-made glossaries to achieve good translation results. Large dictionaries could improve translation quality. They can, however, potentially degrade translation quality if the quality of the dictionaries is not adequately maintained. Our research proved that a small number of well-chosen terms in a UTX glossary significantly improve translation quality. This is the reason why we created a simple glossary format to reflect appropriate technical terms in translation.

We are using SMT. We don't need a glossary!

Perhaps you do. If you are using SMT and want to ensure the quality of translation, your project requires a separate process of terminological verification (which is integrated into the system if you are using RBMT instead). Even if you don't use a glossary when you translate, you will still need to use it for the purpose of quality assurance. Because you need a separate terminological verification process, you will need extra time and effort.

When converting to UTX, will it be a lossy conversion?

It depends. Although UTX can hold any amount of information by defining extra columns, doing so may not always be a good idea. If you need to maintain a number of extra properties, you may also need to consider the use of other XML-based formats. But we also need to realize that when we convert one format to another, only certain properties are essential.

Why does UTX define a limited number of term properties?

Such properties contribute very little to improve accuracy/appropriateness of translation. Reducing complexity is more essential.

I would like to contribute glossaries/write a tool for conversion.

Thank you! Please let us know using the contact form.

Can I make suggestions to the UTX specification?

Sure! Please let us know your ideas using the contact form.