CCSG

Appendix A (A List and Description of Translation Tools)

Dictionaries: There are many kinds of dictionaries and related textbooks. Good usage of dictionaries requires knowledge of their strengths and weaknesses, familiarity with the way in which dictionary entries are structured, and familiarity with the abbreviations and descriptive labels used in entries. In all instances, experienced translators ought to be familiar with the key relevant dictionaries for their language pairs and their area of work and know how to read and use dictionary entries.

Monolingual dictionaries:
- Source language dictionaries list and explain the different typical meanings a source language word may have in different contexts. They may help translators check what a word or term meant in a particular context.
- Monolingual target language dictionaries may help clarify possible meanings in the target language and provide collocations (usual word combinations). They may also offer synonyms.
Bilingual dictionaries:
- General bilingual dictionaries list under one entry the associated terms in another language which correspond to the various meanings possible for that term. Experienced translators may use these dictionaries as checking tools or to remind themselves of definitions they may have forgotten. Inexperienced translators may mistakenly think such dictionaries can provide them with a correct word to use which they do not already know. However, if a translator does not know a word, it is dangerous for them to use it on the basis of having found it in a dictionary.
- Terminological or specialized dictionaries can be especially useful when it comes to subject-specific terminology (e.g., medical terminology). However, languages differ in the extent to which they use technically correct terminology for subjects or prefer more everyday terms (compare "He has athlete's foot" to "He has tinea pedis"). Translators should not use terms with which they are not familiar unless they have solid evidence that these are the right terms for their needs. They may need to consult experts on a final choice. The more information a dictionary offers on the context in which suggested equivalents are embedded, the better for the translator.
Spelling dictionaries are useful during the copyediting and proofreading stages undertaken by translators. Incorrect spelling (and punctuation, layout, etc.) can trip up both interviewers and respondents when reading questions, and may also create a poor impression of the project in general. Spellcheckers included in word processors are useful, but manual proofreading remains a necessary final step to recognize errors a machine cannot (e.g., form/from, on/in, healthy/wealthy)
There are numerous online dictionaries and thesauri, both monolingual and bilingual; for instance, YourDictionary, Lexicool, and WordReference.com.

Thesauri: Thesauri group together words of similar or related meaning. They can be helpful for finding the most appropriate word after looking up a related word known not to be quite right; the user may know the word passively, and recognize it among those offered. Since a thesaurus only offers synonyms, and does not define words, extensive knowledge of the language is required to identify the starting place for a search and to decide whether a term found is appropriate.

Word processors such as MS Word also offer modestly comparable functions as 'Synonyms' and 'Thesaurus' in at least some languages.

Internet: The Internet makes it possible to see multiple examples of words in context and to check how frequently they seem to be used (e.g. through Google Research). However, the Internet offers usage without quality assurance. A particular word might only appear on translated websites or on websites from countries that do not use the language in question as a first language. The word or phrase then found may not be correct for the target language or for the level of diction required for the survey. Therefore, sites such as Google Research should always be used with caution and not without double-checking the nature of the site from which one intends to extract information.

The Internet can be used to check:

The frequency of occurrence of particular phrases or words; but again, this does not necessarily have to tell a lot about the real use of a term or expression because, for instance: (1) sometimes certain websites are linked to each other and appear more often than others, (2) the context in which a term or expression is found does not always correspond to the context you are interested in—but is nevertheless counted as a hit, and (3) the websites using a certain term or expression may be translated, so no guarantee of correct language use at native-speaker level.
The contexts in which words appear.
Official terminology vs. everyday terminology, as evidenced by the contexts in which occurrences are found.

Listservs and newsgroups: Translators often use translation-related listservs and/or newsgroups to post questions and inquiries. Survey translation needs might not be well addressed, but questions about general usage (e.g., regional terms or levels of vocabulary) could be answered. Some languages are likely to be better served than others. A list of translation-related newsgroups can be found here.

Translation software: We distinguish below between general translation software readily available on the market—that is, not specifically designed for questionnaire translation—and tools that are specifically developed for survey translation needs.

1. General translation software, not specifically designed for survey translations

Demonstration versions of general translation tools are usually available on software producer websites. Companies also usually offer to consult on prospective customers' needs. The usefulness of any of these tools for a given project depends on many factors, including the repetitive nature of the project, the scope or complexity of the project, the suitability of the tools for the specifics of a project, the budget available, and the ability of staff to work with such tools.

(a) Computer-Assisted Translation Tools help to produce consistent translations across languages and time by relying on translation memories. For instance, they provide translators with standard phraseology, such as response scales, used over and over in a survey. Depending on the product, they can also provide systematic documentation of the translation process, including document and project management. Survey agencies and international projects often use proprietary translation tools. There are also tools on the market such as SDL Trados or Déjà Vu that can be adapted to comparative survey translation. Some examples of computer-assisted translation tools are:

Across
Déjà Vu by Atril Solutions
MetaTexis
SDL Trados
Transit by STAR Group
Wordfast

(b) Fully-automated translation systems/machine translation, such as Google Translate, are explicitly not recommended here, as they do not provide procedures for consistent translation (translation memory) and process quality control via systematic documentation. Also, these systems are not able to consider the context, which is a crucial element for finding optimal translation solutions, nor do they allow for systematic optimization of translation as it is done via the TRAPD process.

A translation memory is a database that stores translations, as they are produced, for future use. 'Future use' can be within the same translation, only a few minutes after first being produced, or could be an entirely new translation task months later. The source text segment and the corresponding target text segment produced as a translation are saved as a 'translation unit.' A segment may consist of a few words, whole sentences, or, depending on the material involved, extended stretches of text. Translation memories display source and target text segments alongside each other, and thus facilitate review. In addition, they facilitate making sure that all segments up for translation have been translated, because the system runs through the entire text automatically without leaving any gaps.
When translation memory is used, it offers '100% matches' for completely identical and previously translated source text segments and 'fuzzy matches' for similar but not identical source text segments previously translated. Depending on the software used, the degree of match required in order for it to be presented to the translator can be defined. Translators accept or reject matches offered. Whatever a translator may produce as a new translation or revise by modifying an existing translation also becomes part of the dynamically created and expanding translation memory. Translations produced using translation memory can thus benefit from technology, but must be driven by translator decisions. The translation memory software simply presents pre-existing translation choices for consideration, with no quality component regarding how appropriate the translation offered is for a specific new context. It is therefore essential that the memory has been created through submitting good translations—and that the staff translating and using the software is highly qualified and experienced (see Translation: Team). Properly vetted translation memories can be useful for texts that are highly repetitive and where consistency of repetitive elements is a crucial issue. They can also be of value with texts that are used repeatedly but with slight modifications.
A terminology tool stores multilingual terms alongside additional information on these terms, such as a definition, synonyms, and context examples. Often, a terminology tool is used alongside a translation memory as a source of richer information.
Alignment tools can be used to compare a source text and its translation and match the corresponding segments. With alignment tools, it is possible to align translations produced post-hoc, that is, after a translation has been finalized; these can then be imported into a translation memory and made available for future translations. Alignment tools are typically used when a translation memory could not be used until finalization of a translation, thus allowing for the final version, rather than only the draft version, of a translation to be stored in the database.
Translation memory vs. machine translation:
- Translation memories do not ‘translate,’ but just offer similar translations (if these do exist) from a database, which need to be worked on by a competent and experienced translator.
- Translation memories are built upon the basis of human translation, whereas machine translation is a fully automatized process.
- Quality translations never rely on machine translation alone. Survey questions are a complex text type with multiple functions and components; as complete and easy understanding by the average population is of utmost importance, they need to respond to communication requirements also in the target languages. As a result, any reduction of human involvement in the decision-making process of survey translation is ill advised.
Concordance function: This software feature (existing in translation memory software) allows the translator to search for terms within the translation memory. The contextual usage of a given word is then displayed, much as in a concordance.
Corpora: A corpus is “a large collection of authentic texts that have been gathered in electronic form according to a specific set of criteria” [zotpressInText item="{2265844:GX3ZBUGG,9}"]. The relevance and usability of corpora for research stems from three essential characteristics. Firstly, corpora present language ‘as is,’ i.e. they empirically show how language is actually used. Secondly, corpora typically comprise very large collections of texts, which enables statistical analysis and inference about frequencies of various phenomena in language use. Thirdly, corpora in electronic formats are searchable and often equipped with various tools (such as concordances, frequency lists, key words in context etc.) and, as such, can be a useful source of insights about language in use.
- Corpora may be based on various design criteria. For instance, they may comprise texts of specific genres, or texts from specific authors, fields of knowledge, or historical periods. Other corpora aim to provide a broad cross-section of various genres, styles, and authors. Many of the latter are termed ‘national corpora’ (e.g., the British National Corpus) and are usually compiled by academics with public support in an effort to represent the ‘general language’ of a particular country, area, or group.
- Corpora may be monolingual (such as most national corpora) or multilingual. Multilingual corpora usually contain parallel texts and, as such, are known as parallel corpora. Texts in a parallel corpus may represent original writing on similar topics in multiple languages (e.g., news collections in various languages), or the different language versions may be interrelated (e.g., texts in the original language aligned with their translations into various languages). The latter are called translational corpora and provide insights into the characteristics of translated texts and the so-called ‘translatese’ in various language pairs or groups. One of the largest such searchable collections is EUR-Lex, the collection of European Union law in EU official languages.
- Corpora may contain texts produced by native speakers or those generated by non-native speakers, such as language learners. Learners’ corpora help researchers to identify typical errors and enhance language teaching materials or curricula on this basis.
- Moreover, while corpora started off with written texts, there has been an increasing effort to compile spoken language corpora (including corpora of interpreted speech, such as EPIC, the parallel corpus of European Parliament speeches and their simultaneous interpretations).
- Corpora have found multiple uses in areas such as linguistics (language features such as lexical density, semantic prosody etc.), language learning, discourse analysis (incl. critical discourse analysis), translation studies, etc.
- There are a number of corpus analysis tools (known as concordancers), which can interrogate corpora in various ways. They can be applied to existing public and nonpublic corpora or to specific corpus-based research projects. Queries are facilitated if corpus elements have been previously tagged, i.e. marked for various characteristics, such as parts of speech, grammatical tense, or other relevant characteristics.
- 3MC surveys can be informed by corpora of survey questionnaires with translations from various research projects, particularly if the translated versions are official and have undergone a rigorous procedure, such as some version of ‘committee approach’ or TRAPD (see above and [zotpressInText item="{2265844:2MJMKXPF}" format="%a% (%d%)"]). At present (early 2016), no such corpora are available. However, with such corpora in place, researchers could reuse survey questions and their existing approved translations (to enhance comparability within and across surveys) and avoid translating the same questions again (to reduce costs and eliminate errors in new translations). Such corpora could also be a useful learning resource for item designers, questionnaire translators, and researchers studying surveys.
- Another idea is to compile question banks from various surveys, either in a specific language or regardless of language. Such an attempt has been undertaken by GESIS-Leibnitz Institute for the Social Sciences (Germany), which is running a databank of survey items and scales in social sciences (available here). Such question banks could also provide a useful starting point for creating a translational corpus of survey questions.
- Translation management: in addition to facilitating translation, tools are available that facilitate project management of the entire translation workflow. Most of the commercial packages listed in Further Reading.

Translation software specifically designed for survey translations Additionally, there have been some tools designed not to facilitate questionnaire translation itself, but rather for internal use within institutes or projects. As these are not publicly searchable and not available for public use, we would like to concentrate on one particular tool in these Guidelines which has been developed specifically for questionnaire translation and is currently being adapted in order to be useable for the team approach (or TRAPD translation scheme). The so-called 'Translation Management Tool,' as the name indicates, will not only be usable throughout the whole questionnaire translation process, including the TRAPD model plus quality assurance steps, but will also facilitate managing the whole translation workflow. CentERdata has been developing it for the Survey on Health, Ageing and Retirement in Europe (SHARE), which has been using this tool since its first wave (however, its predecessor, the 'Language Management Tool,' is a different product with some common feature). CentERdata is now collaborating with the translating team of the [zotpressInText item="{2265844:883WBJP7}" format="%a% (%d%)"] to make it usable for the rigorous ESS questionnaire translation scheme, consisting of the team approach following the TRAPD model. Once it has been developed, it will be available online, and references will be added here. Reference [zotpressInTextBib style="apa" sortby="author"]