Cross-Lingual Question-Answering

Abstract: Web4Health uses a technology called cross-lingual question-answering. The question is translated by Google to English, and answers found also in the Web4Health English data base.

psychologist Independent medical expert answers on psychiatry and psychology

Written by: Jacob Palme, professor of computer science, Stockholm University
First version: 22 Jul 2008.
Latest revision: 23 Oct 2013.

Sometimes, when I ask questions in Swedish, German or Italian to Web4Health, I get answers also in English. Why?


The natural-language question-answering method used in Web4Health means that we have to produce question-matching templates for each page. These templates also often need to be updated, based on entries in the usage logs where the system did not provide the best answer to a certain question. The work of developing and managing these templates require a special competence. Not even an ordinary professional translator can do it without a few days of instruction on how to create such templates.

Because of this, it is an advantage if only some of the people need to have this particular competence. Also, it is very important that a change in these templates can be done in one language, and the result be immediately available for natural-language question-answering also in other languages.

We have implemented this, using a technique called cross-lingual natural-language question-answering [1], [2] . How this works is shown in the figure below. The figure uses Italian, but Italian can be replaced by any other language, for which a machine-translator to English is available. If no machine-translator is available, word-for-word dictionary look up may also give acceptable results.

Incoming questions are translated by Google machine-translation to English. The English question is then put to the English-language answering engine. When the results have been found, the corresponding native language objects are shown. This could be implemented so that the user never sees that any other language than his own is involved. We have chosen, however, to show the English answer if the text of the answer has not yet been translated to English. This means that users will see some English answers mixed with their native language answers.

It is also possible to set up this process without having any translated answers. This will allow users to ask questions in their native language, but get the answer in English. Since many people handle English better as a passive than as an active language, this would be a useful tool for them.

Web4Health also have some texts which are only available in the native language, since each national editor can add texts which are only available in his/her own language. For these texts, a native-language question-answering system is used to find answers.
Web4Health has compared the quality of the answers found in this way to question-answering directly in the language of the questions. These comparisons indicate that taking the Google machine-translation engine as is, the quality will be somewhat inferior to that of direct language answering. However, if the dictionary used by Google is extended with the terminology suitable for our subject area, the quality will be almost as good as with direct language answering.

The reason for this is that the standard Google dictionaries are designed for office documents, not health. For example, the word "body" is by Google translated as if it meant "main part", which is the most common use of this word in office documents, but which, of course, is usually not suitable when talking about health.
One might argue that augmenting the dictionary with new terminology (in our case about 6000 words, but many of them already have suitable translation in Google, so all need not be added in an additional dictionary) is as much work as writing the classification separately in each language. However, this is not true, because the same dictionary entry can be used in the classification of many answers. For example, the dictionary entry for "cause" can be used in many pages discussing causes of various disorders. Another important advantage with cross-lingual question-answering is that development of the dictionary does not need the special competence needed for doing the classification. Thus, cross-lingual question-answering allows a separation of tasks between people with different competences.

Web4Health in other languages: Finnish German Greek Italian Polish Russian Swedish

