Sometimes, when I ask questions in Swedish, German or Italian to Web4Health, I get answers also in English. Why?
Answer:
The natural-language question-answering method used in Web4Health means that we have to produce question-matching templates for each page. These templates also often need to be updated, based on entries in the usage logs where the system did not provide the best answer to a certain question. The work of developing and managing these templates require a special competence. Not even an ordinary professional translator can do it without a few days of instruction on how to create such templates.
Because of this, it is an advantage if only some of the people need to have this particular competence. Also, it is very important that a change in these templates can be done in one language, and the result be immediately available for natural-language question-answering also in other languages.
We have implemented this, using a technique called cross-lingual natural-language question-answering [1], [2] . How this works is shown in the figure below. The figure uses Italian, but Italian can be replaced by any other language, for which a machine-translator to English is available. If no machine-translator is available, word-for-word dictionary look up may also give acceptable results.
Incoming questions are translated by Google machine-translation to English. The English question is then put to the English-language answering engine. When the results have been found, the corresponding native language objects are shown. This could be implemented so that the user never sees that any other language than his own is involved. We have chosen, however, to show the English answer if the text of the answer has not yet been translated to English. This means that users will see some English answers mixed with their native language answers.
It is also possible to set up this process without having any translated answers. This will allow users to ask questions in their native language, but get the answer in English. Since many people handle English better as a passive than as an active language, this would be a useful tool for them.
Web4Health also have some texts which are only available in the native
language, since each national editor can add texts which are only available
in his/her
own language. For these texts, a native-language question-answering system
is used to find answers.
Web4Health has
compared the quality of the answers found in this way to question-answering
directly
in the language of the questions. These comparisons indicate
that taking the
Google machine-translation engine
as is, the quality will
be somewhat inferior to that of direct language answering. However, if
the dictionary used by Google is extended with the terminology suitable
for
our subject area, the quality will be almost as good as with direct language
answering.
The reason for this is that the standard Google dictionaries are designed
for office documents, not health. For example, the word "body" is
by Google translated as if it meant "main part", which
is the most common use of this word in office documents, but which, of
course, is
usually not suitable when talking about health.
One might argue that augmenting the dictionary with new terminology (in
our case about 6000 words, but many of them already have suitable translation
in Google, so all need not be added in an additional dictionary) is
as much
work as writing the classification separately in each language. However,
this is not true, because the same dictionary entry can be used in the
classification of many answers. For example, the dictionary entry for
"cause" can
be used in many pages discussing causes of various disorders. Another
important advantage with cross-lingual question-answering is that development
of the
dictionary does not need the special competence needed for doing the
classification. Thus, cross-lingual question-answering allows a separation
of tasks between
people with different competences.