The natural-language
question-answering method used
in Web4Health means that we have to produce question-matching templates
for each page. These
templates also often need to be updated, based on entries in the usage logs
where the system did not provide the best answer to a certain question. The
work of developing and managing these templates require a special competence.
Not even an ordinary professional translator can do it without a few days
of instruction on how to create such templates.
Because of this, it is an advantage if only some of the people need to
have this particular competence. Also, it is very important that a change
in these
templates
can be done in one language, and the result be immediately available for
natural-language question-answering also in other languages.
We have implemented this, using a technique called cross-lingual natural-language
question-answering
[1], [2]
.
How this works is shown in the
figure below.
The figure uses Italian, but Italian can be replaced by any other language,
for which
a machine-translator to English is available. If no machine-translator is
available,
word-for-word dictionary look up may also give acceptable results.
Incoming
questions are translated by Systran machine-translation to English. The English
question is then put to the English-language answering engine.
When the results have been found, the corresponding native language objects
are shown. This could be implemented so that the user never sees that any
other language than his own is involved. We have chosen, however, to show
the English answer if the text of the answer has not yet been translated
to English. This means that users will see some English answers mixed with
their native language answers.
It is also possible to set up this process without having any translated
answers. This will allow users to ask questions in their native language,
but get the answer in English. Since many people handle English better
as a passive than as an active language, this would be a useful tool for
them.
Web4Health also have some texts which are only available in the native
language, since each national editor can add texts which are only available
in his/her
own language. For these texts, a native-language question-answering system
is used to find answers.
Web4Health has
compared the quality of the answers found in this way to question-answering
directly
in the language of the questions. These comparisons indicate
that taking the
Systran machine-translation engine
as is, the quality will
be somewhat inferior to that of direct language answering. However, if
the dictionary used by Systran is extended with the terminology suitable
for
our subject area, the quality will be almost as good as with direct language
answering.
The reason for this is that the standard Systran dictionaries are designed
for office documents, not health. For example, the word "body" is
by Systran translated as if it meant "main part", which
is the most common use of this word in office documents, but which, of
course, is
usually not suitable when talking about health.
One might argue that augmenting the dictionary with new terminology (in
our case about 6000 words, but many of them already have suitable translation
in Systran, so all need not be added in an additional dictionary) is
as much
work as writing the classification separately in each language. However,
this is not true, because the same dictionary entry can be used in the
classification of many answers. For example, the dictionary entry for
"cause" can
be used in many pages discussing causes of various disorders. Another
important advantage with cross-lingual question-answering is that development
of the
dictionary does not need the special competence needed for doing the
classification. Thus, cross-lingual question-answering allows a separation
of tasks between
people with different competences.