Tag-Based Machine Translation

The difficulties in MT (machine translation) are mostly due to various types of ambiguity, concerning polysemy of words, phrase/clause attachment, coordination, anaphoric reference, scope of logical/modal operators, and so on. Unknown words and phrases are another major source of difficulty. Translation accuracy is expected to drastically improve if the input documents are marked up with appropriate tags which resolve such ambiguities or supply missing information. Some GDA tagsets will be designed for this purpose.

An MT system which exploits such tags to generate very accurate translations could be developed very soon if you already have a translation dictionary. The GDA sense tag dictionary and your translation dictionary could be automatically aligned for the most part. Intelligibility of translation will be almost guaranteed thanks to the tags. Naturalness, stylistic quality, and so on will be major research issues.

As discussed before, the GDA tagsets are not intended to serve as knowledge representation language. So of course they are not interlingua either. You cannot translate a tagged document by looking at the tags only. First tag-based MT systems will therefore be of transfer style. Once an interlingua (or an appropriate internal representation language) is defined, however, tagged documents can be translated into it and then to any other language.

Tagged documents can be used as an example base for EBT (example-based translation) (Sato, 1991) concerning tagged texts. That is, a translation output is generated by retrieving and modifying tagged passages in the target language whose patterns of annotation are similar to that of the input tagged text. Note that this is a sort of EBT. The usual EBT uses syntactic patterns to retrieve examples, whereas this tag-adapted EBT uses annotation patterns instead.

This technology will be further applied to translation of plain texts. That is, the input text could be first automatically tagged and then translated as discussed above. The ambiguities arising in the initial tagging process are resolved by matching with the annotation patterns of the examples in the tagged example base.

GDA Home Page