Tag-Based Browsing and Question Answering

The majority of works in IR have so far addressed retrieving documents relevant to the query from a set of documents. On the other hand, retrieval to pinpoint relevant parts of a document (which may be a hypertext module) is also very useful when, for instance, you want to find exactly where in a computer manual to read in order to resolve the trouble at hand. This type of IR for document browsing has not received much attention it deserves. Tags encoding semantic structure of text are extremely useful or even indispensable here.

It is widely recognized that some sort of hyperlinks are important in document browsing. For instance, readers will greatly benefit from hyperlinks from an occurrence of a term to the part of the same document where it is defined. Obviously such links can be implemented by some tags. Perhaps such tags may not be in GDA tagsets, but if so then proposals are welcome about what they should be like and how to handle them. Such tagging for hyperlinking probably requires more intensive human involvement than the usual tagging assumed in GDA, and hence may well be studied in its own right.

It will consist of two phases: the term extraction phase and the linking phase. The former is to extract terms to be hyperlinked from and to. The linking phase establishes hyperlinks among parts of the document. Among various types of hyperlinks, let us consider definition-reference links. When one term occurs several times in a document, normally a few of them are associated with descriptions which define it. The other occurrences are referring to the definition. Hyperlinks from a referring occurrence to the definition occurrence are extremely useful for readers to understand the document without tears. If a reader cannot remember the meaning of a term she has just come across, she could jump to its definition by just a click of mouse.

The hardest problem here is how to identify definition parts of the document. If it is marked up with GDA tags, then term extraction algorithms (Nakagawa, 1996) can consult them in particular to find out the definition parts in the document.

One step further, IR and document browsing may lead to question answering. Granted that the semantic structure of documents is automatically recognized thanks to the tags, the reply to a query may be natural language sentences composed by consulting the semantic structure, instead of simply the set of relevant (parts of) documents. Of course the reply could accompany extralinguistic materials like figures, as discussed in the previous section.

GDA Home Page