K.A.I.E.C.

Kai's Automatic Indexing Evaluation Center

I have a blog, too: blog.kaiec.org

Assessing Thesaurus-Based Annotations for Semantic Search Applications

Kai Eckert, Magnus Pfeffer and Heiner Stuckenschmidt

International Journal on Metadata, Semantics and Ontologies, 2008, to appear.

The use of thesaurus-based indexing is a common approach for improving the result of document retrieval. With the growing amount of documents available, manual indexing is no longer a feasible option and statistical methods for automated document indexing are becoming an attractive alternative. We argue that the quality of the thesaurus used as a basis for indexing in regard to its ability to adequately cover the contents to be indexed and as a basis for the specific indexing method used is of crucial importance in automatic indexing. The reason being that there is no human in the loop that can spot and avoid indexing errors. We propose the use of an interactive tool for thesaurus evaluation that is based on a combination of statistical measures and appropriate visualization techniques that supports the detection of potential problems in a thesaurus. We describe this method and show its application in the context of the evaluation of indexing results. The examples show that the tool supports the detection and correction of errors, leading to a better indexing result.

Download