K.A.I.E.C.

Kai's Automatic Indexing Evaluation Center

I have a blog, too: blog.kaiec.org

Interactive Thesaurus Assessment for Automatic Document Annotation


Kai Eckert, Heiner Stuckenschmidt, Magnus Pfeffer

Proceedings of The Fourth International Conference on Knowledge Capture (K-CAP 2007), Whistler, Canada

The use of thesaurus-based indexing is a common approach for increasing the performance of document retrieval. With the growing amount of documents available, manual indexing is not a feasible option. Statistical methods for automated document indexing are an attractive alternative. We argue that the quality of the thesaurus used as a basis for indexing in regard to its ability to adequately cover the contents to be indexed is of crucial importance in automatic indexing because there is no human in the loop that can spot and avoid indexing errors. We propose a method for thesaurus evaluation that is based on a combination of statistical measures and appropriate visualization techniques that supports the detection of potential problems in a thesaurus. We describe this method and show its application in the context of two automatic indexing tasks. The examples show that the methods indeed eases the detection and correction of errors leading to a better indexing result.

Additional Media

Bibliography

Download