Semtinel: Thesaurus Analysis Beyond Numbers

Semtinel is a main contribution of my doctoral dissertation, which is available online and which contains a detailed description of the approach.

What is Semtinel?

Semtinel is a thesaurus analysis software that provides various visualisation and analysis techniques to supervise and enhance the quality of your thesaurus. Thus, the main users are thesaurus developers and developers of applications that use a thesaurus internally. The Semtinel Workbench is released under GPL on Sourceforge. Visit us there: Semtinel on Sourceforge.

Release note in my blog plus external reception:

Further information


The use of thesaurus-based indexing is a common approach for increasing the performance of document retrieval. With the growing amount of documents available, manual indexing is not a feasible option. Statistical methods for automated document indexing are an attractive alternative. We argue that the quality of the thesaurus used as a basis for indexing in regard to its ability to adequately cover the contents to be indexed is of crucial importance in automatic indexing because there is no human in the loop that can spot and avoid indexing errors.

With Semtinel we develop a framework that enables a human expert to supervise the automatic indexing process in an interactive and effective way. The Semtinel Framework provides various analysis approaches that are based on a combination of statistical measures and appropriate visualization techniques. The goal is the detection of potential problems in a thesaurus that affect the quality of the document annotations.

More information:


These demos are outdated and have been created with a very early prototype.

STW Analysis

In this presentation, we use IC Difference Analysis to get some insight on the German Economic Standard Thesaurus (Standard Thesaurus Wirtschaft, STW). Click here (Flash, JavaScript and Popups required).

MeSH Analysis

This presentation resembles the previous analysis, but now the Medical Subject Headings (MeSH) are used as thesaurus. Click here (Flash, JavaScript and Popups required).