Topic models for decision-support systems. Part I: Training, representation and exploitation of topic models

Analyzing document collections in Science, Technology, and Innovation (STI) is essential for informed policy-making. Advances in topic modeling offer powerful tools to uncover key themes within large, heterogeneous STI datasets that can inform decision-making. However, challenges such as aligning model outputs with expert knowledge, determining the desired topic granularity, and addressing model variability, limit their widespread adoption. This paper presents the Topic Analysis and Search Engine (tase), a platform developed within the European IntelComp project to address these limitations. tase combines Bayesian and Neural topic modeling techniques within a unified framework that incorporates an expert-in-the-loop approach for training and curating models. This framework is further enhanced with a Solr-based exploitation tool, featuring an innovative indexing method and a novel criterion for document retrieval. These features enable efficient semantic similarity calculations and seamless integration into decision-support systems. We demonstrate tase s scalability and effectiveness through two real-world STI use cases, highlighting its potential for broader applications. The software is freely available as open-source under the MIT license.

topic modeling; user-in-the-loop; latent dirichlet allocation; neural topic modeling; semantic similarity; decision-support systems

Topic models for decision-support systems. Part I: Training, representation and exploitation of topic models Articles