Electronic International Standard Serial Number (EISSN)
1873-6769
abstract
Topic modeling has been extensively used across disciplines to extract knowledge from text collections for policy design, implementation, and monitoring. However, flat topic models are limited in their ability to group overly broad topics with more specific ones. Hierarchical topic models (htm) address this limitation by offering thematic analysis at different granularity levels, but existing htms rely on complex implementations, complicating hyperparameter tuning and domain expert-guided topic curation. This paper introduces two novel algorithms for hierarchical topic modeling: htm-ws (htm with word selection) and htm-ds (htm with document selection). These expert-in-the-loop methods enable domain experts to refine topic hierarchies by selecting subtopics for further partitioning. They are model-agnostic-compatible with both Bayesian and neural topic models¿and require minimal hyperparameter tuning, comparable to flat topic models. Both algorithms are integrated within a tool for the training and exploitation of topic models, developed in the context of the IntelComp project. Experiments on three scientific paper datasets demonstrate the effectiveness of the algorithms through quantitative evaluations with automatic metrics and qualitative assessments via human judgment, including comparisons both between the two methods and against existing hierarchical topic models from the literature.