Tuning BART models to simplify Spanish health-related content Articles uri icon

publication date

  • March 2023

start page

  • 111

end page

  • 122

issue

  • 70

International Standard Serial Number (ISSN)

  • 1135-5948

Electronic International Standard Serial Number (EISSN)

  • 1989-7553

abstract

  • Health literacy has become an increasingly important skill for citizens to make health-relevant decisions in modern societies. Technology to support text accessibility is needed to help people understand information about their health conditions. This paper presents a transfer learning approach implemented with BART (Bidirectional AutoRegressive Transformers), a sequence-to-sequence technique that is trained as a denoising autoencoder. To accomplish this task, pre-trained models have been fine-tuned to simplify Spanish texts. Since fine tuning of language models requires sample data to adapt it to a new task, the process of creating of a synthetic parallel dataset of Spanish health-related texts is also introduced in this paper. The results on the test set of the fine-tuned models reached SARI values of 59.7 in a multilingual BART (mBART) model and 29.74 in a pre-trained mBART model for the Spanish summary generation task. They also achieved improved readability of the original texts according to the Inflesz scale.

subjects

  • Computer Science

keywords

  • language models; lexical simplification; multilingual bart; spanish