Improving Selection of Synsets from WordNet for Domain-Specific Word Sense Disambiguation Articles uri icon

authors

  • LOPEZ AREVALO, IVAN
  • SOSA SOSA, VICTOR JESUS
  • Rojas Lopez, Franco
  • Tello Lea, Edgar

publication date

  • January 2017

start page

  • 128

end page

  • 145

volume

  • 41

International Standard Serial Number (ISSN)

  • 0885-2308

Electronic International Standard Serial Number (EISSN)

  • 1095-8363

abstract

  • Word Sense Disambiguation (WSD) is a fundamental task useful for Information Retrieval, Information Extraction, web search,and indexing, among others. In the literature there exist several works dedicated to generic WSD task, but in recent years domain-specific WSD has attracted the attention of several researchers. In this sense, this paper describes an approach for domain-specificWSD by selecting the predominant sense (synset from WordNet) of ambiguous words. To achieve it the method uses two corpora:thedomain-specific test corpus(containing target ambiguous words) and adomain-specific auxiliary corpus(obtained by usingrelevant words from thedomain-specific test corpus). The approach has four main stages: (1) auxiliary corpus generation; (2)relatedfeatures extraction(from the auxiliary corpus); (3)test features extraction(from the test corpus); and (4)features integration. Theproposed approach has been tested on domain-specific corpora (Sports and Finance) and on one balanced corpus, BNC. Even thoughour WSD approach showed some limitations when dealing with the general-domain corpus, the obtained results for domain-specific corpora, which are our main interest, were better than those reported in previous works.

keywords

  • domain-specific word sense disambiguation; wordnet; synset; context