Electronic International Standard Serial Number (EISSN)
1095-8363
abstract
Word Sense Disambiguation (WSD) is a fundamental task useful for Information Retrieval, Information Extraction, web search,and indexing, among others. In the literature there exist several works dedicated to generic WSD task, but in recent years domain-specific WSD has attracted the attention of several researchers. In this sense, this paper describes an approach for domain-specificWSD by selecting the predominant sense (synset from WordNet) of ambiguous words. To achieve it the method uses two corpora:thedomain-specific test corpus(containing target ambiguous words) and adomain-specific auxiliary corpus(obtained by usingrelevant words from thedomain-specific test corpus). The approach has four main stages: (1) auxiliary corpus generation; (2)relatedfeatures extraction(from the auxiliary corpus); (3)test features extraction(from the test corpus); and (4)features integration. Theproposed approach has been tested on domain-specific corpora (Sports and Finance) and on one balanced corpus, BNC. Even thoughour WSD approach showed some limitations when dealing with the general-domain corpus, the obtained results for domain-specific corpora, which are our main interest, were better than those reported in previous works.
Classification
keywords
domain-specific word sense disambiguation; wordnet; synset; context