Comparative evaluation of link-based approaches for candidate ranking in link-to-Wikipedia systems Articles uri icon

publication date

  • April 2014

start page

  • 733

end page

  • 773

volume

  • 49

International Standard Serial Number (ISSN)

  • 1076-9757

Electronic International Standard Serial Number (EISSN)

  • 1943-5037

abstract

  • In recent years, the task of automatically linking pieces of text (anchors) mentioned in a document to Wikipedia articles that represent the meaning of these anchors has received extensive research attention. Typically, link-to-Wikipedia systems try to find a set of Wikipedia articles that are candidates to represent the meaning of the anchor and, later, rank these candidates to select the most appropriate one. In this ranking process the systems rely on context information obtained from the document where the anchor is mentioned and/or from Wikipedia. In this paper we center our attention in the use of Wikipedia links as context information . In particular , we offer a review of several candidate ranking approaches in the state-of-the-art that rely on Wikipedia link information. In addition , we provide a comparative empirical evaluation of the different approaches on five different corpora: the TAC 2010 corpus and four corpora built from actual Wikipedia articles and news items.

keywords

  • semantic relatedness; web search; wordnet