VR-Tree: A Novel Tree-based Approach for Modeling Web Query Interfaces Articles uri icon

authors

  • Marin Castro, Heidy Marisol
  • SOSA SOSA, VICTOR JESUS

publication date

  • February 2017

start page

  • 367

end page

  • 390

issue

  • 3

volume

  • 49

International Standard Serial Number (ISSN)

  • 0925-9902

Electronic International Standard Serial Number (EISSN)

  • 1573-7675

abstract

  • Web Query Interfaces (WQIs) play a very important role in retrieving Deep Web content. WQIs allow users to query domain-specific databases for obtaining information of interest from diverse domains such as car rentals, hotels, airfare, etc. As the number of WQIs on the web is increasing drastically, some research efforts are focused on building a single (unified) WQI that allows users to query and integrate information available in different web databases related to a specific domain. A very important task in this WQIs' integration process is the extraction, modeling and understanding of WQIs' semantic content. However, this task is challenging because of the great heterogeneity in the design of WQIs. This paper presents a novel tree-based approach for the modeling and understanding of WQIs. A tree schema called the Visual Reduced Tree (VR-Tree) is built from the tree produced by a web browser's render engine, applying a set of well- defined functions and guided by a set of heuristic rules to identify the WQI's main components and their relationships. The proposed strategy was evaluated by running a collection of experiments over the Tel-8 and ICQ datasets from the UIUC repository. The results show that the automatic modeling of WQIs is possible with a high degree of precision if compared against previous approaches, simplifying the modeling task by only considering visual and spatial properties of WQI components using the VR-Tree schema proposed in this work.

keywords

  • web query interfaces; modeling; schema tree; render tree; heuristic rules