Electronic International Standard Serial Number (EISSN)
1573-7675
abstract
Web Query Interfaces (WQIs) play a very important role in retrieving Deep Web content. WQIs allow users to query domain-specific databases for obtaining information of interest from diverse domains such as car rentals, hotels, airfare, etc. As the number of WQIs on the web is increasing drastically, some research efforts are focused on building a single (unified) WQI that allows users to query and integrate information available in different web databases related to a specific domain. A very important task in this WQIs' integration process is the extraction, modeling and understanding of WQIs' semantic content. However, this task is challenging because of the great heterogeneity in the design of WQIs. This paper presents a novel tree-based approach for the modeling and understanding of WQIs. A tree schema called the Visual Reduced Tree (VR-Tree) is built from the tree produced by a web browser's render engine, applying a set of well- defined functions and guided by a set of heuristic rules to identify the WQI's main components and their relationships. The proposed strategy was evaluated by running a collection of experiments over the Tel-8 and ICQ datasets from the UIUC repository. The results show that the automatic modeling of WQIs is possible with a high degree of precision if compared against previous approaches, simplifying the modeling task by only considering visual and spatial properties of WQI components using the VR-Tree schema proposed in this work.
Classification
keywords
web query interfaces; modeling; schema tree; render tree; heuristic rules