Bridging from syntactic to statistical methods: Classification with automatically segmented features from sequences Articles
Overview
published in
- PATTERN RECOGNITION Journal
publication date
- November 2015
start page
- 3749
end page
- 3756
issue
- 11
volume
- 48
Digital Object Identifier (DOI)
full text
International Standard Serial Number (ISSN)
- 0031-3203
Electronic International Standard Serial Number (EISSN)
- 1873-5142
abstract
- To integrate the benefits of statistical methods into syntactic pattern recognition, a Bridging Approach is proposed: (i) acquisition of a grammar per recognition class; (ii) comparison of the obtained grammars in order to find substructures of interest represented as sequences of terminal and/or non-terminal symbols and filling the feature vector with their counts; (iii) hierarchical feature selection and hierarchical classification, deducing and accounting for the domain taxonomy. The bridging approach has the benefits of syntactic methods: preserves structural relations and gives insights into the problem. Yet, it does not imply distance calculations and, thus, saves a non-trivial task-dependent design step. Instead it relies on statistical classification from many features. Our experiments concern a difficult problem of chemical toxicity prediction. The code and the data set are open-source. (C) 2015 Elsevier Ltd. All rights reserved.
Classification
keywords
- syntactic pattern recognition; grammatical inference; feature segmentation; smiles parser; feature extraction