Bridging from syntactic to statistical methods: Classification with automatically segmented features from sequences Articles uri icon

publication date

  • November 2015

start page

  • 3749

end page

  • 3756

issue

  • 11

volume

  • 48

International Standard Serial Number (ISSN)

  • 0031-3203

Electronic International Standard Serial Number (EISSN)

  • 1873-5142

abstract

  • To integrate the benefits of statistical methods into syntactic pattern recognition, a Bridging Approach is proposed: (i) acquisition of a grammar per recognition class; (ii) comparison of the obtained grammars in order to find substructures of interest represented as sequences of terminal and/or non-terminal symbols and filling the feature vector with their counts; (iii) hierarchical feature selection and hierarchical classification, deducing and accounting for the domain taxonomy. The bridging approach has the benefits of syntactic methods: preserves structural relations and gives insights into the problem. Yet, it does not imply distance calculations and, thus, saves a non-trivial task-dependent design step. Instead it relies on statistical classification from many features. Our experiments concern a difficult problem of chemical toxicity prediction. The code and the data set are open-source. (C) 2015 Elsevier Ltd. All rights reserved.

keywords

  • syntactic pattern recognition; grammatical inference; feature segmentation; smiles parser; feature extraction