NLP-inspired structural pattern recognition in chemical application Articles
Overview
published in
- PATTERN RECOGNITION LETTERS Journal
publication date
- August 2014
start page
- 11
end page
- 16
volume
- 45
Digital Object Identifier (DOI)
International Standard Serial Number (ISSN)
- 0167-8655
Electronic International Standard Serial Number (EISSN)
- 1872-7344
abstract
- In this paper we report on a new structural pattern recognition approach for in silico prediction of chemical activity. It is based on grammatical inference on strings representing chemical compounds and string edit distance between a chemical compound and a formal grammar generalizing an activity class. In the late 1980s Weininger published a chemical language with a very simple and natural grammar. Recently, the algorithms suitable to process this language have been developed. From modeling of chemical activity with formal grammars and chemical compounds as words, a functionality is derivable to search for "structural alerts", that is, molecular substructures and their combinatorial patterns that cause a molecule to have properties of interest. A biodegradability prediction system has been constructed to serve as an example throughout the paper. The source code and various files from the experiment are available from the corresponding author on request. (C) 2014 Elsevier B.V. All rights reserved.
Classification
keywords
- structural pattern recognition; grammar inference; natural language processing; chemical descriptors; smiles; activity prediction; graph kernels; language; design