NLP-inspired structural pattern recognition in chemical application Articles uri icon

authors

  • SIDOROVA, YULIA
  • ANISIMOVA, M.

publication date

  • August 2014

start page

  • 11

end page

  • 16

volume

  • 45

International Standard Serial Number (ISSN)

  • 0167-8655

Electronic International Standard Serial Number (EISSN)

  • 1872-7344

abstract

  • In this paper we report on a new structural pattern recognition approach for in silico prediction of chemical activity. It is based on grammatical inference on strings representing chemical compounds and string edit distance between a chemical compound and a formal grammar generalizing an activity class. In the late 1980s Weininger published a chemical language with a very simple and natural grammar. Recently, the algorithms suitable to process this language have been developed. From modeling of chemical activity with formal grammars and chemical compounds as words, a functionality is derivable to search for "structural alerts", that is, molecular substructures and their combinatorial patterns that cause a molecule to have properties of interest. A biodegradability prediction system has been constructed to serve as an example throughout the paper. The source code and various files from the experiment are available from the corresponding author on request. (C) 2014 Elsevier B.V. All rights reserved.

keywords

  • structural pattern recognition; grammar inference; natural language processing; chemical descriptors; smiles; activity prediction; graph kernels; language; design