Automatic learning framework for pharmaceutical record matching

authors

published in

IEEE Access Journal

publication date

September 2020

start page

171754

end page

171770

volume

8

Digital Object Identifier (DOI)

https://doi.org/10.1109/access.2020.3024558

full text

http://hdl.handle.net/10016/33564

Electronic International Standard Serial Number (EISSN)

2169-3536

abstract

Pharmaceutical manufacturers need to analyse a vast number of products in their daily activities. Many times, the same product can be registered several times by different systems using different attributes, and these companies require accurate and quality information regarding their products since these products are drugs. The central hypothesis of this research work is that machine learning can be applied to this domain to efficiently merge different data sources and match the records related to the same product. No human is able to do this in a reasonable way because the number of records to be matched is extremely high. This article presents a framework for pharmaceutical record matching based on machine learning techniques in a big data environment. The proposed framework aims to explode the well-known rules for the matching of records from different databases for training machine learning models. Then the trained models are evaluated by predicting matches with records that do not follow these known rules. Finally, the production environment is simulated by generating a huge amount of combinations of records and predicting the matches. The obtained results show that, despite the good results obtained with the training datasets, in the production environment, the average accuracy of the best model is around 85%. That shows that matches which do not follow the known rules can be predicted and, considering that there is not a human way to process this amount of data, the results are promising.

Automatic learning framework for pharmaceutical record matching Articles

Overview

authors

published in

publication date

start page

end page

volume

Digital Object Identifier (DOI)

full text

Electronic International Standard Serial Number (EISSN)

abstract

Classification

keywords