Supervised machine learning using encrypted training data

Preservation of privacy in data mining and machine learning has emerged as an absolute prerequisite in many practical scenarios, especially when the processing of sensitive data is outsourced to an external third party. Currently, privacy preservation methods are mainly based on randomization and/or perturbation, secure multiparty computations and cryptographic methods. In this paper, we take advantage of the partial homomorphic property of some cryptosystems to train simple machine learning models with encrypted data. Our basic scenario has three parties: multiple Data Owners, which provide encrypted training examples; the Algorithm Owner (or Application), which processes them to adjust the parameters of its models; and a semi-trusted third party, which provides privacy and secure computation services to the Application in some operations not supported by the homomorphic cryptosystem. In particular, we focus on two issues: the use of multiple-key cryptosystems, and the impact of the quantization of real-valued input data required before encryption. In addition, we develop primitives based on the outsourcing of a reduced set of operations that allows to implement general machine learning algorithms using efficient dedicated hardware. As applications, we consider the training of classifiers using privacy-protected data and the tracking of a moving target using encrypted distance measurements.

Supervised machine learning using encrypted training data Articles