'A priori' shapley data value estimation

authors

published in

PATTERN ANALYSIS AND APPLICATIONS Journal

publication date

April 2025

start page

1

end page

19

issue

2

volume

28

Digital Object Identifier (DOI)

https://doi.org/10.1007/s10044-025-01454-5

full text

https://hdl.handle.net/10016/48049

International Standard Serial Number (ISSN)

1433-7541

Electronic International Standard Serial Number (EISSN)

1433-755X

abstract

Distributed machine learning approaches are required when training data cannot be collected in a central location, due to storage, transmission or privacy/security constraints. An important task in any distributed machine learning context, and Federated Learning is no exception, is data value estimation or credit allocation, where the goal is to reward each participant proportionally to their contribution to the final performance of the machine learning model. However, all existing data value estimation techniques require that training be completed before the data values are obtained, and in this sense they can be considered as 'a posteriori' approaches. Thus, all potential contributors must participate in the training process, regardless of the quality of their data or the final reward they can obtain. Here we present an 'a priori' Shapley data value estimation technique in which, based on some statistical measures provided by the participants, the central counterpart or aggregator can obtain reasonably accurate data value estimates before actually starting the distributed learning process. To the best of our knowledge, this is the first 'a priori' data value estimation approach proposed in the literature, and it can be used for the pre-selection of participants or to implement new pricing schemes. The introduced algorithms have been benchmarked using a variety of datasets and a logistic regression model, and we show that our 'a priori' estimates are very accurate, compared to the centralized Shapley data values.

'A priori' shapley data value estimation Articles

Overview

authors

published in

publication date

start page

end page

issue

volume

Digital Object Identifier (DOI)

full text

International Standard Serial Number (ISSN)

Electronic International Standard Serial Number (EISSN)

abstract

Classification

subjects

keywords