Back to top

Master's Thesis Ahmed Saidani

Last modified Jun 28, 2022
   No tags assigned

A Systematic Comparison of Federated Machine Learning Libraries

In traditional machine learning, the central server gathers the data of different clients and uses it to train a model centrally. Data owners lose their data sovereignty and give their data privacy into the hand of this central entity. This inhibits the willingness of voluntary data sharing, which potentially leads to the ever-persisting bottleneck of insufficient training data in ML systems. Federated machine learning, on the other hand, is a novel machine learning paradigm that addresses this issue (Shaheen et al., 2022). In federated machine learning, The central server decides on a machine learning model and distributes it among all clients. Then, the model is trained in a decentralized manner on the client-side, and only the output of that training is sent back to the server. The server then uses an averaging algorithm like FedAvg to combine the results acquired from the different clients to get a final output (Kairouz et al., 2021). The data therefore never leaves the client’s devices. Data sovereignty is kept on the data owner’s side. Which provides a certain degree of privacy by design (Li et al., 2020). That can be rendered useful in privacy-critical industries. For instance, the banking industry (e.g training a credit rating model without accessing the client’s financial data ), the insurance industry (e.g determining the insurance premiums of the clients), or the healthcare industry. Even though the use cases for federated learning already exist, real-life applications are yet to be seen (Müller et al., 2021). The main reasons for that are privacy concerns, data heterogeneity, and scalability issues. To address these issues and make the development of federated learning systems easier, multiple libraries have been developed (e.g TFF, FATE, PaddleFL, LEAF, PySyft, and FedML). This has given the FL practitioners a wide array of choices but also made it more confusing to pick a library. Efforts to compare the accuracy, privacy, and time consumption of these libraries (Chai et al., 2020), as well as their functionality (He et al., 2020), have been already made. However, these efforts were high level and have not quantitatively measured the quality through KPIs nor have they provided a guideline for the practitioners to choose an adequate library. Since every library has its capabilities and limitations, it is cumbersome for practitioners to identify the fitting one for their specific use case. Hence, this thesis aims to research important requirements for FedML libraries and identify KPIs to classify them. This can be achieved by providing an overview of existing FedML libraries and their corresponding capabilities and limitations. These capabilities and limitations can be deducted from the official library documentation and by benchmarking these libraries according to specific KPIs with the help of a modular tool, where they can be integrated and evaluated.

Research questions

1. What are the important functional and non-functional requirements for FL libraries and what are the metrics to benchmark them?

2. How could a modular software application that benchmarks the different federated learning libraries using the KPIs be developed?

3. What are the guidelines/best practices to choose a federated learning library that could be extracted from the benchmarking?

Contribution

1. To answer the first research question, we will conduct a literature review as well as expert interviews to identify the different Federated Learning libraries used in both the industry and academia and compare the capability, features and maturity of libraries (qualitative comparison) as well as other quality requirements for these libraries (e.g. communiation efficiency, time consumption, scalability, ...). After that, we will use metrics to measure the most important quality requirements (quantitative comparison).

2. We will use the design science methodology to develop a software application that tests the different libraries and benchmark them using the KPIs that we developed. Ideally, we will have a web frontend running on multiple Docker containers to simulate the multiple clients and one backend simulating the server. the clients would perform actions that are relevant for the benchmarking. For example, training a model with 100 data points, checking how long it took them to get results, how accurate the results are, and how long it took the server to respond after the client’s request.

3. Finally, we will use deductive scientific reasoning to develop guidelines/best practices from the results of our research.

Files and Subpages