Multifactor Incentive Mechanism for Federated Learning in IoT: A Stackelberg Game Approach

In the era of the Internet of Things (IoT), remote sensors and endpoint appliances generate vast amounts of data. Decentralized and collaborative learning builds on these IoT data to enable classification and recognition tasks by inviting multiple data owners. Federated learning (FL), as a popular collaborative learning framework, can significantly improve the performance of models without collecting the original data. To invite data owners to participate in FL, various incentive mechanisms are designed to address this issue by researchers. However, existing solutions still face high costs and low utility due to information asymmetry, where the reputation, computation power, and data quantity of the data owners are not known in advance. Therefore, we propose a Stackelberg game-based multifactor incentive mechanism for FL (SGMFIFL). First, we design the Top- $K$ cost selection algorithm based on reverse auction, which can reduce the cost of selecting data owners. Next, we devise a multifactor reward function based on reputation, accuracy, and reward rate, the data owners with high reputation and high accuracy will be of more reward. In particular, to ensure that SGMFIFL can provide reliable incentives in IoT, we use blockchain to provide a secure and trusted environment. Finally, we construct a two-stage Stackelberg game model for the task publisher and the data owners and derive an optimal Equilibrium solution for both stages of the whole game. Experiments conducted on two well-known data sets, MNIST and CIFAR10, demonstrate the significant performance of the proposed mechanism.


I. INTRODUCTION
W ITH the proliferation of the Internet of Things (IoT), billions of remote sensors and endpoint appliances are connected to the Internet, generating massive IoT data.In addition, due to the rapid development of smart technologies, various networks, such as wireless sensor networks [1], [2], heterogeneous wireless networks [3], and hierarchical hybrid network (HHN) [4], IoT has shown a gradual trend toward the Internet of Everything.Smart applications powered by IoT data have contributed to the popularity of many IoT services in our lives, such as smart healthcare [5], autonomous driving [6], intelligent transportation [7], and smart cities [8].
Most IoT services benefit from the cloud, however, the cloud stores, computes, and communicates in a centralized fashion.The required IoT data is generated by IoT devices owned by individuals or organizations, who do not know what data is stored, when it is used, and what it is used for.In other words, users lose control of their personal data in the use of IoT services.In addition, to meet the needs of large-scale IoT applications, a limited number of wireless communications may not enable reliable and robust information transfer.In the process, IoT services are exposed to massive cyber attacks, such as advanced persistent threats (APTs), Botnets, etc.
To address the above-mentioned threats to IoT trust, security, and privacy (TSP), some attempts have been made by existing researchers: IoT network intrusion detection systems (NIDSs) are designed to prevent misuse of IoT data, although the false positive rate can be optimized based on the Bloom Filter [9], the high false positive rate and false negative rate of NIDS cannot be addressed at the same time.Homomorphic encryption [10], [11] and secure multiparty computation [12] are often used to secure IoT data privacy, but complex cryptographic operations lead to poor efficiency and significant expense.Thanks to advances in mobile cloud computing (MCC) [13], edge computing [14], mobile robotic systems [15], and long-range adaptive communication [16], which breaks the limits of data latency and communication bandwidth, edge IoT devices are equipped with storage and computing power, and costs for cloud providers have been significantly reduced.Blockchain [17], [18], [19], as a decentralized and traceable distributed database, offers a potential solution for IoT TSP threats, especially when dealing with large-scale heterogeneous IoT data that can be distributed over Fig. 1.Main idea of our proposed SGMFIFL in IoT.We have divided SGMFIFL into a cloud layer, an incentive layer, a data layer, and an application layer.Our work is mainly focused on the incentive layer.communication networks for real-time processing.Recently, researchers have put forward the concept of Smart blockchain by combining traditional blockchain with artificial intelligence technologies, such as data mining, machine learning, and deep learning.Presently, the top smart blockchain projects include SingularityNET, DeepBrain Chain, and Matrix AI.Smart blockchain extends the functionality of traditional blockchain and make great efforts to address IoT TSP threats, such as cyber-physical-social systems (CPSSs) [20] and crowdsourcing/crowdsensing.More seriously, however, due to the restrictions of data protection regulations, such as General Data Protection Regulation (GDPR) [21], California Privacy Rights Act (CPRA) [22], the development of IoT smart services encounters great challenges.
To overcome such challenges, federated learning (FL) [23], proposed by Google, and a detailed explanation of the definition, classification, and architecture of FL by [24].As a decentralized and collaborative learning framework for IoT that does not upload raw data [25], but only model parameters, FL not only greatly weakens the TSP threat, but also complies with the rigorous GDPR regulatory guidelines, enabling smarter and more complex IoT tasks, such as mobile crowd sensing [26].
Despite its enormous advantages in enabling decentralized and collaborative learning for the IoT, FL still faces some fatal challenges.On the one hand, data owners (e.g., individuals or organizations with IoT devices) consume a large number of computational resources for local training.Without enough rewards, self-interested data owners will not contribute resources to FL.On the other hand, existing research [27], [28], demonstrates that publicly shared model updates and gradients can still compromise privacy.Data owners are at risk of privacy breaches through involvement with FL.
Unfortunately, the previous incentive mechanism excessively used model accuracy as the contribution metric, and rarely considered the reputation of the data owner at the same time.Unreliable or random behaviors caused by untrustworthiness have a serious impact on the performance of the FL model.In addition, the distribution of rewards by the FL task publisher, which we call the reward rate, can determine the final reward for the data owner.The reward rate is only determined by the task publisher, so we realize that we can not only consider the influence of one side on the incentive mechanism.Therefore, a multifactor incentive mechanism is more reasonable and preferable in the practical application of FL.The incentive process is not unilateral, one-time, and simple, but two-sided, multiple, and complex.Therefore, we need an effective tool to make the incentive mechanism process easier to operate and understand.
To solve the above problems, we propose a Stackelberg game-based multifactor incentive mechanism for FL (SGMFIFL), the main idea is shown in Fig. 1: Since the task publisher does not know the qualification of the data owners in advance, which leads to more costs when motivating the data owners, we introduce reverse auction to reduce the costs.On this basis, we consider the impact of the behavior of task publisher and data owners on the incentive mechanism, devise a multifactor reward function, and design an incentive mechanism based on the two-stage Stackelberg game (TSG) model.In addition, to provide reliable incentives in IoT, we use blockchain to ensure a secure and trusted environment.
Our main contributions can be listed as follows.Section II reviews the literature relating to the design of incentive mechanisms.In Section III, we introduce the basics of FL and the system model.Section IV presents technical details of the proposed SGMFIFL.In Section V, we present the performance evaluation.Conclusions in this article are drawn in Section VI.

II. RELATED WORK
In this section, we review the related literature on incentive mechanism methodology and contribution measurement, to link our research with existing research.

A. Incentive Mechanism Methodology
Early incentives mainly used Equal Incentive [39], where users were viewed as equally important, so all users were rewarded equally.This paradigm proved vulnerable to freeriding users in [40].Considering the difference among users, Union Incentive was proposed in [41], i.e., users were incentivized based on the marginal utility when they enter the Union, but the reward share was related to the sequence in which they join.Hence, Wang and Tseng [42] reported the Shapley Incentive, which eliminated the effect of user joining sequence by traversing the portfolio of all users, and rewards users based on average marginal utility.
The additional cost in motivating users is caused by the priori knowledge of the available computational resources and the quantity of trainable data.To address this issue, [31], [33], [37] reduced the impact of additional cost based on Auction theory.Considering the user's expectation of high incentive willingness, [38], [43] based on Contract theory specifies the resources the user needs to contribute and the rewards they deserve.In this case, the user can choose the appropriate contract item to maximize their utilities.Most of the above approaches designed incentives in a static environment, nevertheless, the actual environment was often dynamic.To determine the best strategy in a dynamic environment and thus optimize the utility of both parties, DRL-based incentives [30], [31], [44] have been widely proposed.However, DRL-based has a common drawback of consuming large amounts of training time and computational resources.Therefore, a lightweight and efficient incentive mechanism is more practical in collaborative learning scenarios for the IoT.
Game theory was regarded as a useful vehicle for studying FL incentive mechanisms, which primarily used noncooperative game [45] and Stackelberg game [30], [32], [35].In the noncooperative game [45], users were concerned with maximizing their own utility and there was no cooperation or agreement between users, whereas the Stackelberg game maximized the utility of both parties.For instance, Xiao et al. [35] translated the design of the incentive mechanism into a Stackelberg game and the game reached a Nash Equilibrium.
Nevertheless, these Stackelberg Game-based incentive mechanisms took little or no account of the reputation of the users.Noncredible users lead to behaviors that can have a serious impact on the performance of the FL model.The incentive mechanism design in the IoT needs to be carried out in a trusted environment, so the reputation of participants is particularly important.A comparison of the advantages and disadvantages of the above methods in the incentive mechanisms is shown in Table I.

B. Contribution Measurement
The general practice of the incentive mechanism is to allocate rewards according to the user's contribution to FL.The more rewards an individual wants to obtain, the more data it needs to train.Therefore, the amount of the data [29], [30], [31], was regularly used as an indicator of the contribution of incentives.More training data means longer training time and the contribution of individuals always depends on the training time [34].Longer training time also leads to higher computational and communication costs [31], [35].Furthermore, another simple and intuitive way to measure individual contributions is based on the local model accuracy, which can be viewed as a concave function of the training data.Due to the openness of FL and the heterogeneity among devices, users may upload unreliable or even malicious model updates, which negatively affects the FL global model.Therefore, researchers used reputation [36], [37], [38], as a measure of individual contribution to keep FL stable and reliable.However, as users are exposed to the risk of privacy breaches, [32], [33] introduced differential privacy and its variants [46], [47] into the incentive design, using the spendable privacy budget as a new indicator of contribution.
However, the above methods are overly based on model accuracy as a contribution metric and rarely consider the reputation of the data owner at the same time.In addition, the FL task publisher can determine the final reward for the data owner through the reward rate.Different from the above work, we comprehensively consider multifactor contribution metrics for reputation, model accuracy, and reward rate, to design a more practical and reasonable incentive mechanism.The advantages and disadvantages of the above factors are shown in Table II.

III. PRELIMINARIES
In this section, we introduce the basics of FL and blockchain and design a new model of FL.Furthermore, we explain the details of the model.Table III

A. Basic Federated Learning
Definition 1 (FL Objective [36]): Given a data owner i, who has training data set D i with input x and corresponding label y, the global loss function is L, and the weight of i is w = (n i /[ N j=1 n j ]), we have the goal of FL min where x), y), loss as loss function to measure the difference between the predicted and true values, and a smaller loss means better performance of the FL trained model.
The gradient G i generated by the local training of the data owner i can be expressed as follows: where T is the number of local iterations and ∂ is the partial derivative.The global gradient obtained by the task publisher after aggregating the gradient G i sent by all data owners is as follows: With a given learning rate α, the global model parameter update in round t can be expressed as follows:

B. Blockchain
Blockchain is a chain structure formed by connecting different blocks in chronological order.It has the characteristics of traceability and anti-tampering and can provide a trusted execution environment.Each block is composed of the current block hash, parent block hash, Merkle tree, timestamp, and a random number.The blockchain realizes the consensus of de-trust through mechanisms, such as PoW and PoS and guarantees the security of data transmission and storage based on cryptography.This secure and tamper-proof storage can effectively ensure the reliability and traceability of IoT data.
Especially, the distributed and collaborative learning paradigm in the IoT, that is, the FL we study, is very suitable for using the above characteristics of blockchain to provide a safe and reliable incentive method.

C. System Model
We assume that the FL system model contains one task publisher and N data owners, where the task publisher is acted by a server residing in the cloud.In round t of model training, Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
The task publisher sends the initial global model M 0 and reward strategy to the data owner, who uses the respective data sets D i to train the model, and sends the gradient G i to the task publisher.Task publisher proceeds attack detection on the received gradient, and calculates the reputation of the data owner according to the detection score, before aggregating and publishing a new global model M i .The above process iterates till the accuracy target of the global model is reached.The task publisher evaluates the contribution of the data owner based on reputation, model accuracy, and reward rate.After providing rewards, the task publisher updates the reputation to the reputation blockchain.The system model with blockchain shown in Fig. 2 includes the following steps.
Step 1 (Post Task Information): The task publisher posts the FL task information to the data owner, which includes the reward strategy, the required computing power, and the amount of data.
Step 2 (Report Cost): After receiving the task information, the data owner reports the cost required to complete the task, and the task publisher selects the data owner with the smallest Top-K cost to participate in FL.
Step 3 (Local Training): The data owner is given the global model for model training and uploads the gradients.
Step 4 (Attack Detection): After the data owner sends the gradient, the task publisher filters out malicious attackers by calculating the gradient distance through cosine similarity.
Step 5 (Compute Reputation): In the absence of blockchain, the task publisher has to recalculate the reputation every time, which may result in a lot of meaningless calculations and a waste of resources.The task publisher calculates the reputation of data owners based on historical behavior and current state.Among them, the reputation generated by historical behavior is stored by the reputation blockchain with traceability.
Step 6 (Assess Contribution): Task publisher evaluates the contributions of data owners and rewards them, and stores the updated result in the reputation blockchain.
Step 7 (Update Reputation): The task publisher updates the reputation value of the data owner in this task and stores the updated result in the reputation blockchain.As the reputation blockchain is safe and reliable, it can also reduce malicious data owner attacks to a certain extent.
1) Report Cost: Owing to the problem of information asymmetry [38], [48] between task publisher and data owners: the task publisher does not know the qualifications of the data owners, such as available computing resources, data size, and data quality.Resulting in more cost for the task publisher in designing incentive mechanisms.To solve the problem of information asymmetry in FL, we design a reverse auctionbased [31] reporting cost module from an economic point of view.Unlike double auction [49] where both buyers and sellers submit prices and quantities to bid, in a reverse auction, multiple sellers submit asking prices to compete for selling items to a single buyer.In this process, buyers can quickly grasp the qualifications of sellers, while sellers conduct reverse auctions according to the rule of lower bidders, which also ensures fair competition.In FL, a buyer can be a task publisher, data owners act as sellers.The data owner actively reports the cost of completing the task, in this process, the data owner has real cost C i and reporting cost B i .
Definition 2 (The Cost of Data Owner): For the arbitrary round t ∈ T, the cost of data owner i can be expressed as follows: Among them, a i is related to the computing power of the data owner, which means the average cost for improving accuracy from Acc t i to Acc t−1 .For simplicity, we assume that C i is a linear function.The smaller a i means smaller C i , and vice versa, C i is larger.Assume that a rational data owner will report the real cost, the details will be explained in Theorem 4. The task publisher can select the data owners with the smallest Top-K cost according to Algorithm 1 for model training.
Theorem 1: The time complexity of selecting the data owners with the smallest Top-K cost is O(NlogK).
Proof : The traditional global sorting algorithm selects the top K data owners with low cost as follows: first global sorting, then cost sorting in the ascending order, and finally outputting the top K data owners after sorting.The time complexity is 2) Attack Detection: Due to the nature of FL parameter averaging, a malicious attacker adding small random noise to the gradient does not affect the convergence of the global model.In this case, malicious attacker must upload anomalous gradients different from the normal gradient to corrupt the global model, i.e., the task publisher can identify the anomalous gradient by calculating the similarity of the gradients, thereby filtering the malicious attacker.
Suppose the baseline gradient generated by the honest data owner is G = G = ∇L t (θ ), G i , i ∈ {1, 2, 3, . . ., n} is all possible gradients uploaded by the data owner.The task publisher uses the cosine similarity to measure the distance between G i and G and generates a detection score S i for each gradient uploaded by the data owner The final detection score S of the task publisher can be expressed as follows: 3) Compute Reputation: The cost can only measure the computing power of the data owners, not reflect its trustworthiness.In addition, the computing power only affects the global model's convergence speed, but the trustworthiness of the data owner will directly affect the global model performance.
Inspired by [38], we take reputation as an indicator to measure the trustworthiness of the data owner, and compute reputation based on a subjective logistic model.The subjective logic model consists of fact space R s and conceptual space B s .R s = {pos, neg}, where pos denotes a positive event and neg indicates a negative event.B s can be represented by the triple T. T is not only represented by trust T t and distrust T d , but also introduces uncertainty T u , so as to describe subjective trust more accurately.Therefore, T = (T t , T d , T u ), we use T to describe the trust relationship between data owners and have Let m, n, and p be the predetermined weight parameters, for a detailed definition, please refer to [36], and the reputation of the data owner in time interval can be expressed as follows: In the reputation computing process, considering the different reputations of historical behaviors and current events, we introduce freshness f in the subjective logic model, where historical behaviors can be assigned lower weights and current events are assigned higher weights.
Definition 3 (The Reputation of Data Owner): Given the reputation R t−1 i of the data owner i in round t − 1, the reputation r t i of the current event in round t.The reputation of the data owner i can be re-expressed as follows:

IV. SGMFIFL MECHANISM
In this section, inspired by [35], we design SGMFIFL, a multifactor incentive mechanism based on the Stackelberg game, and analyze the two stages of the game to solve the optimal solution.

A. Multifactor Reward Function Definition 4 (Multifactor Reward Function):
For the arbitrary round t ∈ T, given the reputation R t i , the accuracy Acc t i , and the reward rate γ t , Multifactor Reward Function I t i can be expressed as follows: I t i is a function of Acc t i , while R t i can reflect the credibility of the data owner i, which depends on the behavior of the data owners throughout the model training, reflects the reliability and stability of the model, and has been determined before the task publisher rewards the data owners.Acc t i can reflect the model accuracy contributed by the data owner i.In addition, the task publisher can control the reward to the data owners through γ t .Therefore, both parties can affect the final multifactor reward function value I t i .Theorem 2: The task publisher fairly rewards data owners based on reputation and model accuracy.
Proof : To analyze the fairness for data owners, we obtain the partial derivatives for R t i and Acc t i , respectively, according to (12), such as Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
) > 0. Task publishers' rewards to data owners are positively correlated with reputation and model accuracy, when Acc t i > Acc t j , I t i > I t j , and similarly when R t i > R t j , I t i > I t j .We quantify the fairness by finding the correlation coefficient using correlation diagram [50], and we find the fairness coefficient of R t i and I t i equals 1.Interestingly, the fairness coefficient between Acc t i and I t i also equals 1.More details can be found in [36].

B. Utility Function of Both Sides of the Game Definition 5 (Utility Function of the Data Owner):
For the arbitrary round t ∈ T, the utility of the data owner i is equal to the reward received from the task publisher minus the cost Acc t −i is the local accuracy vector Acc t 1 , Acc t 2 , . . ., Acc t N expect Acc t i .Next, by substituting ( 6) and ( 12) into ( 13), we can obtain The direct benefit to the task publisher comes from the global model M that ultimately obtains, which can be measured by the incremental model accuracy V(Acc t , Acc t−1 ).According to the law of diminishing marginal utility, it can be expressed as follows: where V(Acc t−1 ) = λ ln(1 + Acc t−1 ), λ is used as a system parameter and related to the global model M. Definition 6 (Utility Function of the Task Publisher): For the arbitrary round t ∈ T, the utility of the task publisher is equal to the model accuracy increment minus the reward to all data owners

C. Two-Stage Stackelberg Game
Problem: Due to both sides intend to maximize their utility, a Nash Equilibrium problem is transformed into the following maximizing utility problem: As the task publisher can control the total reward through the reward rate γ t , and each data owner can affect the final reward received by controlling Acc t i generated in each round.Thus, the two utilities affect each other, but the task publisher still dominates the overall reward strategy.Hence, we treat the problem of maximizing the utility function of ( 17) as a TSG, where the task publisher is considered leader and all data owners act as follower.
Definition 7 (Optimal Equilibrium): For the arbitrary round t ∈ T, the optimal equilibrium solution of TSG is The primary goal of SGMFIFL is to determine the optimal training strategy (Acc t i ) * for each data owner, thus ensuring that the data owner obtains maximum utility.The determination of (Acc t i ) * involves two stages of the game. 1) In stage 1, the task publisher announces I t i and the data owners determine the optimal training strategy (Acc t i ) * under the condition that they know I t i .2) In stage 2, all data owners try to determine a training strategy that is greater than (Acc t i ) * and thus receive more rewards from the task publisher, a process that will form a noncooperative game.Therefore, for stage 2 noncooperative game, we need to ensure that the optimal training strategy (Acc t i ) * generated by all data owners in TSG stage 1 can also form stage 2 Nash Equilibrium, i.e., no data owner can gain additional profit by unilaterally changing (Acc t i ) * .This means that (Acc t i ) * is the optimal solution for the entire TSG.
Taking the partial derivative of ( 14), we obtain where We rewrite Acc t i in (14) with the corresponding (Acc t i ) * , and we have  2 , i.e., no data owner can obtain more additional incentives by changing the training strategy unilaterally.
The task publisher as leader can determine the optimal reward rate (γ t ) * for stage 1 Stackelberg Game after observing the optimal training strategy (Acc t i ) * for stage 2 noncooperative Game.
Rewrite (16) by replacing Acc t i with (Acc t i ) * , and we obtain Next, calculating the first-order partial derivative of U s (γ t , (Acc t i ) * , (Acc t −i ) * ) with respect to γ t , we have Then, we derive the second-order partial derivative for γ t As U s (γ t , (Acc t i ) * , (Acc t −i ) * ) is a strictly concave function about γ t according to (24), there exists a unique maximum value Continuing with the substitution of the solution into (23), Theorem 4: The optimal solution for all data owners reporting cost is (C t i ) * = a i ((Acc t i ) * − Acc t−1 ).Proof: As shown by (14), when the incentives I t i remain unchanged, the data owner expects to obtain greater utility at a cost less than the true cost.Assuming B i < C i , its corresponding utility be U c (γ t , Acc t i , Acc t −i ), we can produce

V. PERFORMANCE EVALUATION
In this section, we conduct extensive experiments to verify the effectiveness of SGMFIFL.We first illustrate the experimental setup, such as data sets, baselines, and evaluation metrics.Then, we show experimental results for SGMFIFL.

A. Experiment Setup
In our experiments, we conduct experiments on an Intel Xeon Gold 5220 CPU @ 2.20 GHz 125G RAM server and use PyTorch as the underlying ML training library based on the Python3 programming language.We follow the parameter settings of [35] and the main parameter settings are shown in Table IV.
Data Sets: We conduct experiments using the publicly available MNIST [51] and CIFAR10 [52] data sets.The MNIST belongs to the 0-9 number classification problem and contains 60 000 training samples and 10 000 test samples.The samples are 28×28 grayscale images.CIFAR10 contains ten categories of 32×32 color images with a total of 50 000 training samples and 10 000 test samples.For convenience, we will use CIFAR later.To evaluate our scheme in a more realistic sample distribution imbalance setting, furthermore, we also consider the non-IID data set by following the configuration of [23].By default, the data is distributed across 100 data owners.Each data owner holds two classes of random samples, each with 100 images.
Baselines: We compare our proposed SGMFIFL with two existing state-of-the-art methods.
2) The rewards earned by data owners in FIFL [36] are proportional to their reputation and contributions (i.e., the similarity between local gradient and global gradient).Main Evaluation Metrics: 1) Time Metric: We use the time to evaluate the effectiveness of the Top-K.A shorter time means better performance from Top-K.2) Utility Metric: We use the ultimate utility of the task publisher and data owners to evaluate the effectiveness of SGMFIFL.A greater utility indicates a better performance of SGMFIFL.

B. Experimental Results
First, we verify the effectiveness and convergence of the Top-K algorithm, especially the variation of the cost for Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
different values of K.Then, we evaluate the impact of various game strategies on the utility of both parties to verify the existence of a unique Stackelberg Equilibrium.Next, we measure the impact of parameters, such as N, λ, a i , and R t i on the utility.Finally, we compare SGMFIFL with other incentive mechanisms.
Effectiveness of the Top-K Under Time Metric: To verify the effectiveness of Top-K, we considered the costs of four models, namely, MNIST_CNN, MNIST_MLP, CIFAR_CNN, and CIFAR_MLP, in the IID and non-IID settings, respectively.As shown in Table IV, we compare the cost with GlobalSort and RandomChoice under different models, when K is taken as 5, 10, and 20, respectively.For the sake of illustration, we take K = 5 as an example.Under the MNIST_CNN model, the maximum reduced cost of Top-K is 6.38% and the average reduced cost is 5.76% compared to GlobalSort and RandomChoice.When using the CIFAR10 data set, the maximum cost of Top-K reduction under the CIFAR_CNN model is 6.85% and the average cost of reduction is 6.38%.The maximum cost of Top-K reduction under the CIFAR_MLP model is 9.1% and the average cost of reduction is 8.28%.
In the non-IID setting, we also discuss the effectiveness of Top-K with K = 5 as an example.Under the MNIST_CNN model, Top-K reduces the maximum cost by 5.62% and the average cost by 4.16% compared to GlobalSort and RandomChoice.With the MNIST_MLP model, the maximum cost of Top-K reduction is 2.73% and the average cost of reduction is 1.61%.Similarly, when using the CIFAR10 data set, the maximum cost of Top-K reduction under the CIFAR_CNN model of 7.45%, and the average cost of reduction of 6.69%.The maximum cost of Top-K reduction under the CIFAR_MLP model is 25.35% and the average cost of reduction is 13.37%.
For K taken as 10 and 20, the change in the cost of Top-K is similar to the case of K = 5.It is of notice that our Top-K is better in the IID setting than in non-IID, in particular, CIFAR_CNN and CIFAR_MLP do not perform very well for K = 20 in the non-IID setting.
Convergence Result When Using Top-K: We verify the convergence of the Top-K algorithm on SGMFIFL using K = 20 as an example.As shown in Fig. 3, with the IID setting, the final training loss for MNIST_MLP is 0.053, while CIFAR_MLP is 1.075.Similarly, the final training loss for MNIST_CNN is 0.101, while CIFAR_CNN is 0.336.As shown in Fig. 4, in the non-IID setting, the final training loss of MNIST_MLP is 0.024, while CIFAR_MLP is 1.071.Similarly, the final training loss of MNIST_CNN is 0.034, while CIFAR_CNN is 0.353.The results show that SGMFIFL with the Top-K algorithm converges well, and the convergence results on the MNIST data set are better than those on the CIFAR10.
Task Publisher Utility: We evaluate the effect of the reward rate on the task publisher's utility in Fig. 5. Since the task publisher's utility function is strictly concave with respect to the reward rate γ t , there exists a unique reward rate (γ ) * = 0.85, which achieves the maximum utility U s = 1.7675.We can observe that U s continues to grow at γ t < 0.85, and after γ t > 0.85, U s begins to decrease.Data Owner Utility: The effect on the utility of the data owners is reflected in Fig. 6.Note that U c is strictly concave about Acc t i , in this case, all data owners can reach a Nash Equilibrium, i.e., there is a unique optimal training strategy for each data owner to obtain their maximum rewards.We   observe that as Acc t i increases, the utility of the data owner initially increases rapidly and then stabilizes.In addition, a smaller a i means less cost for the data owner, corresponding to a greater utility obtained, such as a 1 = 3.7 < a 20 = 5.12, U 1 > U 20 .
Impact of the System Parameter λ: We observe the effect of λ on changes in both utility and strategy in Fig. 7.According to (15), the utility of the task publisher increases with the increase of λ, so the task publisher is more willing to increase the reward to motivate the data owner to train the model.Similarly, with the increase of λ, the reward rate also corresponds as it increases, the data owner will also train the model better and expect more rewards.
Impact of the Reputation R t i : To illustrate the impact of reputation on the utility of both parties, let the reputation values be [0.2,0.4, 0.6, 0.8, 1], respectively.In Fig. 8, we find that the utility of task publisher with different reputation values  has the same trend, especially the reward rate shifts forward with increasing reputation values, for example, at a reputation value of 1, the reward rate is 0.85 while at a reputation value of 0.6, the reward rate is 1.4.In this case, the task publisher prefers to select the data owner with a high reputation.At the same time, a higher reputation value means that the data owner receives more incentives from the task publisher.
Comparison of Different Incentive Mechanisms: In Fig. 9, we see that SGMFIFL has no loss in accuracy when using the Top-K algorithm compared to FIFL and TSG, and the accuracy of the three is close.As illustrated in Figs. 10 and 11, we investigate the effects of λ and N under various incentive mechanisms, such as TSG and FIFL.In Fig. 10, we observe the same trend of variation for the three incentive mechanisms, with SGMFIFL outperforming the baseline.Compared with TSG and FIFL, SGMFIFL increased the utility of task publisher by an average of 10.83% and 43.36%, respectively.In Fig. 11, we discuss the data owner utility under three methods.We can see that FIFL outperforms SGMFIFL and TSG when N < 40.After N > 40, SGMFIFL and TSG first grow rapidly and then plateau.Nevertheless, compared to FIFL, SGMFIFL increased the utility of data owners by an average of 26.82%, and compared to TSG, SGMFIFL increased the utility of data owners by an average of 13.94%, consistently higher than TSG.The reason is that SGMFIFL consider the reputation of the data owner and is designed with a reputation module in the system model to avoid the malicious behavior of unreliable data owners.In contrast to SGMFIFL and TSG, FIFL does not consider the factor that the task publisher can change the reward that the data owner eventually receives, so even if the N continues to increase, FIFL is always less than SGMFIFL.

VI. CONCLUSION
In this article, we proposed SGMFIFL, a decentralized and collaborative multifactor incentive mechanism based on TSG for IoT.We designed a Top-K cost selection algorithm based on reverse auction, which reduced the cost by up to 12.28% and 25.61% under MNIST and CIFAR10 data sets.Then, we devised a multifactor reward function by reputation, model accuracy, and reward rate, and analyzed the utility of task publisher and data owners based on TSG.We derived the optimal solution of the TSG such that the utility of the task publisher and the data owner was optimal, i.e., stage 1 reached a Stackelberg Equilibrium, and Nash Equilibrium in stage 2. SGMFIFL increased the utility of task publisher and data owners by an average of up to 43.36% and 26.82%, respectively.To ensure that SGMFIFL can provide reliable incentives in IoT, we used blockchain to provide a secure and trusted environment.The experimental results have shown that SGMFIFL significantly reduced the cost and improved the utility, providing an effective solution for an efficient and fair FL incentive mechanism.
In future work, we will study the implementation effect of SGMFIFL in more data sets under different models.

Fig. 2 .
Fig. 2.System Model of FL based on reputation blockchain.The model includes two entities, task publisher and data owner.The reputation blockchain, as a distributed database stored in the system model, provides a reliable platform for the entire FL task.

Fig. 3 .
Fig. 3. Convergence of our Top-K in the IID setting.

Fig. 4 .
Fig. 4. Convergence of our Top-K in the non-IID setting.

Fig. 5 .
Fig. 5. U s versus γ t .Utility of the task publisher when varying the value of the reward rate.

Fig. 6 .
Fig. 6.U c versus Acc t i .Utility of the data owner when varying the value of the accuracy.

Fig. 7 .
Fig. 7. U s & U c versus λ.Utility of the task publisher and data owner when varying the value of the λ.

Fig. 8 .
Fig. 8. U s versus R t i .Utility of the task publisher when varying the value of the reputation.

Fig. 9 .
Fig. 9. Accuracy of the global model when varying the value of the epoch under various incentive mechanisms.

Fig. 10 .
Fig. 10.U s versus λ .Utility of the task publisher when varying the value of the system parameter under various incentive mechanisms.

Fig. 11 .
Fig. 11.U c versus N. Utility of the data owner when varying the number of data owner under various incentive mechanisms.

TABLE I COMPARISON
OF THE ADVANTAGES AND DISADVANTAGES OF DIFFERENT METHODS IN INCENTIVE MECHANISMS The main concept of the Top-K algorithm designed as follows: first, we insert the top K elements into the Top_heap, then read C t i in turn as elements to be inserted, and compare them with Top.If C t i > Top, ignore it.Otherwise, replace Top and adjust Top_heap.The above process is iterated till only the top K cost minimum values are stored in Top_heap, and the time complexity is Acc t i , Acc t −i .It means that the utility obtained by the data owner with the reported cost of B i is still less than the utility obtained with the real cost of C t i , i.e., any data owner can not increase the utility by falsely reporting the cost.Therefore, rational data owners will choose to report the actual cost C t i to obtain greater utility, i.e., (C t i ) * = a i ((Acc t i ) * − Acc t−1 ).

TABLE V EFFECTIVENESS
OF THE TOP-K