Energy-delay tradeoff for virtual machine placement in virtualized multi-access edge computing: a two-sided matching approach

By decoupling network functions from the underlying physical machines (PMs) at the edge of the networks, the virtualized multi-access edge computing (MEC) enables deployment of new network services and elastic network scaling to reduce the maintenance costs in a more flexible, scalable and cost-effective manner. Although there are appealing performance gains to be achieved, the placement of virtual machines (VMs) on top of the sharing PMs to support computation-intensive applications for smart mobile devices becomes a major challenge, especially for an increasing network scale. In this paper, we attempt to deal with the VM placement problem in virtualized MEC system, which is targeted for finding a performance balance between energy consumption and computing/offloading delay. To capture such a tradeoff for VM placement, we formulate a weighted sum based cost minimization problem as a pure 0–1 integer linear programming problem, which is NP-complete and very difficult to solve with lower complexity. Based on the one-to-one mapping relation constraint, the VM placement problem is then converted into a many-to-many two-sided matching problem between the VM instances and the PMs. Motivated by the student project allocation problem, we develop an extended two-sided matching algorithm with lower computational complexity for solving the matching problem. Simulation results are presented to show the effectiveness of our proposed algorithm, and the normalization factor is of great significance to obtain the lower total cost.


Introduction
Nowadays, along with the technological evolution of the fifth generation (5G)-based cellular networks and the Internet of Things (IoT), the growing popularity of smart mobile devices (SMDs) has been driving the rapid development of various attractive mobile applications and multimedia services, such as extended reality, holographic telepresence, cloud gaming, autonomous driving, eHealth, etc. (Zhang et al. 2019b(Zhang et al. , 2020bChettri and Bera 2020). These newly emerging applications and services heavily rely on high-speed data rates and low-latency transmission, which bring a big challenge to the computation capabilities, storage resources, and energy consumptions of the SMDs. Therefore, it is a very challenging task to run the computationintensive applications on the resource-hungry SMDs while meeting strict delay requirements. The tension between the low capabilities of SMDs and the high requirements of computation-intensive and delay-sensitive applications becomes the bottleneck for providing satisfactory user experience, and thereby may defer the advent of new business opportunities for prospective applications in future sixth generation (6G) wireless systems of the 2030s (Mao et al. 2016;Strinati et al. 2019;Zhang et al. 2020a).
To overcome this challenge, multi-access edge computing (MEC) has recently emerged as a new computing paradigm to provide cloud-computing capability within the radio sharing paradigm during the virtualization process not only enables the agility of network configuration by dynamically responding to new network services at acceptable maintenance costs, but also facilitates the opportunity to realize scalable, elastic, and programmable next-generation wireless networks (Hawilo et al. 2019;Kulkarni et al. 2020). For the NFV-enabled MEC architecture, virtualized network functions (VNFs) in terms of virtual resources (e.g., computing, memory, and storage resources) can be instantiated at the network edge, e.g., the MDC integrated at the 5G NR gNodeB, which is described as virtualized MDC (vMDC) acting as a service provider for the virtualized edge computing. Particularly, as a virtualization infrastructure in virtualized MEC system, the resourceful vMDC is capable of creating multiple VMs that are simultaneously running on the underlying computation-enabled PMs in a flexible and cost-effective manner. With the virtualized MEC, the SMDs can purchase these virtualized resources in the form of customized VM instances as services offered by the vMDC for executing the offloaded computation workload.
From the perspective of service provider for edge computing, the costs of operation and maintenance primarily arise from the total energy consumption of dedicated hardware devices, i.e., PMs, hosted by the vMDC in the virtualized MEC Dai et al. 2016). To save the costs, a natural idea is to allow several VM instances to concurrently run on top of a single PM using the virtualization software platform. With the increasing scale of the virtualized MEC, there will be more energy consumption and more associated costs due to the impact of a large number of active PMs. Therefore, it is imperative to optimize the placement of plenty of VM instances over the underlying hardware devices, i.e., the efficient mapping of VM instances on top of the suitable PMs, aiming to increase the utilization of physical resources and reduce the total energy consumption for the system level. Under the scenario of the virtualized MEC, it is of paramount importance Fig. 1 An illustration of a virtualized MEC scenario consisting of K SMDs and one 5G NR gNodeB equipped with a vMDC to provide solutions that effectively achieve the optimal VM placement on top of the PMs and accordingly assign the virtual resources to the SMDs while satisfying the requirement of computation-intensive workload.
To the best of our knowledge, there have been no prior research works studying the performance balance between energy consumption and computing/offloading delay for the VM placement on top of the PMs in the virtualized MEC system. Particularly, the effect of the creation of a single VM instance spanning across multiple different PMs on the VM placement has yet not been thoroughly studied in the existing literature. For bridging the research gap, in this paper, we investigate the problem of VM placement by properly capturing the energy-delay tradeoff during computing and offloading in the virtualized MEC, aiming to minimize the weighted sum based cost function while satisfying the placement constraints.
Anticipating the great challenges in directly solving VM placement problem, we propose a two-sided matching approach for it as outlined in Fig. 2. Our approach converts the weighted sum based cost minimization problem (see Problem Domain in Fig. 2) for VM placement in the virtualized MEC scenario (see Scenario Domain in Fig. 2) into a many-to-many two-sided matching problem between the VM instances and the PMs. Additionally, to find a twosided matching between the VM instances and the PMs, we develop an effective algorithm based on the classic student project allocation (SPA) algorithm with low-complexity (see Algorithm Domain in Fig. 2). The main contributions of this paper can be summarized as follows: • We develop a novel VM placement optimization framework to achieve the energy-delay tradeoff for local computing, computation offloading, and computing at the vMDC by taking into account the more complex mapping relation between the VM instances and the PMs. Our framework is the first time in the literature to identify a completely close coupling interplay between energy consumption and computing/offloading latency form the perspective of both the virtualization infrastructure and the users. • We formulate the problem of VM placement as a pure 0-1 integer linear programming problem, which is NPcomplete and hard to solve directly through the traditional centralized exhaustive method, especially with the increasing scale of the virtualized MEC system. To capture such a tradeoff, the objective of the VM placement is to minimize the weighted sum based cost function for computing and offloading by optimizing the VM instance placement relations on top of the PMs. • To tackle this problem with a reduced time complexity, we transform this VM placement problem as a manyto-many two-sided matching problem between the VM instances and the PMs. For describing the specific utility obtained by a matching agent over others, we define the preference value paradigm of a given VM instance to a given PM by taking into account the placement constraints. Based on that, the stability of matching is further formulated to minimize the weighted sum based cost function for computing and offloading. Moreover, for obtaining a two-sided matching between the VM instances and the PMs, we propose an effective algorithm by using the SPA-(S, P) algorithm with low-complexity to identify the tradeoff between energy consumption and computing/offloading latency.
The rest of this paper is organized as follows. We first introduce the related work in Sect. 2. Section 3 describes the system model, followed by a formulation of the optimization problem. In Sect. 4, we provide the two-sided matching approach to solve this problem and further present an effective algorithm with low complexity to derive the feasible Fig. 2 Outline of our two-sided matching approach for the VM placement problem in virtualized MEC system solution. Simulation results are discussed in Sect. 5. Finally, Sect. 6 concludes this paper.

Related work
Currently, numerous research efforts have dealt with the problem of VM placement for data centers, and most of them focused on the scenario of cloud computing Dai et al. 2016;Rampersaud and Grosu 2017;Guo et al. 2018;Gharehpasha and Masdari 2020;Wei et al. 2020). In Liu et al. (2018), an ant colony system based approach was proposed to achieve the VM placement by effectively minimizing the number of active physical servers to reduce the energy consumption for cloud computing. To capture the balance between saving PM power and guaranteeing VM performance, a power-aware and performanceguaranteed VM placement scheme was developed from the perspective of both cloud providers and users through an ant colony optimization method in . In Dai et al. (2016), two greedy approximation algorithms were presented to intelligently place VMs on top of the underlying physical servers in data center to reduce the total energy consumption while satisfying the tenants' service level agreements. Based on the multi-linear programming, in Rampersaud and Grosu (2017), a shared perception online algorithm was proposed for VM packing. Particularly, cloud providers should provide a large number of users with the on-demand VM instances and package their associated vectors to minimize the number of physical servers aiming to reduce the total operational costs. In Guo et al. (2018), a shadow routing based VM placement algorithm was designed by enabling the VMs to share the CPU/memory resources on the same PM in large data centers within the network cloud scenario. To improve the energy efficiency of cloud data centers, Gharehpasha and Masdari (2020) combined the sine cosine algorithm and ant lion optimizer as the discrete multi-objective and chaotic functions to obtain an optimal VM assignment by minimizing the power consumption for the number of active PMs. In Wei et al. (2020), a joint binpacking heuristic and genetic algorithm was presented to balance the use of multiple physical resources in cloud data centers, aiming to reduce the fragmentation of resources and to maximize the service rate of VM placement. All the above works in Dai et al. 2016;Rampersaud and Grosu 2017;Guo et al. 2018;Gharehpasha and Masdari 2020;Wei et al. 2020), even though they provide valuable insights on the potential of efficient VM placement on top of the physical servers towards the overall performance improvement for cloud data centers, cannot be directly applied to the virtualized MEC scenario by integrating the NFV and the MEC framework.
On the other hand, the combination of the NFV paradigm and the MEC system to improve the overall performance of the virtualized MEC has attracted interest in the research community (Li et al. 2019;Tan et al. 2018;Liang et al. 2019;Chen et al. 2019;Rezvani et al. 2019). In Li et al. (2019), a polynomial time heuristic algorithm was proposed to solve the problem of the VNF placement for the request of service function chain (SFC) in the NFV-enabled edge computing. In Tan et al. (2018), a framework of the full-duplex-based virtual small cell network was developed to achieve both the MEC and the caching optimization for two kinds of heterogeneous services, i.e., high-data-rate service and computationally sensitive service, aiming to save the cost of backhaul resources and to ensure the delay requirements of users. In Liang et al. (2019), an offloading scheduling method was presented to solve the problem of joint radio-and-computation resource allocation for the multi-user virtualized MEC systems with the I/O interference, i.e., the interference between VMs sharing the same physical platform. The proposed method obtained the performance improvement of two system metrics, namely, the sum offloading throughput maximization as well we the sum mobile energy consumption minimization. In Chen et al. (2019), a double deep Q-network based learning algorithm was proposed to solve the stochastic computation offloading problem in the sliced RAN with virtualized MEC system, with the objective of maximizing the long-term utility performance without knowing a priori knowledge of network dynamics. In Rezvani et al. (2019), a two-phase resource allocation framework was proposed for the parallel cooperative joint multi-bitrate video caching and transcoding in the heterogeneous virtualized MEC networks. The aim of this developed framework is to jointly optimize both available physical resources with user association and request scheduling for maximizing the slice revenues. However, the above works only explored the application of NFV in the MEC environment from different perspectives, e.g., VNF placement of SFC (Li et al. 2019) as well as joint computation offloading and multi-dimensional resource allocation (Tan et al. 2018;Liang et al. 2019;Chen et al. 2019;Rezvani et al. 2019), without investigating the VM placement problem over the underlying PMs under the virtualization settings.
Meanwhile, many recent works have been dedicated to the VM placement problem in the virtualized MEC scenario. In Zhang et al. (2019a), an optimization framework of energy consumption minimization was developed for computing and offloading by jointly optimizing the VM placement matrix and the number of PMs via the non-uniform weighted hypergraph model. In Wang et al. (2017), a mathematical model was formulated to minimize the hardware consumption required by the VMs for supporting the given workloads in the virtualized MEC scenario, while meeting the requirements of the heterogeneous latency for different computation-intensive applications. By capturing the impact of users' mobility on the dynamicity of the virtualized MEC system, an effective algorithm was proposed to achieve the flexible selection of communication path together with dynamic VM placement according to the expected users' movement in Plachy et al. (2016). In , a latency aware heuristic placement algorithm was designed to obtain the optimal placement of VM replica copies, aiming at the minimization of the average response time of deploying multiple applications among several MEC servers in the virtualized MEC networks.
The related research in (Zhang et al. 2019a) is heuristic, although the authors only investigated the effect of energy efficiency on the VM placement within the scenario of virtualized MEC. By contrast, we extensively consider a tradeoff to effectively balance the energy consumption and computing/offloading delay for VM placement. This is due to the fact that both the computation delay and energy consumption are critical to the users during computation offloading, especially for obtaining the optimal placement of VM instances on top of the PMs. Furthermore, the most prominent feature of the proposed VM placement schemes for whether cloud data centers Dai et al. 2016;Rampersaud and Grosu 2017;Guo et al. 2018;Gharehpasha and Masdari 2020;Wei et al. 2020) or the virtualized MEC (Wang et al. 2017;Plachy et al. 2016; is that they assume that a single VM instance must be placed on top of one PM based on the conventional virtualization software platform, e.g., the Hypervisor. We wish to remark that the traditional Hypervisor cannot support the creation of a single VM instance that spans different PMs. However, it has been also revealed by the ScaleMP, Inc. that the innovative versatile ScaleMP (vSMP) Serve-rONE is capable of aggregating different underlying PMs into a single high-performance VM. 2 From practical point of view, one single VM instance can be technically placed across multiple different PMs under the virtualized MEC scenario.
Motivated by the above mentioned reasons, the work presented in this paper extends the traditional VM placement constraint in a more practical manner to realize the more complex mapping relation between VM instances and PMs in the vMDC of the virtualized MEC system. On the other hand, as already mentioned, the placement of the VM instances on top of the PMs should incorporate both the computation delay of the offloaded workload from the users and the energy consumption of the PMs. Unfortunately, none of these works have completely identified the tradeoff between energy consumption and computing/offloading delay for the VM placement over the underlying PMs. This research gap motivates us to pursue a solution for the problem of VM placement across multiple different PMs by minimizing the weighted sum based cost function with respect to the total energy consumption as well as the total computing/offloading delay.

System model and problem formulation
In this section, we introduce the system model for the VM placement in virtualized MEC system. We first provide an overview of the system scenario, and then, present the details on the communication model, followed by the computation model via the partial computation offloading rule. Based on the system model, we further discuss the problem formulation for energy-delay tradeoff for the VM placement in detail. The key notations and variables used throughout this paper are summarized in Table 1 for the ease of reference.

System overview
As shown in Fig. 1, we consider an uplink NFV-enabled MEC system scenario, where one 5G NR gNodeB integrated with a vMDC serves K SMDs. The SMDs are randomly distributed within the coverage area of the gNodeB. Denote K = {1, 2, ⋯ , K} as the set of all the SMDs each with a computation workload of computation-intensive and delaysensitive applications to be executed. We assume that each SMD has limited computation capability due to the energyand size-constrained computing resources, which results in a large amount of consumed energy and time to complete the execution of workload. As a consequence, with the aid of the resourceful vMDC in the virtualized MEC, every SMD adopts a partial computation offloading rule. In this way, the computation workload of the SMD can be processed locally or potentially offloaded to the vMDC via the gNodeB using uplink transmission.
We focus on one particular system time frame with duration T. Within duration T, we assume that the location of each SMD remains unchanged. For convenience, the size of computation data including the program codes and input parameters is adopted to characterize the computation workload for every SMD. We then define the computation workload of SMD k as W k , which is described by W k = L k , X k , where L k (in bits) is the total data size of workload for SMD k, and X k (in bits) is the data size of workload offloaded to the vMDC, for 0 < X k < L k and k ∈ K.
With the partial computation offloading rule, every SMD requires a lot of physical resources, including computing, memory, storage, and network resources, to fulfill the application requirements for its computation-intensive workload. 3 We thus assume that the vMDC owns M PMs that are used to provide the physical resources for executing the computation workload offloaded form all the SMDs within duration T. The set of all the PMs in the vMDC is represented by M = {1, 2, ⋯ , M} . For analytical simplicity, we only consider two types of physical resources for every PM, i.e., CPU kernels and memory sizes. Particularly, we define the available physical resources of PM m as an available resource vector m = ( m , m ) , where α m is the number of available CPU kernels and β m is the amount of available memory sizes in GB, for m ∈ M . Let us further devise a resource requirement vector m = (u m , v m ) , where u m and v m refer to the number of required CPU kernels and the amount of required memory sizes in GB, respectively, to maintain the normal operation of PM m when it is idle, for m ∈ M.
To implement the virtualized MEC paradigm, a great number of VMs should be created by the vMDC to achieve the simultaneous operations on top of the underlying PMs by using the virtualization software platform, as shown in Fig. 1. Meanwhile, the SMDs can purchase these customized VM instances offered by the vMDC for computation offloading. To this aim, it is further supposed that there are N VM instances which are concurrently running on top of M PMs in the vMDC within duration T. The set of all the VM instances is denoted as N={1, 2, ⋯ , N} . We assume that each SMD can only purchase limited number of VM instances created by the vMDC, and the total number of VM instances purchased by all the SMDs should be equal to the size of VM instance set N . Specifically, let k be the number of VM instances purchased by SMD k, for k ≪ N . Denote by Δ k = {1, 2, ⋯ , k } as the set of k VM instances Let us then characterize VM instance n as a virtual resource vector n = (c n , s n ) , where c n and s n are the number of CPU kernels and the amount of memory sizes in GB, respectively, for n ∈ N . We further use α n,m and β n,m to stand for the number of CPU kernels and the amount of memory sizes of VM instance n that are placed on top of PM m at the same time, respectively. In order to represent the placement relationship between VM instance n and PM m, we then introduce an integer binary variable b n,m , which can be determined by: Thereby, elements c n and s n in virtual resource vector Г n of VM instance n can be respectively interpreted as:

Communication model
As previously mentioned, the partial computation offloading rule is employed for every SMD due to the powerful capabilities for computing, storage, and network resources of the vMDC integrated at the gNodeB for the virtualized MEC. With such an offloading rule, the considered time frame with duration T to execute the whole computation workload for K SMDs can be divided into three stages, i.e., offloading stage, computing stage, and downloading stage, as depicted in Fig. 3. In order for K SMDs to offload their workload to the gNodeB for computing via uplink transmissions, it will be assumed that computation offloading for K SMDs in the offloading stage is operated on the same channel with bandwidth denoted by B. To avoid the co-channel interference, similar to Wang et al. 2018), we adopt a TDMA-based co-channel access protocol to efficiently schedule the multiple access of K SMDs for computation offloading. As such, in the offloading stage, K SMDs offload their computation workload to the gNodeB one by one within every allocated time slot, as shown in Fig. 3. Let t O k be the time slot allocated to SMD k for offloading its workload to the gNodeB within duration T, for k ∈ K . In the computing stage, the vMDC at the gNodeB performs the edge execution of the offloaded computation workload for K SMDs within the allocated time slot. We denote T C as the time slot allocated to the vMDC for completing the execution of workload form K SMDs. Similar to Bi and Zhang 2018), we can neglect the time consumed at the downloading stage for sending the computation results back to K SMDs due to higher transmit power of the gNodeB and smaller number of bits associated with the computation results.
Based on the time slot allocation as already introduced, we thereby only consider the problem of uplink transmission for computation offloading in the offloading stage where each SMD offloads its workload to the gNodeB. For analytical simplicity, we assume that uplink transmission follows the quasi-static block fading channel model, in which the channel gain remains unchanged within the considered time slot, but varies independently from one to another. Here, the channel gain between SMD k and the gNodeB is denoted by k = g k d 0.5 k , where g k is the small-scale fading channel gain between SMD k and the gNodeB, d k is the distance between SMD k and the gNodeB, and ≥ 2 is the path loss exponent. As in (Yang et al. 2018), the small-scale fading channel gain is subject to the Rayleigh distribution, i.e., g k ∼ CN(0, 1) with CN( 1 , 2 ) which is the complexed Gaussian distribution with mean 1 and variance 2 . Let p k denote the transmit power of SMD k when it offloads its workload to the gNodeB via uplink transmission, with P max k being its maximum allowable value, for p k ≤ P max k . Also, the maximum transmit power P max k (in Watt) of SMD k for uplink transmission can be calculated by: where P 0 is the receiving reference power by the gNodeB at a reference distance d 0 . Based on the TDMA-based co-channel access protocol, the achievable rate (in bps) for uplink Fig. 3 The time slot allocation of the system time frame with duration T transmission from SMD k to the gNodeB can be expressed as: where 2 is the power of additive white Gaussian noise at the gNodeB. To simplify the analysis, we suppose that every SMD has a fixed transmit power for uplink transmission, with an assumption that it is proportional to its maximum transmit power during the offloading phase. Thus, the transmit power of SMD k can be specified by p k = P max k , where ∈ (0, 1] refers to an adjustment factor.

Local computing
Since only X k bits in computation workload W k = {L k , X k } are offloaded to the vMDC integrated at the gNodeB via uplink transmission, the data size of the remaining workload that is required to be computed locally at SMD k can be denoted by L k − X k , for k ∈ K . We assume that each SMD has a given number of CPU kernels as its local computing resources, and each CPU kernel has fixed and equal computing capability, which is measured by CPU cycles per second. Let k and f k represent the number of CPU kernels owned by SMD k and the local computation capability of each CPU kernel for SMD k, respectively. We also denote by ℘ 1 as the number of CPU cycles required to execute one bit of raw data for local computing at the SMDs, which is determined by the feature of computation-intensive and delay-sensitive applications. Note that the number of CPU cycles required to complete one bit of raw data is assumed to be equal for every SMD. As such, the total number of required CPU cycles for SMD k to accomplish the data size of L k − X k bits locally can be written by ℘ 1 (L k − X k ) . Thus, the computation delay for processing the computation workload W k locally can be calculated as: According to (Fan et al. 2007), we model the energy consumption of per CPU cycle at SMD k as k k f k 2 , where k is an effective capacitance coefficient that depends on the chip architecture of CPU at SMD k. Correspondingly, the energy consumption for local computing at SMD k can be obtained by:

Computation offloading
By adopting the TDMA-based co-channel access protocol to coordinate the access of multiple SMDs for computation offloading in the virtualized MEC, every SMD then offloads its computation workload to the gNodeB based on the allocated time slot, as shown in Fig. 3. With X k bits workload offloaded to the vMDC, the total amount of uplink transmitted data for SMD k can be actually given by X k , where is the communication overhead composed of forward error control coding and data encryption during uplink transmission. Thus, based on the achievable rate R k for uplink transmission, the offloading delay of SMD k within the allocated time slot is modeled as: Accordingly, the energy consumption of computation offloading for SMD k is given by:

Computing at the vMDC
For computing at the vMDC, the VM instances purchased by the SMDs execute the total amount of received data offloaded form the SMDs under the scenario of the virtualized MEC. Let us employ to denote the average computing capability of one CPU kernel for every VM instance, which is also measured by CPU cycles per second. Recall that VM instance n owns c n CPU kernels under the virtualized MEC framework as already stated. Thus, the computing capability of VM instance n is specified by c n , for n ∈ N . Then the computing capability of k VM instances purchased by SMD k is equal to ∑ n∈Δ k c n . Let ℘ 2 be the number of CPU cycles required to execute one bit of raw data for computing at the vMDC. Thereby, from the perspective of the vMDC, the computation delay of the workload offloaded from SMD k can be expressed as: As a result, the computation delay of the overall workload from K SMDs at the vMDC can be written by: We focus on the energy consumption in the computing stage in terms of the number of active PMs. According to (13) Fan et al. 2007), as for the underlying PM, both the energy consumption and the CPU utilization is approximately linear. Essentially, the energy consumption of the PM consists of static power consumption and dynamic power consumption. When the PM is idle, the static power consumption is related to the PM's hardware configuration. Meanwhile, the dynamic power consumption of the PM is associated with the CPU utilization rate. To simplify the analysis of the problem, we define the ratio of the number of CPU kernels already assigned to the total number of CPU kernels for PM m as the CPU utilization rate. More specifically, the CPU utilization rate of PM m can be calculated by: Therefore, as in (Fan et al. 2007), by adopting its CPU utilization rate, the power consumption of PM m can be defined as: where P idle m is the static power consumption when PM m is idle and P max m is the maximum power consumption when PM m is fully loaded. Note that the static power consumption of the PM is almost 30% of its maximum power consumption, and the maximum power consumption of the PM is approximately 75% of the nameplate value (Fan et al. 2007). Hence, the energy consumption of M PMs during the computing stage can be given by: where E idle m = P idle m T C is the energy consumption when PM m is idle.

Problem formulation
By extending computation and storage resources to the network edge in close proximity to end users, the virtualized MEC contributes to more energy saving for the SMDs compared with pure local computing, but resulting in the increased offloading delay of the SMDs. Moreover, the placement of the VM instances on top of the PMs has a significant impact on not only the computation delay of the offloaded workload from the SMDs, but also the energy consumption of the PMs in the vMDC integrated at the gNodeB.
Keeping this fact in mind, both the computing delay and energy consumption are critical to the SMDs during computation offloading while finding the optimal placement of VM instances on top of the underlying PMs. As a result, there is a tradeoff between energy consumption and computing/ offloading delay due to their coupling interplay under the considered virtualized MEC framework. Our objective is thus to identify the tradeoff between energy consumption and computing/offloading delay from the perspective of the vMDC and all the SMDs by optimizing the placement relations between the VM instances and the PMs.
To capture such a tradeoff, we employ a weighted sum based cost function, which is calculated by the weighted sum of energy consumption cost and computing/offloading delay cost. More precisely, we define the weighted sum based cost function as follows: where w ∈ [0, 1] is the weighting factor reflecting the tradeoff between energy consumption and computing/offloading delay, T is the normalization factor used to ensure the same range for two devised costs, E total is the total energy consumption cost, and T total is the total computing/offloading delay cost. Since the energy consumption for every SMD including the local computing and computation offloading as well as for active PMs at the vMDC have been obtained through (9), (11), and (16), the total energy consumption cost is given by: Meanwhile, based on the local computing delay, computation offloading delay, and computing delay at the vMDC for every SMD by using (8), (10), and (12), the total computing/offloading delay cost can be expressed by: Under the above setup, we aim to minimize the weighted sum based cost function for computing and offloading within duration T by optimizing the VM instance placement relations on top of the underlying PMs. The optimization problem can be mathematically formulated as: In (20), constraints (21) and (22) represent the placement constraints of VM instances on top of the PMs, i.e., the number of CPU kernels and the amount of memory sizes of the VM instances. Constraint (23) corresponds to the constraint of the binary placement variable, which also dictates whether VM instance is placed on top of the associated PM.

Proposed two-sided matching approach
In this section, we consider the solution to the optimization problem in (20) by identifying an optimal VM placement scheme. Note that the formulated problem in (20) is a pure 0-1 integer linear programming problem due to the binary variable for VM instance placement in objective function (20) as well as two linear constraints in (21) and (22). In general, such a problem is NP-complete and computationally intractable. Especially, with the increasing scale of the virtualized MEC in which N and M become very large, it is extremely complex to solve the problem directly by using the traditional centralized exhaustive method. However, since the optimization problem in (20) contains only one binary variable, it can be thus converted into a matching problem (Manlove 2013). Inspired by (Bayat et al. 2016;Gu et al. 2015), in what follows, we will propose the effective algorithm based on two-sided matching theory (Manlove 2013;Bayat et al. 2016;Gu et al. 2015;Gao et al. 2017) for the sub-optimal solution of the formulated problem.
As noticed previously, such a binary variable reflects the placement of one given VM instance on top of one given PM, i.e., one-to-one mapping relation constraint, under the virtualized MEC scenario. In this way, we wish to remark that this work considers the more complex mapping relation between the VM instances and the PMs by capturing the constraint that one VM can be technically placed across multiple different PMs. Therefore, the many-to-many twosided matching between the VM instances and the PMs is then invoked for obtaining a sub-optimal solution to the formulated problem in (20).

Preference and preference list
The primary goal of the two-sided matching framework is to optimally match two sets of matching agents together, given their individual utilities. As for the classical stable marriage matching model, a man or a woman is called to be a matching agent. For our work, the matching agent refers to the VM instance or the underlying PM. As in (Bayat et al. 2016), the preference or the preference value of an agent over other agents can be characterized by the specific utility value that provides the performance metric of each agent with respect to other agents. To obtain the matching result of the considered VM placement problem, it is of paramount importance to characterize the preference values for both the VM instances and the PMs. For ease of exposition, we then give the definition of the preference value of a given VM instance to a given PM as follows.
Definition 1 (Preference Value): the degree of matching between the remaining resources of PM m and the resources required by VM instance n is defined as the preference value of VM instance n to PM m, for n ∈ N and m ∈ M.
Based on Definition 1, we use Ψ n,m and Q n,m to denote the preference values of the number of CPU kernels and the amount of memory sizes belonging to VM instance n that is placed on top of PM m, respectively. Thereby, the preference values Ψ n,m and Q n,m can be respectively expressed by: where m + u m and m + v m are the total number of CPU kernels of PM m and the amount of memory sizes of PM m, respectively. From (24) and (25), if both the preference values Ψ n,m and Q n,m are less than zero, i.e., negative values, we note that PM m does not have enough physical resources to deploy VM instance n. Additionally, if the preference values of PM m with all the VM instances are negative, we further note that there will be no VM instance that can satisfy the deployment requirements of PM m. Therefore, a newly adopted PM is required to be turned on in an active mode for hosting the associated VM instances.
Accordingly, the preference matrix, denoted by m , between PM m and N VM instances can be specified by: where [Ψ 1,m , Ψ 2,m , ⋯ , Ψ n,m , ⋯ , Ψ N,m ] are the preference values of the number of CPU kernels of VM instance n that is placed on PM m, [Q 1,m , Q 2,m , ⋯ , Q n,m , ⋯ , Q N,m ] are the preference values of the amount of memory sizes of VM instance n that is placed on top of PM m, notation [] T stands for the transpose of a matrix. Since we mainly consider both the number of CPU kernels and the amount of memory sizes as two types of physical resources for every underlying PM, we thus employ q 1 and q 2 to represent the weights of CPU kernels and memory sizes, where q 1 + q 2 = 1 , for 0 ≤ q 1 , q 2 ≤ 1 . As a result, the preference vector m of PM m and N VM instances can be obtained as follows: where n,m represents the preference degree of VM instance n that is placed on top of PM m. From (27), we wish to remark that the smaller the preference degree n,m , the more suitable for VM instance n to be placed on top of PM m. In order to enable both the PMs and the VM instances to obtain the more satisfactory matching objective, we then define the stability paradigm for such kind of matching, which is beneficial to both the underlying PMs and the VM instances. In particular, the formal definition of the stability of matching is provided in Definition 2 as follows.
Definition 2 (Stability): a matching G is said be stable, if there is no blocking pair (BP). A pair (n,m) is defined as a BP, for n ∈ N and m ∈ M , if and only if all of the following conditions are satisfied: (1) n finds m acceptable; (2) either n is unmatched in G , or n prefers m to G(n); (3) either (3.1) m is under subscribed and either of following three conditions is satisfied: (a) G(n) ∈ M , and n prefers m to G(n) ; or. (b) G(n) ∉ M , and n is under-subscribed; or. (c) G(n) ∉ M , and n is full and n prefers m to its current worst partner m worst ; (3.2) m is full and m prefers n to its current worst partner n worst , and either of the following two conditions is satisfied: (a) G(m) ∉ N ; (b) G(m) ∈ N , and m prefers n to G(m); We wish to remark that a stable matching is defined as a complete matching between a set of men and a set of women, wherein no BP exists in the classical stable marriage model (Gale and Shapley 1962). In Definition 2, G(x) represents the partner/matching of player x in matching G . To make it concise, we can easily have G(n) = m ∈ M . As such, the manyto-many matching G can be referred to be stable if matching G admits no BP. To find a stable matching, we need to firstly establish the preference list for the matching agents, i.e., the VM instances and the underlying PMs in the considered virtualized MEC scenario. It is noted that a preference list for each agent can be referred to as an ordered list based on the preferences over the other set of agents who he/she finds acceptable (Gale and Shapley 1962).
Based on the preference value paradigm as mentioned, let us utilize F and S to represent the preference lists for the VM instances and the underlying PMs, respectively. Particularly, we denote F m as the preference list of PM m for N VM instances. Having this in mind, from (24), preference list F m of PM m for N VM instances ranks each VM instance in this acceptable set W m = 1,m , 2,m , ⋯ , n,m , ⋯ , N,m |m ∈ M i n a n ascending order according to the preferences F m ∈ W m . Hence, the preference lists of M PMs can be obtained as On the other hand, when selecting the VM instances to match its physical resources, we should consider the delay problem of the SMDs. Actually, the SMDs want to obtain the faster services and also need the lower delay for these applications. As a result, each SMD is more inclined to run on the underlying PM with the CPU kernel of higher computing capability measured by CPU cycles per second. As has been mentioned before, we employ to represent the average computing capability of one CPU kernel for every VM instance. Actually, the CPU computing capability here is provided by each PM, which refers to its own CPU computing capability. For convenience, we assume that each CPU kernel of each PM has the same computation capability. Therefore, the CPU kernel's computing capability of M PMs can be formulated by set P = { 1 , 2 , ⋯ , m , ⋯ , M } . Meanwhile, the preference list of VM instance n to M PMs is defined as the computation capability of one CPU kernel of the PM, denoted by a set S n . Preference list S n are arranged in descending order according to the CPU computing capability of PMs for each PM in this acceptable set P = { 1 , 2 , ⋯ , m , ⋯ , M } , for S n ∈ P . The preference lists of N VM instances is further denote by set S = S 1 , S 2 , ⋯ , S n , ⋯ , S N .

Algorithm design
Our objective is to obtain the performance balance between energy consumption and computing/offloading delay for the VM placement on top of the underlying PMs in the virtualized MEC system. It is worthy to mention that the PMs needs to match a more appropriate VM instance, while VM instances want to match the PMs with more CPU computing capability. In addition, a PM can host multiple VM instances, and a VM instance can be distributed across multiple different PMs, which is a many-to-many two-sided matching problem. To solve this matching problem and obtain the proper matching result, we refer to the problem of the classic student project allocation (SPA). Note that the SPA problem is a kind of problem that various students are shared to various projects (owned by different lecturers) under the aid of the lecturers. The project wants better students, and the students want to select the project they like. Due to the space limitation, specific detail about the SPA problem is omitted here, and readers can refer to (El-Atta and Moussa 2009) for more detailed description. Based on the SPA problem, we assume that the SMDs, VM instances and the PMs can be referred to as the lectures, the projects and the students, respectively. With the previous preference list as has been established, we can apply the SPA-(S, P) algorithm to find a two-sided matching between the VM instances and the PMs that provide the underlying physical resources. The details of the two-sided matching algorithm via the SPA are presented in Algorithm 1.
We then analyze the computational complexity of the proposed two-sided matching algorithm in Algorithm 1 based on the SPA. The computational complexity of Algorithm 1 mainly resides in the iteration for determination of the most appropriate VM instance from preference list as well as the total number of the PMs for all the iterations. At the first stage for the preferences of PM m to N VM instances, each iteration finds the most appropriate VM instance from preference list F m . For VM instance n that has been matched by PM m, we need to compare and select the most appropriate PM according to preference list S n of VM instance n for all PMs. If VM instance n has been matched, we need to replace the worst PM that had been matched according to preference list S n . Hence, the complexity of this stage in Algorithm 1 can be derived as O (M + N) . At the second stage, the next PM loop will continue until M PMs are fully cycled in Algorithm 1. Obviously, the computational complexity of the second stage is determined by the order of O(M).
In summary, the overall computational complexity of Algorithm 1 for finding a two-sided matching between the VM instances and the PMs can be calculated as O(M(M + N)) = O(M 2 ) . Hence, the proposed two-sided matching algorithm only requires polynomial complexity to find the solution of the optimization problem in (20). We wish to further remark that the complexity of the proposed algorithm polynomially increases with the number of PMs, which also indicates the scalability of the proposed algorithm in NFV-enabled MEC system scenario with the increasing number of the SMDs.

Simulation results and discussions
In this section, we present the simulation results to verify our theoretical analysis and evaluate the performance of our VM placement framework presented in this paper. Since it is extremely difficult to conduct repeatable and large-scale experiments for VM placement on a real 5G virtualized MEC platform, we thereby use a simulator developed in Python 3.7 as our simulation framework for performance evaluation. The simulations are run on a desktop PC with 4.80-GHz Intel Pentium Dual Core i7 CPU and 16 GB memory. All the simulation results are obtained in real time through our simulation framework.
As emphasized previously, the most prominent feature of the existing works associated with the VM placement for whether cloud data centers Dai et al. 2016;Rampersaud and Grosu 2017;Guo et al. 2018;Gharehpasha and Masdari 2020;Wei et al. 2020) or the virtualized MEC (Wang et al. 2017;Plachy et al. 2016; is that they only assume that a single VM instance must be placed on top of one PM via the Hypervisor or other virtualization software platform. However, our proposed VM placement framework extends the traditional VM placement constraint in a more practical manner to realize the more complex mapping relation between VM instances and PMs in the virtualized MEC system through the newly emerging high-performance virtualization platform, e.g., the vSMP ServerONE. Correspondingly, we mainly study the impacts of the number of PMs and the number of VM instances on the performance of the proposed two-sided matching algorithm, and further gain insights into how the various system parameters affect the performance metrics in the simulations. Under the given simulation configurations, the performance of our proposed algorithm is numerically demonstrated and discussed in terms of three mostly associated optimization metrics according to the idea of energy-delay tradeoff in this work, i.e., the total energy consumption, the total delay, and the total cost, respectively.

Simulation settings
Regarding the simulations, we consider an uplink virtualized MEC system scenario, in which one 5G NR gNodeB integrated with a vMDC serves a given number of SMDs. Without loss of generality, we choose the number of the SMDs uniformly from the parameter range of [50,200] with the equal subinterval of 25. All the SMDs are assumed to be randomly distributed within the coverage area of the gNodeB in the real-time simulations. We further assume that each SMD can purchase δ k = 4 VM instances form the resourceful vMDC for computation offloading.
For simplicity, we only consider two types of physical resources for each PM, i.e., CPU kernels and memory sizes. Similar to many works that evaluated their algorithms or schemes by randomly generating VMs and PMs (Zhang et al. 2019a;Liu et al. 2018;, we also choose the same way throughout the simulations. More specifically, we randomly choose the number of VM instances hosted by the vMDC form the parameter range of [200,800]. For convenience, we set the number of PMs randomly according to the range of [50,300] with the equal subinterval of 50 to provide the underlying physical resources. Different from the traditional standards, we exploit the standard VM instance shapes provided by the Oracle cloud infrastructure for the settings of specific resources for all the VM instances, i.e., CPU kernels and memory size. 4 Particularly, we select a portion of the VM instance shapes from the Oracle for each VM instance. That is, the number of CPU kernels c n and the amount of memory sizes S n for each VM instance are randomly selected from the parameter range of Bayat et al. (2016); Rezvani et al. 2019) and the parameter range of [7, 320] GB, respectively. For the PM's operation in the simulations, we assign the number of available CPU kernels α m and the amount of available memory sizes β m of each PM according to the range of [6,60] and the range of [480,2048] GB, respectively. Furthermore, the number of required CPU kernels u m is set to Bi and Zhang (2018); Chettri and Bera 2020;Filippou et al. 2020), and the amount of required memory sizes v m is always randomly allocated within the range of [60, 480] GB, which are required by each PM to maintain its own normal operation.
In our simulations, the channel coefficients were generated following the quasi-static block fading channel model and the rest parameters were configured according to the settings of various works (Zhang et al. 2019a(Zhang et al. ,2020cTan et al. 2018;Bi and Zhang 2018;Wang et al. 2019). All the simulation results obtained in this section are averaged over 1000 independently channel realizations. For clarity, the simulation configurations and detail parameter settings are  Table 2, which did not change in the sequel unless explicitly stated otherwise. Figure 4 shows the impact of the number of PMs on the total energy consumption under the given number of VM instances. As can be observed from this figure, the total energy consumption will markedly decrease with the increasing number of PMs from M = 50 to M = 150. Particularly, when the number of PMs is larger than 150, we can easily find that the total energy consumption tends to be a constant value for the given number of VM instances. This can be explained by the fact that the increase of the number of PMs will obviously enhance the overall computing capacity of the system, which results in the reduction of total energy consumption due to the improved utilization of the underlying dedicate physical resources via our proposed two-sided matching algorithm. As shown in Fig. 4, the total energy consumption gradually tends to be the fixed constant value from 3.3878 × 10 4 to 1.6616 × 10 4 when N = 300. Moreover, higher total energy consumption will be obtained with the growing number of VM instances. This phenomenon can be mainly explained by the fact that the computation tasks need to be assigned with the increasing number of VM instances. The results indicate that the choice of the numbers of the PMs M and the VM instances N has negligible effect on the total energy consumption of the virtualized MEC system. In Fig. 5, we compare the impact of the number of PMs on the total delay under the given number of VM instances. From Fig. 5, it is evident that the total delay will gradually decrease with the growing number of PMs from M = 50 to M = 150. It should be emphasized that the curve trends of the total delay become steady and have no changes while the number of PMs is greater than 150. The reason for this phenomenon is that the larger number of PMs generates much more computing capability for the offloaded workload, which results in the reduction of the total delay, e.g., 11.9989-11.0333 while N = 400. In addition, by adding the number of VM instances, we can observe that the total delay will be in the mode of the continued growth. Clearly, the larger computation workload will lead to the increase of the total delay, e.g., 5.5842-14.0755 when M = 100. Therefore, it is confirmed that the proposed system performance can be improved by flexibly choosing the number of PMs based on the amount of offloaded computation tasks from SMDs to the vMDC.

Results
As depicted in Fig. 6, we demonstrate the comparison of the total cost according to the evolution of the number of PMs from M = 50 to M = 300 under the weighting factor w selected as 0.2, 0.5, and 0.8 and the given number of VM instances. It can be immediately observed that the total cost will obviously decrease as the number of PMs. Significantly, under the different settings of the weighting factor w, we can see that the total cost tends to be the fixed value when the number of PMs increases from M = 150 to M = 300. This is because that the higher number of PMs will lead to the faster calculation speed, resulting in the reduced energy consumption and the reduced delay form the perspective of the system-level. This result may further give rise to the lower total cost. That is, the total cost can decrease from 2.993 × 10 4 to 1.5582 × 10 4 when N = 400 and w = 0.5. Besides, it is worth noting that this observation emphasizes the importance of selecting the proper number of VM instances on the total cost. In addition, we can further find that the total cost under the smaller weighting factor is also obviously lower than that Fig. 4 The impact of the number of PMs on the total energy consumption Fig. 5 The impact of the number of PMs on the total delay of the larger one. This result can be explained by the fact that the total cost by using our proposed optimization framework entirely depends on the proper selections of both the weighting factor and the number of VM instances.
As illustrated in Fig. 7, we look at the performance of the impact of the number of VM instances from N = 200 to N = 800 on the total energy consumption of system under the given number of the PMs. Regarding this figure, the total energy consumption based on the smaller number of VM instances outperforms the total energy consumption using the larger number of VM instances. In the meantime, the curves become more and more precipitous when the number of PMs decreases. Notice that the number of VM instances represents the number of computation tasks offloaded from the SMDs to the vMDC assigned to the PMs. Thus, this is because that our proposed two-sided matching algorithm can obtain the lower total energy consumption under the condition of the smaller number of VM instances, e.g., from 8.8378 × 10 4 to 1.0273 × 10 4 when M = 100. Beyond that, we can further see that the total energy consumption increases correspondingly with the decrease of the number of PMs. That is, the total energy consumption by using the number of PMs, M = 100, is clearly more than that of the number of PMs, M = 250. The explanation is that the more number of PMs may share the computational burden to reduce the associated energy consumption.
In Fig. 8, we further study the effect of the varying number of VM instances from N = 200 to N = 800 on the effect of the total delay for our proposed VM placement framework. As shown in this figure, by increasing the number of VM instances, it is not difficult to find that the total delay is growing accordingly as expected. Apart from that, the curves of the total delay under different numbers of PMs tend to be closest for each other. The major reason is the increase amount of the calculation workload assigned to the PMs by the VM instances will generate a lot of computational burden for the physical resources. This may further Fig. 6 The impact of the number of PMs on the total cost Fig. 7 The impact of the number of VMs on the total energy consumption Fig. 8 The impact of the number of VMs on the total delay cause the higher total delay, e.g., significantly growth from 5.5842 to 22.3395 when M = 100. However, although the total delay has been continuously growing, a shallow total delay can be relevantly actualized by adopting the larger number of PMs. This indicates that the more number of PMs is required to calculate computation tasks for reducing the total delay under the given number of VM instances.
From the results of three sub-figures in Fig. 9, we further evaluate the performance of our proposed VM placement framework in terms of the total cost by increasing the number of VM instances from N = 200 to N = 800. It is clear from Fig. 9 that the total cost lies in a continuous growth when the number of VM instances is also growing, indicating that the low effectiveness of the considered system by employing the more number of VM instances. Particularly, when the weighting factor continuously magnifies, the curve has distinct variation, i.e., the improvement of the total cost is even more pronounced with a larger weighting factor. Moreover, it can be also observed that the total cost will be sustainably increasing with the gradual growth of the number of PMs for the three sub-figures corresponding to the different numbers of VM instances. This is due to the fact that the vMDC can own more physical resources to calculate the offloaded computation tasks, and thereupon then ascend the proposed system performance, i.e., the reduction of the total cost for our proposed framework. That is, the total cost varies from 7.9285 × 10 3 to 4.2922 × 10 4 using w = 0.5 and M = 200 when the number of VM instances increases from 200 to 800. This result implies that the proposed two-sided matching algorithm responds in a more cost-effective way to the increase of the number of the PMs by properly assigning the weighting factor in our formulated weighted sum based cost minimization problem. Figure 10 plots the impact of both the number of PMs and the number of VM instances on the total cost under given normalization factor T for weighting factor w = 0.5. In particular, the normalization factor T is selected based on three constant values, i.e., 1000, 2000, and 3000, respectively. The result is obtained based on the number of PMs varying from M = 50 to M = 300 and the number of VM instances varying from N = 200 to N = 700. It is observed from this figure that the larger the normalization factor, the higher the total cost for our proposed VM placement framework under the constraint of the given number of PMs and the given number  of VM instances. This is because that higher total delay with higher weight is more inclined to generate more and more total cost. From the figure, we can also see that the total cost will increase jointly with the decreasing number of PMs and the growing number of VM instances with weighting factor w = 0.5. This is due to the fact that the vMDC in the considered system can use more PMs to undertake the calculation of more offloaded computation tasks, thereupon reducing the total cost. However, if we increase the number of VM instances, there will be higher occupying physical resources, such that both the energy consumption and the delay will increase accordingly. Obviously, this result will give rise to more total cost. Particularly, the total cost will reduce from 1.5514 × 10 4 to 7.928 × 10 3 with normalization factor T = 1000 and the number of VM instances N = 200. This infers that our proposed two-sided matching algorithm is able to offer the significant performance improvement by utilizing the proper normalization factor.

Summary
The performance of our VM placement framework introduced in Sect. 4 is numerically evaluated. Based on the simulation framework through our simulator developed in Python 3.7, we quantitatively demonstrate the effects of the number of PMs and the number of VM instances on the performance of the proposed matching algorithm. In particular, we focus on illustrating the total energy consumption, the total delay, and the total cost as the performance metrics. Furthermore, we obtain results illustrating the sensitivity of the system performance towards the choice of parameters in different scenarios, such as weighting factor w and normalization factor χ T in the weighted sum based cost function for computing and offloading.
The observations from the above analysis can be summarized as follows: • With the increment of the number of PMs M, the total energy consumption takes a substantially downward trend in the VM placement framework under the given number of VM instances N when M ≤ 150. • The total delay shows a slowly decrease trend with the growing number of PMs from M = 50 to M = 150. That is, to say, the proposed system performance can be improved by flexibly choosing the number of PMs based on the amount of offloaded computation tasks from SMDs to the vMDC. • For the same number of VM instances N, the total cost obviously decreases with the number of PMs M increasing for the virtualized MEC system. Moreover, the total cost under the smaller weighting factor w is obviously smaller than that of the larger one.
• For the same number of PMs M, the total energy consumption with the smaller number of VM instances N outperforms the total energy consumption using the larger number of VM instances N. • With the increment of the number of VM instances N, the total delay takes an upward trend for the same number of PMs M. • The total cost lies in a continuous growth when the number of VM instances increases from N = 200 to N = 800. More particularly, when the weighting factor w continuously magnifies, the total cost takes an upward trend for the VM placement framework. • For the same weighting factor w, the larger the normalization factor χ T , the higher the total cost for our proposed VM placement framework in the virtualized MEC system under the constraint of the given number of PMs M and the given number of VM instances N.

Conclusion
In this paper, we proposed a VM placement optimization framework for achieving a performance balance between energy consumption and computing/offloading delay in virtualized MEC system. This framework incorporated the more complex mapping relation between the VM instances and the PMs in comparison with the existing works. Motivated by that, we formulated a weighted sum based cost minimization problem as a pure 0-1 integer linear programming problem. Particularly, the formulated optimization problem was NP-complete and computationally intractable, especially for the increasing network scale. To tackle this problem with affordable computational complexity, we transformed this VM placement problem as a many-to-many two-sided matching problem. We further developed an effective algorithm using the SPA-(S, P) algorithm to obtain the sub-optimal solution of the formulated problem. Finally, we validated and evaluated the performance of our proposed framework through numerical analysis by properly choosing both the number of PMs and the number of VM instances.