Bayesian Optimization of Queuing-Based Multichannel URLLC Scheduling

This paper studies the allocation of shared resources between ultra-reliable low-latency communication (URLLC) and enhanced mobile broadband (eMBB) in the emerging 5G and beyond cellular networks. In this paper, we design a unique queuing mechanism for the joint eMBB/URLLC system. The aim is to flexibly schedule URLLC traffic to enhance the total eMBB throughput and the reliability of URLLC packets (i.e., the probability of not dropping URLLC packets in each mini-slot) while maintaining a satisfactory transmission latency as per the 3GPP requirements. Precisely, by deriving the steady-state probabilities of URLLC queue backlog analytically, we formulate a stochastic optimization problem to maximize the total normalized eMBB throughput and the URLLC utility. Due to the stochastic nature of the objective function, it is expensive to evaluate it for any set of inputs, and thus the Bayesian optimization is applied to obtain the optimal results of such a black-box objective function. Numerical results demonstrate that the proposed queuing mechanism never violates the latency requirement of the URLLC services but improves the reliability. It also enhances the total normalized eMBB throughput as compared to the method without queuing.

, [3], [4], [5]. Ultra-reliable low-latency communications (URLLC) and enhanced Mobile Broadband (eMBB) services are two main categories that have been introduced in the 5G new radio. In the application of the former service, 3GPP has specified that each URLLC packets should be served in a very short time (i.e., up to one millisecond) with the packet error rate of less than 10 −5 , or even down to 10 −9 [6]. With such strict requirements, the possible applications of URLLC include intelligent transportation, industry automation and generally mission-critical services [7]. While URLLC has been characterized by low latency and ultra reliable requirements, the eMBB service targets high data rates to bring a faster and better user experience. The target data rates for eMBB services usually are higher than 100Mb/s [8].
When transmitting the eMBB and URLLC traffic in the same frequency-time spectrum, different time scales are applied to optimize the resource blocks allocation [9]. The eMBB services often occupy a more significant proportion of the transmission resource blocks, and the infrequent URLLC traffic is normally served spontaneously [10]. Thus, each eMBB packet can be coded over a longer time slot and each URLLC packet is served in shorter mini-slot to achieve the low latency requirement. Besides, 3GPP proposed a resource allocation and puncturing scheme for the URLLC/eMBB coexistence system [11]. In such puncturing scheme, the transmission channel suspends the ongoing eMBB packets in order to transmit URLLC packets. Specifically, when the channel senses an incoming URLLC packet, it immediately punctures the eMBB traffic in the next mini-slot, or the URLLC packet will be dropped as a loss. The authors in [12] elaborated the benefits of using the puncturing scheduling policy on the joint system of eMBB and URLLC, which are considering the sporadic and random characteristics of the URLLC traffic and transmitting such latency-critical service upon the ongoing eMBB transmissions to enhance the reliability. However, such a scheduling technique influences the quality of services of the eMBB services. In more detail, the utility of eMBB services (i.e., the data rate or total throughput) is directly influenced by the number of preemption URLLC packets and more puncturing on the eMBB channels could cause the decoding of eMBB to fail.

A. Relevant Works
With the preemptive puncturing policy, the current studies focus on the optimization problems that maximize the 1536-1276 © 2022 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
allocation of URLLC packets among eMBB traffic within the target latency while maintaining the utility of eMBB users. Some researchers assumed the URLLC traffic has a higher scheduling priority than eMBB traffic. For example, the work in [13] considered a risk-sensitive model for eMBB transmission and optimized the URLLC reliability as a chance constraint which was relaxed based on Markov's inequality. Besides, in [12], the authors showed the advantage of eMBB-aware URLLC scheduling decisions. Such a method primarily punctures eMBB transmission, which caused the low modulation and coding scheme index. Some other researchers attempted to jointly schedule URLLC with eMBB services. In [14], the authors considered three different loss models for eMBB rate, i.e., linear, convex and threshold. They maximized the eMBB utility under different loss models while guaranteeing the URLLC requirements. Considering the different distributions of URLLC traffic, [15] formed a chance-constrained optimization problem of the joint eMBB/URLLC system, and the cumulative distribution function (CDF) was applied to transform the chance constraint into a linear constraint. Their proposed algorithm showed a fair allocation of resource blocks to eMBB users.
Some researchers implemented machine learning to solve the optimization of the joint eMBB/URLLC system. For instance, in [16], the authors used a flexible transmission interval selection policy and applied supervised learning to the joint optimization. However, their studies scheduled eMBB and URLLC unevenly (i.e., scheduled URLLC in advance). The authors in [17] proposed a long-term monitor system under the changing environment, which balanced the cost of eMBB traffic and the gain of URLLC services. Remarkably, the previous research has also shown that deep reinforcement learning (DRL) has significant advantages in enhancing the eMBB requirements in such a coexistence system. E.g., In [8], the authors proposed a model-free DRL to minimize the impact of puncturing on eMBB users during the URLLC transmission. The deep deterministic policy gradient based method was used in their system to solve the continuous action domain for the resource scheduling. Besides, [18] reduced the impact of URLLC immediate scheduling policy on eMBB utility while considering the variance of eMBB data rate. The DRL-based model intelligently assigned the URLLC packets among the eMBB users.
The existing studies have already achieved significant outcomes on the design of URLLC scheduling policy when it coexists with eMBB traffic, e.g., [12], [13], [14], [15], [16], [17]. However, in general, all of the works mentioned above assumed that the incoming URLLC packets could only either be served or dropped immediately at the beginning of each mini-slot, which only ensured the URLLC latency requirement but caused packet losses to the URLLC. The work in [19] considered a queuing mechanism for URLLC traffic, where each transmission channel had an individual queue. It is assumed that the packet arrivals are evenly distributed between different queues. The base station (BS) makes the puncturing decision of each eMBB channel separately in turns at the beginning of each time slot. The even distribution of packet arrivals and separate decision making of different channels would lead to more drops depending on the channel conditions. Therefore, in this paper we propose a new queuing scheme with a single queue wherein the dynamic scheduling of the multi-channel system could be managed jointly and hence more effectively. We have implemented the results of [19] as a benchmark and verified the effectiveness of the new scheduling scheme as compared with [19].

B. Motivation and Contributions
In this paper, we propose a novel queue-based allocation mechanism for the URLLC traffic in a multi-channel system and aim to improve the URLLC reliability within the target latency by simultaneously queuing/transmitting multiple URLLC packets in each mini-slot over multiple channels. The main contributions of this paper are summarized as follows: • We study the URLLC scheduling in a coexisting URLLC/eMBB downlink transmission system. In such a joint system, we assume the BS has already allocated the radio resources among the eMBB users on different channels at the beginning of each eMBB time slot before scheduling the URLLC packets. We propose a M/G/∞ queuing mechanism for infrequent URLLC traffic and derive the expressions that analyze the URLLC queuing mechanism, including the number of queuing packets and the instantaneous mean latency of the served URLLC packets. The differences between the analytical and numerical results are below 10 −3 . The results also show that the proposed queuing mechanism can wisely adjust the number of URLLC queuing packets to balance the trade-off between the reliability and latency requirements of URLLC services. • We apply the puncturing scheduling for the joint eMBB/URLLC system and propose an objective function that maximizes the total normalized eMBB throughput and the URLLC utility among all possible system states. The URLLC utility is quantified as the successful transmission probability of URLLC traffic with no drop over all channels which considers the probabilities of successful transmission of URLLC traffic on each channel and the probability of not dropping URLLC packets. However, it is complicated to derive the closedform of the steady state probabilities by analyzing the dynamic of the URLLC queue. The standard optimization techniques cannot be used in the case of solving a blackbox objective function, thus, in this paper, we apply the Bayesian optimization (BO) algorithm to solve the optimization problem. Besides, we use the additive structure with the BO algorithm to reduce high dimensions by decomposing the original search space into several sub-spaces. We implement different acquisition functions of the BO algorithm and use the random search as a comparison to show the better performance of BO. The results show that the BO algorithm can quickly reach the optimal point with a limited training set. • We evaluate the performance of the proposed algorithms by comparing them with two benchmarks. One is the optimization problem for the case of without the queuing scheme, another is the proposed mechanism in [19]. Based on the results, firstly, our one queue scheme shows the flexibility of the URLLC scheduling compared to [19], where the scheduling of the URLLC is rigid. Beside, the scheme without queuing is more sensitive to the total number of channels and the value of URLLC arrival loads. Without enough channels, the mean value of total eMBB throughput of the no queuing mechanism is significantly lower than the queue-based scheme. Moreover, we define the URLLC outage probability as the probability that there exists dropped URLLC packets in a mini-slot. A slight increase in the number of transmission channels can significantly improve the URLLC outage probability of the proposed queuing scheme. However, the URLLC outage probability of the scheme without queuing hardly meets the 3GPP requirements, even with a large number of channels. These results show that our proposed queuing system can enhance both the total normalized eMBB throughput and URLLC requirements.

C. Organisation
The rest of this paper is organized as follows. Section II introduces the multi-channel scheduling policy for URLLC with queuing in coexistence with eMBB. Moreover, the working principles of the URLLC queuing mechanism are explained and an optimization problem is formulated to find the optimal puncturing weights. In Section III, the dynamics of the proposed queuing are mathematically analyzed aiming to characterize the achievable performance of URLLC and eMBB traffic depending on the puncturing weights. Section IV reviews the BO framework with additive structures and focuses on developing an algorithm based on BO to optimize the URLLC scheduling. The detail of using the additive structure algorithm to reduce the high-dimension problem into a serial of low-dimensional sub-problems is also explained. Section V presents the numerical results to evaluate the performance of the queuing mechanism and the developed algorithm for URLLC scheduling. Finally, the conclusions are given in Section VI.
Notations: We use E[.] and P[.] to denote the expectation and the probability respectively. N and N + are represented as the set of natural numbers and positive integers, respectively.

A. System Model
We consider a downlink wireless system that contains a base station (BS) to simultaneously provide eMBB and URLLC services. The system frequency resources are divided into B channels of equal bandwidth. We study the URLLC scheduling over a set of channels B = {1, 2, . . . , B} in coexistence with eMBB traffic. These channels are primarily scheduled for eMBB transmission and will be shared with URLLC traffic adopting a puncturing policy as proposed in [14]. The puncturing policy will be optimally designed to simultaneously minimize the eMBB throughput loss due to URLLC transmission and enhance the URLLC reliability measure. For satisfying the URLLC latency requirement, it is assumed that  1. Scheduling policy for URLLC in the joint eMBB/URLLC system with the proposed M/G/∞ queuing mechanism. The served URLLC packets are shown in red rectangles at each mini-slot. New arriving and queue backlog packets in mini-slot m are shown in black and yellow colours, respectively. the system operates on two different timescales as proposed by 3GPP [20], i.e., the eMBB time slot t, at the beginning of which the eMBB users are scheduled, and mini-slot m to schedule the sporadic URLLC packets. Each of the eMBB time slot is divided into M equal size mini-slots. The set of mini-slots is represented by M = {1, 2, . . . , M}. The frequency-time spectrum of the downlink transmission system on different scheduling intervals is illustrated in Fig. 1 along with Table I listing the notations used in this work.
We assume block fading for eMBB users where the channel state of an eMBB user at one frequency channel remains constant during one time slot. The set of possible channel rates for an eMBB user is represented by C embb = {C embb,1 , C embb,2 , . . . , C embb,N embb c }, where N embb c is the cardinality of the set C embb . Let S embb = [s b ] be the 1 × B vector of the system state of eMBB that represents the instantaneous channel rates of eMBB users over B channels at current time slot, where s b ∈ C embb denotes the channel rate of the eMBB user transmitting at channel b ∈ B. Moreover, we consider the probability of successful transmission as the system state of URLLC. The set of possible probabilities of successful transmission for URLLC wireless channels is denoted by , ∀s s s ∈ S. The wireless system undergoes the channel variations in each eMBB slot which are modeled as an independent and identically distributed random process over the set of C eMBB and C urllc . Thus, our scheduler is assumed to know (or able to estimate over time) the distribution of system state (i.e., P s (s s s)) across the eMBB slot.
The URLLC transmissions are determined and realized by the puncturing weights selected in different channels given s s s. Suppose W W W = w s s s b , ∀b ∈ B, ∀s s s ∈ S is the B × N s matrix of URLLC puncturing weights, where w s s s b ∈ [0, 1] is the weight of puncturing on the channel b in system state s s s. We assume that the puncturing weights are the same over different minislots of one time-slot and hence time-homogeneous over one time-slot. The realization of each puncturing weight is random based on the variable π s s s b,m , which is generated using a Bernoulli distribution with the success probability of w s s s b at the beginning of each mini-slot. If π s s s b,m = 1, a URLLC packet will be overridden upon the scheduled eMBB transmission on channel b in mini-slot m when the system state is s s s. We assume that each channel can transmit at most one URLLC packet per mini-slot. Otherwise, π s s s b,m = 0 and channel b will not be punctured by any URLLC packet in mini-slot m when the system state is s s s. Given π s s s b,m , the total number of mini resource blocks allocated to URLLC puncturing over all channels in mini-slot m can be computed as

B. URLLC Scheduling Policy With Queuing
We propose the multi-channel scheduling for the URLLC services with one M/G/∞ queue, which can simultaneously serve multiple URLLC packets at the beginning of each mini-slot to be punctured over different channels depending on π s s s b,m . In particular, M means we consider Poisson URLLC arrivals, G means that service times have a General distribution, and ∞ specifies that the queue length is considered infinite. As supported by [20], we adopt Poisson URLLC packet arrivals that enable us to analytically derive the expected number of served URLLC packet and the probability of no drop. Moreover, we consider infinite queue length in this work so that we could analyze the number of URLLC packet drops that caused by the delay of URLLC scheduling rather than the limitation of the queue.
The proposed queuing mechanism enhances the URLLC reliability by not immediately dropping unserved packets and keep them in a queue while meeting the strict latency requirements. 3GPP has suggested an extreme latency requirement for URLLC, i.e., each URLLC packet should be served within 0.3 milliseconds [11], [20], [21]. In addition, 3GPP advises that the length of each URLLC mini-slot is 0.125 milliseconds. Accordingly, we establish the working principle of URLLC queuing mechanism as the URLLC packets can wait in the queue until next mini-slot if it cannot be transmitted immediately. If a URLLC packet has already queued for one mini-slot and will not be transmitted in the next mini-slot, it will be dropped from the queue and counted as a loss.
We assume that URLLC packets have the same fixed packet size and randomly arrive in each mini-slot. Let A m denote the number of URLLC packets that arrive at the queue line between mini-slot m − 1 and mini-slot m.
Suppose PMF of A m is denoted by p a (y) as where p a (y) ∈ [0, 1] and y∈N p a (y) = 1. Let λ = E A m denote the average URLLC arrival rate (in units of packets/mini-slot).
Let Q s s s m−1 be the number of URLLC packets already stored in the queue before the mini-slot m in system state s s s. Thus, based on the working principle of the queuing mechanism, if in mini-slot m system state s s s, the number of served URLLC packets μ s s s m and the number of dropped URLLC packets D s s s m can be derived as Accordingly, in system state s s s ∈ S, the one-step queuing equation for the queue link is expressed as Fig. 2 illustrates an example on how the URLLC queuing mechanism operates in mini-slots. Suppose in this case that the number of provided channels is not sufficient to transmit the URLLC backlog packets in mini-slot m, i.e., C s s s m < Q s s s m−1 . The reserved queue backlog will be dropped and the newly arrived packets will be backlogged in the queue backlog, i.e., Q s s s m = A m . In the second case, when the system provides enough transmission opportunities for URLLC queue backlogs and there are some surplus opportunities for the newly arrived packets, there will be no drops and the queue backlog in the next mini-slot becomes the reserve of newly arrived packets.

C. Problem Formulation
Here, we formulate the URLLC multi-channel scheduling problem as a stochastic optimization problem that optimally determines the puncturing weights for all channels and all system states. Since the URLLC packets will be punctured on the ongoing eMBB transmissions, the URLLC scheduling policy should strike the balance between mitigating the adverse impact of URLLC puncturing on the average eMBB throughput and satisfying the URLLC requirements. Therefore, we formulate an optimization problem aiming to maximize the total normalized eMBB throughput and URLLC utility. The URLLC utility is quantified as the successful transmission probability of URLLC traffic with no drop over all channels in the system state s s s ∈ S.
Accordingly, the optimization problem is formulated as: where U s s s embb denotes the total normalized eMBB throughput and U s s s urllc represents URLLC utility over all channels in system state s s s ∈ S. The constraint (6b) ensures that the value of puncturing weights could only vary between zero and one.
The URLLC utility U s s s urllc is defined as the product of the probability that no URLLC packet would be dropped in each mini-slot (i.e., P(D s s s m = 0)) and the successful transmission probability of URLLC traffic over all channels in system state s s s ∈ S. Thus, U s s s urllc is expressed as The total eMBB throughput in system state s s s ∈ S is defined as the total eMBB rate over all channels (i.e., b∈B s b ) scaled by that is the ratio of one time slot the eMBB used excluding URLLC transmissions. In particular, w s s s b E [μ s m ]/ b∈B w s s s b represents the expected number of minislots punctured for URLLC service on channel b to the total number of mini-slots in one eMBB slot in system state s s s ∈ S. Therefore, the total normalized eMBB throughput in system state s s s ∈ S can be calculated as By solving the optimization problem (6) and obtaining the optimal puncturing weights, the proposed scheduling policy enhances the URLLC reliability by maximizing U s s s urllc in (6), while the URLLC latency requirement is supported by the URLLC queuing mechanism. To solve (6), we need to derive E [μ s m ] and P(D s s s m = 0) by studying the queue dynamics. The next section provides the queuing analysis to establish the objective function as the function of W W W .

III. PERFORMANCE ANALYSIS WITH QUEUING
The optimization equation in (6) contains two metrics that demonstrate the efficacy of the URLLC scheduling policy when coexisting with eMBB services, i.e., the total normalized eMBB throughput and URLLC utility. In this section, we mathematically analyze the URLLC queuing mechanism in order to derive the objective function as a function of W W W .

A. eMBB Throughput Analysis
Here, we start the analysis by studying the eMBB throughput in (8). This requires the analysis of x where x represents the number of served URLLC packets, x ∈ {0, . . . , B}. Accordingly, the expected number of served URLLC packets in mini-slot m system state s s s ∈ S can be expressed as x · p s s s,m μ (x).
Therefore, the first step to find the expected value of served URLLC packets is deriving the PMF of served URLLC packets, i.e., p s s s,m μ (x). Considering (3), the event of μ s s s m = x occurs when either (C s s s where F C contains all the combinations that choose x channels out of B. Also, K C and K C denote possible subsets of F C and its complementary set, respectively. Moreover, δ ∈ K C denotes the channels that are chosen to transmit URLLC packets and δ ∈ K C denotes the channels in which no URLLC puncturing happens. According to (11), note that p s s s c is time-homogeneous across mini-slots in one eMBB slot and does not depend on m. This is because the puncturing weights are also timehomogeneous.
Moreover, the probability of P(Q s s s m−1 + A m = x) can be written as where p a (y) = P(A m = y) is the PMF that the number of the newly arrived URLLC packets is A m = y as introduced in (2) and p s s s Q (x − y) represents the steady state probability of the URLLC queuing backlog. In particular, Based on (12), the next step of deriving E μ s m in order to obtain the total normalized eMBB throughput is to find the steady state probability of the URLLC queue backlog.

B. Steady State Probability of the URLLC Queuing Mechanism
To obtain the steady state probabilities of the URLLC queue backlog, we apply the Markov chain model considering the transition probabilities among them [22], [23], [24]. In particular, we assume that the state of the Markov chain represents the number of URLLC packets backlogged in the queue that can change from zero to infinity. State zero represents the empty queue. The state transition probability matrix is denoted by and, ∞ k=0 ψ s s s n,k = 1.
The steady-state solution of the Markov chain in system state s s s ∈ S, denoted by p s s s Q (q), q ∈ N, can be calculated as [22] p s s s where the output of the function F (A) is the matrix A with its right-most column replaced with all 1's. I is the identity matrix of size q, 0 q,q denotes the zero matrix of size q × q. Proposition 1: The time-homogeneous transition probabilities of the URLLC queuing mechanism given s s s ∈ S over eMBB slot can be computed as: Proof: Given s s s ∈ S, the steps of finding the transition probabilities ψ s n,k , ∀n, k ∈ N can be divided into two cases as follows:  Table. II part (a)). In this scenario, the number of provided mini-slots for URLLC packets is always less than the number of queue backlog (i.e., Consequently, the newly arrived URLLC packets will be queued and hence Q s s s m = A m . Therefore, the transition probability of such a scenario in system state s s s ∈ S can be calculated as The second scenario of Case 1 is when B > Q s s s m−1 , but the number of provided resource blocks is no larger than the number of queuing packets (i.e., C s s s m ≤ Q s s s m−1 ≤ B as shown in Table. II part (b)). In this scenario, the next queue state only depends on the value of A m . The transition probabilities of such a scenario can be calculated as To sum up, the transition probabilities of the two scenarios for Case 1 can be combined as the first term in (17), i.e., p a (k) min{B,n} x=0 p s s s c (x), for any k ∈ N. In Case 2, the number of provided mini-slots is larger than the number of URLLC backlog packets, and the BS starts to allocate mini-slots for transmitting the newly arrived packets, i.e., Q s s s m−1 < C s s s m ≤ B as shown in Table. II gray parts. Let θ, θ ∈ {1, . . . , B − n} denote the number of provided mini-slots for transmitting the newly arrived URLLC packets. In this case, the number of provided mini-slots is expressed as C s s s m = n + θ, C s s s m ∈ {n + 1, . . . , B}. In Case 2, the queuing mechanism will not drop any URLLC packets from the queue backlog (i.e., D s s s m = 0). If the number of newly arrived URLLC packets is larger than θ, the remnant URLLC packets will be queued in the next mini-slot, i.e., Q s s s m = A m − θ > 0, k ∈ N + . Otherwise, all the new arriving URLLC packets will be transmitted and the next state goes to state zero, i.e., Therefore, with the transition probability equation in (17), we can compute the steady-state probability of the queue based on (16). Having the steady-state probabilities of the queue, the PMF of the number of served packets can be derived based on (10), (11), and (12). Consequently, by substituting (10) into (9), E μ s m can be found.

C. No-Drop Probability of URLLC Packets
The second performance metric of the URLLC scheduling policy is the URLLC utility, which depends on the probability of not dropping URLLC packets in each mini-slot based on (7).
where, p s s s Q (q) and p s s s c (x) denote the steady state probability of the URLLC queuing backlog and the PMF of provided minislots, as introduced in (16) and (11), respectively. Accordingly, the no-drop probability of URLLC depends on the distribution of URLLC arriving packets and eMBB puncturing weights and thus is the same over different mini-slots of one time-slot (i.e., time-homogeneous over one time-slot).
Proof: The probability of not dropping URLLC packets can be calculated as Based on (21)-(23), the no-drop probability is updated as (20). This completes the proof.

IV. BAYESIAN OPTIMIZATION FOR URLLC SCHEDULING
With the mathematical analysis of the queue dynamics, it becomes apparent that finding the closed-form of the objective function in (6) is infeasible because there is no closed-form solution for the steady-state probabilities of the URLLC queue. In such a case, the optimization problem can be solved as a black-box function where the inputs are the puncturing weights. For each input, the black-box function can be computed. However, the computational cost of acquiring the black-box function for each possible input is high since the representation of the objective function and its derivatives are uncertain and unknown [25], [26]. In other words, during the evaluation, only the outputs of the black-box function for each input will be required and there is no need to analyze the first or second-order derivatives. This is also called "derivativefree" problem, which prevents the use of first and second-order derivatives in optimization algorithms, as required in gradient descent, Newton's, and quasiNewton methods [27]. Thus, it has always been challenging to shorten the evolutionary computation time for stochastic black-box optimization.
Recently, many researchers have been focused on solving the black-box optimization based on the BO algorithm [25], [28], [29], which is an efficient optimization method to approach a global maximum (or minimum) of an unknown objective function by iteratively developing a global statistical model of it. BO algorithm has two critical elements that need to be chosen appropriately for the different objective functions [29], [30]. The first one is the surrogate model, i.e., a substitute function used to hypothesize the optimized black-box function.
Starting with a prior model over the function and a likelihood, at each iteration the surrogate model can compute the posterior distribution by conditioning on the previous evaluations of the objective function. Then, the second critical element of BO, acquisition function, is constructed, which maps the belief of the objective function by measuring the foreground of each position in the input space. After that, the goal is to find the input that maximizes the acquisition function and uses it as a new sampling point for the black-box function evaluation. BO puts the new input in the black-box function to obtain the corresponding output, the surrogate model is then updated. This completes one iteration of the BO loop.
In this section, we first introduce the fundamentals of the surrogate model and acquisition functions. After that, the principles of additive structure to solve high dimensional BO problems will be presented. The proposed solution of (6) using the BO and additive structure algorithms will also be explained.

A. Surrogate Model
We employ the Gaussian process (GP) regression [31] for the surrogate model that provides a Bayesian posterior probability distribution of the objective function for different values of input considering a prior with multivariate Gaussian distribution. GP is a commonly used surrogate model for BO since it can efficiently and effectively summarize a large number of functions, smooth the transition with plenty of observations and thus better quantify the uncertainty in the surrogate model. Besides, the elegant marginalization of the Gaussian distribution contributes to calculating the edges and conditions of the closed-form.
Suppose the black-box objective function in (6) is denoted as where ρ 0 is the prior mean function, which is set to zero as GPs are flexible enough to model the mean arbitrarily well [32]. Moreover, K 0 (W W W In this paper, κ 0 (W W W 1:r , W W W 1:r ) follows the squared exponential covariance function, which specifies the covariance between each pair of input variables as BO combines the prior beliefs about f (W W W ), updates the priors with the training set, and finally obtains the posterior to approximate the black-box function. Given the observations and Gaussian prior, the posterior distribution of f (W W W ) conditioned on N (r) can be obtained as [27], [32] where ρ r (W W W ) and σ 2 r (W W W ) denote the posterior mean and variance, which are calculated as and, Based on the posterior distribution, in BO, an acquisition function will be constructed to quantify the evaluation of the objective function at a new sampling point W W W . In other words, the acquisition function is computed based on the current posterior distribution, then, the next sampling point for observation and evaluation is selected by the maximizer of the acquisition function over the objective domain.

B. Acquisition Function
The acquisition function of BO is used to propose the next sampling point in the search space where there is a tradeoff between exploitation and exploration. The goal of both exploitation and exploration is to maximize the acquisition function to determine the next sampling point. However, the former refers to sampling in places where the surrogate model predicts a high objective, and the latter refers to sampling in places where the prediction uncertainty is high.
Given the current observations N (r), in each iteration the position of the next evaluation point can be decided by maximizing the acquisition function (i.e., denoted as g(.)). In particular, the next sample will be selected as The standard acquisition functions developed for BO are Expected Improvement (EI), Probability of Improvement (PI), and upper confidence bound (UCB) [27], [29]. The performance of the acquisition function depends on the choice of the surrogate model. The derived expressions of the above three commonly used acquisition functions based on the GP hyperparameters and previous observations set N (r) are presented as follows. 1) Expected Improvement: The principle of the Expected Improvement algorithm is to maximize the expected improvement over the current best [27], [33], and the expression of Expected Improvement is defined as where f * r = max j≤r f (W W W j ) is the largest observed value in N (r). Under GP, the expression of (33) is analytically computed based on (29) as . Moreover, Φ and φ are the CDF and Probability Density Function (PDF) of the standard normal distribution, respectively. Parameter ξ ≥ 0 in (34) determines the amount of exploration during optimization [28], [34].
With higher values of ξ, the acquisition criterion selects the next sampling point which has a larger surrogate variance (i.e., σ 2 r (.) in (31)). Conversely, with lower values of ξ, the acquisition criterion results in selecting the next sampling point with a larger surrogate mean value (i.e., ρ r (.) in (30)).
2) Probability of Improvement: The strategy of Probability of Improvement is to maximize the probability of improving the best current value [35]. Under GP, this algorithm is analytically computed as 3) Upper Confidence Bound: The idea of the upper confidence bound is to maximize the regret in the optimization process [27], [36]. Under GP, the upper confidence bound is analytically computed as In the simulation section, we implemented all the acquisition functions above to compare the performance among them.

C. Additive Structure
BO is a common technique to optimize an expensiveto-evaluate d-dimensional black-box objective function [30]. However, the algorithm meets challenges when the input space of the objective function is larger since the reliable search and estimation of the surrogate model and acquisition functions in higher dimensional spaces may require more evaluations. In this paper, we apply the additive structure to reduce the high dimensional input space of BO; this technique is an advanced algorithm that can accelerate the search and optimization efficiently [37], [38].
The additive structure assumes that the black-box objective function is an additive function as a sum of sub-functions of all combinations of lower dimensional coordinates. It has been proved in [38] that an additive function only depends linearly on d even though it depends on all d input dimensions. Thus, the working principle of the additive structure in this paper is dividing the high dimensional input space into multiple sub-spaces, so that the number of variables in each sub-space is small, e.g., less than twenty. Accordingly, this technique decomposes the original black-box function in to multiple sub-functions. Since the acquisition function can be optimized component-wise, for each sub-black-box function, we carry out the basic BO optimization, and update the surrogate model and acquisition function in each sub-space.
where w w w s s s 1:r and f s s s (w w w s s s 1:r ) indicate the input sequence and corresponding outputs. The prior mean function is set to zero function, i.e., ρ s s s 0 (w w w s s s 1:r ) = 0 0 0, ∀s s s ∈ {s s s 1 , . . . , s s s Ns }. Also, K s s s 0 (w w w s s s 1:r , w w w s s s 1:r ) is the covariance matrix, which follows the squared exponential covariance function as shown in (28). Then, the mean function and covariance matrix of the original black-box function in (25) is obtained as [38] The decomposed objective functions are independent, and thus the corresponding surrogate model and acquisition functions are updated in the decomposed sub-space domains with the corresponding posterior mean and variance.

D. Proposed Solution
Here, the proposed solution for (6) using BO and the additive structure is presented. As shown in Algorithm 1, we first create the initial training set N (r) with r training samples, which includes the input data and the corresponding output data under f (W W W ). Each input is randomly generated puncturing weights over all channels and possible system states. We decompose the search space, initial training set, and the original objective function into N s parts. From step 5 to step 9 in algorithm 1, we find the next sampling point in each sub-space. As step 8 shows, with the posterior mean and variance of each sub-function, we apply the acquisition function in the sub-space to maximally determine where to  ∈ {s s s 1 , . . . , s s s Ns } do 6: Find the GP prior distribution of f s s s (w w w s s s ).

7:
Find the posterior mean and variance of f s s s (w w w s s s ) using N s s s (r) and update the posterior distribution. 8: Find the next sampling point as w w w s s s r+1 = arg max w w w s s s g(w w w s s s |N s s s (r)).

9:
Obtain output sample, i.e., f s s s (w w w s s s r+1 ). 10: end for 11 sample the next sub-objective function. The corresponding output data in each sub-space is found by putting the obtained sampling data in step 8 into the sub-objective function. Then the new sampling data for original objective function is obtained as shown in step 11 and 12. After updating the training set, we iterate between step 2 and step 14 until reaching a maximum number of iterations, i.e, R.

E. Computational Complexity
It has been proved that the computational complexity of the BO algorithm is an order of O(r * 3 ), where r * is the number of samples evaluated until convergence [39], [40]. This computational complexity is dominantly due to the computations required to find the posterior. The computation of the posterior requires the computation of the inverse of the kernel matrix, whose size depends on the number of sample evaluations. More specifically, in the derivation process of ρ r (W W W ) in (30) and σ 2 r (W W W ) in (31), the computational complexity is dominated by the matrix inverse K −1 0 (W W W 1:r , W W W 1:r ) in (28). Based on [37], [38], [39], [40], the standard methods for matrix inversion of positive definite symmetric matrices require computational time complexity O(r 3 ) for inversion of an r by r matrix. Therefore, it can be concluded that the complexity of the proposed algorithm only grows cubic in the number of sample evaluations r * . With the additive structure, the proposed solution in Algorithm 1 further enhances the time efficiency of BO by decomposing the black-box function into several parts and optimizing them in parallel. This will reduce the number of sample evaluations required for convergence.
Besides, the proposed scheme is suitable for timecritical applications. Since this paper considers a stochastic optimization of URLLC scheduling, the algorithm finds an optimal scheduling solution for each possible system state combination. With the optimized results for the system states distribution, the BS could choose the eMBB puncturing weights at the beginning of each decision slot based on which rates the channels undergo. Thus, the complexity of the proposed algorithm does not directly influence the URLLC latency requirement.

V. RESULTS
In this section, we first validate the accuracy of the expressions that analyze the URLLC queue in Section III. The results also show the influence of the queuing mechanism in URLLC requirements. After that, we compare the optimization ability of different acquisition functions with a limited initial training sets. The optimization results of proposed solution are presented in terms of the total normalized eMBB throughput and the utility of URLLC. We evaluated the performance of our proposed queuing mechanism by comparing it with two benchmarks, i.e., the immediate dropping policy in most of the existing works [12], [13], [14], [15], [16], [17] and the scheduling scheme in [19]. After implementing the joint eMBB-URLLC systems of the benchmarks, the results show that the proposed queuing mechanism in this paper not only enhances the probability of not dropping URLLC packets within the target latency but also improves the total normalized eMBB throughput.

A. Simulation Setup
According to [14], each URLLC packet should be transmitted within one millisecond, and the scheduling time should be no more than 0.3 milliseconds. 3GPP has suggested that the duration of each eMBB time slot is set to one millisecond with a mini-slot duration of 0.125 milliseconds [11], [14]. Thus, each eMBB slot contains M = 8 URLLC mini-slots in this paper. Same as [14], where the authors considered a finite set of eMBB channel rates with equal probability and i.i.d. across users and slots, in this paper, the set of possible channel rates for an eMBB user is assumed as C = {0.1, 0.5, 1} Mbps. We considered all combinations of system states and assumed all possible system states have equal probability of being chosen. The URLLC packets were generated randomly in each mini-slot based on the Poisson distribution with the arrival rate λ packets/mini-slot. The maximum number of the iterations in Algorithm 1 is set to R = 200. Each experiment was repeated 30 times, and the results show the averaged values.

B. The Performance of the Queuing Mechanism
First, we investigate the influence of the queue-based scheduling on the URLLC requirements assuming the puncturing weights are the same over all the channels, i.e., w b = w, ∀b ∈ B. We define α as the ratio of the expected number of served URLLC packets that were waited for one mini-slot in the queue before transmission (i.e., E[Q m−1 ] − E[D m ]) to the expected number of total served URLLC packets as  The measure α shows the percentage of the served URLLC packets that need to be queued to satisfy the ultra-reliability requirement. It also stands for the mean latency of the served packets in each mini-slot. A higher α indicates that with the queue, the BS could transmit more URLLC packets, whereas a small value of α shows the system can transmit URLLC packets directly without queuing. Fig. 3 shows α versus the puncturing weight (i.e. w) when the URLLC arrival load is λ = 0.5 packets/mini-slot. This result confirms the validity of the analytical results in Section III, since there is no significant difference between the analytical and numerical results. The mean square errors between analytical and numerical results are below 10 −3 . This figure also shows that increasing the puncturing weights will reduce the instantaneous mean latency of served URLLC packets to 0.18 mini-slot for B = 2 and further to zero if there are more channels available (e.g., B = 5 and B = 8).
To further present how the BS wisely schedules the URLLC packets with the queuing mechanism, Fig. 4 presents the analytical and numerical results of the number of served URLLC packets that have waited one mini-slot in the queue, i.e., At first, the values of β increase when increasing the intensity of w. For example, when B = 2 and w = 0.5, the value  of β increases from zero to 0.351 packets. This is because both queued and dropped URLLC packets reduce when the BS decides to puncture more mini-slots. However, the decreasing ratio of E[D m ] (as shown in Fig. 5) is higher than the decreasing ratio of E[Q m−1 ] (as shown in Fig. 6), since with larger values of w, the URLLC packets have a higher chance to be served rather than dropped. However, after the peak points, the BS provides enough mini-slots for URLLC packets, the number of dropped URLLC packets approaches zero (as shown in Fig. 5), and the results show decreasing trends only because of the number of queued URLLC packets reduces.
Furthermore, Fig. 4 also shows that if there are enough channels, the peak points of β are smaller, and the number of queued URLLC packets could quickly approach zero. For example, in Fig. 4, comparing the results of B = 2 and B = 8, with more transmission channels, the peak point of β is smaller and the trend drops more quickly. Especially when w = 0.7, the BS achieves the URLLC requirements with both of the number of queued and dropped packets are zero for B = 8. However, for B = 2, even with w = 1, the URLLC packets still need to be queued to achieve a higher served rate. Besides, the results in Fig. 4, Fig. 5 and Fig. 6 also show that the analytical results match well with the numerical results.

C. Bayesian Optimization Convergence and Performance
In this subsection, we first investigate the convergence of the proposed algorithm to find the optimal solution of the optimization problem (6) as compared to the random search.  The initial training set N (r) contains r = 5 training samples. The URLLC arrival load is λ = 0.1 packets/mini-slot. Here, we first set ξ = 0.01 as the trade-off parameter between exploitation and exploration since it is the most recommended default value in BO literature [30], [34], [37], [38]. Later, the influence of different values of ξ will be explored in this section. The probability of URLLC successful transmission is s b urllc = 0.9, ∀b ∈ B. Fig. 7 and Fig. 8 show convergence behaviors of the proposed algorithm with different acquisition functions in comparison with the random search when transmitting under different numbers of channels. In Fig. 7, for B = 2, it is shown that all acquisition functions could sense larger local points than the random search. In this case, the convergence time is short since the original input space is relatively small with d = 18. However, by increasing the number of channels B = 4 as in Fig. 8, the set of possible system states is significantly expanded (N s = 81). This leads to a higher dimensional input space, i.e., d = 324 and hence longer convergence time. Overall, all three acquisition functions could obtain satisfactory optimization results, and the EI acquisition function shows a better convergence performance than others. Thus, we choose the EI acquisition function for the following results.
Here, we also study the impact of the amount of exploration of the EI acquisition function by modifying ξ. Fig. 9 shows the convergence of the proposed algorithm with the  EI acquisition function when applying different values of ξ to find the optimal solution of the optimization problem (6). We consider four different levels of trade-off parameters, i.e., ξ ∈ {0.001, 0.01, 0.3, 3}. In Fig. 9, for ξ = 0.01, the proposed algorithm converges in a few iterations. However, a much lower value of ξ = 0.001 puts the focus on 'more local' optimization rather than exploration for global optimization by choosing the next sampling point with higher posterior mean. Therefore, the algorithm does not reach to the optimal solution in 200 iterations as for ξ = 0.01. With ξ = 0.3, the algorithm converges to the optimal point as ξ = 0.01 but more slowly compared to the case of ξ = 0.01, since increasing the tradeoff parameter increases exploration in the acquisition criterion. Increasing ξ to ξ = 3 will further slow down the convergence due to more exploration and this results in not reaching the maximum in the first 200 iterations. Therefore, we choose to use ξ = 0.01 in the simulations since it will strike a good balance between exploration and exploitation.
Besides, Fig. 10 shows the performance of the proposed algorithm to achieve the global maximum in comparison with the random search. The global maximum is obtained by evaluating all possible combinations of inputs in the grid search. The number of possible combinations of the puncturing weights in gird search grows exponentially as the number of transmission channels increases. Thus, we consider the case that B = 1 in which the number of possible system states is N s = 3 and the dimensional input space is d = 3. Let assume  Fig. 10 confirms the capability of the proposed BO-based algorithm in reaching to the optimal global solution. It is shown that the solution from the Bayesian optimization provides the minimal expected deviation from the global minimum/maximum and the asymptotic density of calculations of the objective function is much greater around the point of global minimum/maximum [41], [42].
In contrast, the random search slowly progress and cannot reach the global point as only parts of the values of the puncturing weights are tried out. The advantage of proposed solution is that it chooses only the relevant search space and discards the ranges that will most likely not deliver the best solution. Thus, instead of randomly choosing the next sampling point, the proposed algorithm optimizes the surrogate model and acquisition function according to the previous sampling points at each iteration.
To further investigate the performance of the proposed algorithm, Fig. 11 shows the optimal value of the objective function achieved by the proposed algorithm with EI acquisition function versus the URLLC packet arrival rate. These results verify the effectiveness of the proposed algorithm in comparison with the random search. In particular, the gap in the performance is significant for the higher URLLC arrival rate. This is because the performance is sensitive to the puncturing weights when the BS needs to serve more of URLLC packets.

D. Performance Comparison With Benchmarks
We implemented two benchmarks to highlight the performance of proposed URLLC queuing scheme in the trade-off problem of the joint eMBB-URLLC system. 1) Benchmark 1: Immediate Dropping Scheme: In this scheme, if the newly arrived URLLC packets cannot be transmitted, such packets will be dropped immediately as the losses. Obviously, such policy increases the number of URLLC drops, and to optimize the probability of not dropping URLLC packets, more mini-slots of eMBB services will be interrupted.
2) Benchmark 2: Scheduling Scheme in [19]: In this scheme, each transmission channel has an individual URLLC  queue. The newly arrived packets are distributed evenly among these queues, oblivious to the channel gains.
To evaluate the performance of the proposed scheme, in Fig. 12 to Fig. 17, the total normalized eMBB throughput, the URLLC utility, and the URLLC outage probability are analyzed in comparison with two benchmarks. The URLLC arrival rate in Fig. 12 to Fig. 17 changed from 0.01 to 0.1 packets/mini-slot. The objective functions were optimized by the proposed algorithm when using Expected Improvement as the acquisition function. Moreover, we also compare the influence of the probability of URLLC successful transmission on the performance of the proposed scheme. The probability of URLLC successful transmission in Fig. 12 to Fig. 14 is s b urllc = 0.9, ∀b ∈ B and in Fig. 15 to Fig. 17 is s b urllc = 0.8, ∀b ∈ B. Fig. 12 shows the total normalized eMBB throughput (i.e., U eMBB = s s s∈S P s s s U s s s eMBB ) versus the URLLC served rate (i.e., E [μ m ]) under different numbers of channels. The results confirm that the our proposed solution with one queue offers the highest eMBB throughput as compared with Benchmarks 1 and 2. Overall, the proposed solution could maintain the total normalized eMBB throughput above 80%, even when B = 2. Although Benchmark 2 performs better than Benchmark 1, the scheduling in [19] is less flexible than our proposed solution with one queue due to even distribution of packets among channels. In the proposed solution with one queue, the URLLC packets are distributed considering the channel conditions of different channels. Besides, as shown in Fig. 12, by queuing the URLLC packets, the total normalized eMBB throughput of both the proposed solution and Benchmark 2 are robust with serving more URLLC packets. In contract, in Benchmark 1, the total normalized eMBB throughput is more sensitive to URLLC served rate. Furthermore, the performance under all three schemes could be improved by increasing the number of transmission channels. But no-queue scheme is much more sensitive to the number of channels. In Fig. 12, when the URLLC arrival load is λ = 0.02 packets/mini-slot, the calculated E[μ m ] of the no-queue scheme in the cases of B = 2 and B = 6 are 0.01981 and 0.01631 packets/mini-slot, respectively. And the total normalized eMBB throughput can be improved from 77% to 86% when changing the number of channels form B = 6 to B = 2. However, the total normalized eMBB throughput of the proposed solution only has a small change.
To further assess the influence on the probability of not dropping URLLC packets, we define the URLLC outage probability as  Fig. 13, with more channels, all three schemes could reduce the URLLC outage probabilities and enhance the expected values of URLLC served rates. However, the outage probabilities are significantly lower with our proposed solution with one URLLC queue. Fig. 14 shows the URLLC utility (i.e., U urllc = s s s∈S P s s s U s s s urllc ) versus the URLLC served rate (i.e., E [μ m ]) under different numbers of channels. The results confirm that the proposed solution provides the highest URLLC utility compared to the other two benchmarks. Besides, with increasing the URLLC arrivals, the URLLC utility decreases. This is due to the increase of both the URLLC outage probability and the number of failures in transmitted URLLC packets.   Furthermore, we investigate the influence of the probability of URLLC successful transmission by setting s b urllc = 0.8, ∀b ∈ B in Fig. 15-Fig. 17, and compare them to the corresponding results in Fig. 12-Fig. 14 where s b urllc = 0.9, ∀b ∈ B. In Fig. 15, the total normalized eMBB throughput values are smaller than the results in Fig. 12. This is because with a lower probability of URLLC successful transmission, the BS will thrive to maintain the URLLC utility by puncturing more eMBB transmissions. Thus, the number of served URLLC packets increases, and Fig.16 shows decreasing trends of URLLC outage probabilities compared to Fig. 13. However, the overall URLLC utility in Fig. 17 is smaller compared to Fig. 14, which implies that the number of successfully transmitted URLLC packets decreases. In contrast, the proposed URLLC queue mechanism robustly provides stable probabilities of not dropping URLLC regardless of S urllc .
Overall, compared to the the benchmarks, the proposed URLLC scheduling policy with the queuing mechanism could enhance the URLLC reliability within the target latency without causing a significant influence on the eMBB transmissions.

VI. CONCLUSION
In this paper, we developed an optimal URLLC scheduling policy on top of scheduled eMBB channels considering a URLLC queuing mechanism. The scheduling problem has been formulated as a stochastic optimization problem aiming to maximize the total normalized eMBB throughput and URLLC utility. By studying the queue dynamics, we mathematically analyzed the expected value of served URLLC packets and probability of not dropping URLLC packets to derive the objective function. However, it is infeasible to have a closed-form expression due to the computational complexities of deriving the steady-state solution of the URLLC queue. Thus, we applied Bayesian optimization combined with the additive structure algorithm to solve the high dimensional black-box scheduling optimization. The results illustrated that the proposed novel queuing scheme could enhance the probability of not dropping URLLC packets within the target latency requirement in each mini-slot and maintain the total normalized eMBB throughput among all system states. Furthermore, the results also showed that the Bayesian algorithm performed well in finding an optimal point of the black-box function compared with the random search. We also implemented the no-queue scheme and the scheduling scheme in [19] as benchmarks. The results indicated that the optimal URLLC scheduling policy with the proposed queuing scheme has a better flexibility and significantly outperforms the other two scheduling policies in maintaining the total eMBB throughput and reducing the probability of not dropping URLLC packets. Future cellular networks could consider applying the proposed queue mechanism model to a model-free joint eMBB/URLLC scheduling problem by applying deep reinforcement learning algorithms. In addition, the existing DRL-based algorithms were designed to only maximize the expected longterm reward without considering the variance of the reward distribution. Thus, future cellular networks should be aware of the potential risks caused by the randomness of longterm rewards and develop more robust joint eMBB/URLLC scheduling strategies.