Energy-Efficient Edge Computing Service Provisioning for Vehicular Networks: A Consensus ADMM Approach

In vehicular networks, in-vehicle user equipment (UE) with limited battery capacity can achieve opportunistic energy saving by offloading energy-hungry workloads to vehicular edge computing nodes via vehicle-to-infrastructure links. However, how to determine the optimal portion of workload to be offloaded based on the dynamic states of energy consumption and latency in local computing, data transmission, workload execution and handover, is still an open issue. In this paper, we study the energy-efficient workload offloading problem and propose a low-complexity distributed solution based on consensus alternating direction method of multipliers. By incorporating a set of local variables for each UE, the original problem, in which the optimization variables of UEs are coupled together, is transformed into an equivalent general consensus problem with separable objectives and constraints. The consensus problem can be further decomposed into a bunch of subproblems, which are distributed across UEs and solved in parallel simultaneously. Finally, the proposed solution is validated based on a realistic road topology of Beijing, China. Simulation results have demonstrated that significant energy saving gain can be achieved by the proposed algorithm.


A. Background and Motivation
T HE rapid development of vehicular networks will spur an array of applications in the domains of travel assistance, self-driving, video streaming, and online gaming [1]- [4], which require enormous computation resources to process a large volume of workload data and have strict timeliness Time required to deliver the L2 report T k,P T Time required for the sMAG (nMAG) to send the data (T k,P T ) packets to the nMAG (RSU) ϕ Time required to deliver the HI message k Time for confirming the received profile and creating a new cache entry requirements [5], [6].To support the delay-sensitive and multimedia-rich services in vehicular networks, vehicular edge computing (VEC), in which workloads are processed at the network edges to eliminate excessive network hops, has been proposed [7].VEC not only reduces the computation response time, but also alleviates the traffic congestion problem in capacity-constrained backhaul links [8], [9].Furthermore, VEC allows opportunistic energy saving for in-vehicle user equipments (UEs) with limited battery capacity such as smart phones and wearable devices.Traditionally, all of the workloads have to be processed locally on the UE, which dramatically reduces the battery endurance time and impedes the service delivery reliability.With the assistance of VEC, the energy-hungry workloads can be offloaded from the UE to nearby VEC nodes with higher computing capability and abundant energy supply via vehicle-to-infrastructure (V2I) links [10].As a result, the energy expenditure of local computing is saved at the costs of increased latency caused by workload offloading and the additional energy consumption for transmitting the computation workload [11].
There exist some works that have tried to improve energy efficiency of UEs via workload offloading [12]- [15].You et al. studied resource allocation problems under the computation latency constraint for MEC offloading systems in order to minimize the weighted sum energy consumption for mobile UEs [14].In [15], Li et al. introduced MEC into virtualized cellular networks with machine-to-machine communications, where each UE chooses to access virtual networks so as to minimize the energy consumption and execution time.However, some critical challenges have been neglected in previous studies, which are summarized as follows.
First, workload offloading may not always lower energy consumption due to communication costs.To minimize the energy consumption, the tradeoff between energy saving of workload offloading and energy consumption of communication should be optimized dynamically based on a number of factors including channel conditions, workload attributes, vehicle velocity, computing capability, etc., which has not been thoroughly analyzed from the perspective of energy efficiency [12].Second, the offloading decisions of adjacent UEs are often intertwined with each other via the constraint term of VEC node's computing capability, and the size of the joint optimization problem grows rapidly with the number of UEs.Centralized optimization approaches proposed in [13] faces severe complexity and scalability problems.Last but not least, the intermittent connectivity between vehicles and road side units (RSUs) poses another critical challenge.A vehicle that have moved out of the RSU coverage during workload data transmission will result in frequent offloading failures, which is not considered in previous works [12]- [15].

B. Contributions
In this paper, we investigate how to address the above challenges by exploring consensus alternating direction method of multipliers (ADMM), which is a powerful tool for solving distributed convex optimization problems.It takes a decomposition-coordination procedure, in which the joint optimization problem is firstly decomposed into several tractable subproblems that can be solved in parallel, and then the solutions of all the subproblems are coordinated to obtain the global solution of the original problem [16].The main contributions of this work are summarized as follows.
• We introduce queuing theory to derive the stochastic traffic models at both UEs and VEC nodes with the consideration of queue heterogeneity.By assuming that the generated workload follows a Poisson process and the service time follows an exponential distribution, the workload traffic models of the UE and the VEC node can be regarded as a M/M/1 queue and a M/M/c queue, respectively.Then, the closed-form expressions of computation latency and waiting latency are derived based on Little's law and Erlang's formula.A review of related works is presented in Section II.Section III describes the system model.The problem formulation is presented in Section IV.The consensus ADMM-based distributed algorithm is proposed in Section V. Simulation results and related analysis are elaborated in Section VI. Conclusions and future directions are summarized in Section VII.

II. RELATED WORKS
Mobile edge computing (MEC) is regarded as a promising solution to achieve the performance gain of proximate data processing, short-range transmission, and location awareness [17].There have been many works investigating MEC in vehicular networks.Feng et al. proposed a VEC framework named autonomous vehicular edge (AVE) to increase the computational capabilities of vehicles in a decentralized manner [7].In [10], Zhang et al. designed an offloading scheme to improve the transmission efficiency with considerations of the task execution time and the vehicle mobility.In [18], Taleb et al. developed a cloud-based MEC offloading framework and proposed a predictive computation mode transfer scheme to improve task transmission efficiency in vehicular networks.These works mainly focus on low-latency and high-reliability system design, and have not considered the energy saving problems for in-vehicle UEs with limited battery capacity.
There are many studies that investigate the energy efficiency issue in edge computing through workload offloading and system resource allocation.In [12], the workload allocation between fog and cloud is optimized to minimize the system energy consumption under different service delay constraints.In [13], Mao et al. proposed an effective computation offloading strategy to construct a green MEC system with energy harvesting devices.
Nevertheless, the above-mentioned works mainly target on static cellular networks, and thus cannot be applied directly for vehicular networks with highly dynamic and unreliable connections.Without considering the fast mobility of vehicles, conventional static decision-making schemes will result in frequent offloading failures when the connectivity between vehicles and the RSU becomes unavailable before the workload data has been fully uploaded.Although there exist some works which have applied MEC for vehicular networks [7], [10], [18], they mainly address the workload offloading problem from a delay minimization perspective, and have not considered the energy efficiency issues of in-vehicle UEs with limited battery capacity.Their results cannot be directly utilized to solve the energy-efficient workload offloading problem investigated in this work.Moreover, most of the previous solutions rely on a centralized optimization approach, the computing complexity of which increases significantly with the number of UEs.It is better to address the problem from a distributed perspective considering the complexity and scalability issues.Therefore, there lacks a unified distributed solution to address the energy saving problems for in-vehicle UEs with the considerations of vehicle mobility.
We next review the related studies about ADMM, which is used to solve the formulation of the joint optimization in this work.ADMM, which is known as a powerful tool for solving distributed convex optimization problems [16], has been widely applied in many aspects.Yin et al. considered a fog-assisted data streaming scenario [19], and proposed a hybrid ADMM (H-ADMM) method to solve the social welfare optimization problem and reduce the communication overhead.In [20], Vu et al. investigated the energy efficiency optimization problem of small-cell networks with multiantenna transceivers and base stations.By using Charnes-Cooper's transformation, the original optimization problem was transformed into an equivalent convex program, and an ADMM-based decentralized algorithm was presented to solve the problem and achieve a fast convergence.
This work is an extension of our previous work [1].Different from the previous studies, we employ the consensus ADMM approach to address the energy saving problem.The differences between ADMM and the consensus ADMM are summarized as follows.In ADMM, the primal variables are updated in an alternating or sequential fashion, which can be regarded as a modified version of the conventional method of multipliers based on the Gauss-Seidel approach [16].On the other hand, the consensus ADMM employs a series of local variables, based on which the primal variables no longer need to be updated sequentially.Instead, the coupled objectives and constraints of the joint optimization problem can be separated and distributed across UEs, where each UE only has to deal with its own objective and constraint term.In other words, the consensus ADMM is actually the extension of ADMM for solving the consensus problems, which aims to achieve a consensus between the local variables and the global variables in a dynamically changing environment [21].Hence, the consensus ADMM approach not only reduces the amount of information that needs to be exchanged, but also enables parallel decision making [22].This is of significant importance for vehicular networks with fast mobility, capacityconstrained communication links, and strict timeliness requirements.Convergence to the global solution is guaranteed as long as the convergence requirements can be satisfied [23].Furthermore, a more realistic handover model based on the IFP-NEMO is utilized, which takes into account both the link reestablishment and the data forwarding latency.Last but not least, we provide a comprehensive analysis regarding the convergence and complexity properties.We also validate the proposed scheme by using a real-world road topology.

III. SYSTEM MODEL
In this section, we elaborate the overall system model of VEC, the data transmission model, and the computation workload offloading model in details.

A. The Overall System Model
The hierarchical computing framework for vehicular networks is shown in Fig. 1, which is composed of three layers, i.e., the control layer, the VEC server layer, and the vehicular network layer.In the control layer, a centralized controller is responsible for the inter-cell resource coordination and handover management [24].To maintain Internet connectivity for moving vehicles, the improved fast proxy mobile IPv6 based network mobility basic support (IFP-NEMO) mechanism is adopted [25].In IFP-NEMO, the mobility management of vehicles is performed by a mobile access gateway (MAG), which acts as a proxy mobility agent.In the distributed VEC server layer, M RSUs are deployed uniformly along an unidirectional lane and connected to the MAGs via Ethernet connections.For each RSU, there exists a co-located VEC node with c homogeneous servers.The m-th RSU and the colocated VEC node are denoted as R m and I m , respectively.
In the vehicular network layer, we can divide the road into corresponding M segments based on the coverage areas of M RSUs, e.g., segment m corresponds to the coverage of RSU R m .We assume that there exist K vehicles in the mth segment traveling towards the same direction, which is shown in Fig. 1.The k-th vehicle is denoted as V k .The communication device mounted on each vehicle has twofolded functions.On the one hand, it allows the vehicle to transmit data to the RSU and offload workloads to the VEC node via dedicated V2I links.On the other hand, it acts as an access point and provides free connections for in-vehicle UEs via short-range communication technologies such as Wi-Fi [26].The UE inside vehicle V k is denoted as U k .The set of in-vehicle UEs is defined as For any UE U k ∈ U, an array of applications are executed, which accordingly generate a series of computation workloads.Without loss of generality, the workload generated at UE U k is assumed to follow a Poisson process with an average arrival rate λ k [27]- [29], which can be either processed locally by the UE itself or offloaded to VEC node I m .The key attributes of the workloads generated by UE U k can be described by a triplet {θ k , δ k , τ k }, where θ k represents the data size of workloads, δ k is the required computation resource for processing the workloads, and τ k represents the delay constraint.We assume that each workload has the same computation complexity, which is defined as δ.This assumption is valid since a higher complexity workload is equivalent to several several basic workloads with the same computation complexity.Thus, we have δ k = δλ k .By further assuming that the service time follows an exponential distribution, the workload traffic models of UE U k and VEC node I m can be regarded as a M/M/1 queue and a M/M/c queue, respectively.
The workload offloading and execution are implemented as the following three steps: (i.) each UE U k ∈ U determines the portion of workload offloaded to VEC node I m , i.e., 0 ≤ p o k ≤ 1, and transmits the workload related data to RSU R m ; (ii.) the offloaded workload is processed at the VEC node; (iii.) the obtained computation results are fed back to UE U k .
Remark 1.In this work, we only consider the simplified single-segment case in order to derive a tractable solution.The more complicated multi-segment case is beyond the scope of this paper and will be investigated in future works.Nevertheless, the proposed solution can be easily extended to the multi-segment scenario by adopting a time-slot model.That is, the number of vehicles in each segment remains constant within a slot and varies across different slots.Hence, the proposed solution can be applied for the optimization of workload offloading within each segment in a slot-by-slot fashion.
Remark 2. A justification for the M/M/1 and M/M/c queuing models is that the same traffic and service time models have been adopted in a number of previous works such as [27]- [29].Moreover, the solution structure does not depend on the specific traffic models.The proposed solution can be extended to other traffic models.

B. The Transmission Model
We assume that each vehicle is allocated with an orthogonal spectrum resource block so that the co-channel interference among vehicles can be ignored.In the offloading mode, data are actually transmitted from UE U k to RSU R m in a two-hop fashion, i.e., data are firstly sent from UE U k to vehicle V k in the first hop, and then are forwarded from vehicle V k to RSU R m in the second hop.The signal to noise ratio (SNR) expressions of the first-hop link and the second-hop link are calculated as where P U k and P V k are the transmission power of UE U k and vehicle V k , respectively.g U k and g V k are the channel gain between U k and V k , and the channel gain between V k and RSU R m , respectively.N 0 is the additive white Gaussian noise (AWGN).
The effective SNR of the two-hop link, i.e., ( [30], is expressed as Hence, the transmission time required by UE U k for uploading workload data with size p o k θ k , i.e., T t k , can be obtained as where B k refers to the channel bandwidth.Due to the fast vehicle mobility, vehicle V k might move out of the communication range of RSU R m during data transmission, which results in an offloading failure.Denote the dwell time of V k inside the coverage of RSU R m as τ o k .An offloading failure occurs if τ o k < T t k .Therefore, τ o k also represents the delay constraint of data transmission because V k can only transmit data to RSU R m when it remains within segment m.That is, an offloading request is admissible if and only if where d k denotes the distance between the location of V k and the coverage edge of RSU R m in the vehicle heading direction, and vk denotes the average velocity of V k within segment m. Remark 3.Both d k and vk can be estimated from the GPS data [31], which are generally available for latest vehicles.For example, if V k moves in the centrifugal direction to leave the coverage area of RSU R m with radius d m , d k is calculated as The energy consumed for transmitting the workload data to the in-vehicle access point is calculated as

C. The Computation-Offloading Model
Based on the Poisson splitting property [32], if the workload of UE U k follows a Poisson process with an average rate λ k , then the workload that is processed locally on UE U k follows a Poisson process with an average rate (1−p o k )λ k .Furthermore, the workload offloaded from UE U k to VEC node I m also follows a Poisson process with an average rate p o k λ k .Next, by using Little's law, the local computing latency T l k of UE U k is calculated as where denotes the normalized workload of other on-going applications, which reflects the occupancy rate of CPU resources, i.e., 0 ≤ S l k ≤ 1.For example, S l k = 1 represents that the CPU is completely occupied by other applications.
The energy consumption of local workload execution is given by where β k represents the local power consumption per unit workload execution.The energy consumption of UE U k , which contains the energy consumed for local workload execution and workload data uploading, is expressed as Taking ( 6) and ( 8) into ( 9), the expression of E total k (p o k ) is written as (10).
Remark 4. u l k , β k and S l k depend on the intrinsic nature of CPU, workload complexity, and other ongoing applications.To simplify the problem, the values of u l k , β k and S l k are assumed as constants during the decision making process, and may vary across different decision making processes.It is noted that the values of u l k , β k and S l k are privacy information of UE U k , which are generally unknown for VEC node I m .Hence, conventional centralized optimization algorithms which require perfect knowledge of UE's private information cannot be directly applied.
Due to the limited computation resources, the VEC node cannot execute a massive number of workloads simultaneously.In VEC node I m , the workloads offloaded from different UEs are pooled together and wait to be processed by VEC servers.Since the combination of independent Poisson processes is also Poisson [32], the sum rate λ e m is calculated as Considering the c homogeneous servers deployed in VEC node I m , the computing capability of each server is defined as u e m .Based on the M/M/c queuing model and Erlang's formula [33], the average waiting latency of each workload at VEC node I m can be calculated as where ρ e m is the server occupancy, and ϕ(c, ρ e m ) is the Erlang C formula which represents the waiting probability.ρ e m and ϕ(c, ρ e m ) are calculated as In RSU R m , the computation results also have to wait in a queue before they can be processed and delivered back to UE U k .Hence, the average waiting latency of each computation result at RSU R m , i. e. , T t m (p o k ), can be expressed as where u t m denotes the transmission processing rate of RSU R m , and η denotes the computation resouce required to process each result.The transmission latency from RSU R m to U k is ignored, due to the fact that the size of computation results is usually negligible compared to that of the input data.
If vehicle V k has already moved out of the coverage of RSU R m when the results are ready for transmission, i.e., a handover occurs when T t k + T e m + T t m > τ o k , then the results have to be forwarded firstly from the serving MAG (sMAG) to the centralized controller, and then sent from the centralized controller to the next MAG (nMAG) with which V k will be attached.The handover process of the IFP-NEMO is carried out from two perspectives in parallel: link reestablishment and data forwarding [25].The procedure of link reestablishment is illustrated as follows.
• Step 1: The previous wireless layer 2 (L2) link between R m and V K is disconnected, which requires a time of with which V k is reconnected, and V k is established, which takes a time of T on .Hence, the total latency of link reestablishment is calculated as The procedure of data forwarding is illustrated as follows: • Step 1: The predictive mode of IFP-NEMO is activated when V k sends a L2 report to the sMAG.The time required to deliver the L2 report is denoted as t L2 .• Step 2: Upon receiving the L2 report, the sMAG sends a handover initiate (HI) message to the nMAG, which contains a number of key information including vehicle ID, home network prefix, mobile network prefix, and centralized controller address.The time required to deliver the HI message is denoted as ϕ. .
Once both the link reestablishment and the data forwarding processes are completed, the nMAG sends the data packet to RSU R m (R m = R m ), which takes a time of T k,P T .The handover latency T h k is defined as the total duration during which V k cannot send or receive any data packet due to either link reestablishment latency or data forwarding latency.To calculate the handover latency, the following four cases are considered, which are shown in Fig. 2.
, the link reestablishment process is finished earlier than the data forwarding process.Therefore, the data can be directly sent to V k from the nMAG without buffering.Furthermore, since T L2 + ϕ + 2 k + κ > T of f , the handover latency should be calculated from the moment that the previous L2 link has been disconnected, which is given by • Case B (T L2 + ϕ + 2 k + κ < T of f and T k,data > T k,link ): In this case, the sMAG starts to transfer data to the nMAG even though the L2 link has not been disconnected.Therefore, the handover latency is calculated from the moment when the sMAG starts to transfer data packets to the nMAG.T h k is calculated as • Case C (T L2 + ϕ + 2 k + κ > T of f and T k,data < T k,link ): Since T k,data < T k,link , V k has not reconnected with the nMAG when the data forwarding process is completed, and the delivered data have to be buffered in nMAG.T h k is calculated as • Case D (T L2 + ϕ + 2 k + κ < T of f and T k,data < T k,link ): This case is similar to case B. The only difference is that the data forwarding process is finished earlier, and the nMAG has to wait for the link reestablishment process to be finished.Therefore, T h k is calculated as A robust approach is to consider the worst-case scenario, Hence, the latency caused by workload offloading is the sum of the workload transmission latency, the waiting latency at VEC node I m , the remote workload execution latency, the waiting latency at RSU R m , and the handover latency, which is given by

IV. PROBLEM FORMULATION
The objective is to minimize the total energy consumption of K UEs within the coverage of RSU R m .The formulated energy-efficient workload offloading problem is given as follows: Here  is not convex.Second, the optimization variables of K UEs are coupled through the term C 2 .Furthermore, it is noted that the problem size of P1 grows enormously fast with the number of UEs.Therefore, it is difficult to solve P1 via centralized solutions because the VEC node or the RSU has to collect every detailed piece of information from all of UEs.This might be infeasible for practical implementation considering the communication overhead constraint and the threat of privacy leakage.Hence, we aim at addressing P1 in a distributed manner.

V. CONSENSUS ADMM-BASED ENERGY-EFFICIENT WORKLOAD OFFLOADING
In this section, we introduce an energy-efficient distributed solution based on consensus ADMM.First, we provide a brief introduction to consensus ADMM for the readers' better understanding.Then, we introduce the problem transformation which is a prerequisite for applying consensus ADMM.Next, the implementation procedures of the proposed distributed solution are elaborated.Finally, we analyze the convergence and complexity properties.

A. Introduction to Consensus ADMM
Generally, ADMM is suitable to solve the problems with the following forms [23]: where , and c ∈ R q3×1 .The augmented Lagrangian of ( 24) is given by where ρ ∈ R ++ denotes the penalty parameter in the augmented Lagrangian, which is used to increase the speed of convergence in ADMM [27].ρ can be adjusted by using the self-adaptive approach [16].µ denotes the vector of Lagrange multipliers.
The problem ( 24) can be solved via the following iterations: where t is the index of iteration.
Next, we consider a global consensus problem with a global variable vector z, i.e., z ∈ R q1×1 , and several local variable vectors x i , i.e., x i ∈ R q1×1 , i = 1, • • • , N , which is formulated as [22]: The consensus constraint guarantees that all of the local variables should be equal to the global variable.The augmented Lagrangian corresponding to ( 29) is given by The resulting iterations are given given by Remark 5.It is noted that the x-minimization and yminimization steps in ( 26) and ( 27) are carried out in a sequential fashion, while the x i -minimization in ( 31) is carried out in parallel for each i = 1, • • • , N .

B. Problem Transformation
To apply ADMM, we have to transform problem P1 into a tractable form.First, the original problem with a fractionalform is transformed to a new problem with a subtractiveform objective function.Second, the problem with coupled variables is further transformed to a decomposable problem with separable objectives and decoupled variables.The details are illustrated as follows.
1) Nonlinear Fractional Programming: It can be observed from ( 10) that E total k (p o k ) is a fractional-form function.Hence, we can employ nonlinear fractional programming to transform the original problem in the fractional form into an equivalent problem in the subtractive form.
Let us define the numerator and denominator of (10) as = min , where p o * k denotes the global optimal solution for UE U k .Based on nonlinear fractional programming [34], we have the following property: Theorem 1: ψ * k is achieved if and only if Proof: The detailed proof is omitted due to space limitation.A similar proof can be found in our previous work [35].
Theorem 1 indicates the necessary and sufficient conditions to obtain ψ * k .Accordingly, p o * k can be obtained by solving the following transformed problem: Remark 6.It can be easily proved that the objective of P2 is convex with regards to p o k by calculating the corresponding second derivative.
However, the specific value of * k required to solve P2 is still unavailable.To obtain ψ * k , the iterative Dinkelbach method can be used [34].Denote the iteration index as n and the initial value of ψ k as a small positive number.At the n-th iteration, p o k [n] is derived by using ψ k [n] obtained from the (n − 1)-th iteration, which is given by P3 : min How to solve P3 is provided in Subsection V-C.Then, upon obtaining The iteration process will stop if where ε represents the stopping criteria.The above implementation procedures are summarized as the outer loop of Algorithm 1.
2) Consensus Problem Formulation: At each iteration n, problem P3 has to be solved with a given ψ k [n].However, the objectives in P3 are not separable because the workload offloading variables of K UEs are coupled through the constraint term C 2 .To provide a distributed solution, local copies of the global optimization variables are introduced to transform P3 into a general consensus problem.Specifically, defining the vector of global optimization variables as For instance, the local copy of the global variable p o k−1 (i.e., the offloading strategy of UE U k−1 ) at UE U k is p o,k k−1 .Furthermore, we define the feasibility set of the local optimization variables for UE U k as ω k , which is given by We define the local objective function associated with the feasibility set , the solution is feasible, then χ k is equivalent to its global counterpart, i.e., Otherwise, if the constraints cannot be satisfied, Therefore, the general consensus problem corresponding to P3 is given by P4 : min where C 7 denotes the consensus constraint, i.e., the local variables duplicated at different UEs should be equal to the global variables.Remark 7. C 7 guarantees that P3 and P4 are equivalent.

C. Consensus ADMM-based Distributed Solution
In this subsection, the proposed consensus ADMM-based solution is elaborated in details.Let Λ be the K ×K matrix of the Lagrange multipliers corresponding to the consensus constraint C 7 in P4.Λ is given by Λ where µ k is a K × 1 vector.
The augmented Lagrangian for P4 is expressed as The resulting iterations of updating local variables, global variables, and Lagrange multipliers are given in ( 44), (45), and (46), respectively.Remark 8. From (44), it is clear that the optimization of po k is carried out independently for each UE.As a result, P4 can be decomposed into a set of subproblems, which are distributed across UEs and solved in parallel.The corresponding optimization objective for U k is exactly χ k .
Based on the above analysis, the energy-efficient workload offloading algorithm based on consensus ADMM is summarized as Algorithm 1.It consists of two loops.The outer loop represents the iterations to solve the nonlinear fractional programming problem, and the corresponding iteration index is n.The inner loop represents the iterations for updating the primal and dual variables, and the corresponding iteration index is defined as t.In iteration n of the outer loop, given ψ k [n], the primal and dual variables are updated sequentially to find the optimal workload offloading strategy, which are elaborated as follows: 1) {p o k } update : In iteration t of the inner loop, the optimization of po k [t+1] is carried out by using (44).It is noted that (44) is actually a quadratic programming (QP) problem [16], which can be easily solved by existing QP solvers.
where r k and s denote the primal residual and the dual residual, respectively.pri and dual denote the thresholds for r k and s, respectively.Moreover, as proved in Subsection V-D, the primal and dual update iterations in consensus ADMM satisfy objective convergence, residual convergence and dual variable convergence as t → ∞. 11: Update t → t + 1; 13: end while 17: Convergence = False  (38).The stopping criteria of the outer loop is given in (39).In the final iteration of outer loop, the obtained workload offloading strategy converges to the optimal strategies, i.e., p o * k .ψ * k is calculated by using p o * k as (34).

D. Property Analysis
In this subsection, we analyze the convergence and complexity of the proposed algorithm.
1) Convergence of the Inner Iteration: The objective function of P4 is closed, proper, and convex, and the corresponding epigraph is a closed nonempty convex set.Furthermore, the Lagrangian L({p o k }, Λ) has a saddle point.Thus, based on [16], the inner iteration satisfies residual convergence, objective convergence and dual variable convergence, which is shown as below.
• Residual convergence: which indicates that the iterations approach feasibility.• Objective convergence:  A similar proof can be found in [35].
3) Complexity: In each iteration of the outer loop, P4 is solved to produce a decreasing sequence of ψ k .Here, we define n loop the required number of iterations by the outer loop to reach convergence.Similarly, in each iteration of the inner loop, (44), (45) and (46) are updated sequentially to obtain p o * k [n] for a given ψ k [n].We define t loop as the number of iterations required by the inner loop to reach convergence.Hence, the computation complexity for solving each decomposed subproblem is O(n loop t loop ).

VI. SIMULATION RESULTS AND DISCUSSIONS
In this section, we validate the proposed algorithm based on the real-world topology of the Xidan area in Beijing, China.This area is featured with the Chang'an avenue, which is the road to several scenic spots such as Tian'an men Square and the Forbidden City as well as the headquarters of many companies and government agencies are located in this area.An aerial snapshot obtained from the Baidu map is shown in Fig. 3. First, The data of the digital map downloaded from OpenStreetMap is imported to SUMO.Then, vehicle traffics are generated based on the realistic road topologies, which are marked as small yellow triangles in Fig. 3.The critical attributes of each vehicle such as location and velocity are obtained during simulation, based on which the average velocity of each vehicle is estimated by using a simple rolling window regression approach [36].The RSUs are also deployed  along the Chang'an avenue.The simulation parameters are summarized in Table II [37]- [39].The proposed algorithm is compared with two heuristic algorithms including the brute-force searching algorithm, and the static offloading algorithm algorithm [11].In the static offloading algorithm, the portion of offloaded workload is fixed and the same for any UE.
Fig. 4 shows the relationship between p o k and the energy consumption of UE U k under different average vehicle velocities.It is clear that the energy consumption decreases firstly and then increases with p o k .When p o k is small, the energy consumption of transmission is less than that of the local computing.Hence, more energy can be saved by increasing o k .However, when p o k > 0.5, the energy consumed for data transmission starts to dominate the total energy consumption.In other words, the energy saving brought by workload offloading cannot compensate the energy consumed for data transmission.As a result, the energy consumption increases monotonically with p o k .Furthermore, we found that the energy consumption also increases with the average vehicle velocity when p o k > 0.5.The reason is that higher velocity will cause more offloading failures when p o k is large.This not only increases transmission energy consumption, but also results in higher energy consumption of local computing because more workloads have to be processed locally.
Fig. 5 shows the energy consumption versus vehicle velocity.The proposed algorithm is compared with the static offloading algorithm under different workload offloading portions.When p o k is large, i.e., p o k = 0.85 and p o k = 0.6, the energy consumption of the static offloading algorithm increases dramatically with the vehicle velocity.The reason behind is that higher velocity leads to frequent offloading failures.In comparison, the energy consumption of the proposed algorithm remains constant when the vehicle velocity is increased from 40 to 70 km/h.Simulation results demonstrate that the proposed algorithm is more robust to the negative impact caused by high vehicle mobility.Even when the velocity exceeds 70 km/h, the proposed algorithm still outperforms the static offloading algorithm.The reason is that the proposed algorithm is able to reduce the energy consumption by dynamically adjusting the offloading portion.For example, the optimal offloading portions for the velocities of 70, 80, and 90km/h are 0.4576, 0.3918, and 0.3407, respectively.That is, as velocity increases, the portion of workload to be offloaded is also reduced accordingly to avoid offloading failure.Fig. 6 shows the average energy consumption per UE versus the RSU coverage diameter with different numbers of UEs.Enhancing the RSU coverage has positive impacts on the energy consumption.It not only reduces the handover latency but also relaxes the latency requirement of data transmission.Hence, the average energy consumption per UE decreases monotonically with the RSU coverage diameter.However, when the coverage diameter reaches a certain value, the performance improvement becomes saturated because the optimal energy consumption have already been achieved.Furthermore, when the number of UEs is doubled, the average energy consumption per UE only increases slightly.The reason is that the offloading portion is dynamically adjusted in accordance with the number of UEs.
The convergence performance of the proposed algorithm is shown in Fig. 7.The brute-force searching algorithm which examines all possible of combinations to find the optimal solution is utilized as a performance benchmark.It is observed that the proposed algorithm can converge rapidly to the optimal result only within 2 ∼ 3 iterations.
Fig. 8 shows the total energy consumption versus different workload offloading portions.The offloading portion of any UE is kept as the same, i.e., p o k = p o k , ∀k = k.The numerical results are consistent with Fig. 4, i.e., the energy consumption decreases firstly and then increases with p o k .Moreover, it is observed that the maximally allowed offloading portion decreases monotonically as the number of UEs increases.This is due to the constraint C 2 of problem P1 that the sum arrival rate of all UEs' workload cannot exceed the processing node of the VEC node.VII.CONCLUSION this paper, we have investigated the energy-efficient workload offloading for in-vehicle UEs with limited battery capacity, and proposed the consensus ADMM-based energyefficient resource allocation algorithm.First, by taking the high mobility of vehicles into account, we have proposed a queuing model to derive the closed-form expressions of the computation latency and the waiting latency.Then, we have formulated a workload offloading optimization problem with the explicit considerations of the overall energy consumption and latency.Next, we have proposed a consensus ADMMbased distributed solution.The formulated joint problem was decomposed into a set of subproblems and solved in parallel.Finally, a real-world topology based simulation has been conducted.For the future work, we will investigate the delay minimization problem in VEC by employing machine learning based workload prediction and computation resource prediction.

2 ) 3 )
{p o , µ k } update : Compared with {p o k } update, the optimization of p o [t+1] and µ k [t+1] can be carried out more easily due to the nature of un-constrained quadratic optimization.The specific updating processes of p o k [t+1] and µ k [t + 1] are shown in (45) and (46), respectively.Termination criteria of the inner iteration: The inner iteration stops workload arrival rate λ k 2 ∼ 5 workload/s Local computing capability u l k 1.4 ∼ 2.2 GHz Workload computation complexity δ 0.5 GHz/workload Edge computing capability u e m 12 GHz Noise power N 0 -97 dBm Time required to deliver the TPBU 20 ms message κ Time required to deliver the L2 report T L2 15 ms Time required for the sMAG (nMAG) to 10 ∼ 30 ms send the data packets to the nMAG (RSU) T k,P T (T k,P T ) Time required to deliver the HI message ϕ 10 ms Time for confirming the received profile 10 ms and creating a new cache entry k which indicates that the objective function eventually converges to the optimal value.•Dual variable convergence: µ k [t] → µ * k as t → ∞, where µ * k is a dual optimal vector.2) Convergence of the Outer Iteration: It can be proved that p o k [n] converges to p o * k in a super-linear speed as n increases.

Fig. 7 . 10 Fig. 8 .
Fig. 7.The convergence performance of the proposed algorithm.(K = 10, 15, and dm = 400m) Dwell time of vehicleV k inside the coverage of RSU Rm d kDistance between the location of V k and the coverage edge of RSU Rm in the vehicle heading direction vk Upon receiving the HI message, the nMAG confirms the received profile of V k and creates a new cache entry, which takes a time of k .• Step 4: The nMAG sends a tentative proxy binding update (TPBU) message to the centralized controller.The time required to deliver the TPBU message is denoted as κ.The centralized controller confirms the received profile of V k and creates a new cache entry, which takes a time of k .The sMAG sends the data packets to the nMAG via the centralized controller, which takes a time of T k,P T .The total time required for data forwarding is calculated as • Step 3: • Step 5: • Step 6: , C 1 and C 2 represent the computing capability constraints, i.e., the workload arrival rates λ k (1 − p o k ) and U k ∈U p o k should not exceed the processing rate at UE U k and VEC node I m , respectively.C 3 denotes latency constraint of data transmission.C 4 and C 5 denote the latency constraints of local and remote workload executions, respectively.C 6 is the boundary constraint of p o k .It is infeasible to find a polynomial-time solution for P1 due to the following two reasons.First, the objective function Initialize: n, t, p o k , ψ k , pri , dual , and ε.
0018-9545 (c) 2018 IEEE.Personal use is permitted, but republication/redistribution requires IEEE permission.See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.This article has been accepted for publication in a future issue of this journal, but has not been fully edited.Content may change prior to final publication.Citation information: DOI 10.1109/TVT.2019.2905432,IEEE Transactions on Vehicular Technology IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, VOL.XX, NO.XX, XXX 2019 11