Integrated Thermal and Energy Management of Connected Hybrid Electric Vehicles Using Deep Reinforcement Learning

The climate-adaptive mymargin energy management system (EMS) holds promising potential for harnessing the concealed energy-saving capabilities of connected plug-in hybrid electric vehicles (PHEVs). This research focuses on exploring the synergistic effects of artificial intelligence control and traffic preview to enhance the performance of the EMS. A high-fidelity model of a multimode connected PHEV is calibrated using the experimental data as a foundation. Subsequently, a model-free multistate deep reinforcement learning (DRL) algorithm is proposed to develop the integrated thermal and energy management (ITEM) system, incorporating the features of engine smart warm-up and engine-assisted heating for cold climate conditions. The optimality and adaptability of the proposed system are evaluated through both offline tests and online hardware-in-the-loop (HIL) tests, encompassing a homologation driving cycle and a real-world driving cycle in China with real-time traffic data. The results demonstrate that ITEM achieves a close to dynamic programming (DP) fuel economy performance with a margin of 93.7%, while reducing fuel consumption ranging from 2.2% to 9.6% as ambient temperature decreases from 15 °C to −15 °C in comparison to the state-of-the-art DRL-based EMS solutions.

Abstract-The climate-adaptive energy management system (EMS) holds promising potential for harnessing the concealed energy-saving capabilities of connected plug-in hybrid electric vehicles (PHEVs).This research focuses on exploring the synergistic effects of artificial intelligence control and traffic preview to enhance the performance of the EMS.A high-fidelity model of a multimode connected PHEV is calibrated using the experimental data as a foundation.Subsequently, a model-free multistate deep reinforcement learning (DRL) algorithm is proposed to develop the integrated thermal and energy management (ITEM) system, incorporating the features of engine smart warm-up and engineassisted heating for cold climate conditions.The optimality and adaptability of the proposed system are evaluated through both offline tests and online hardware-in-the-loop (HIL) tests, encompassing a homologation driving cycle and a real-world driving cycle in China with real-time traffic data.The results demonstrate that ITEM achieves a close to dynamic programming (DP) fuel economy performance with a margin of 93.7%, while reducing fuel consumption ranging from 2.2% to 9.6% as ambient temperature decreases from 15 • C to −15 • C in comparison to the state-of-the-art DRL-based EMS solutions.Index Terms-Adaptability, climate adaptive, deep reinforcement learning (DRL), integrated thermal and energy management (ITEM), optimality, plug-in hybrid electric vehicles (PHEVs).

I. INTRODUCTION
P LUG-IN hybrid electric vehicles (PHEVs) have remark- able potential to augment conventional powertrain efficiency and significantly curtail carbon emissions [1].By combining the advantages of series-parallel and power-split hybrid powertrain, a novel multimode dedicated hybrid transmission (DHT) has gained widespread attention and application, offering enhanced control flexibility and striking energy-saving benefits [2].Although the concept of multimode PHEVs has been proved promising for enhancing fuel economy, the actual performance under real driving conditions is highly dependent on the energy management system (EMS) and thermal management system (TMS) [3], [4], since the engine efficiency and TMS-related accessory loads are greatly influenced by the coolant temperature [5].Current research often presupposes that the engine is preheated at the commencement of this mission [6], but vehicles invariably encounter cold start conditions [7].Moreover, it is challenging to maintain coolant temperature in PHEV due to the intermittent operation of engine [8].Given that using electric heater in cold climate can be highly energy consuming, optimizing the integrated thermal and energy management (ITEM) system could engender reliance on engine-assisted heating instead of electric heating [9].
Effective control systems are required to address the inherent coupling in the EMS and TMS, and a handful of researches have begun to explore the synergistic optimization of ITEM [10].Pham et al. [11] proposed an integrated strategy for energy and thermal management applicable to parallel HEVs using the equivalent consumption minimization strategy (ECMS), where the battery state of charge (SOC) is designed to be temperature-related.Shams-Zahraei et al. [12] introduced an energy management framework incorporates engine thermal management, where they developed a cost function based on SOC and coolant temperature to obtain the optimal SOC trajectory using dynamic programming (DP).Although the above literatures have initiated the exploration of more nuanced and accurate system models to depict the reciprocal impact of EMS and TMS [13], substantial challenges still persist in this field due to the complexity of solving coupled thermal and energy management in varying climate conditions [14].To effectively address this problem, research might need to concentrate on the following areas: 1) the use of real-time traffic and terrain data provided by intelligent transportation system (ITS) and geographic information system (GIS) to achieve predictive ITEM for near-global optimality [15], [16] and 2) adopting optimization methodologies empowered by reinforcement learning (RL) to solve complex energy and thermal management problems [17].
Recently, some studies have made use of the information from ITS for model predictive control (MPC) to solve the ITEM of HEVs [18], [19].Based on traffic data, Wang et al. [20] developed a speed prediction-based ITEM system built upon MPC to reduce the energy consumption of cabin air conditioning by optimization compressor schedule.Hu et al. [21] proposed a multihorizon MPC for ITEM to utilize both short-and long-range speed prediction, considering that the EMS and TMS differ a lot in terms of time constants.However, the shortcomings of MPC in terms of real-time performance severely impact its industrial applications [22], and its effectiveness heavily relies on the model accuracy [23].
Comparatively, data-driven approaches hold promise in addressing the aforementioned issues.By reframing the EMS into Markov decision processes (MDPs) [24], several researches used deep Q-network (DQN) [25] and double DQN [26] to establish EMS with discretized action space, namely engine output power [27], and the results proved RL methods outperform conventional online optimization methods, such as ECMS, in both fuel economy optimality and computational burden [28].Furthermore, actor-critic (AC) framework can be adopted to enable continuous action-based EMS, for example, deep deterministic policy gradient (DDPG) has been adopted for competent EMS with continuous output of engine power [29], [30].Moreover, some improved RL algorithms capable of handling high-dimensional state variables [31], such as twindelayed DDPG (TD3) can be combined with multivariate trip information for traffic-aware EMS [32].Although excellent examples of RL have already been demonstrated in the field of EMS [33], its application in ITEM remains largely untapped.
Therefore, it is of utmost significance to develop DRLbased ITEM system for multimode PHEVs [34].To the best of our understanding, there is a scarcity of research focused on the trip-oriented integrated thermal and EMS for PHEVs using DRL.To fill this gap, this research is based on the promising multimode PHEV that embodies series, parallel, and power-split functionalities.Then, a DRL-based ITEM system is proposed, as depicted in Fig. 1, which integrates multisource traffic and terrain information processed by a spatiotemporal data processing (STDP) framework in real time.
The principal features of the proposed ITEM strategy encompass the following elements: 1) a novel model-free optimization methodology is proposed for the trip-oriented ITEM of a multimode PHEV with engine-assisted heating for cold climate operation, where the powertrain model is modeled by experimental data; 2) multivariate states, which encompass the ambient temperature and engine coolant temperature, in addition to traffic and terrain information, are integrated into the state space, thus promotes the optimal decision-making in real-world driving situations; and 3) numerous features including the bounded double Q-values, STDP framework, and delayed policy updates are merged to the ITEM agent for enhanced learning ability, and the results under various driving cycles verify the optimality and adaptability of the proposed control system.

A. Multimode PHEV Longitudinal Dynamic Model
The PHEV in this work dual-mode DHT with dual clutches, as shown in Fig. 1, where the shift drums D 1 and D 2 control the system operating in series hybrid (SH), where r PG , r f , r e , and r f refer to the gear ratio of the PG, final dive, and the transmission sets shown in Fig. 1.When operating in the PSH mode, the planetary gear (PG) functions as an electric continuously variable transmission (ECVT), where the generator acts as a speed regulator.The longitudinal dynamic model is explained from the following equations and the powertrain parameters can be found in Table I: where T dem refers to the torque demand, and a represents the acceleration.Air density is denoted by ρ, while gravitational acceleration is represented by g.Additionally, m v , v, and A represent the vehicle mass, velocity, and frontal area, respectively.Moreover, the wheel radius is represented by r , respectively.Rolling resistance is defined as f .Air resistance coefficient and slope gradient are indicated by C d and ϑ, respectively.Also, T ICE , T GEN , T DM , ω ICE , ω GEN , and ω DM refer to the torque and angular speed of the engine, generator, and drive motor, respectively.

B. Powertrain Model Calibrated With Experimental Data
The efficiency map of the generator can be observed in Fig. 2(a).The power of electric machines (EMs) can be calculated according to the following equation, where T EM , ω EM , and η EM represent the torque, angular speed, and efficiency of Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.EMs, respectively.Besides, the engine fuel consumption model considering thermal effect is represented by the following equations, based on the steady-state map shown in Fig. 2(b).In the following equation, T tar and T col are the target and actual value of the coolant temperature, respectively; also, α and β are the fitting parameters calibrated with experimental data, and the thermal dynamics of engine coolant T col is modeled based on the following equation.Also, the engine speed is limited to 1200 r/min when the coolant temperature is less than 40 Ṫ col = LHV ṁ − P ICE − Qexh + Qrad + Qcab m ICE C ICE (7) where µ determines whether the EM operates as a motor or a generator.It is assigned a value of −1 when the EM functions as a motor.Conversely, µ is set to 1 when the EM serves as a generator.Also, LHV refers to the lower heating value of gasoline, and the heat brought by exhaust gases, emitted via air convection, and delivered for cabin heating through radiator are denoted as Qexh , Qrad , and Qcab , respectively.Besides, m ICE and C ICE represent the equivalent thermal mass and specific heat capacity of the engine body and cooling system, respectively.The battery's charge-discharge characteristics are modeled with the open-circuit voltage and internal resistance [35], shown in Fig. 3.By employing the equation, the output power can be obtained.Also, the current can be calculated using the following equation: Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.where P trac , P ICE , and P aux represent the power of traction, engine, and auxiliary components, respectively.This article mainly focuses on the ITEM operation in cold climate, with an ambient temperature range of [−15 • C, 15 • C]; thus, the accessory load of TMS-related auxiliary components can be approximated by its average value according to experiments, as shown in Fig. 4. Rather than using positive temperature coefficient (PTC) device only, the heat of coolant can be utilized for cabin heating when its temperature reaches 75 • C, the warm state of engine coolant, resulting in the reduction of accessory load.The simulation results of the calibrated model are presented in Fig. 5, which exhibit a high level of consistency with the bench test data.Notably, the actual EMS implemented in the vehicle controller of Dongfeng Motor is designated as the benchmark (BMK) strategy, which is calibrated using finite state machine for mode selection and lookup table for determining the output power of the engine and battery, as shown in Fig. 5(a).The simulated engine power aligns well with the actual strategy, and the trend of coolant temperature and SOC captured by the simulation accurately characterize the real system, as illustrated in Fig. 5(b) and (c).

III. REINFORCEMENT LEARNING-BASED ITEM STRATEGY
A. Problem Formulation With RL In our research, we harness the RL for the optimization of ITEM.The core capability of RL lies in its competency to manage assignments that follow the MDP.This property empowers it to construct a precise correlation between environmental states and the ideal responses.Four elements form the nucleus of the RL iterative process: 1) environment model; 2) value function V π (s t ); 3) samples e t = (s t , a t , s t+1 ); and 4) policy π θ (s t ) parametrized by θ.Within this context, the control policy, symbolized as π * t , is developed focusing on boosting the overall return in the long run rather than seeking immediate gains.The optimization is to maximize the total discounted reward, as shown in the following: where r t (s t , a t ) denotes the reward associated with the state and action at time step t.Also, γ is defined as the discount factor belongs to [0, 1].The reward function r t is designed to include the engine fueling rate (mL/s), denoted as ṁ f (a t ), and the penalty for battery SOC (%) deviation, denoted as ς (s t ), which are both set to negative as shown in the following equation.The weight λ balances the engine fuel consumption and battery charge depletion where the SOC difference is defined as SOC t = SOC t − SOC SC , where SOC SC is indicative of the SOC target value.

B. Design of the State and Action Space
Multivariate state space: the state space of ITEM agent S t is expected to consider key parameters related to three parts: Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
The thermal states of the powertrain and cabin are taken into account when optimizing power distribution; thus, the state space in this work includes the power demand of TMS P Aux t , engine coolant temperature T e t , and ambient temperature T a t , which can be found in the following equation: The energy management-related variables, shown in the following equation, include the vehicle's velocity v t , acceleration χ t , and the battery SOC denoted as SOC t : Multisource traffic and terrain data processed by an STDP framework [32] are incorporated, shown in the following equation, where d rem t refers to the remaining mileage of the journey, and TL t , defined in the following equation, represents the traffic light state, which is determined based on the signal phasing and timing (SPaT) data cd G/R t and SPaT type S TL .Here, cd G t and cd R t are defined as the countdown of green and red light, respectively.Besides, T G and T R , respectively, refer to the total time length of the green or red phase.The definition of the SPaT type is as follows: S TL = 1 is defined as when the vehicle does not enter the traffic light waiting area; S TL = 0 means when the vehicle appears in the traffic light waiting area while the traffic signal is green; and S TL = −1 means when the vehicle appears in the traffic light waiting area when traffic signal is red where v t = v(t − T b :t + T f ) and ϑ t = ϑ(t − T b :t + T f ) represent the vector of velocity and road slope for the past and future horizon, provided by the digital map service from time step t −T b to t +T f .Here, T b is the symbolic of backward indexed time steps, while T f signifies forward indexed time steps, i.e., the length of the future horizon.The future horizon's vehicle velocity at both the initial time step and final time step T is set to zero, implying a stationary state.The recorded historical velocity and road slopes are denoted as v(t − T b : t − 1) and ϑ(t − T b : t − 1).Correspondingly, the future horizon's v(t : t + T f ) and ϑ(t : t + T f ) can be obtained by transforming the map-forecast future velocity v(t i : t i+n ) and GIS-based road slopes ϑ(t i : t i+n ) from the spatial domain to the desired time span.Here, d i , t d i , and v d i symbolize the driving distance, time, and velocity at a specific location

C. Training of ITEM
This work leverages TD3 to train the agent, where the state space is expanded to encompass EMS, TMS, and traffic/terrain information, as shown in (13).The actor network is trained to output engine power P ICE and drive mode M DHT by a t = π θ (s t ) + N (0, δ 2 ), where the mean-zero Gaussian noise N (0, δ 2 ) is added to deterministic policy for the balance of exploration and exploitation.It commences with the initialization of the replay memory D with a capacity of M before the offline training begins.The transition tuples from the interaction process are preserved in a replay buffer for experience replay.If the buffer reaches its limit, the oldest data will be replaced with new ones.The evaluate actor network is responsible for generating optimal commands, while each transition tuple e t = (s t , a t , r t , s t+1 ) feeds s t into the evaluate actor network to produce action a t = π θ (s t ).
This action is then employed by the pair of evaluate critic networks independently to compute the $Q$-values, represented as Q w 1 (s t , π θ (s t )) and Q w 2 (s t , π θ (s t )), with w 1 and w 2 denoting the parameters of evaluate critic network I and II, respectively.They are utilized for the update of the evaluate critic networks by minimizing the TD error-based loss functions as shown in the following equations: where Q w ′ 1 (s t+1 , ãt+1 ) and Q w ′ 2 (s t+1 , ãt+1 ) refer to the Qvalues ascertained by the pair of target critic networks.Moreover, ãt is the modified action using clipped noise from the target actor network, as shown in the following equation, where ±c refers to the upper and lower range.Also, Q w i (s t , a t ) shown in the following equation refers to the action-value function refers to the expected accumulated reward following policy π: Moreover, Q w 1 (s t , π θ (s t )) is used for updating the weights of the evaluate actor network using the following equation.The target networks are updated every d steps, implying that the pace of weight updates in the target networks lags behind that of the evaluate networks Concurrently, the parameters of the evaluate networks are optimized synchronously with the training progress, while the target networks' weights take after the corresponding evaluate networks with soft updates with a ratio of σ as detailed in the following equation.Upon the completion of the targeted episodes E, the optimal policy is chosen and loaded into the Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.rapid control prototype (RCP).The overall learning framework of the proposed ITEM is illustrated in Fig. 6 w

IV. TESTING AND VALIDATION SETUP
The ITEM system is trained under the WLTC driving cycle during the charge sustaining (CS) stage, where both the initial and target SOC values are set to 34% according to parameter setting of the real vehicle controller in Dongfeng Motor.Since the homologation driving cycle does not have real-time traffic information, the author's previous study proposed a synthetic construction of traffic data [32] and it is adopted in this article.In addition, the ambient temperature in this work is sampled according to the probability shown in Fig. 7, a fitted data of Beijing average temperature from November to April.Besides, the comparative study is carried out against  ].Moreover, its action space is consistent with that of the ITEM agent.Furthermore, the SOC compensation procedure is performed to obtain the corrected fuel consumption before conducting the comparison of control results with the identical terminal SOC, which aligns with the SAE J1711 standard [36].
In addition to offline optimality testing, the hardware-inthe-loop (HIL) testing system, depicted in Fig. 8, is utilized to validate the real-time control performance of the ITEM system under the CS stage.This platform comprises an upper computer that manages and configures input and output (I/O) interfaces, communication interfaces, and test cases using NI VeriStand software.The I/O signals from the RCP are connected to HIL terminals, enabling real-time operation of the powertrain loaded in the real-time computer (RTPC).
The HIL test is based on a real-world driving situation, a detailed capture of a 27-km driving during peak hours was carried out in Beijing, China, and the reconstructed traffic data can be seen in Fig. 9. Also, both the ambient temperature and the initial coolant temperature are set as −10 • C, and a higher initial SOC state of 45% is selected.

A. Parameter Design and Learning Ability
The specific hyperparameters utilized during the training process of the ITEM agent are listed in Table II, and a total of 500 training episodes have been designated.Additionally,   the parameters T b and T f corresponding to the velocity vector and the slope vector of the state inputs have been set to 10 and 50 s, respectively.The neural network architecture consists of four fully connected layers, with each layer comprising 200, 150, 100, and 50 neurons, respectively.These values have been determined based on comprehensive experiments and the latest literatures in this field.
The convergence curves of the offline training of the BMK EMS and the proposed ITEM system are displayed in Fig. 10.The RL-based EMS shows a faster convergence rate as it does not consider thermal variables in its state space.Despite the ITEM is slower in convergence speed, it achieves a higher return.

B. Optimality Validation in Homologation Driving Cycle
To understand the training process and performance after convergence, Fig. 11 shows the results from the 450th to the 500th episode.The fuel consumption after correction has been illustrated, since each episode has a different terminal SOC.Furthermore, the ITEM demonstrates better fuel economy than conventional RL-EMS as the temperature decreases.This is because the increased necessity of substituting electric heating with coolant heating at lower temperatures.The training results of both algorithms were separately fitted, and fuel consumption is recorded with initial coolant temperature ranging from −10 • C to 10 • C, as shown in Table III.
The ITEM is compared against the RL-EMS and BMK strategies to analyze the performance under homologation cycles, as shown in Fig. 12.In this test, the ambient temperature is 10 • C and the initial coolant temperature is set as 23 • C. Fig. 12(a) presents the comparison of SOC dynamics, revealing that the ITEM strategy achieves a shallower depth of discharge compared to RL-EMS and BMK.This is attributed to the fact that ITEM learns to expedite the engine warm-up process.In terms of the discharge process and the shape of the SOC trajectory, ITEM exhibits results closest to those of DP, outperforming the other two control systems and leading to improved fuel economy.In comparison to the RL-EMS and BMK control strategies, ITEM achieves fuel savings of 3.4% and 4%, respectively, while maintaining the highest average  water temperature.The comparison of different control systems can be found in Table IV.
Fig. 13 illustrates the influence of state variables on the output of actor network.It explains the degree to which each state variable affects the output.The data generated during the simulation training process are collected and fed into the decision tree, and the results are obtained using the gradient boosting regression tree package [32].The results show that SOC has the highest feature importance, followed by coolant temperature and velocity.Besides, traffic and terrain data also play a significant role in decision-making.

C. Online HIL Experiment in Real-World Driving Condition
The engine output power curve serves as a critical indicator for evaluating the performance of the control systems (see Fig. 14).It is evident that both the ITEM and BMK algorithms initiate engine preheating prior to 1000 s.However, the ITEM algorithm promptly commences engine operation once the  engine reaches a temperature of 40 • C, thereby raising the coolant temperature.Although BMK achieves a higher coolant temperature than RL-EMS before 2000 s through rapid preheating, it struggles to sustain the coolant temperature above 75 • C for an extended duration.This limitation hampers its capability to substitute cabin electric Furthermore, a zoomed-in view of the output power of engine, coolant temperature, power of the drive motor, and battery SOC is presented during the driving cycle from 1500 to 3500 s, as illustrated in Fig. 15.The ITEM efficiently accomplishes engine preheating, while maintaining coolant temperature consistently.Conversely, BMK refrains from engine operation mode after completing the preheating process due to the logic of rule-based EMS, since the battery SOC has not reached its threshold for engine operation.The RL-EMS, lacking thermal management-related states, commences engine preheating around 1600 s.However, after completing preheating, RL-EMS fails to raise the coolant temperature adequately.Hence, despite RL-EMS showing commendable performance in EMS, its overall energy-saving effectiveness is compromised.
The HIL test results of RL-ITEM, RL-EMS, and BMK can be found in Fig. 16 to illustrate the corrected fuel economy, energy consumption caused by electric heating, as well as the duration of engine warm state allowed for cabin heating.The fuel economy of ITEM outperforms the state-of-the-art RL-based EMS and rule-based BMK system by 7.1% and 12.4%, which is evidenced by the less energy usage on electric heating and longer duration in the required coolant temperature for cabin heating, and the detailed results are summarized in Table V.Therefore, the proposed method can be modified and applied to the calibration of intelligent ITEM, to tackle the complex real-world driving conditions and explore the concealed fuel-saving potentials in climate-adaptive EMSs.

VI. CONCLUSION
This article proposes the trip-oriented thermal and energy management for a multimode connected PHEV.The performance of the proposed ITEM has been validated by comprehensive experiments under homologation and real-world drive cycles with different SOC initial conditions, and the findings of the study are summarized as follows.
1) A multimode PHEV model with high-fidelity thermal and energy consumption characteristics is calibrated using experimental data.Based on this, a model-free multistate RL algorithm with a hybrid action space is designed for the trip-oriented ITEM system, featuring with engine smart warm-up and engine-assisted heating.

Fig. 1 .
Fig. 1.Overall design of the intelligent ITEM system and dual-mode PHEV powertrain configuration.

Fig. 4 .
Fig. 4. Average accessory load of the TMS-related auxiliary components in case of cold and warm status of engine coolant.

Fig. 5 .
Fig. 5. Experimental validation of powertrain model and control system.(a) Framework of benchmark strategy, (b) validation of velocity and coolant temperature, and (c) validation of engine power and battery SOC.

Fig. 6 .
Fig. 6.Learning framework of the RL-based ITEM system.

Fig. 9 .
Fig. 9. Trip profile of the reconstructed real driving cycles.

Fig. 10 .
Fig. 10.Learning curve of episode return in ITEM and EMS.

Fig. 11 .
Fig. 11.Training results in terms of fuel economy for ITEM and EMS during the 450th to the 500th episodes.

Fig. 13 .
Fig. 13.Feature importance of the state variables of the RL-based ITEM and EMS.

Fig. 15 .
Fig. 15.Magnified illustration of (a) engine power and coolant temperature and (b) drive motor power and battery SOC.

Fig. 16 .
Fig.16.HIL test results in terms of fuel economy, energy consumption of electric heating, as well as the duration of engine warm state allowed for cabin heating.
Integrated Thermal and Energy Management of Connected Hybrid Electric Vehicles Using Deep Reinforcement Learning Hao Zhang , Boli Chen , Member, IEEE, Nuo Lei , Graduate Student Member, IEEE, Bingbing Li , Rulong Li, and Zhi Wang

TABLE II HYPERPARAMETERS
OF TD3 ALGORITHM

TABLE III COMPARISON
OF FITTED AVERAGE FUEL CONSUMPTION (FC)

TABLE V HIL
VALIDATION FOR GENERALIZATION TEST IN −10 • C 2) The proposed ITEM controller incorporates techniques, including a STDP framework, bounded double Q-values, and delayed policy updates.Offline tests confirm the ITEM achieving fuel economy enhancement from 2.2% to 9.6% when ambient temperature drops from 15 • C to −15 • C, compared with the state-of-theart RL-based EMS, and realizes on average 93.7% of the performance of DP.3)The adaptability of the ITEM system is facilitated by integrating coolant temperature, ambient temperature, as well as multisource traffic and terrain data processed by STDP framework into the state space.The HIL experiments conducted in −10 • demonstrate the real-time implementation in real-world driving scenarios, which reduces fuel cost by 7.1% and 12.4% compared to RL-EMS and rule-based control strategies, respectively.