Advanced Fault Diagnosis for Lithium-Ion Battery Systems

Lithium-ion batteries have become the mainstream energy storage solution for many applications, such as electric vehicles and smart grids. However, various faults in a lithium-ion battery system (LIBS) can potentially cause performance degradation and severe safety issues. Developing advanced fault diagnosis technologies is becoming increasingly critical for the safe operation of LIBS. This paper provides a comprehensive review of fault mechanisms, fault features, and fault diagnosis of various faults in LIBS, including internal battery faults, sensor faults, and actuator faults. Future trends in the development of fault diagnosis technologies for a safer battery system are presented and discussed.


Introduction
As one of the most promising energy storage systems, lithium-ion batteries have been widely used in various applications, such as electric vehicles (EVs) and smart grids.Currently, lithium-ion batteries have become the mainstream energy storage solution, owing to their inherent benefits such as high energy density, high power density, and long lifespan.However, the potential r isks, due to the abusive operating conditions and harsh environment, pose a huge challenge to the safety of Li-ion battery systems (LIBSs).A real-time effective battery management system (BMS) is critical to ensure the safety of LIBSs.BMS has several functionalities, such as state of charge monitoring, thermal management, charging management, and equalization management.It also tracks the health status and monitors the potential faults of LIBS.Without suitable diagnostics and fault handling, a minor fault could eventually lead to severe damages of LIBS [1].The importance of the fault diagnostics and fault handling has been demonstrated repeatedly in several severe incidents recently [2]- [4].
There are different fault modes in the LIBS, and fault mechanisms are usually very complex.From a control perspective, these fault modes can be divided into battery fault, sensor fault, and actuator fault.The battery faults, which include overcharge , overdischarge, overheat, external short circuit (ESC), internal short circuit (ISC), electrolyte le akage, battery swelling, battery accelerated degradation, and thermal runaway (TR), are the most critical faults in the LIBS.These faults are also intertwine d.Overcharge and overdischarge could lead to various undesirable battery side reactions, resultin g in accelerated degradation.These side reactions and gases generated by the chain reactions during TR may eventually cause the battery swelling.Such a swellin g along with mechanical damage may, in turn, lead to electrolyte leakage.The ISC is typically caused by separator failure due to manufacturing defects, overheat, mechanical collisions, or penetration by metal dendrites or mechanical punctures.Fortunatel y, the Joule heat generated by ISC develops into TR only when the equivalent ISC resistance reac hes a very low level [5].Abnormal heat generation occurs under various conditions, such as side reactions during overcharge/over -discharge, ISC, ESC, and contact loose of the cell connector, which further increases battery temperature.Temperature plays an important role in thermal management, battery-pack equalization, capacity/power degradation, and TR [6].Overheat is the direct cause of the battery TR and can also be facilitated by the chain reactions during TR [7], resulting in a vicious positive feedback cycle.Feng et al. [8] studied the mechanisms of chain reactions during TR for a Li-ion battery with Li(NixCoyMnz)O2 (NCM)/Graphite electrodes and polyethylene-based ceramic coated separator.The solid electrolyte interface (SEI) decomposition, the reaction between electrolyte and anode, the melting of the separator, the decomposition of the NCM cathode and the decomposition of the electrolyte have occurred sequent ially in the process of temperature rise.In Ref. [9], they found that 12% of the heat released in the TR of a single cell is sufficient to trigger the TR of the adjacent battery cells.Lamb et al. [10] investigated the failure propagation in a multi -cell Li-ion battery pack when the TR is induced in a single cell.They analyzed the failure propagation under different cell types and electrical connections (parallel or series).Feng et al. [11] summarized four approaches of delaying or preventing TR propagation, including increas ing the TR onset temperature, improving heat dissipation, reducing the accumulated energy during TR, and adding thermal resistant layers between adjacent batteries.Hofmann et al. [12] proposed an explosion prevention method by reducing the battery pressure duri ng TR, which is particularly practical for the explosion caused by the electrolyte.
Besides the battery faults, sensor faults can also cause severe issues to the LIBS operation, because all the feedback -based algorithms in the BMS highly depend on the sensor measurements [13].Sensor faults in LIBS mainly include the voltage sensor fault, current sensor fault, and temperature sensor fault.The current sensor fault affects the accuracy of state of charge (SOC) estimation [14] and multi-state estimation [15], [16].The estimated SOC and temperature measurements are used to update the battery model parameters in real-time for high-accuracy prediction [17], [18].Li-ion batteries must be operated within the safe voltage and temperature ranges [19].Exceeding these ranges may reduce battery performance or even cause accidents.Voltage a nd temperature sensor faults could also cause equalization errors or thermal management errors in the BMS.
Actuator faults have a more direct impact on control system performance than battery faults and sensor faults.Potential act uator faults in LIBS, including the terminal connector fault, cooling system fault, CAN bus fault, high voltage contactor fault, and fuse fault, are summarized in Ref. [20].If the cooling system fails, the battery cannot be maintained within the proper operating temperature range, and it may even trigger TR.Battery connection fault will not only cause insufficient power supply but also increase the risk of accidents [21], [22].Poor connection between batteries leads to a rise in resistance, generates excessive abnormal heat, which further causes temperature rise [23]- [25].As the charging and discharging process continues, there may be an arc or spark, resulting in the melting of the battery terminals [26].
A lot of research on the fault diagnostics for different components of LIBS has been conducted.Among the different approaches proposed, the most widely used battery fault diagnosis strategy is the model-based approach instead of the data-driven approach because obtaining rich battery fault data is usually time-consuming and costly.In the data-driven approaches, the signal processing methods are mainly used for battery fault diagnosis, rather than machine learning-based methods.Sensor faults and actuator faults usually affect the external signals of the battery, such as voltage, current, and temperature.Therefore, phenomenological models, such as the equivalent circuit model (ECM), are enough for the diagnostic requirements.ECM is simpler in computation and str ucture than electrochemical models (EM).Therefore, it is easier to design various control and diagnostic tools based on ECM [27].In general, observer-based methods and signal processing methods are widely used for sensor fault diagnosis and actuator fault diagnosis, respectively.
However, the fault diagnostics for LIBS still faces many challenges.These challenges will be thoroughly discussed in Section 3.4.To the best of our knowledge, there is no review of fault diagnostics for LIBS in the existing literature.For a clear and sy stematic understanding of the state of the art of LIBS fault diagnostics, this article provides a comprehensive review of fault mechanisms, fault features, and fault diagnosis techniques for Li-ion batteries, sensors, and actuators in the LIBS.The state-of-the-art approaches for LIBS diagnostics and their advantages and limitations are also summarized.In addition, some representative algorithms are classified and discussed to stimulate innovative ideas for LIBS fault diagnosis.Finally, this article discusses future trend s and suggestions on improving LIBS fault diagnostics for a safer battery system.This review is organized as follows.Section 1 introduces the impact of LIBS faults on the safety of LIBS and the relationshi ps between the faults.The research progress of the LIBS fault diagnosis is also briefly summarized.Section 2 introduces the fundamentals of LIBS fault diagnosis, with a focus on methodologies for fault diagnosis systems.Section3 is the main body of this review.It conducts a comprehensive survey on fault mechanisms, fault features, and diagnostic algorithms in LIBS fault diagnosis, and discusses the current problems and challenges in LIBS fault diagnosis.Section 4 presents future trends in LIBS fault dia gnosis at three different stages of the fault diagnosis.Section 5 presents the conclusions and suggestions of the latest developments in LIBS fault diagnosis.For a better understanding of the abbreviations used in this review, a list of all acronyms and abbreviation s is shown in Table 1.Data acquisition FIGURE 1 -Flowchart of the general fault diagnosis system.Adapted from [29], [30].

Feature extraction
Feature extraction is a pre-processing step for fault diagnostics.The accuracy of feature extraction highly depends on the method used.Here, we focus on two main feature extraction methods: signal processing based and model -based.Various signal processing techniques have been developed to extract useful features in the time, frequency and time -frequency domains, such as root mean square amplitude, spectral analysis [31], wavelet transformation [32], entropy-based method [33], rough set [34] and principal component analysis [35].For example, battery fault and connection fault can cause abnormal fluctuatio ns in battery voltage response and temperature response.The entropy-based method [33] can be used to capture these anomalies due to its capability of measuring the degree of randomness or disorder of time series data.Note that some methods, such as rough s et [35] and principal component analysis [35], can reduce the dimension of the fault features, which is very useful for reducing the complexity of the diagnostic system.fault can cause a significant change in battery contact resistance.ISC fault and thermal fault can be characterized by the S OC decrease and ohmic internal resistance increase, respectively.For different fault models, the filter [17], observer [18] o r least-squares algorithm [36] need to be designed accordingly to extract key states or parameters.Theoretically, artificial intelligence al gorithms can also be applied to extract fault features as an alternative to the physics-based model.This method is expected to extract more accurate features with online training and continuous improvement, but with the computational cost of continuous training.
For battery system faults, the advantages and disadvantages of the above two types of feature extract ion methods are shown in Table 3.Many battery system faults can cause the capacity fade, extra charge depletion, increased heat generation, or increased battery cell inconsistency.This abnormal behavior can be captured by analyzing the external voltage and t emperature response of the battery system using signal processing methods.The signal processing method does not require modeling work, but it may n ot achieve fault isolation as some of the battery faults have similar electrical and thermal responses.Besi des, the signal processing method can only detect the faults when the abnormality in the battery system response reaches a certain level, which makes it difficult to detect minor faults.In contrast, it is easier for the model-based method to quantify and locate specific faults by exploiting the relationship between faults and model states or parameters.

Diagnostic methods
There are many studies on diagnostic methods.As shown in Fig. 2, we classify the diagnostic methods into the knowledge -based, model-based, and data-driven ones, according to the Refs.[19], [37]- [40].Specifically, by using graph theory, such as signed directed graph [41], fault tree [42], and failure mode and effects analys is [43], a fault diagnosis network can be constructed based on the fault propagation relationship between various components in the system.Then, a fault can be located using the relevant search theory.An expert system is a designed computer program to simulate th e reasoning and decision making of human experts [44].The knowledge and rules are established by utilizing the historical database and the rich experience from domain experts.Fuzzy logic which conforms to the natural thinking process of human beings and facilitates the processing of qualitative knowledge can be applied to fault diagnosis by using fuzzy parameters, fuzzy models, or fuzzy thresholds.4 gives a comparison of various knowledge-based diagnostic methods in terms of their key technologies, advantages, and disadvantages.Multiple battery faults, sensor faults, and actuator faults may occur in the battery system.Graph theory has a clear causal relationship, and diagnostic results are easy to interpret.However, the complex fault mechanisms of the battery system make it difficult to establish an accurate diagnostic network.The expert system method does not require a physics -based model.However, there also exist several problems when it's applied to battery systems, such as difficulties in knowledge acquisition, inaccurate knowledge representation.The fault states of batteries can be characterized by the anomalies such as rapid SOC decline, inte nse heat generation, and large voltage fluctuations.These fuzzy parameters can be processed by the fuzzy logic method.However, developing effective rules is still a big challenge.

Model-based methods.
For the model-based fault diagnostics, a residual signal is typically obtained by comparing the measurable signal with the signal generated by the model [45].Subsequently, the residual will be evaluated to determine the diagnostic results [46].The development of high-fidelity battery models [47], including electrical models, thermal models, and multiphysics models, provides the basis for the model-based fault diagnosis.Thanks to the in-depth understanding of battery system dynamics, these methods can not only detect faults but also locate faults and estimate their magnitude.Therefore, they are b ecoming the mainstream method for LIBS fault diagnostics.It should be noted that these methods could be affected by model uncertainty, interference, and noise.Model-based methods can be divided into four categories, including the state estimation, parameter estimation, parity space, and structural analysis theory.Affected by model accuracy and noise.

Structural analysis theory
Structural analysis of system dynamic equations.
Easy to analyze fault detectability and isolability; Workload of selecting residual generators is reduced.
Strongly dependent on the redundant information of the system model.
The state estimation methods essentially utilize an observer or filter to reconstruct or estimate the internal states, such as SOC and the internal temperature of batteries.After that, the residuals containing the fault information can be obtained by comparin g the estimated signals with the sensor measurements [37].The basic idea of the parameter estimation for fault diagnosis is that faults will affect the physical system process, further leading to the change in model parameters [48], [49].Therefore, fault detection and isolation (FDI) of LIBS can be achieved by detecting changes in the battery's electrical model and thermal model parameters.The dynamic model of the battery system determines the relationship between input and output variables.The parity space method c an be used to verify this relationship by analyzing the input and output measurements of the battery system [50 ], [51].Structural analysis theory finds and utilizes the structural over-determined part of system dynamic equations [52], and then achieves the structura l detectability and isolability analysis of faults [53]- [58].
A comparison of the above model-based methods is illustrated in Table 5. Various filters and observers have been applied to fault diagnosis for LIBS, such as Kalman filter (KF) [59], extended Kalman filter (EKF) [60], unscented Kalman filter [61], particle filter (PF) [62], Lunberger observer [63], and adaptive observer [64].The state estimation method can help the state monitoring fun ction of BMS and can detect the fault with excellent real-time performance.In comparison, parameter estimation methods such as filter methods [65] and least-squares methods [66], can be combined with other methods to locate specific LIBS faults.However, they require higher battery model accuracy and sufficient current excitations [67].For the parity space methods, such as the parity equation method [68] and the constrained optimization method [69], fault isolation of sensors and actuators in LIBS can be ea sily achieved based on the different hypothesized no-faulty subsets of inputs and outputs.One obvious advantage of structural analysis theory is the ability to provide fault detectability and isolability analysis regardless of the LIBS parameter values, which greatly reduces the workload of designing residual generation for fault isolation.
2.3.3.Data-driven methods.These methods analyze and process the running data directly to detect faults without relying on the accurate analytical model and the experience of experts.For the data-driven fault diagnosis of LIBS, the fault detection process is simplified by not considering the complicated fault mechanism and system structure, especially for TR and battery accelerated degradation which are affected by various unclear and coupled factors.However, the implementati on of this method generally requires a proper pre-processing of raw data for LIBS.Due to the neglect of fault mechanisms, it is not easy to analyze and interpret faults using this method.Furthermore, some data-driven methods also present inherent limitations, such as the need for a large amount of historical data, accompanied by high computational cost, and training complexity [70].The data -driven methods commonly used in fault diagnosis domain include signal processing, machine learning, and information fusion.
Fault diagnosis based on signal processing usually uses various signal processing techniques to extract fault feature paramet ers such as deviation, variance, entropy, and correlation coefficient.After that, the fault will be detected by comparing them with the values in a normal state.The artificial neural networks (ANN) [71] and support vector machines (SVM) [72] are two typical ma chine learning algorithms.ANN-based fault diagnosis is to learn the implicit rules from a given pair of inputs and outputs in the offline training phase, and then to form a nonlinear black-box model for use in the online operation phase.The well-trained ANN can distinguish between the normal and abnormal states of the battery system.The main function of SVM -based fault diagnosis is to transform the input space into a high-dimensional space through a kernel function and to find the optimal hyperplane in this new space.This method treats LIBS fault diagnosis as a sample classification problem and trains an accurate classifier based on historical data.Information fusion represents a process of reasoning and decision-making based on uncertain information.Based on the analysis of multi-source information, more reliable fault detections can be achieved.
Table 6 illustrates a comparative analysis of these data-driven diagnostic methods.Due to the neglect of the LIBS dynamics, the signal processing method is easy to implement and suitable for fault detection, but it is difficult to locate faults directly in the case of multiple LIBS fault coupling.Machine learning algorithms have the learning ability to adapt the training sample set by adjustin g its own parameters, and the ability to extract knowledge from the current training samples.Theoretically, the battery black -box model based on the ANN can achieve higher accuracy than the EM and ECM of battery.However, the lack of LIBS fault data may cause over-fitting problems.That is, ANNs with poor generalization ability are likely to cause an undesired false alarm of the LIBS fault.Compared with ANN, SVM has better generalization ability and is applicable to the small sample cases [73], which is especially suitable for the LIBS with a limited amount of fault data.The most critical issue of SVM is the optimal kernel fu nction selection for a specific problem.In order to make full use of existing multi-source information of the LIBS to improve the accuracy of fault diagnosis, an effective fusion algorithm is essential.Good generalization ability; Applicable to small sample cases.
Difficult to select the optimal kernel function; Low efficiency for large-scale training sets.

Information fusion
Appropriate information fusion algorithms.
More accurate diagnostic result.Difficult to select effective fusion algorithms.

Fault-tolerant control
Fault-tolerant control (FTC) is used to maintain safe operation and meet certain performance requirements when the fault occurs in the system [74].An architecture of FTC is shown in Fig. 3.In general, FTC can be classified into active FTC and passive FTC [75].
There are few existing studies on FTC in LIBS, and the passive FTC is used in most cases.The purpose of passive FTC is to design a robust controller such that the system is robust to certain faults [76].Passive FT C assumes prior knowledge of faults, and therefore, does not need to know the real-time fault information or adjust the controller online.However, passive FTC may be ineffective under unknown faults.Hu et al. [77] developed a dual-redundancy method to achieve FTC of temperature sensors.When a sensor fault occurred, the optimal value determined by relevant algorithms was taken as the sensor output value to ensure proper operation of the system.Berdichevsky et al. [78] introduced that each cell was equipped with two fuses for the cell's anode and cathode in the Tesla Roadster battery pack.This scheme mainly relies on the structural design of the battery system, and can effectively prevent the entire battery system from malfunctioning in the case of a short circuit.However, the requirement of substantial additional components would make the battery structure complicated.
In contrast, based on the real-time information of fault, active FTC will re-adjust the controller parameters or even change the configuration after the fault occurs.That is, active FTC can actively process the fault in real-time so that the system can still achieve the specified functions under fault conditions [79].Despite high complexity and high computational cost, this method improve s the system performance greatly, which has attracted increasing attention in both academia and industry.Therefore, the application of active FTC in LIBS has the potential to become an important research field.

Evaluation system
For battery system faults, the performance of the diagnostic system will vary based on different diagnostic methods.A good evaluation system can compare various diagnostic algorithms and help design a better fault diagnosis method.The key to establishing a good evaluation system for fault diagnosis is to establish a reasonable performance index system and develop appropriate evaluation methods [81].
According to different functionalities, the major performance indexes for the diagnostic system can be roughly divided into detection performance, diagnostic performance, and robustness [82], [83].Detection performance can be assessed by sensitivity, time delay, false alarm rate, missed detection rate, and misclassification rate.This index is closely related to the timeliness of fault handling for LIBS.Diagnostic performance refers to the capability of fault isolation and the accuracy of fault estimation.False alarms a nd missed detections are common indices of LIBS, which can cause additional troubleshooting and safety risks.Robustness is the most difficult performance index to measure and achieve.A diagnostic algorithm without the robustness to model uncertainty, interference, a nd noise can be hardly used in practical LIBS.
To date, there are no standardized evaluation methods for LIBS fault diagnosis.In general, an evaluation method of the diagnostic system follows the process of determining the weight of each index, evaluating each index, and determining the final evaluation result.The weight has a big impact on the final evaluation results.Typically, the weight is related to the importance and reliabilit y of indexes.For example, a battery short-circuits fault has a higher weight than a sensor fault owing to its higher threat to the LIBS.In most cases, it is difficult to evaluate some indicators quantitatively, for example, robustness.One possible solution is to first qualitatively evaluate each index and then quantify the qualitative index in a unified framework.

Fault diagnosis for LIBS
Fault diagnosis is critical to ensure the safety of LIBSs.Therefore, it is necessary to study the fault mechanisms, fault features, and diagnostic methods for LIBSs.Fig. 4 illustrates an overview of the faults in LIBS.The faults of LIBS are affected by inherent defects, improper use, and harsh environments.Therefore, these internal and external factors and their complicated coupling relationship make fault diagnosis of LIBS a difficult task.In general, the faults of LIBS are hidden, and it is difficult to directly and accurately determine early fault conditions by voltage, current, and temperature signals only.Each type of faults poses a certain threat to the LIBS.Battery faults could lead to system performance degradation and even catastrophic accidents such as battery burning and explosion.Sensor faults of BMS will affect the normal operation of the control system, leading to ineffective st ate estimation, equalization management, and thermal management in the LIBS.Actuator faults often lead to ineffective control actions, which further affects the system response.[87].At the microscopic level, Alavi et al. [27] summarized several electrochemical failures such as the loss of electrical contact, current collector corrosion, SEI growth, electrolyte decomposition, fracture in the lattice structure of electrodes, lithium plating, loss of active material, negative electrode diffusion coefficient reduction, porosity change of the electrode, and change of particle size.In addition, several battery faults, including the overcharge/overdischarge, battery accelerated degradation, battery swelling, electrolyte leakage, ESC, ISC, overheat, and TR, are very important in the real applications.
Although the cut-off voltage can be pre-set in the protection circuit, overcharge and overdischarge faults still occur in EVs due to the inconsistency among cells, inaccurate condition monitoring, and charging system faults [88].For example, if the voltages of series cells are not monitored well in BMS, the cells that have the highest and lowest voltages will be overcharged and overdischarged, respectively, resulting in the rapid aging of the battery.Accelerated battery degradation is caused by the undesired side reactions within the cell, which are accompanied by the losses of cyclable Li-ions and active material [89], [90].Typically, these adverse side reactions, such as the phase change or decomposition of cathode material [91], electrolyte decomposition [89], SEI decomposition and growth at the anode [92], [93], are caused by various external factors [38] including overcharge/overdischarge, low temperature, high voltage storage, high-rate cycling.Specifically, the excessive delithiation of anode causes SEI decomposition during overdischarging [94].After recharging, the newly regenerated SEI changes the electrochemical properties of the anode [95], resulting in an increase of resistance and the degradation of capac ity [96].Repeated overdischarge will accelerate battery capacity degradation, the extent of which depends on the depth of discharge [88].During overcharging, lithium deposition (mossy or dendritic type) will occur at the surface of the anode [97].Meantime, over deintercalation of lithium will contribute to irreversible phase change and even collapse of cathode structure with gas release and heat generation.Temperature is also a very important factor affecting battery operation.Under low-temperature charging conditions, lithium plating is more likely to occur at the anode due to the slow diffusion process [98].High temperatures can cause SEI decomposition and accelerate the capacity fade.The battery capacity can drop significantly when it is operated or stored abo ve 50 ℃, especially in the high SOC range [6].Under high-rate discharge conditions, a large amount of Li-ion is transferred in a short time, which may cause incomplete de-intercalation of Li-ions and capacity loss.
Battery swelling, electrolyte leakage, and ESC faults are often caused by other battery faults or component faults.There exists a causal relationship between them.Firstly, the gas generated by the side reactions during overcharge and the chain reactions during TR may cause the internal pressure to rise, and even explosion.Then, the battery swelling and mechanical damage become the main causes of electrolyte leakage.Finally, electrolyte leakage can further cause the battery ESC and also the short circuit of a djacent electronic components.Besides, EVs may suffer from water immersion, collision deformation, electric wire failure during operation.Therefore, an ESC may also occur when the electrodes with voltage differences are accidentally connected by conductors [99].The ESC is a fast discharge process and results in abnormally high heat generation.
ISC, one of the most common faults in TR, can be caused by different separator failures such as deformation, penetration, shrinkage, or melting.For example, mechanical loading [100] can cause the deformation and fracture of the separator, and the electrical short-circuit under mechanical loading is generally predicated by the formation of internal cracks in the battery stack [101].Separator penetration can be caused by mechanical shock or dendrite due to overcharge and overdischarge [62].Moreover, the thermal runaway reaction is more severe when the penetration occurs at the center of the battery [102].The separator shrinka ge or melting caused by high temperature and the contamination of the separator by impurities are also the causes of separator failure [103].Once the separator fails, the ISC is triggered by the contact between anode and cathode.Battery capacity [104] and th e heat accumulated during the initial phase [105] are key factors in determining the consequences of ISCs.Studi es [106] show that the worst location for ISC is the edge of the electrode where the heat dissipation is limited by the low thermal conductivity of electrolyte and separator materials.
Overheat and TR have mutually reinforcing relationships.The causes of TR are summarized in Ref. [99], including mechanical abuse, electrical abuse, and thermal abuse.Specifically, thermal abuse or overheat is the direct cause of TR.Overheat is us ually caused by abnormal heat generation, external heat transfer, and poor heat dissipation.Abnormal heat generation occurs in many scenarios, such as side reactions during overcharge and overdischarge, ESC, ISC, and battery connection fault.A portion of t he heat could be transferred to adjacent cells and the environment.Moreover, fault or improper design of the cooling system can also result in poor heat dissipation.Battery overheat caused by these factors may trigger TR, and the mechanism of chain reactions durin g TR for a Li-ion battery is elaborated in Ref. [107], including capacity degradation at the high temperature, SEI decomposition, the reaction between anode and electrolyte, separator melting, cathode decomposition, electrolyte decomposition, the reaction bet ween anode and binder, electrolyte burning, etc.
3.1.2.Li-ion battery fault features.It is important to note that, unlike sensor fault and actuator fault, data acquisition is a key step in battery fault diagnosis.Therefore, the state of the art of data acquisition for battery fault diagnosis is discussed in t his subsection before the Li-ion battery fault feature.Besides the real-time data from EVs, data acquisition can also be achieved through substitute tests and simulation models in academic research.
Many test methods have been developed for Li-ion battery research, such as penetration [106], mechanical loadings [108], [109], external heating [110], overcharge [111], and ESC test [112].The implementation of the aforementioned test methods often req uires a combination of advanced techniques including optical, infrared, chemical, and thermal methods [111], [113]- [115].For ISC fault, the most concerned battery fault, the currently accepted ISC substitute tests mainly include penetrating, adding phase change material into the separator, inducing the dendrite growth by electrical abuse, and connecting an equivalent ISC resistance in parallel to the cell.Moreover, a new approach to conduct the ISC substitute test is provided by controlling the separator porosity and the p ressing force [116].Because the experimental test methods are costly and time-consuming, a lot of research has been devoted to developing a high-fidelity model that can simulate the battery failure behavior, such as equivalent circuit model [60 ], [117], two-state thermal model [118], electrochemical-thermal model [119], [120], 3D electrochemical-thermal model [105], 3D electrochemical-thermal-ISC coupled model [66], [121], mechanical-electrical-thermal coupled model [122], and finite element model [123].
In general, the Li-ion battery fault feature can be obtained from two sources.First, the battery fault feature can be directly extracted from measurements or transformed from basic features.Second, the fault feature of the battery can also be reflected by certa in model parameters.In general, battery faults are typically difficult to be determined by current, voltage, and temperature measurements.Instead, fault features are often extracted from the abnormal responses caused by faults through signal processing.For examp le, due to the extra charge depletion, ISC can be inferred by two implicit features, including the continuous reduction of the SOC and the rising of the heat generation [124].These two features can be captured by the responses of battery voltage and temperature [ 125].For the short-circuit under mechanical abuse conditions, a local force drop [126] can be regarded as a fault feature of short -circuit, which is consistent with the voltage drop and temperature rise.Moreover, fault features transformed from basic features allo w detecting battery faults more sensitively.For example, in Ref. [117], the differential of the voltage and the fluctuation function of the internal resistance are considered as the fault features.In Ref. [125], the correlation coefficient between cell voltage s can capture the abnormal voltage drop.The entropy of battery temperature [127] and voltage [128] become the features of temperature abno rmity and voltage fault, respectively.
For the quantitative analysis of faults, certain parameters of the battery model are regarded as fault features, such as the ISC equivalent resistance [129] and thermal model parameters [63], [130] related to convective cooling resistance fault, internal thermal resistance fault and TR fault.Liu [131] and Wu [132] analyzed the relationship between battery faults and parameter changes, and summarized the diagnostic rules for common battery faults.In many studies, certain model parameters are regarded as the stat e of health (SOH) indicator, such as capacity and internal resistance [133]- [135].However, the results derived from these parameters may vary at different operating conditions [136].Since certain electrochemical properties are uniquely related to the degree of battery degradation regardless of operating conditions, they can be used as an indicator for battery SOH, such as the side reaction current density [137].It should be noted that most battery fault features are at the cell level.In the case of series-connected battery modules, the difference of SOC as well as ohmic internal resistance [36], [138] in the mean-difference model (MDM) of the battery pack can be used as effective ISC fault features.7.
Based on the battery model and measured data, model-based methods use the state estimation and parameter estimation techniques to generate residuals and detect faults.Fault isolation can be achieved by constructing a fault signature table.Due to its simplicity and intuitive nature, the model-based method is widely used for fault diagnosis in battery cells and packs [139].Based on the electrochemical model, Alavi et al. [62] estimated the transport rate of Li-ions in both positive and negative electrodes by the PF algorithm and then compared the estimated data with the boundary condition to detect the lithium plating.Because overcharge and overdischarge can cause the model parameters to change, Sidhu et al. [59], [60] constructed multiple battery signature-fault models by impedance spectroscopy technology and equivalent circuit methodology, using KF or EKF to estimate the model terminal volta ge and generate residuals.A probability-based approach was also applied to indicate the probability of failure.But this method is accompanied by the difficulty of identifying multiple models and running EKF.Dey et al. [63] added convective cooling resist ance fault, internal thermal resistance fault, and TR fault into a two-state thermal model, and the FDI is designed for these three thermal faults based on the Luenberger observer.In Ref.
) Another implementation of the model-based method is to combine the information of adjacent cells in a battery pack.Fault diagnosis can be achieved based on the difference of battery states or model parameters between the faulty cell and the no rmal cell.For example, Feng et al. [138] proposed a model-based ISC fault diagnostic scheme, as shown in Fig. 5.They first calculated the voltages and temperatures of both average and worst cells, and worst cells have the most possibility for an ISC faul t.Then, their SOCs and internal ohmic resistances were obtained based on the state estimation and parameter estimation methods, namely EKF and RLS with a forgetting factor.Finally, the ISC fault and its fault level were determined by the deviations of vo ltage, temperature, SOC, and internal ohmic resistance between batteries.Zhang et al. [141] estimated the resistance of the parallel -connected battery group (PCBG), and identified the capacity fade fault by comparing the PCBG resistance among different PC BGs.Moreover, two fault causes, an inconsistent aging fault, and a loose contact fault, can be distinguished by comparing the PCBG resistances.Ouyang et al. [117] estimated the basic parameters of MDM by the RLS algorithm and then calculated the differential of the voltage and the fluctuation function of the internal resistance.Based on a statistical method, the ISC fault is determined by comparing the estimated and calculated parameters with the threshold.Given that the micro-short circuit (MSC) causes the SOC difference to increase continuously, Gao et al. [36] estimated the SOC difference based on the MDM with an EKF.The extra depleting current is ident ified, and the short circuit resistance is detected and calculated.Without the need for estimating the SOC of each cell, this method can quantitatively describe the ISC fault with a small computational cost.
Data acquisition , -A short-circuit fault diagnosis scheme based on the correlation coefficient method.Adapted from [125], [142].
The signal processing method is a typical data-driven method that extracts useful fault features from the battery measurement data directly to detect faults.It does not require the construction of an accurate battery analytical model and is suitabl e for a wide range of applications.Dubarry et al. [133]- [135] applied incremental capacity analysis (ICA) to identify various contributions to capacity loss, and ICA is more sensitive than traditional charge-discharge curves.The correlation coefficient can be used to determine whether the trends of two voltage curves match with each other.For example, Xia et al. [125] proposed a short-circuit fault diagnosis scheme by using this method, as shown in Fig. 6.The voltage of each cell in the battery pack is readily available, but battery inconsistency makes it difficult to determine battery faults directly by voltage.Therefore, Xia et al. captured the abnormal voltage drop by calculating the correlation coefficient between cell voltages.Then, the short circuit fault was detected by comparing the calculated correlation coefficient with the threshold.According to the mathematical properties of the correlation coefficient algorithm , this method is robust to the inconsistencies in OCV and internal resistance, and the detection process does not require hardware or analytical redundancy.Using the real-time voltage data extracted from the National Service and Management Center of Electric Vehicles (NSMC-EV) in Beijing, Li et al. [142] verified the voltage fault detection of the battery pack based on the interclass correlation coefficient method.Considering that the remaining charging capacity (RCC) of the MSC cell will increase when the battery pack is fully charged each time due to the extra charge depletion, Kong et al. [143] estimated the RCC of each cell based on the uniform charging cell voltage curve (CCVC) hypothesis.According to the difference between the RCCs after two adjacent charges, the leakage current and MSC resistance can be obtained.Based upon a large amount of raw temperature data derived from NSMC-EV in Beijing, Hong et al. [127] applied the Shannon entropy to capture the temperature abnormity of the battery pack.Besides, the abnormity coefficient, including over-temperature and excessive temperature difference, was quantitatively evaluated to predict both the time and location of the temperature faults in battery packs.Wang et al. [128] employed the modified Sha nnon entropy to analyze the voltage evolution of each cell, and accurately predict both the time and location of the voltage fault in battery packs.Liu et al. [144] regarded all cell voltage values at each time step as an index and implemented the entropy weight met hod to obtain the objective weight of each index.According to the comprehensive score and the threshold, battery voltage abnormality can be accurately identified.Another typical data-driven method is machine learning, which learns the underlying laws from a large number of battery training samples.However, it is currently less used in battery fault diagnosis due to the difficulty in obtaining lar ge amounts of battery fault data.For example, Yang et al. [145] proposed a method based on the random forest (RF) classifier to detect electrolyte leaka ge of ESC cells, as shown in Fig. 7.The leaked cells have a lower discharge capacity and a higher maximum temperature rise.Therefore, these two features were fed into the pre-trained RF model.Firstly, every training subset Si was resampled randomly from the training data set using the Bootstrap method, and then every single decision tree Ci was generated by the corresponding Si.Finally, the output classification results indicate the leakage conditions, which are determined by the voting results of all decision trees.Wit h a large number of offline ESC fault tests, the trained RF classifier can get the correct result rapidly.Zhao et al. [146] combined the 3σ multilevel screening strategy (3σ-MSS) and machine learning algorithm to establish a battery fault diagnosis model, in which the 3σ-MSS is utilized to build the criteria of fault-free cell terminal voltages, and a neural network is applied to fit the cell fault distribution in a battery pack.Kim et al. [147], [148] proposed a distance-based outlier detection approach with Z-score standardized pre-processing method for battery fault diagnosis.The estimated capacity and resistance parameters were subjected to cluster analysis for detecting the healthy cells, shorted cells, and aged faulty cells.
Knowledge-based fault diagnosis relies on the understanding of battery mechanisms and the long-term accumulated knowledge and experience.Xiong et al. [149] proposed a rule-based detection method for the over-discharged Li-ion batteries.Based upon the increase of temperature and the decrease of voltage during battery over-discharge, temperature and voltage rules are established respectively, and failure detection and early warning are directly given by a Boolean expression.However, the appropriate fi xed or time-varying thresholds in the rules are not easy to be determined in real applications.Muddappa et al. [ 150] designed an electrochemical model-based observer to generate voltage residual, temperature residual, and SOC residual.Then these residuals along with the temperature change rate, voltage level, and SOC level are all incorporated into the fuzzy rule t o detect various fault types including overcharge, over-discharge, and battery aging.Huber et al. [151] proposed a method for classification of battery separator defects using optical inspection, and combined various techniques such as expert knowledge, m achine learning, and machine vision in the diagnosis process.This method of integrating multiple diagnostic techniques generally has high precisi on and robustness, but at the cost of high computational complexity.Insensitive to the noise.
Affected by model uncertainty.
High computational cost.
Insensitive to the noise.
Complexity of multiple models.
High computational cost.
Luenberger observer [63] Thermal faults Thermal model parameters corresponding to different thermal faults.
Quantitatively assessment of faults.
Sensitive to the noise.Poor robustness to measurement noise.

Sensor fault mechanisms.
In general, the reliability of sensors is affected by manufacturing defects, harsh environment, and working conditions.The sensors in the LIBS discussed in this paper mainly consists of voltage, current, and temperature sensors.Conventional current and voltage sensors used in EV battery systems are Hall effect sensors.Additionally, some advanced technologies, such as constant current source circuit acquisition and isolation amplifier acquisition, are also applied in the monomer voltage acquisition.Thermocouples and resistance temperature detectors are commonly used temperature sensors.For Hall effec t sensors, temperature variations can change the magnetic properties of the ferrite core, and there could be some flaws developed in the core, such as corrosion, cracks and core breakage, all of which could result in the bias [152].Due to mechanical shocks or other Nonlinear observer-based on Lyapunov analysis [130] Thermal Faults Thermal model parameters.
FDI for three thermal faults.
Affected by model uncertainty.
Needs high model accuracy.A large amount of training data are required.

PDE-based observer
A large amount of fault data is not easily available.

Rule-based method
[149] Overcharge Increase of temperature and the decrease of voltage.
Easy to implement and understand.
Not easy to determine the appropriate parameters in the rules.
Poor robustness to unknown interferences.
Fuzzy logic [150] Overcharge, overdischarge and aged battery Parameters derived from voltage, temperature and SOC.
Easy to deal with uncertainty in knowledge.
Poor self-learning capability; High computational cost.
Fusion method of integrating expert knowledge, machine learning, and machine vision [151] Battery separator defects Non-quality related optical effects.
High precision; Good robustness.

High system complexity.
High computational cost.
causes that can change the value of Hall voltage, changes in the orientation of the induced magnetic field would lead to a scaling error.For thermocouples, the failure of thermocouple junction, such as corrosion, degradation, and changes in material compo sition at long-term high temperatures, can lead to bias, scaling, intermittent and/or complete failure [153].For resistance temperature detectors, overtime exposure in high temperatures, as well as vibration and shock, can change its characteristics, further le ading to signal drift [152], [154].

Sensor fault features.
Since sensor faults affect the measurement signals directly, the fault features of voltage, current, and temperature sensors are often considered as some forms of bias, drift, scaling, or complete failure signal in sensor measurements [19], [155].Besides, sensor faults can also be classified into additive and multiplicative faults [37], [45].In Ref. [155], typical ranges for common sensor faults from literature are summarized, which provides realistic magnitudes to the sensor faults.Voltage measurement is one of the most critical measurements in a battery system due to its high sensitivity to common electrical fau lts including short circuits, overcharge and overdischarge [156].
3.2.3.Sensor fault diagnosis methods.Table 8 illustrates the sensor fault diagnosis methods used in the battery system, which can be divided into three types: sensor topology-based method, model-based method, and fusion method.
The sensor topology-based method mainly relies on the sensor configuration and redundancy of sensor functionalities, which is easy to implement.Xia et al. [157], [158] proposed a fault-tolerant voltage measurement method for series-connected battery packs by measuring the total voltage of multiple cells instead of measuring the voltage of individual cells.Then, a matrix interpretation of the sensor topology was developed.For this sensor topology, sensor or cell faults can be isolated by locating abnormal signa ls without additional hardware expense.Kang et al. [159] presented a multi-fault diagnostic scheme that combines voltage measurement topology and correlation coefficient method, in which the correlation coefficient is used to detect fault feature s.In this sensor topology, each cell and connection resistor is associated with two sensors, which enables isolation of voltage sensor faults, short circuit faults, and connection faults.
The model-based method generates residual by using sensor measurements and a priori information or constraint relationships expressed by the model.After analyzing and evaluating the residuals, the magnitude, type, and location of faults can be dete rmined.Typical battery models for sensor fault diagnosis include the EM [160]- [162], ECM [163]- [167], lumped-parameter thermal models [168], [169], and two-state lumped parameter thermal models [52].Lombardi et al. [13] tested the electrical relationship between the current sensor and voltage sensor measurements based on Kirchhoff's law to generate residuals, and achieve the FDI of voltage and current sensors according to the battery pack structure and the residual set associated with each sensor.Liu et al. [169 ] proposed a systematic scheme to apply structural analysis theory to detect and isolate the voltage, current, and temperature sensors faults.Specifically, structural over-determined parts of the system model have been found, and subsequently, fault detectability and isolability analysis have been performed.Then, diagnostic tests are developed by selecting the minimum over-determined set.Finally, the residuals are generated by checking the analytical redundancy relationship in each test.Structural analysis theory [20 ], [52], [169] can effectively reduce the workload in selecting residual generators.However, this type of analysis is easily affected by noise and model uncertainty.Due to the inaccurate initial values, unknown interference, and noise, residuals generated directly through the constraint relationships from a model may carry errors.Observers or filters can reduce the impacts of these factors, and the sensor fau lt diagnosis based on various observers follows a similar process, as shown in Fig. 8.These methods first estimate the battery states based on the battery model and current, voltage, and temperature sensor measurements.Then, the residuals containing the sens or fault information are generated by comparing the measured outputs and the estimated ou tputs.Finally, the FDI of sensor faults can be achieved through residual evaluation, and alarms and fault flag should be set.Xu et al. [14] took the current sensor faul t as a bias signal to the system input, and used the proportional integral observer (PIO) to implement fault detection and estimation.Although this method is accurate and easy to implement, improper setting of PIO parameters may cause instability of the diagnostic sys tem.Marcicki et al. [168] provided a scheme based on a modified nonlinear parity equation method to achieve voltage, current and temperature sensor FDI.In the method, a subset of inputs and outputs are hypothesized to be non-faulty.Under this assumption, the residuals are generated by the forward and inverse models of the system, but the minimum detectable fault magnitude is limited by the observer error.By adding the bounded deviation to the sensor measurement, Dey et al. [18] achieved the fault detection, isolation, and estimation of voltage, current, and temperature sensors by using a sliding-mode observer method.Liu et al. [17] presented a model-based diagnostic scheme using EKF to estimate output voltage for detecting current or voltage sensor faults, which is robust to inaccurate initial values and noise.However, the accurate process noise covariance matrix in EKF is not easy to be determined in practice.He et al. [170], [171] achieved the FDI of the current and voltage sensors in the series battery pack based on an adaptive extended Kalman filter (AEKF), shows better noise robustness because AEKF can adjust the process and measurement noise covariance matrix adaptively.

Sensors ( V, T ) Battery
Strategies combining multiple model-based methods can compensate for the inherent flaws in a single method.For example, Liu et al. [52] constructed two tests based on the structural analysis theory.Then, the residuals were generated based on EKF in each diagnostic test.The generated residuals were further evaluated by the statistical cumulative sum test to detect the sensor faults.This fusion scheme reduces the effort required to find the appropriate residual generator and is robust to noise and inaccura te initial values.But it also increases the system complexity and computational cost.Kirchhoff's law [13] FDI of voltage, current and temperature sensor.
Simple and low computational cost.
Subject to noise and model uncertainty.
Not suitable for fault estimation.

Convenient detectability and isolability analysis;
Less workload of designing a residual generator.
Highly dependent on redundant information from the system.
Poor robustness to noise and model uncertainty.
PIO [14] Fault detection and estimation of current sensor.
Accurate; Easy to implement.
Improper setting of PIO parameters may cause instability.
Needs high model accuracy and proper parameters of PIO.
Nonlinear parity equation method [168] FDI of voltage, current and temperature sensor.
Efficient, easy to detect the large fault.
Minimum detectable fault magnitude is limited by the observer error.

Needs high model accuracy;
Low sensitivity for fault detection.
Sliding-mode observer [18] FDI and fault estimation of voltage, current and temperature sensor.
Good noise robustness.Sensitive to model uncertainty.
Needs high model accuracy.
EKF [17] Fault detection of voltage or current sensor.
Insensitive to noise and inaccurate initial values.
The accurate process noise covariance is not easy to be determined.
Affected by the process noise and model accuracy.
Insensitive to noise and inaccurate initial values; Update the noise covariance matrix.

High computational cost.
Needs high model accuracy.
Fusion method integrating EKF and structural analysis theory [52] FDI of voltage, current and temperature sensor.
Accurate, low false alarm rate and missed detection rate.
High system complexity.
High computational cost.Research on actuator fault diagnosis in the battery system mainly focuses on the battery connection fault and cooling system fault [20], [26].During EV operations, vibrations may contribute to the loosing or poor electrical connections between batteries [26].Once the operating and driving voltage becomes too low, the relay and drive motor cannot operate as specified.Fan failures are often caused by electric wire faults or blade damage, which along with motor fault, wo uld severely affect the normal operation of the cooling system.
3.3.2.Actuator fault features.Similar to battery features, actuator fault features can also be derived from the system abnormal behavior and equivalent fault parameters.If the battery connection fault occurs, the resistance will increase drama tically and generate significant heat.Moreover, a single high inter-cell contact resistance can cause uneven current flow, resulting in a severe imbalance in the battery pack [172].Therefore, some features, such as the increased contact resistance, temperature r ise, and voltage inconsistency, can be used to characterize the connection fault.In the system model, the battery connection fault can b e considered as a gain fault due to the sharp increase in internal resistance caused by a poor connection.For the cooling syst em fault, it is often considered as an additive fault because it will cause a deviation of the effective heat transfer coefficient in t he thermal model.

Actuator fault diagnosis methods. Actuators with different functionalities have various fault mechanisms and features.
There is no universal diagnostic method for actuator fault diagnosis.Two typical diagnostic methods are model -based method and signal processing method.
The model-based method can be directly applied to the fault diagnosis of the cooling system.The cooling system, such as the cooling fan or drive motor, is used in the battery system to increase the rate of cooling.The effective heat transfer coeffi cient, a parameter in the thermal model, varies with the type of convection.Therefore, the cooling system fault could be regarded as the thermal model parameter deviation, which can be detected by typical model-based methods.Liu et al. [20], [169] used the structural analysis theory to implement the cooling system FDI based on the lumped thermal model.Marcicki et al. [168] detected the cooling system fault caused by the failure of the fan motor based on the nonlinear parity equation method.It is also important to note that the entropy method [173] is an effective tool to describe the degree of disorder of time se ries, which has a wide range of applications and is suitable for handling actuator faults with abnormal fluctuations, such as battery connection faults.Taheri et al. [174] studied the energy loss caused by the contact resistance of Li-ion battery assemblies for the first time.Zheng et al. [33] proposed an entropy-based connection fault diagnosis scheme, as shown in Fig. 9. Their preliminary analysis identified the two reasons for battery pack power fade, including the internal and contact resistance increase fault s.In order to account for the internal and contact resistance in the resistance calculation, the voltage is usually measured between the ends of the cell and the connecting wire.Then, they established a simplified battery ECM and identified the model parameter conta ining the contact resistant by the total least squares algorithm.Considering that poor contact conditions between the batteries can make the contact resistance highly unstable, they captured the unstable characteristics of contact resistance by calculating the S hannon entropy of the cell resistance and realized the distinction between cell fault and connection fault.Yao et al. [26] identified the cell connection state by calculating the entropy value of the cell voltage.After a comparative analysis of sample entrop y, local Shannon entropy and ensemble Shannon entropy, it is found that the ensemble Shannon entropy can predict the accurate time and the location of a battery connection failure in real-time.Sun et al. [175] used Shannon entropy to process the cell voltage measurements after wavelet transformation, and detected the battery connection fault accurately.Compared with the Shannon entropy iteration method used in Ref. [26], this method simplifies the calculation process and is easier to implement due to the relatively reasonable interval parameters.

Issues and challenges
Issues and challenges in LIBS fault diagnosis can be divided into two categories: those related to the diagnostic objects and those related to diagnosis or control.
The issues associated with the diagnostic objects are summarized as follows: 1) Many battery fault mechanisms have not been fully understood.For a wide variety of Li-ion batteries, there is no unified understanding of the battery fault mechanisms in the existing literature.2) Standardized substitute test approaches for battery fault have not been developed.Some destructive methods have poor controllability as well as repeatability, and often instantaneously trigger severe faults, which fails to simulate the incubation phase of a fault.3) There is no well-established mathematical model for some faults to accurately describe fault behavior, such as models that simulate the growth process of lithium dendrites.4) The relations between external behaviors and interna l mechanisms are not clear.Different conditions could cause the same fault.However, most of the existing research only focuses on a single fault mechanism without considering the coupling between different fault mechanisms.
There are also some diagnostic or control related challenges: 1) The commonly available battery data are voltage, current, and temperature which do not contain any information of the internal electrochemical dynamics in the battery.Extracting the appr opriate features to characterize the internal state of the battery still remains a challenge.2) The internal battery state is difficult to be monitored directly due to the uncertainties with modeling and measurement.3) The fault data in LIBS are difficult to obtain, which limits the application of data-driven algorithms.4) The threshold is closely related to false alarm rate, missed detection rate, and time delay of fault detection.The fixed threshold or double-threshold does not meet the requirements in the complex real-world scenarios, and few studies have been conducted to develop the adaptive threshold.5) It is not easy to detect some minor battery faults at an early stage.However, it could have already caused serious harm to the battery system when it's detected.It is also difficult to correct these unrepairable faults by the battery system itself.6) Most studies on sensor fault diagnosis and battery fault diagnosis are based on the assumption that other components are trouble-free.Isolating battery fault from sensor fault is still a challenging issue.7) One of the challenges in the fault diagnosis of the parallel-connected battery pack is that there is no observability and controllability because only one voltage sensor and one current sensor are used.Therefore, for a battery pack that has m any parallel-connected battery cells, only the battery pack level faults can be detected.However, the parallel configuration and the traditional sensor placement make it difficult to accurately locate a particular faulty cell in the parallel -connected strings.A better sensor placement design and advanced diagnostic algorithms are needed to solve this problem.8) The reviewed methods or algorithms provide a wide spectrum of available solutions for the fault diagnosis of battery systems.However, many model -based and data-driven algorithms mentioned in the paper cannot be applied in practical applications currently due to the strict requirements of practical applications.The effectiveness of many model-based and data-driven methods is only tested through simulation studies.Experimental studies are needed to verify their effectiveness.Moreover, the performance of the model -based and data-driven diagnostic algorithms mentioned in this paper is affected by many factors in practical applications, including model accuracy , interference and measurement noise, algorithm robustness, quantity and quality of data, battery inconsistency, sensor topology, and battery pack configuration.When the algorithms are applied to a certain scenario, they should be customized and tuned proper ly.

Future trends of battery system fault diagnostics
Based on the discussion in previous sections, it is clear that the fault diagnosis of the battery system is of multidisciplin ary nature.To develop a robust, reliable, effective battery fault diagnosis system, some important tasks need to be completed for differ ent stages of the fault diagnosis, including the preparation, analysis, and handling, which are summarized in Fig. 10.

Preparation stage
In the preparation stage, detailed mechanism research, advanced data acquisition, and processing techniques are essential for battery fault diagnosis.Multi-scale mechanism studies from the material, cell, and pack levels help gain an in-depth understanding of the battery system fault.The damage caused by faults could be contained by the fault diagnosis and safety protection at all leve ls.With the increasing demand for fast charging of EVs, the battery fault mechanism under fast charging conditions should be further investigated.Various side reactions promoted by high-rate charging could contribute to accelerated degradation and TR.Moreover, it is also important to develop controllable and repeatable fault substitute tests, as well as high-fidelity models for simulating the real faults especially for the ISCs that present the greatest potential threat to battery system safety.
In order to improve the diagnostic performance over the entire lifespan of the battery, the effects of battery aging on dia gnostic performance should also be considered.It is important to update and correct the model parameters of the battery, such as cap acity and internal resistance by using advanced techniques including model-based method, machine learning method, and fusion method.Note that the battery ages slowly over time; therefore, the model parameters need to be updated at a long -time-scale or offline.
Data acquisitions using intelligent sensors and integrated chips are also expected, owing to their high accuracy and versatil e functionalities.As the sensing technologies evolve, it's also very attractive to use advanced sensors to measure the physica l and chemical characteristics within a battery directly.For example, based on the built-in piezoelectric sensors [176], the electrochemical acoustic time of flight analysis can be done to capture the implicit correlation between waveform signal parameters, a nd battery SOC and SOH.Omega load cell sensors [177] have been applied to measure the cell expansion caused by swelling of the electrode ac tive material during charging.In addition to regular measurements including voltage, current, and temperature, fibe r optic sensors [178] are capable of monitoring additional cell (e.g., strain), and they are also robust to electromagnetic interferenc e.In future research, fast and accurate sensing technologies will continue to be one of the hot topics for a safer battery system.
Feature extraction at the battery pack level is an important task.A number of studies on fault diagnostics of LIBS have been conducted by using the voltage response of battery cells or series-connected battery packs [125], [128], [138], [157].However, there are few studies on the voltage response of parallel-connected battery packs [141].It is worth noting that due to the self-balancing mechanism of the parallel structure, a battery fault can also cause transient voltage fluctuatio ns of adjacent parallel cells within the same module.
In the existing literature, the widely used features of thermal-related faults are the temperature rise and fluctuation caused by abnormal heat generations.In fact, thermal-related faults can also affect adjacent cells and lead to uneven temperature distribution in the battery pack.Therefore, useful fault features can be derived by exploring the temperature distribution in the battery pack.Besides, suitable fault features can also be developed by feature transformation or fusion of multiple electrical and thermal features.

Analysis stage
In the analysis stage, the accurate analyses of the battery system state, including the condition monitoring, fault diagnosis, a nd fault prognosis, play an important role in the safety of the battery system.The thresholds play a significant role in the trade -off between the sensitivity and robustness for model-based fault diagnosis.Generally, the threshold is affected by a variety of factors, including modeling errors, random disturbances, and system inputs as well as outputs.The accuracy of condition monit oring and fault diagnosis can be improved by developing the adaptive threshold that takes into account battery aging and usage patterns.The model-based state estimation and parameter estimation methods are still commonly used in the fault diagnosis of L IBS.In comparison with the knowledge-based and data-driven method, the model-based method is more suitable for fault isolation and fault size estimation due to the full utilization of battery system dynamics.For a safer battery system, fault isolation must be included, which identifies and locates a specific fault from battery, sensor, or actuator faults.Besides, due to the significant impacts on the fault severity analysis and subsequent countermeasures, fault estimation should also be considered as one important research topic.
Note that the imbalance of battery capacity, SOC, internal resistance are often ignored in many existing studies.Therefore, future studies should also consider this important factor, especially for battery pack applications.Met hods that are robust to battery inconsistency, such as the correlation coefficient method, are expected to be directly used for battery pack fault diagnosis or as a pre-processing technique of machine learning diagnostic methods.With the advent of the era of big data, data-driven methods are expected to promote the rapid developments of battery system fault diagnosis.However, a single fault diagnosis algorithm has an inherent limitation, and it is often difficult to achieve the d esired effect.Therefore, a research trend in fault diagnostics is to fuse multiple fault features and diagnostic algorithms, further improving the overall performance of the diagnostic system.
For some faults with a slow evolution process, such as the spontaneous ISC, early fault diagnostics and prognostics will play an increasingly vital role in ensuring the safety of the battery system.Based on the physics-based model and big data, combining knowledge and data will very likely become an inevitable trend for the next generation of intelligent battery diagnostics.

Fault handling stage
In the fault handling stage, rapid and efficient actions on the identified battery system faults, such as fault -tolerant control and necessary maintenance, are critical for maintaining the safe operations of a LIBS.Currently, the controller and fault diagnosis subsystems are usually designed separately.It is a potentially promising research topic to develop a fault-tolerant controller that integrates both.Although the prognostics and health management (PHM) for rotating machinery systems have been developed and discussed in Ref. [179], research on the battery system maintenance is still in its infancy.Given the complexity and the poor maintainability of the battery system, it's also an important topic to develop the PHM system.
Currently, the fault diagnosis development for the battery system is shifting from offline to online, from local single -machine control to network-based remote control.The integrated management of monitoring, diagnosis, prognosis, and maintenance over the entire lifespan of the battery would be a future trend in the development of fault diagnostics for LIBS.

Conclusions
This paper provides a comprehensive survey on fault mechanisms, fault features, and fault diagnosis of various faults in LIBS, including battery internal faults, sensor faults, and actuator faults.The goal is to provide a comprehensive understanding o f the latest technologies and stimulate innovative ideas for LIBS fault diagnosis.
None of the reviewed diagnostic methods is the one-size-fits-all solution for different faults in the battery system.Battery faults have different fault modes, complex fault mechanisms, and coupled relationships between various faults.These faults typicall y result in abnormal changes in estimated battery state and model parameters such as capacity, internal resistance, SOC, and temperature.Therefore, model-based state estimation and parameter estimation have become the most common methods for battery fault diagnosis.Developing high-fidelity battery models can improve the performance of model-based methods.For example, since the battery aging has an important impact on the diagnostic performance, it is important to establish an electro -thermal-aging coupling model and update the model parameters online.There are fewer machine learning-based methods in battery diagnostics because a large amount of fault data in LIBS is not easily available.With the advent of the era of big data, data-driven methods are expected to play an increasingly important role in LIBS fault diagnosis.However, a single fault diagnosis method has inherent limitations, and it is a promising trend to combine multiple fault features and multiple diagnostic methods to improve the accuracy and robustness of LIBS diagnostics further.
Sensor faults and actuator faults are often treated as unknown input signals and model parameter deviations, respectively.Si nce such faults do not involve the electrochemical information of the battery, a simple ECM is sufficient for the diagnostic requirements in the model-based method.More research work on model-based methods should focus on improving the detection sensitivity of early faults as well as the robustness to model uncertainties, unknown disturbances, and noises.For data-driven methods, the entropy method is particularly suitable for detecting the battery connection faults by capturing the degree of disorder of signals wi th abnormal fluctuations.Model-based methods capture detailed battery system dynamics and are therefore more suitable for fault isolation and fault size estimation.These methods can be used for fault-tolerant control and subsequent countermeasures.
For fault diagnosis of a battery pack, the differences in voltage, temperature, estimated capacity, SOC , and internal resistance between cells can be taken as effective fault features.Battery fault detection and even short -circuit current estimation can be performed based on the MDM of the battery pack with state estimation and parameter estimation.Howeve r, these model-based methods are affected by cell inconsistencies in the battery pack.Therefore, more data -driven methods that are robust to battery inconsistencies, such as the correlation coefficient method, should be developed for LIBS fault diagnosis.Battery fault diagnosis in the parallel-connected battery strings remains a challenge, and a better sensor placement design and advanced diagnostic algorithms are needed to solve this problem.
In conclusion, the fault diagnostics of LIBS is still at its early stage.Battery fault mechanisms under special conditions, such as fast charging, should be further investigated.For some slowly evolving faults, such as the spontaneous ISC, early fault diag nostics and prognostics will play an increasingly important role in ensuring the safety of the battery system.In practice, standardized substitute tests for LIBS faults need to be developed for diagnostic algorithm validation and diagnostic technology developme nt.Thresholds play a significant role in the trade-off between the sensitivity and robustness of fault diagnosis.The accuracy of condition monitoring and fault diagnosis can be improved by developing adaptive thresholds that take into account battery aging and usa ge patterns.In addition, most studies on battery, sensor and actuator fault diagnosis are based on the assumption that other components are fault-free, and therefore multi-faults detection and isolation in battery systems is still a challenging issue.To sum up, advanced LIBS fault diagnosis needs further research on (1) multi-scale mechanism studies at the material, cell, and pack levels; (2) development of battery models and diagnostic methods that can be implemented in practical applications; and (3) multi -fault diagnostics, early fault diagnostics and prognostics, and fault-tolerant control for battery packs.

3 . 1 .
Li-ion battery fault diagnosis 3.1.1.Li-ion battery fault mechanisms.The studies on the battery fault mechanisms provide useful insights into the battery failure process.The understanding of fault mechanisms serves as a foundation for developing the fault diagnostic methods.Cu rrently, a few review papers have been published on the fault mechanisms of Li-ion batteries [84]-

FIGURE 4 -
FIGURE 4 -Overview of the faults in the Li-ion battery systems.

3 . 1 . 3 .
Li-ion battery fault diagnosis methods.Due to the lack of internal information and strong coupling among various battery faults, many conventional diagnostic methods applied in other fields are not suitable for battery fault diagnosis.Currently, me thods used in battery fault diagnosis mainly consist of model-based, data-driven based, knowledge-based, and methods of integrating multiple diagnostic techniques.A comparison of battery fault diagnosis methods is shown in Table [130], Dey et al. extended the previous work to consider the temperature dep endence of internal resistance, and incorporate a Lyapunov-based nonlinear observer approach to deal with the nonlinearities of battery.Moreover, in Ref.[140], they utilized a partial differential equation (PDE) model-based scheme and realized the detection and estimation of the size of the thermal fault.Since the excessive charge depletion and abnormal heat generation of the ISC cells affect the voltage and temperature responses, the correlation can also be captured by the phenomenological model.Feng et al.[66] estimated model parameters using the recursive least squares (RLS) with a forgetting factor.Their model includes parallel re sistance, capacitance, ohmic resistance, and the temperature derivative of the equilibrium potential in the energy conservati on equation.Then, the ISC fault detection was implemented based on the changes in these key parameters.Seo et al.[129] proposed a model -based switching method to detect ISC.By introducing the ISC resistance to the battery model, the accuracy of the ope n-circuit voltage (OCV) estimation is improved, and the ISC resistance can be estimated more accurately.

FIGURE 7 -
FIGURE 7 -Electrolyte leakage fault detection based on the random forest classifier.Reproduced with permission from [145] (Copyright 2018 Elsevier).

FIGURE 9 -
FIGURE 9 -Battery connection fault diagnosis based on the entropy method.Reproduced with permission from [33] (Copyright 2013 Elsevier).

FIGURE 10 -
FIGURE 10 -Prospects of fault diagnosis technology for Li-ion battery system.Adapted from [179].

TABLE 3 -
COMPARISON OF TWO FEATURE EXTRACTION METHODS.Based on the measurements, model-based state estimation and parameter estimation methods can be used to extract fault features.In battery systems, fault features can be characterized by changes in battery states or model parameters.For example, a conn ection

TABLE 4 -
COMPARISON OF KNOWLEDGE-BASED DIAGNOSTIC METHODS.

TABLE 5 -
COMPARISON OF MODEL-BASED DIAGNOSTIC METHODS.

TABLE 6 -
COMPARISON OF DATA-DRIVEN DIAGNOSTIC METHODS.
Difficult to detect minor faults and directly locate faults; Not suitable for systems with highly coupled components.

TABLE 7 -
COMPARISON OF BATTERY FAULT DIAGNOSIS METHODS.

TABLE 8 -
COMPARISON OF SENSOR FAULT DIAGNOSIS METHODS.