Experimental study on condensate heat transfer coefficient of multi-channel cylinder dryer integrated with Bayesian-optimized machine learning prediction

Abstract The condensation heat transfer coefficient of the multi-channel dryer, one of the key component of paper-making machine, directly determines the efficiency of heat energy utilization. However, the prediction of condensation heat transfer coefficient remains a challenge because the heat transfer characteristics in multi-channel dryer is a complex fundamental issue involving the thermal behavior of two-phase fluid systems. Herein, we successfully developed the four supervised machine learning models to predict the heat transfer coefficient of a multi-channel cylinder dryer under different working conditions. The multi-channel cylinder dryer experiments under different steam mass flux and cooling water mass flow rates were performed and the measured data is used as the input data for training. Interestingly, the four trained Bayesian-optimized machine models present the excellent capability of prediction for condensation heat transfer coefficient of multi-channel cylinder dryer, where the values of R 2 for tested Bayesian-optimized-based SVR, ANN, linear SVR, and RF are 0.983, 0.997, 0.996, and 0.953, respectively. In addition, the feature importance of descriptors is quantified based on a random forest algorithm. Our study suggests that machine learning models can effectively predict the condensate heat transfer coefficient of two-phase fluid systems, which not only would be beneficial to optimizing the structures and operation parameters of multi-channel cylinder dryer in the industry but to develop a reasonable correlation of heat transfer coefficient in fundamental research.


Introduction
The steam condensation two-phase flow has been widely used in refrigeration equipment [1], heat pump systems [2], reactor systems [3], water desalination [4], and other various applications due to its glorious heat transfer efficiency and relatively stable temperature control.In the application of paper-making machine, the released heat of steam condensation at the internal surface of the cylinder to remove residual water from pulp or paper [5], where the overall thermal resistance of cylinder dryer mainly consists of the resistance from condensate layer, dryer shell, and dried paper.Extensive studies revealed that the condensation heat transfer resistance largely affects the heat transfer efficiency of cylinder dryer in paper machine [6,7].To enhance the condensation heat transfer coefficient (HTC), the multi-channel dryer (MCD) [8] was proposed by Argonne National Laboratory to decrease the thickness of condensation layer significantly and to enhance the HTC of cylinder dryer.In a multi-channel dryer, the incoming steam is restricted to flow through a horizontal rectangle small-channel and longitudinally distributed on the inner wall of the drying cylinder, where the condensate water would be pushed out of the cylinder dryer by the steam flow.In contrast, its condensation HTC is nearly 7-20 times larger than that of the traditional dryer [8][9][10].To further reduce the energy consumption, the HTC prediction of MCD is significantly important and it is also an elusive task because of the complex fluid and thermal behavior in these twophase systems.Therefore, to design and optimize the next-generation new dryers, accurately prediction of the heat transfer coefficients for steam condensation in small channel is very urgent and interesting.
Various calculation methods and correlations have been developed by researchers to predict the HTC of steam condensate in the small horizontal channels [9][10][11].The most adopted method of predicting heat transfer coefficients is based on the empirical and semi-empirical correlations, which in general are developed from extensive experiments over a range of geometric and flow parameters with selected working fluids.For example, scholars developed an empirical correlation-based method to predict the heat transfer performance of nanofluid flow [12] or other kinds of heat and mass transfer phenomena [13,14].The databases and correlations for saturated flow boiling heat transfer coefficient for cryogens in uniformly heated tubes were built, based on which a new universal cryogen saturated boiling HTC correlation was developed and the obtained HTC correlation showed a good predictive agreement with the database with an MAE of 26.4 and 34.5% [15].In addition, based on the characteristics of condensation heat transfer, scholars have constructed various correlations to predict the condensation HTC inside channels based on experimental and numerical databases [16][17][18].Gong et al. [18] found that the heat transfer in the condensate zone accounts for more than 10% of the total heat transfer, which cannot be neglected.It has been demonstrated that the condensation HTC can vary over two orders of magnitude in one tube cross section in a large, flattened tube of the condenser by varying condensation temperature [19].Shen et al. [20] found that the local HTC increases with steam quality and steam mass flow rate along the circumferential direction.Although the above studies have developed many correlations to give accurate heat transfer predictions for individual tests or configurations, it cannot be directly used to predict the HTC outside the tested range.Meanwhile, the heat transfer characteristics in small channels are a complex fundamental problem and affected by multiple factors including steam mass flux, steam saturation temperature, steam quality, system pressure, channel aspect ratio, etc.Therefore, there still requires more work on developing some correlations and two-phase flow theory or a new method to universally predict the condensation HTC.
Machine learning techniques have been successfully used to study the heat transfer coefficient for twophase flow from the fundamental level to full-scale practical application [4,21,22].Liu et al. developed the extreme machine learning integrated Bayesian methods and successfully predicted the color changes of mushroom slices during the drying process [21].The reinforcement machine learning algorithm was constructed for optimizing tobacco drying process control [22].Cho et al. [23] developed a consolidated database covering a broad range of geometric values and operating conditions for free-fall condensation heat transfer on the external surfaces of vertical tubes in the presence of non-condensable gases, on which MLP algorithms are adopted to predict the condensation heat transfer rate.The proposed nonlinear regression model showed good prediction accuracy with a mean absolute error (MAE) of 12.7%, which is much lower than those achieved by previously proposed relevant correlations.Arif et al. [24] extracted new physical descriptors of boiling heat transfer from pool boiling experimental images without labeling and training by the machine learning model of principle component analysis (PCA), which would prompt the establishment of these generalized correlations.
In our study, the supervised machine learning algorithms are used to predict the heat transfer coefficient (HTC) of a multi-channel cylinder dryer (MCD) with different steam mass flux (ranges from 5 to 40 kgÁm À2 Ás À1 ) and cooling water mass flow rates (ranges from 56.2 to 532.8 kgÁh).The training and tested database for building machine learning models were obtained by MCD condensation experimental rig.The ANN, linear and non-linear support vector regression (SVR), and random forest algorithms are trained and the predicted results of four modes were compared.In addition, the importance of each physical descriptor for condensation heat transfer coefficient is obtained, which would guide a reasonable and effective strategy for optimizing MCD structures and operation conditions to enhance the energy utilization efficiency.

Materials and methods
The schematic diagram and experimental apparatus for measuring the condensation heat transfer coefficient of MCD are shown in Figure 1.The experiment apparatus consists of three subsystems: the experimental section, the steam circuit, and cooling water circuit.In the experiment section, there have three types of processes including condensation heat transfer process, gas-liquid two-phase change process, and flow pattern change process.The length of the channel made of rectangular aluminum plate in this experiment is 800 mm and the cross-section area is 13.5 Â 4.5 mm.Water vapor condensing channels and cooling water heat exchanging channels are set on both sides.One side of the steam channel can be observed by means of a high-temperature-resistant polycarbonate plate, and the other side is covered with a stainless-steel plate clamped and fixed for cooling water heat exchange channels.In addition to the visible section of the channel, all the outer surfaces are closely adhered with two layers of thermal insulation cotton with a thickness of 30 mm.
In the steam circuit, the deionized water flows out of the water storage tank through the drainer and is pumped into the boiler by a variable frequency diaphragm pump.The steam is generated by the deionized water heated by the electric heating boiler and then enters the experimental section through a very short insulation pipe.The steam passes through the experimental section and becomes a gas-liquid twophase fluid, which is fully condensed in the subcooler, on the one hand, it is convenient to measure the mass flow rate of the steam, and on the other hand, it prevents the steam from damaging subsequent flowmeters.The condensed liquid deionized water is measured by the turbine flowmeters and returned to the water storage tank for recycling.The cooling water circuit is mainly used to simulate the drying process of the wet paper web.The moisture content of the wet paper web, as the cooling load of the drying cylinder, will change, which is represented by adjustable cooling water flow.First, the cooling water is pumped from the water tank to the experimental section for heat exchange, and then it absorbs the heat released by the condensation of the steam, and finally, it is cooled by the heat exchanger and returned to the water tank for storage.
Temperature and pressure are monitored by T-type thermocouples and pressure transmitters positioned at the inlet and outlet, where the measured pressure is used to judge whether the steam reaches the saturation point.The measurement parameters of steam include the mass flux, temperature, pressure, and pressure difference between channel inlet and outlet.The main measurement parameters of cooling water include the temperature of the channel inlet and outlet and the mass flow rate of the channel inlet.In addition, the temperature of the channel wall would be collected between the steam and the cooling water channel at the inlet and outlet of the cooling water channel, respectively.In Figure 2, T s was the steam temperature measurement point.T ci was the temperature measurement point at the inlet of the cooling water channel, and T co was the temperature measurement point at the outlet of the cooling water channel.T wi is the temperature measurement point on the heat exchange wall close to the cooling water inlet, and T wo is the temperature measurement point on the heat exchange channel close to the cooling water outlet wall.The T ca is averaged by the temperatures of the cooling water channel at different positions.T wa is the average temperature of the heat exchange wall along the length of the channel.To obtain the fluid temperature parameters more accurately, the experimental channels are divided into segments.All average temperatures are taken as the average of the temperatures measured in each segment.
The heat transfer characteristics are characterized by the cooling water HTC h w , steam condensation HTC h v , and overall HTC K.The steam mass flux G is defined as, where V 1 is the volume flow rate read by the flowmeter.q v is the density of the steam.A 1 is the crosssection area of the steam channel.The released heat from the steam can be written as, where A is the surface area of the heat transfer, h v is the averaged condensation heat transfer coefficient of the steam channel.T s is the average steam temperature, and T wall is the average wall temperature between the steam channel and the cooling water channel.Both two temperatures are recorded by the evenly distributed Ttype thermocouples along the steam and cooling water channels length.The heat flux Q is expressed as, The heat balance of the experimental channel plate can be described as, According to Fourier's law, the heat loss, Q l can be calculated as, where T s and T o are the temperature of steam in the channel and the ambient air temperature, respectively.R n , R g , R b , and R w are the thermal resistance from steam to channel inner wall, channel wall, thermal insulation materials, and external surface of thermal insulation materials to the surrounding medium, respectively.According to flat plates external forced convection, where Nu, Re, and Pr are related to the thickness of the boundary layer.b is the width of the channel plate, k air is the thermal conductivity of air, L is the length of the channel plate, k steam is the thermal conductivity of steam.
The absorbed heat by the cooling water can be considered as the released heat from the steam with negligible heat loss since it is <0.5% of the total heat released by the steam.The absorbed heat by the cooling water is expressed as Equation ( 2): where m is the mass flow rate of the cooling water, and c p is the specific heat of the cooling water.T co and T ci are the outlet and inlet temperatures of cooling water, respectively.
V 2 is the volume flow rate measured by the flowmeter.q c is the density of the cooling water.
The cooling water HTC can be calculated by the Gnielinski correlation [25], where k c , q c , and l c denote the thermal conductivity, density, and kinematic viscosity of cooling water, respectively.d c and l c represented the characteristic dimension of the cooling water channel.Pr represents the Prandtl number of cooling water, and Pr wall is the Prandtl number at the wall temperature.u c expresses the cooling water velocity.The local condensation HTC of any minor test segment can be expressed as: where h v,i , Q v,i , and A i express condensation HTC, heat transfer rate, area, and heat flux of each minor segment, respectively.T v; i , T wall; i are the temperature of the steam and cooling water in each minor segment, respectively.In each minor segment, the heat transfer rate absorbed by cooling water was approximately equal to the heat transfer rate released from the steam.Therefore, the average condensation HTC h v of the steam channel side is represented as: where l is the channel length.For a paper dryer, the heat transfer resistance includes eight parts, as shown in Figure 3. Due to HTC is inversely proportional to the heat transfer resistance, the calculation of the overall HTC should include the HTC of eight parts.If the less influential parts were negligible, the overall HTC is mainly composed of three parts: the condensation HTC from the steam to the inner surface of the cylinder (including steam convection and conduction heat transfer of condensate layer), the cylinder shell conduction, and the HTC from outer media (e.g.wet paper or coolant).Finally, the overall HTC K can be calculated by: where d was the wall thickness and k was the thermal conductivity of the aluminum metal.
The heat transfer characteristics were evaluated by fully measuring its impact parameters at different corresponding positions using well-calibrated sensors and devices with ranges and accuracies, as presented in Table 1.
The second-order uncertainty estimation method recommended by Moffat [26] is employed to evaluate the uncertainties of key experimental results.Here, it is assumed that R is a function of N independent variables i.e., X 1 , X 2 , … … X n , R can be written as: The uncertainty of each independent variable is assumed to follow the normal distribution, the uncertainty of R can be calculated by: Then the relative uncertainty of R is calculated by the following formula: The uncertainties of HTC and other physical quantities are shown in Table 2.

Machine learning models
Four machine models based on BOA optimization are proposed to predict the condensation heat transfer coefficient of MCD.As shown in Figure 4(a), the proposed machine model involves two processes: learning and exploration.In the learning process, it includes two stages: (1) the data acquisition from the experimental setup of steam condensation in small channels is input into the BOA to construct the hyper parameter of different deep learning models including SVR, ANN, and RF models.(2) The trained hyper parameters are applied to construct the supervised deep learning models based on the acquired experimental data.The prediction sensitivity of HTC is used to evaluate the prediction performance of trained machine learning models.In the exploration process, the constructed deep learning models are applied to predict the HTC under other experimental conditions.Additionally, the feature weights of different factors on HTC are evaluated, and this exploration process is conducted off-line and is aimed at generating the HTC prediction model of machine tools using training data.All the machine learning algorithms in our study are called in the scikit-learn platform, where the codes have been made available in the repository.

Artificial neural network
ANN is one of the supervised learning algorithms that mimic neural cells in a human brain [4] and is constructed by a layer of neurons (transfer function) between inputs and outputs shown in Figure 4(b).The topological architecture consists of three layers: input, hidden, and output layers.Each layer contains a specified number of artificial neurons, in which a specific  problem predefines the number of neurons in the input and output layers.The number of hidden neurons is a variable of the ANN architecture optimization process to obtain an optimal network architecture.Two neurons in the two consecutive layers are connected by a weight.Each neuron in the upper layer receives a signal from all neurons in the lower layers and yields an output by applying the weight sum and activation function of all the signals.The activation function enables the network to learn the non-linear relationship between the input and output.Signals from the input layer are propagated through the entire network to obtain the final estimation values at the output layer.A linear activation function was used in the output layer because the ANN architectures were developed for regression purposes.
During training, the weights were updated using a backpropagation algorithm to minimize the difference between the output and target of a network.An optimal ANN architecture is composed of a combination of hyper parameters, including the number of hidden layers, number of neurons in each hidden layer, activation function, learning rate, weight constraint, and dropout rate, which yields the best prediction result on the validating data based on Bayesian optimization algorithm (BOA).

Random forest
Random forest (RF) is one ensemble learning method by adding multiple decision trees [27].The training  dataset is obtained as input data from experimental measurements shown in Figure 1.Firstly, the bootstrap resampling method uses a specific random selection algorithm in each mode to randomly select samples from (n) sets.Then a weighted vote of all decision trees makes the final prediction.CART was utilized for the forest construction in our study.Equation ( 1) is applied to calculate Gini index of every tree in the forest which indicates the purity of tree nodes.Gini index is equal to 0 only if every sample of a node belongs to the same class.Then a weighted vote of all decision trees makes the final prediction.CART was utilized for the forest construction in our study.Equation ( 1) is applied to calculate Gini index of every tree in the forest which indicates the purity of tree nodes.Gini index is equal to 0 only if every sample of a node belongs to the same class.
where Gini index of node (i) is expressed by f(t), p i is class (i) relative frequency in node (t).After the model is constructed, RF can evaluate the relative importance of all descriptors.
According to a single tree, the contribution of a descriptor is obtained by summing the reduction of Gini index over every node in which the definite descriptor is chosen to split.For different cases, the importance of the final descriptor is averaged among all trees.

Support vector machine
The support vector machine (SVM) [28] mainly depends on the theory of statistical learning and the principle of structural risk minimization, which has shown outstanding success in the field of thermal energy conversion and utilization.Given data for training consists of (Z ¼ Z 1 , Z 2 , :::, Z n ½ ) as the input matrix and an output vector (R ¼ R 1 , R 2 , :::, R n ½ ), the SVM could construct an optimized linear regression based on nonlinear kernel functions via mapping the primary samples of training to characteristic space having a higher-dimensional feature space.As a result, the nonlinear problem can be converted to a linear one.The SVM regression function f ðzÞ is expressed as: where /ðzÞ is a nonlinear function through which, primary input vectors are mapped into a higher dimensional feature space, w is the weight and b is bias vector.Applying Lagrange multipliers and optimality limits, the SVM regression function is expressed as: where Lagrange multipliers are denoted by a i and a Ã i , while kernel function is denoted as Kðz, z i Þ: Gaussian radial basis function (RBF) is chosen as the kernel function in our study owing to its enormous advantages, such as no existence for too much parameters and excellent capacity in capturing the nonlinear relationship.

Bayesian optimization algorithm
For machine learning models, such as ANN, SVR, and RF adopted in our study, the determination of hyper parameters can largely determine the accuracy for predicting the output data., depends heavily on their hyper parameters.During the processes of machine learning training, the core task is to determine these hyper parameters of training function, number of the hidden layer, number of neurons in each hidden layer, learning rate, and activation function.Meanwhile, the hyper parameters of SVR model are required to select the best hyperparameters that yield the lowest validation error.and they are the box constraint, insensitive loss function, the kernel function types (linear, polynomial), and the Gaussian spacing parameter.
In our study, Bayesian optimization algorithm (BOA) [21] is applied to search for the best values/types of hyper parameters.Suppose the space of possible hyper parameter is Y and the objective function is f to minimize validation error, the BOA optimization can be described by the following Equation ( 25): where y Ã is the set of hyper parameters that generates the smallest value of the objective score, and y denotes any value of space Y.The optimization is characterized as the global optimization of black-box functions where the expressions and their derivatives are unknown.The optimization is more effective than random, grid, or manual search algorithms.The algorithm uses a surrogate probability model based on the Bayes' rule.In this approach, the next iteration values are chosen based on the results of prior iterations.Thus, it can produce the optimum location more effective than arbitrary choice.
For evaluation of the prediction performance of different ML models, the four indicators including determination coefficient ðR 2 Þ, and mean absolute percentage error (MAPE), are applied for a quantitative comparison of the predictive power of various ML methods, and they are defined as: where y exp and y pre are the experimental results and predicted values, respectively.

Results and discussion
As mentioned before, four machine learning models are used to predict the condensate heat transfer coefficient of MCD and estimate the importance of each physical parameter on the heat transfer coefficient based on the dataset from experimental measurements.The following sections would summarize and discuss the results of experiments and machine learning prediction.

Thermal characteristics of small channel in MCD
Due to the overall HTC is mainly composed of three parts: the condensation HTC from the steam to the inner surface of the cylinder (including steam convection and conduction heat transfer of condensate layer), the cylinder shell conduction, and the HTC from outer media (e.g.wet paper or coolant).Thus, the overall HTC must be used to comprehensively consider the influencing factors to systematically study the practical effect of heat transfer.HTC of steam condensation can be affected by many factors including operating conditions, and dimensionless groups, and each of them is valid in a finite range.Some factors, affecting on the coolant HTC in a tube or a channel with steam, are experimentally investigated as some scholars consider the coolant HTC contributes to obtain higher overall HTC.There is convection heat transfer because of air flow on the outside surface of a dryer.While one of the key factors influencing convection heat transfer is the coolant Nusselt number (Nu).Both coolant Reynolds number (Re) and Prandtl number (Pr) are significant parameters related to Nu. Due to the convection heat transfer can be promoted by turbulence, Re should be investigated, as it reflects the intensity of turbulence of fluid coolant.The more intense was the turbulence of coolant, the better was the convection heat transfer.Pr should be investigated, while it reflects the influence of the physical properties of the coolant on the convection heat transfer.Since Pr ¼ m c ÁCp c /k c (where l c , Cp c , and k c are the kinematic viscosity, specific heat capacity, and thermal conductivity of coolant, respectively).Besides the mass flow rate of coolant, the increase of coolant Re, Pr, and Nu results in so better coolant HTC than heat transfer performance.
In the thermal characteristics of the small channel in MCD, it mainly has two sets of heat transfer systems of water steam as the heating medium and cooling water as the thermal load, which can directly affect the HTC of the small channel in MCD.Due to the retardation of heat transfer in a small channel due to some uncertain factors of experiment rigs, heat supply, and cooling water flow cannot be changed simultaneously.The two sets of experiments were carried out under different operation conditions, where one set is measured under the fixed steam mass flux G ¼ 30 kgm À2 s À1 in Figure 5(a), and the other is carried out under the fixed cooling water mass flow rate 140.4 kgh À1 in Figure 5(b).In addition, more temperature parameters of the channel wall along the axis of the small channel were shown in Table S1.
The influence of thermal load on the heat transfer coefficient cannot be non-negligible.In our experiment, the cooling water was utilized as the thermal load, and its physical parameters are introduced to calculate the cooling water convection HTC, including the cooling water mass flow rate(m), Re, Pr, and Nu, as shown in Figure 5(a).It is found that the turbulence of cooling water reflected by increasing Re enhances with the increasing m for G ¼ 30 kgm À2 s À1 .Based on the fluctuation range of the cooling water flow rate, it can clearly observe that its flow experiences laminar flow (Re < 2300), transition flow (2300 < Re < 4000), turbulence (4000 < Re < 10,000), and fully developed turbulence (Re > 10,000), which mainly originates from the unstable water flow field induced by the shear effect between the metal wall surface and cooling water.The shear effect largely is determined by the value of Re.Specifically, when the weak shear effect implies a small Re, the thickness of the cooling water layer would become thicker, resulting in an increase of convection heat transfer resistance.Nu denotes the ratio of thermal conductivity of the bottom layer of cooling water laminar flow to the convection heat transfer resistance.Thus, Nu decreased with the increasing Re.Moreover, when the shearing effect is strong with a large Re, the heat transfer caused by turbulence was enhanced, resulting in a smaller convection heat transfer resistance and an enhanced Nu.Finally, as Re increased, Nu presented an increasing trend accordingly.Meanwhile, a large amount of cooling water reduced its average temperature (T ca ).Among the three parameters determining Pr ¼ m c ÁCp c /k c , the increasing trend of m c is greater than the decreasing trend of the other two parameters, so Pr increased.As a result, Nu presented significant increasing trends with an increase of m.
Figure 5(b) showed the influence of heat flux (Q) on all the HTC for the fixed m ¼ 140.4 kgÁh À1 ).Steam condensation HTC (h v ), cooling water convection HTC (h w ), and overall HTC (K) increased with an increase in heat flux.The condensation HTC is enhanced by increasing Q according to Fourier's law inside the steam channel.Meanwhile, an increase of Q enhanced the cooling water temperature, which undoubtedly promoted the thermal conductivity (k c ) and Nu of the cooling water.Both important components of K increased, and an increase in K is no suspense.In consequence, it can be concluded that the increase in Q resulted in better heat transfer performance.The relationship between the other 14 factors and three HTCs was not given here.After all, the purpose of this paper was to use ML to achieve accurate predictions, rather than clarify these complicated processes one by one.

Results from machine learning
Based on the database from high-throughput experimental measurements, recently developed BOA-optimized machine learning approaches are employed to further elucidate the correlation between condensation heat transfer coefficient and physical parameters of small channels in MCD.First, linear correlations among 15 features (G, m, T co , T ca , T wi , T wo , T wa , T s , Q, Re, Pr, Nu, h v , h w , K) were investigated using the Pearson correlation coefficient.Figure 6 clearly shows that overall HTC (K) are positively correlated with G, m, T s , Q, Re, Pr, Nu, h v , and h w , and non-correlated with T co , T ca , T wa , T wo , and T wi .On the other hand, m positively correlates with Re, Nu, and h w , but negatively correlates with T co , T ca , T wi , T wo , and T wa , indicating that Re, Nu, and h w (T co , T ca , T wi , T wo , and T wa ) increase (decrease) with increasing m. m is the main factor in the formula for calculating Re, Nu, and h w since it is proportional to Re, Nu, and h w .The main reason for m reducing T co , T ca , T wi , T wo , and T wa is that it is the cooling medium.Meanwhile, T co , T ca , T wi , T wo , and T wa show a strong negative correlation with Pr, indicating that Pr decreases with T co , T ca , T wi , T wo , and T wa .These results accord with a significant effect of the temperature boundary layer on Pr.Finally, T wi , T wo , and T wa show a slight negative correlation with Re, Nu, and h w , since the wall temperatures are not correlated with calculating Re, Nu, and h w .T wi , T wo , and T wa are almost no correlation with K, indicating that T wi , T wo , and T wa barely influence K.
According to the experimental results, 15 features (G, m, T co , T ca , T wi , T wo , T wa , T s , Q, Re, Pr, Nu, h v , h w , K) are associated with data points.The data set was split into two portions and 80 and 20% are used for training and testing datasets of machine learning.The constructed four BOA-based ML models for predicting overall HTC are shown in Figure 6.It is found that the predicted HTC from different ML models for steam condensation system shows good agreement with the experimental values, which indicates the BOA could effectively optimize the hyperparameters and improve the accuracy of predicted results compared with that from the manual selection in previous studies [29].Obviously, the difference between the trained and tested results of ANN is the fewest, and the differences between trained and tested results of RF are relatively more than other ML models.The prediction accuracy of the ANN model mainly depends on both the number of hidden layer nodes and the number of neurons.The proper number of hidden layer nodes and neurons is required.Too many hidden layer nodes and neurons will cause overtraining problems resulting in poor generalization capacity.In our study, BOA-based on Gaussian kernel function can successfully optimize and obtain the optimal parameters.
To quantify the performance of our trained BOAbased ML models, Figure 7 shows that the values of R 2 for trained BOA-based SVR, ANN, linear SVR, and RF are 0.99, 0.999, 0.999, 0.994, respectively, while the values of R 2 for tested BOA-based SVR, ANN, linear SVR, and RF are 0.983, 0.997, 0.996, and 0.953, respectively.To present a complete description of Bayesian optimization algorithm, the R 2 evolution of four machine learning algorithms is shown in Figure S1.It was clearly found that all the values of R 2 for four machine learning algorithms increase with the increasing optimization generation and approaches to converged value.In contrast, the random forest algorithm shows the best performance while the linear SVR is the worst one due to the linear nature of the regression algorithm.In addition, the optimized hyperparameters of four machine learning algorithms are shown in Table S1, which would be helpful to reproduce the machine learning techniques.Meanwhile, the cost times of BOA optimization for four machine learning algorithms are shown in Figure S2.The cost time of the ANN algorithm is the most expensive due to the maximum number of hyperparameters in the ANN algorithm.Interestingly, it was found that both the prediction performance and cost time of hyperparameters optimization in the Random Forest algorithm show an obvious advantage over the other three algorithms.Therefore, the Random Forest is preferable for predicting the HTC in small channels.
In previous studies about calculating condensation heat transfer coefficient, the usual adopted method is based on a correlation dataset [10,30], where the determination coefficient, in general, is in the range from 0.6 to 0.8 shown in Figure 8.By comparison, the machine learning prediction for condensation heat transfer coefficient is acceptable.In addition, the MAPE of the above four models is shown in Figure 8(a), where the performance of four ML models is consistent with the value of R 2 .
To better reflect the importance of all input feature parameters on the overall HTC, the RF algorithm can calculate the contribution percentages of feature descriptors.18.68, and 17.46%, respectively.The increased m can result in an extremely unstable flow field, which would promote h w as the main contributor to overall HTC.With the increase of Nu and Re, the heat transfer induced by turbulence would be enhanced, thus h w can also be improved.As to h w , one of the main components of the condensation heat transfer coefficient of MCD, it is easy to deduce that the h w shows a positive effect on the condensation heat transfer coefficient K.As the fifth important parameter of Q, the enhanced Q can increase the temperature of cooling water, further inducing the increase of K according to Fourier's law.In addition, other parameters had an importance contribution value of 9.98% and it includes the steam parameters (G, T s , h v ), water parameters (T co , T ca , Pr), and wall surface parameters (T wi , T wo , T wa ).From Figure 1, it is well known that both G and T s are the properties of heat resources, K is not related to the physical properties of heat resources but the thermal resistance in the process of heat transfer.In the case of low steam mass flux, the effect of h v on K is very small and negligible.The temperature parameters (T co , T ca , T wi , T wo , and T wa ) were not necessary for calculating K.As for Pr, its value is in the range of 3.89-6.97due to no phase change of cooling water in our experimental conditions.Obviously, the variation has minimal influence on the calculation of K.
To compare the predicted HTC from our developed ML models and Shah correlation [31], the dependence of experimental data on predicted data from different prediction methods is plotted in Figure 9.It is clearly found that the predicted HTC from ML models has good accuracy to predict the HTC of MCD.Moreover, the MAPE for Shah correlation with experimental data is 29.7%, while the MAPE for BOA-based SVR, ANN, linear SVR, and RF with experimental data are 0.013, 1.16, 0.012, and 6.02%, respectively.As such, using BOA-based supervised ML can obtain a very fast prediction of the HTC and the results are more accurate than the Shah correlation-based prediction method.

Conclusions
In this work, four supervised machine learning models of random forest (RF), support vector machine (SVM), and artificial neural network (ANN) are successfully used to predict the condensate the heat transfer coefficient of MCD, where the Bayesian optimization algorithm can effectively optimize the hyper parameters of machine learning models and enhance their performances.The Pearson's correlations analysis concluded that h w , Re, Nu, and m of coolant water are the main influencing factors of the condensation heat transfer coefficient of MCD, which is consistent with the results from the importance contributor from RF algorithm.The prediction performance of Bayesian-based SVR, ANN, linear SVR, and RF were calculated as 0.983, 0.997, 0.996, and 0.953, respectively for determination coefficients, which indicates that the Bayesian optimizationbased machine learning models can effectively predict the condensation heat transfer coefficient of MCD with less computation and experimental expense.In addition, RF can also calculate the feature importance of descriptors, and the descriptor importance value are 20.76,19.15, 18.68, and 17.46% for m, Nu, h w , and Re.Our study results indicate that machine learning models can accurately predict the condensate heat transfer coefficient of MCD, which would be helpful to design and optimize the structures and operation parameters of dryer in MCD in industry and scientific research.Development Project (Grant No.2020GY-105).

Figure 1 .
Figure 1.The schematic diagram and experimental apparatus for measuring condensation heat transfer of multi-channel cylinder dryer.

Figure 2 .
Figure 2. The schematic diagram of experimental channel plate as test section, where the steam inlet and coolant outlet are presented and the location of pressure and temperature measurement are also labeled.

Figure 3 .
Figure 3.The schematic figure of cylinder dryer and enlarged part denotes the heat transfer resistance in a cylinder dryer including part A: steam convection heat transfer resistance, part B: condensed layer, part C: scale inside the dryer, part D: dryer shell, part E: dirt and air between the outside surface of the dryer and wet paper, part F: paper dried, part G: dryer felt and fabric, and part H: air boundary layer.

Figure 4 .
Figure 4. (a) The workflow of machine learning model for predicting the heat transfer coefficient of small channels.(b) Artificial neural network is composed of one input, one output, and two hidden layers, where these nodes are arranged in layers with connections, indicating learned parameters, between every node of a layer and every node of the next layer.(c) The machine learning model of random forest.

Figure 6 .
Figure 6.Pearson's correlations between input layer features and output layer target.
Figure 8(b) shows that the four important contributions of m, Nu, h w , and Re, were 20.76, 19.15,

Figure 7 .
Figure 7.The linear correlation between experimental and predicted overall HTC for the four ML models: ANN, RF, linear SVR, and SVR based on BOA.For each plot, the blue triangle points denote the training data, the red points denote the test data, and the black line denotes the reference for fitting the results.

Figure 8 .
Figure 8.(a) Mean absolute percentage error in different models including linear SVR, RF, ANN, and SVR based on BOA and (b) the feature importance based on RF algorithm.The importance is ranked based on which of features is the most informative for the algorithm to make a decision.

Figure 9 .
Figure 9.The linear correlation comparison between experimental and predicted overall HTC from the Shah correlation and four ML models: BOA-based ANN, RF, linear SVR, and SVR.

Table 1 .
Characteristics of measuring devices.

Table 2 .
Analysis of relative uncertainties.