Application of Data Mining Techniques for the Investigation of Factors Affecting Transportation Enterprises

 Abstract — In land transportation, due to the importance of this business and competition between active organizations in this field, applying new technologies in management and making better decisions can be beneficial. We have presented three data mining techniques of clustering, association rules, and classification to investigate the factors affecting the cost and time of road and rail transportation. Using methods based on the K-means algorithm with comparing four clustering, we have proposed Naive Bayes (probabilistic) classification to determine the total accuracy of transportation percent to 97.91%. Finally, classification tree algorithms such as Bayesian theory and random forest have been used, and the results and output rules have been compared. This article is comprehensive and new to use various effective parameters inland transportation. We will confirm its efficiency by using the criterion(5v)(which we will explain in its place) and then the results in the field. A larger one, called land transit, could be generalized between the two countries. In the end, we have discussed more in methodology and results


INTRODUCTION
N today's world, data and information are considered as organizational wealth and organizations.The world's largest and most successful companies are always looking for more appropriate and commercial use of these virtual resources.On the other hand, with the complexity of business environments, the nature and volume of organizational data have become

Theoretical Foundations and Research Background
So far, several definitions have been proposed for data mining.
In some of these definitions, data mining is a tool that enables users to communicate directly with large volumes of data.In some more precise definitions, data exploration has also been considered.Here are some of these definitions.
The term data mining refers to the semi-automated process of analyzing large databases to find functional patterns.[2] Data mining means searching in a collection of data to find patterns among them.[3] Data mining means extracting new and citable knowledge from a large database set.[3] Data mining means analyzing the observable data set to find reliable relationships between data.[4]

2-1. Data Mining Process
According to Cross-Industry Standard Process for DM, five phases can be considered for data mining projects: Business Understanding, Data Understanding, Data Preparation, Modeling, Evaluation, and Deployment.[5] This standard process, shown in Figure 1, does not move linearly; instead, it is possible to go back at any stage.And this process is so repeated until we finally achieve the best result.We have also, define the acronyms of each factor of influence on transportation in Table 2. Based on this we have to easily understand what are factors meaning in each part.

K-Means Algorithm
Clustering is one of the branches of unsupervised learning.It is an automated process during which samples are divided into classes whose members are similar to each other, called clusters.Thus, a cluster is a set of objects in which things are similar to each other and are dissimilar to objects in other clusters.Different criteria can be considered for similarity; for example, the distance metric can be used for clustering, and closer to each other can be considered a cluster, which is also called distance-based clustering.One of these methods is kmeans clustering.[2] In a simple type of this method, first, the required number of clusters of points is randomly selected.Then, in the data, according to the degree of proximity (similarity), they are attributed to one of these clusters, and thus new clusters are obtained.By repeating the same procedure, new centers can be calculated for them in each repetition by accepting the average of the data and re-assigned the data to new clusters.This process continues until the data is no longer changed.[9] Table 3. different literature review using different methods This research as the Table 3. shows that in the study of factors affecting transportation from limited parameters such as vehicle speed or transport flow and from methods such as clustering and classification and ANN have been used, while the above research in terms of factors affecting transportation (Table 1) is comprehensive and a combination of clustering Rapid Miner classification tree algorithms such as Bayesian theory and random forest have been used and the results and output rules have been compared.With reference to table number one, which includes the factors affecting transportation, and in comparison, with previous studies (table above), if from a simple weighting method (assuming the weight of all factors is equal), the score is as follows:

Table 4. different factors effecting on transportation based on various methods
The parameters of the Table 4. which have been used to analyze the strengths and weakness of each of the proposed models, have been extracted from a review of the literature .This Table uses the simple scoring method(assume the weight of all these parameters is the same and consider it 10 for ease of operation).For example, the Two-Component clustering model has a positive score of 20 and has weaknesses, including inattention to demand data, amount of demand, demanded tonnage(kg), etc., which in total has a negative score of minus 120.While our new research is comprehensive, it has a positive score of 140.Then, a table has been provided from shipping exported during the last year and six months by an international transport company.First, data clustering has been done using Rapid Miner, and then association rules have been extracted.Finally, classification tree algorithms such as Bayesian theory and random forest have been used, and the results and output rules have been compared.

Clustering Assessment(Davies Bouldin Index)[2]
There are different criteria for evaluating clustering algorithms that can be divided into unsupervised and supervised classes.Unsupervised evaluation criteria sometimes referred to as internal criteria in scientific literature, are those operations that are used according to the information contained in the data set.The most important task of a clustering algorithm is to minimize the within-cluster distance and maximize the distance between clusters.
Hence, the two important factors used in all evaluation criteria are cluster density and cluster separation and meeting the goals of minimizing the distance within clusters and maximizing the distance between clusters in the group, respectively, maximizing each cluster's density and maximizing the separation between sets.
One of these criteria is Davis Bouldin which is defined based on these two essential factors: Separation factor or Separation in this criterion is defined as dis(C i , C j , )In which dis is the Euclidean distance between the center of two clusters.
The density factor or Cohesion is defined as ∑ (,   , ) ∈  , 1  .Where x is the point inside the cluster and c is the center of the cluster.Davis Bouldin's general formula is as follows: The value of NC is for all points.

FP-Growth Algorithm
There are many algorithms for the production of association rules.Two of the most widely used methods are Apriority and FP-Growth algorithms.In this research, we use the second method.
It identifies frequent patterns without producing a set of candidate items and with the help of a data tree structure.This algorithm, which follows the Divide and Conquers strategy, first transforms the data set into a FP-tree tree.It then directly extracts frequent patterns from this tree.
Concisely, it can be said that one FP-tree is a compact representation of transaction data set.For this algorithm tree, transaction items are read one by one and mapped as a path on the tree.
Because transactions typically use common items; therefore, the transaction paths overlap on the tree, and because of this, the tree will be more compact than the initial data.In the first database scan, the set of one-member items and their support are identified, and also the set of repetitive items are arranged in descending order of their support.A tree is then constructed as follows: First, the tree's root with the null label is built.The dataset is then scanned a second time.
The items in each transaction are processed in L order, and a branch is created for each transaction.for facilitating tree navigation, a table is created in which each pen is pointed to its place in the tree.The tree is completed after scanning all transactions.[2][9].

Classification Algorithm
Bayesian theory is one of the statistical methods of classification.
In this method, different classes are each considered as a hypothesis with probability.
Each new training record increases or decreases the probability that the previous hypotheses are correct.Finally, the hypotheses that have the highest probability are considered as a class and are labeled.This technique combines Bayes's theory with a causal relationship between data.[

[
The tree-based random forest produces many decisions.Classify a new object, the input vector is placed at the end of each of the random forest trees.For each tree that provides a classification, this tree gives that class a "vote".The forest chooses the classification that has the most votes (among all forest trees).[10 [ Each tree is formed as follows: 1.If N is the number of modes in the train data set, N modes randomly, with the substitute of the primary data, are sampled.This is a sample work developed for this tree.
2. If we have M variables and consider m smaller than M so that in each node, the M variables are randomly selected from M, and the best separation is used on these m variables to separate the node.The value of m is considered constant during forest construction.
3.Each tree grows as large as possible.There is no pruning.

Data Mining in Transportation
In transportation engineering, a lot of data has been extracted in the areas of the field of traffic, accident analysis, asphalt conditions, road durability, stair management, and so on.Based on this data, decision-makers decide to solve a related problem.Decision-makers are always looking for ways to access data and use different data easily.The ability to identify available data, determine data features, extract required data, and convert data to the required formats is among the requirements for planning.In the real world of Transportation, many different information has to be merged to reach a solution.Recent research in transportation data mining has created new horizons for decision-makers of transportation engineers.
In 2014, Li and Chen combined three data mining methods: K-means clustering, decision trees, and neural networks that predicted freeway travel time with non-recurrent congestion.[11 [ Ahmadian et al. used the Priory algorithm to obtain the available association rules in the time/place intervals of problems.They also used a 2-step technique to identify simultaneously over a certain period.Finally, they showed that the resulting laws be used to determine the needs of citizens and improve the management of municipal services.[12[ In his research about time series prediction, Peter Zhang used a combined Autoregressive Integrated Moving Average (ARIMA) model and neural networks.He concluded that the use of time series models alone does not have the accuracy and power of neural networks.
He proposed combining artificial neural network models and ARIMA as an optimal solution in predicting time series.He claimed that a combination of artificial neural network models as a nonlinear model and ARIMA model as a linear model in time series prediction performs better than the prediction made using each model alone.For proving this claim using empirical data, he first modeled the data set using the ARIMA model and compared the output with the output of his hypothesis model.Based on that, he proved his hypothesis.[13 [ Lathia,Froehlich,and Capra, in an article Mining Public Transport Usage for Personalized Intelligent Transport Systems, used data mining tools and techniques and reduced the travel time of passengers to reach their destination and even predicted the travel patterns of travelers.A case study of this research has been done using the data available in the London Subway transportation system database.It has estimated the travel time of each passenger and extracted the stations that are used most by each passenger.By presenting a model, it offers the passenger the route that has the shortest travel time.Another purpose of this article was to determine the most used stations because this information effectively improves and changes the travel plans of that station.[15] In their article, Adams, Akano, and Asemota predicted the amount of electricity needed in Nigeria.The goal was to predict the amount of electricity by the country in the next ten years, knowing the amount of electricity generated between 1970 and 2009.The reason for this research was that in 2005, due to insufficient electricity production and reduced rainfall, only 40% of the country's electricity needs were provided.Therefore, the use of a model that can make this prediction was the group's objective.In this study, the group used the ARIMA model for forecasting and explained the reason as follows: ARIMA model is a general model and can model data and series with seasonal and non-seasonal period and determine the degree of this model they used ACF and PACF.After modeling test data with different degrees of autoregressive, moving average, differentiation, evaluation of outputs based on parameters Stationary R2 and Bayesian Information Criteria (BIC).this group selects the best model on the amount of required electricity.They predicted the electricity for the next ten years.[17]

Crisp Process
America's most extensive software and hardware Electronics Company (National Cash Register, NCR), as part of its goal, has established teams of consultants and data mining technicians to serve the needs of its customers to give more value to its data warehouse customers (Teradata) [29] The data mining process in the Crisp data mining standard is carried out in six stages by the data discovery method presented by Osama Fayyad et al. [6]; In the following, the method of conducting the research based on these steps will be described.

Understanding the Business Environment
Understanding Business Objectives: The first step in any data mining project is to determine what is expected from the implementation of the project.
Understanding the goals: Business goals are proposed in business or health, and data mining goals should be examined in the field of data mining and its techniques.The most critical output at this stage is the list of data mining objectives.Also, the criteria for data mining success must be determined.
Preparing a project program: At the end of the first stage of Crisp, a program similar to other studies should be set.In addition to specifying the details of research activities and the time of its implementation, the required methods and tools in the project must be selected in this program.The output of this part of the process is in the form of implementation documents and a research schedule.The program of the considered research is presented in the form of a proposal in advance.

Understanding The Data
Collection of initial data: In this stage, depending on the type of project, data are collected by different sources with a questionnaire or as systematic.
Data Description: Data description can be defined as the preliminary examination of the data and the extraction of the initial features of the data set.
In this research, the data format, number of variables, type of variables (continuous and discrete) data, and the concept of variables are examined in this stage.Also, in this section, according to the number of records, it was determined whether or not the type and volume of data meet the requirements of the research.
Data Discovery: After a preliminary description of the data, a closer look at the data should be provided.At this stage, the main features of the data should be extracted using statistical methods, and the features of the data should be depicted with the help of visualization tools.The output of this step is usually in the form of a report in which data errors, missing values, and duplicate records are identified.Data errors are meaningless values and remote data that affect results.

Data preparation
Data selection: In some problems, due to the large volume of records available, sampling methods are used to reduce the volume of data.
Data wiping: In this section, the lost values should be replaced by using the output report of the data quality check in the previous step.Also, in the second stage, the meaningless values of the variables should be deleted or replaced with meaningful values according to the report of the concepts of the variables in the previous step.
Reducing variables: At this stage, after data wiping, the existing methods of feature selection should be used to find influential variables and eliminate low-impact and ineffective variables.
This step is efficient, and when the number of variables is large, it reduces the implementation time and modeling efficiency, so it is a significant step.

Modeling and Evaluations
Selection of modeling method: In this step, the first several classifications or clustering techniques will be selected based on the data type.Depending on the preprocessing stage, we may need to go back to this stage based on the evaluation parameters and examine the different classification techniques and parameters.
Evaluation of results: In this step, the evaluation criteria of the model can be selected according to the field of research and study of valuable articles and analyses done on the data; and the best method and class in each step was selected based on this feature.In cases where the cost is low, and the budget is sufficient, the created model can be tested on other real data sets, and the accuracy of the developed model can be further examined.

Model data Collection and Information
The collected data set includes 1389 information related to the Transportation of various products of a transport company on multiple types of railways and roads.It exports products including zinc ingots and aluminum ingots to Turkey.The data consists of 14 numerical and discrete columns.The features are defined as in Table 5.In this step, after describing the data with a diagram, if there are missing data, based on the number of missing data, the approach of deleting or replacing the lost data is selected.Also, the duplicate rows and extra variables in the software are checked.

Clustering
For analyzing transportation data, the first step is clustering and classification offload data.Then the model k-means clustering, is used.First, the number of parameters is entered manually, and the optimal number of clusters is determined with the evaluation parameter of Davis Boldin.In the analysis step, the optimal number of sets is evaluated with the output of the central table.

Extracting Data of Association Rules
The process of discovering the dependency rules is one of the fundamental approaches in modern data mining science to find directions and patterns in the database that researchers have highly regarded.In the data extraction step, the objective is to generate association rules from the data set to extract the relationships between features in Transportation.
The basic assumptions are as follows: Consider the set I = I1, I2, I3 as a set of items.
Consider set D as a set of data that is, database transactions so that each transaction contains a collection of items; that is, each transaction T is a subset of I (T⊆I).Each transaction has an identifier called a TID.If A is a set of items, we say that transaction T contains A if and only if A⊆T.
An association rule is a statement in the form A⇒B, while A⊆I, B⊆I, and A∩B = ∅.Criteria for the acceptability of association rules are two important parameters of the support and confidence of the rules.
The A⇒B rule in the set of transactions D has support equal to s if s percent of transactions D include A∪B, and this rule has the same reliability as c, if c percent of the transactions that contain A, also include B. In other words: Rules that have a low Support or min-sup limit and low confidence or min-conf limit are called strong association rules, and the objective of all algorithms is such rules (or, as will be seen in some cases, a subset of these rules).For example, the following rule states that 70% of those who buy a TV and a video player and 80% of all transactions involve both the TV and the video player.

4.3.Data Extraction (Classification Algorithms)
After the classification step and the classification of the load data and analyzing the clusters, collective tree classification algorithms such as random forest and Naive Bayes Classifiers probabilistic method are used.The results and output rules are compared.training and examination.In the simple validation method in all stages of implementation, for example, 70% of the training data set and the rest can be used to test and calculate the indicators.Still, more innovatively, the cross-validation method was used, which was also used in this study, to make the whole data set both in training and in the test.In this type of validation, the data is subdivided into K subsets; of these K subsets, one is used for confirmation and the other K-1 for training.This procedure is repeated K times, and all data is used exactly once for training and once for validation.Finally, the average result of this K validation load is chosen as a final estimate.

Criteria for Evaluating Classification Algorithms
This matrix shows how the classification algorithm works according to the input data set by classifying the classes of the classification problem.According to Table 6, the concepts of True Positive-TP, False Positive-FP, True Negative (TN), and False Negative(FN) are as follows.False Positive: This value indicates the number of records whose actual class is negative, and the classifier has incorrectly identified their class as positive.
True Negative: This value indicates the number of records whose actual class is negative, and the classifier has correctly identified their class as negative.
False Negative: This value indicates the number of records whose actual class is positive, and the classifier has incorrectly identified their class as negative.
The essential types of evaluation criteria for classification algorithms are now described.The most critical measure is the accuracy or rate of classification, which means that the classifier has correctly classified the percentage of the test records.This accuracy is calculated based on the concepts of the matrix according to the following relation: Next, based on the definition of the confusion matrix and the parameters of total accuracy evaluation, the results are reported from the software output and then analyzed.
Finally, with the conclusion and analysis of clustering, the strategies in Transportation will be reported.

Analysis of Total Accuracy Data Mining Process Results
In this section, based on the Crisp process model, the steps of data comprehension, data visualization, data preparation, and wiping are described by running in the software.
Then the clustering and classification of the results are reported in the form of a table.Results of Optimal Cluster Analysis and the central plain of clusters are described at the end.Then, in the classification section, the two random forest and Naive Bayes methods are applied to the clustered data set.The accuracy of the models is reported and then evaluated.
The statistical information of the data set according to Figure 4 is as follows: According to Figure 4 in the statistical information section, the first column is the name of the data.the second column is the data type.the third column is the number of missing data of each variable.the fourth and fifth columns are maxim and minimum of numerical variables and the minimum and maximum values of discrete variables and the last column shows the mean and standard deviation of numerical variables and the mode value of discrete variables.There are only two rows of missing data in the supply column.The demand date variable is the solar data type, considered discrete variable in the software.The amount of supply and demand ranges from 1 to 18, and the average is 1.698.Loading tonnage is defined from 4900 kg to 487070 kg.According to Figure 6, 38% of the destinations are frequently to Van, Turkey, 13% to Ankara, 10% to Mersin, 0.06% to Gaziantep, 0.06% to Muratbey, 0.05% to Konya, and the rest to Adana, Izmir, and other border cities of Turkey.
Estimated transportation time is defined as a minimum of 3 days to 9 days and an average of 4.61.
Transportation time is defined from the day of demand to unloading from a minimum of 3 days to 12 days.
The weather condition is defined in 232 lines in intemperate mode and 1156 lines in temperate mode.
46% of loads are in truck mode and 53% in rail mode.96% of loads do not need a license for Transportation.
Information is defined from 2017 to 2019.The maximum range of demanded tonnage and loaded tonnage is defined for truck transportation up to an average of 475,000 and for rail transportation only from 25,000 to 50,000 kg.
For rail transportation, only the city of Sahlan is the origin of Transportation, and other loadings are defined in truck transportation in other cities.
In rail transportation, only the cities of Van, Gaziantep, Mersin, and Iskenderun are transportation destinations, and the rest are related to truck transportation.
The maximum transportation time is from the day of demand to unloading, in a time interval of three days to 12 days for truck transportation and rail transportation up to an average of 5 days and a maximum of 8 days.
The highest transportation cost is related to truck transportation up to 1,500,000,000 Rials, and the difference in cost between rail and truck transportation is noticeable.

Pre-processing and Preparation of Data 5.1.1. Additional Variables
In the transport data analysis section, the first step in the clustering section is the need for numerical and continuous descriptive variables.Due to estimated time variables and the time interval between Transportation and unloading, the demand date feature, which is of the solar type, is left out of the data in modeling and extracting additional knowledge.

5.1.2.Deleting the Lost Value
As mentioned in the data description section, only two rows of data are lost in the supply column.One way is to delete a row with missing data.The disadvantage of this method is the removal of useful information in other features that have a value.One of the best ways is to replace the lost weightwith the average value of the same column to maintain the standard deviation of the data.Therefore, in the software, the missing data alternative for the numerical supply variable is used.

Results of K-Means Modeling (clustering)
In this step, the K-Means clustering algorithm is applied to the wiped data.The number of k (clusters) is simulated and evaluated based on Davis Boldin parameter in each step.The optimal number of clusters is based on the evaluation value and the analysis of the results section.
For clustering, the data should be converted to a numerical value The results of the Davis Boldin evaluation parameter for the number of clusters 3 and 4 are reported and analyzed in the comparison section The Davis Boldin evaluator reports the distance between clusters to the distance between clusters and obtains a number between zero and one.In this case, a successful model is when the samples in one cluster are closer and denser to each other and farther away from the samples in different clusters.It means that the closer Davis Boldin is to zero, the more successful the model and the more optimal the number of clusters.According to Table 4, the lowest value of the evaluator is with 4 clusters and a K-means algorithm.Therefore, in the next section, the results of 4-means clustering for classification are reported

K-means Clustering Analysis
In this step, according to the software results to analyze the clusters, the number of samples in each cluster is discussed.According to Figure 8, the number of samples is determined separately by clusters.In cluster zero, there are 1180 items, equal to most of the samples; in cluster 1, there are only four items; in cluster two, there are 24 items, and in cluster three, 180 items.The following output (Figure 9) also indicates what cluster each load transaction is in.This table with the 4-means method is as Table 7   7, the analysis of each cluster is as follows: Cluster 1 (0.28%): loads transported in this group are the most expensive load in terms of transportation features compared to other clusters (up to 1 billion more), and the average loading tonnage is the highest, as well as the highest amount of supply and demand, is in this group.
100% of loads of this group do not require a license, and the type of Transportation of all transactions is 100% truck type and does not include rail transportation.The origin of all loads is Qazvin city and 75% to Adana city and 25% to Murat Bay city.The type of load is 75% rebar and 25% zinc ingot.The maximum transportation time from the day of demand to unloading is in loads of this cluster.Only four items with very different and unique features are in this group.
Cluster zero (frequency of 85.01%):The most loads (1180 items) are in this group, which on average, the lowest cost compared to other clusters and the shortest transportation time from the day of demand to the day of unloading are for loads of this group.99% of rail loads are in this cluster and other groups, except for a tiny percentage in cluster 3, there are not rail loadings.But also, there is 37% truck loading.By far, the lowest amount of supply and demand is in this cluster.The origin of cities is 62% from Sahlan, and 26% from Zanjan and the rest with a small percentage from Qazvin, Tabriz, and Yazd and there are destinations in almost all cities, but the place with most unloading is in Van, Turkey (42%).37% of the load type is the container, and 22% is zinc ingot.
Cluster Two (1.72%):After Cluster 1, relatively high transportation costs are related to loads of this group.The highest loading tonnage after cluster 1 and the highest amount of supply and demand with an average of 6.1 is for items in this group.0.05% of the loads often need a license for Transportation in this group, and 40% of the Transportation is in intemperate weather conditions.While in other clusters, most of the transportations were in temperate conditions.41% of the loads were sent from Rasht, 29% from Qazvin, and 20% from Qazvin, and 33% of destinations were to Van and the rest to Adana and Ankara.100% of the loading was of truck type.
Cluster 3 (13.07%):transportation costs in this group are average and less than clusters 1 and 2 and more than cluster zero.The difference between the index of this group and other groups is the difference in the estimated transportation time and the transportation time from the day of demand and the day of unloading, which is more than other groups.Loading cargo, supply and demand in this cluster is average, and 0.06% of the loads need a license.23% have been in intemperate weather conditions.0.06 of the loading is often of the rail type 56% of the loading is sent from Zanjan to all Turkish cities on average.

5.3.Effective Variables in Load Analysis
In the discussion of analysis and classification of clusters, the deviation diagram (Figure 10) examines the factors affecting the classification of loads in the transportation system.In this figure, a unique color line is provided for each cluster; which specifies the average data scores for each column in the corresponding cluster.The low score of the blue cluster (cluster zero) shows the load with the lowest cost.
The red line is related to cluster 3 (medium cost loads and average supply and demand).
The green line is related to cluster 2 (loads with relatively high cost(. Indigo line has the highest score related to cluster 1 (highcost loads and high cargo, and so on).
However, the principal analysis of this diagram is separating the lines from the transportation time and transportation cost.The loading tonnage shows that these three factors are more effective in separating the clusters relative to the factors of weather conditions, type of load and origin, and destination.

Modeling (Association Rules)
To generate association rules and frequent items, discrete and binary features are selected with the fp-growth algorithm in Rapid Miner to generate frequent rules and articles that are repeated together, and the results are as follows: 1. 53%of the transportation system originated from the city of Sahlan, and 38% of the destinations where the city of Van.
2.46.5%of the loading of truck transportation type.
3. 32%of the load transportation system was container type, and 24% was zinc ingots.
4.In 34% of Transportation, the origin was from Sahlan, and the destination was the city of Van.
5.In 32% of Transportation originating from Sahlan city, the type of load is the container.
6.In 30% of trucks, the loading was zinc ingots.7.In 30% of the truck transportation, the origin was Zanjan city.
8. In 32 % of destinations to the city of Van, the loadingwas of container type.9.In 32% of the origin was from Sahlan, Van city's destination, and the type of load was the container.
With confidence = 70% of the association rules; If-then is as follows 2.With 83.5% confidence, if the destination of the loading is the city of Van, then the type of load is the container, and the origin is the city of Sahlan.
3. With 89.5% confidence, if the destination of the loading is the city of Van, then the origin is from the city of Sahlan.4 .With 93.3% confidence, if the destination is the city of Van and the origin is the city, the type of load is a container.
5.With 100% confidence, if the origin is the city of Zanjan, the type of load is truck.
6 .With 100% confidence, if the load type is ingot zinc, the transportation type is truck.
7.With 100% confidence, if the load type is the container, the destination is the city of Van.
8.With 100% confidence, if the load type is the container, the destination is the city of Van, and the origin is the city of Sahlan.
9. With 100% confidence, if the load type is ingot zinc, the origin is Zanjan, and the transportation type is truck.

Naive Bayes Modeling Results (Data Extraction)
In this section, two classifiers of the Naive Bayes Classifiers probabilistic method and the collective tree of random forest method are applied to the data set with labeled clusters.Then the results of each class are being reported.
The data set (cluster output) is divided into training and test data into ten steps by cross-validation method.And the label variable (objective) is considered a cluster, a multi-value discrete variable (Nominal).
After dividing the samples, the Naive Bayes is used for classification.
On the right side of the software, the accuracy parameters specified in the output are identified.The results of parameters of total accuracy and recall are as follows after implementing: Accuracy: 89.77% The result of the confusion matrix is as follows: The data set (cluster output) is divided into training and test data into ten steps by cross-validation method.And the label variable (objective) is considered a cluster, a multi-value discrete variable (Nominal).
After dividing the samples, the Naive Bayes is used for classification.
On the right side of the software, the accuracy parameters specified in the output are identified.The results of parameters of total accuracy and recall are as follows after implementing: Accuracy: 89.77% The result of the confusion matrix is as follows: Class accuracy of cluster 1: 87.50%

Random Forest Results
Random tree forest generates many decisions.To classify, it places a new object from an input vector at the end of each random forest tree.
Each tree gives us a classification, and it is possible to say that this tree "votes" for that class.The forest chooses the sort that has the most votes (among all forest trees).Hence the number of samples that are actually in cluster 0 and are mistakenly predicted in cluster 1 : 0 samples Hence the number of samples that are actually in cluster 0 and are mistakenly predicted in cluster 2: 0 samples Hence, the number of samples that are actually in cluster 0 and are mistakenly predicted in cluster 3: 3 samples b) Cluster accuracy 3: 91.67% Hence the number of samples that are actually in cluster 3 and are correctly predicted: 165 samples Hence the number of samples that are actually in cluster 3 and are mistakenly predicted in cluster 0: 15 samples Hence the number of samples that are actually in cluster 3 and are mistakenly predicted in cluster 2: 0samples.
Hence the number of samples that are actually in cluster 3 and are mistakenly predicted in cluster 1: 0 samples c) Cluster accuracy 1: 100% Hence the number of samples that are actually in cluster 1 and are correctly predicted: 4 samples Hence the number of samples that are actually in cluster 0 and are mistakenly predicted in cluster 1: 0 samples Hence the number of samples that are actually in cluster 0 and are mistakenly predicted in cluster 2: 0 samples Hence the number of samples that are actually in cluster 0 and are mistakenly predicted in cluster 3: 0 samples d) Cluster accuracy 2: 54.17% Hence the number of samples that are actually in cluster 2 and are correctly predicted: 13 samples Hence the number of samples that are actually in cluster 2 and are mistakenly predicted in cluster 1: 0 sample Hence the number of samples that are actually in cluster 2 and are mistakenly predicted in cluster 3: 8 samples Hence the number of samples that are actually in cluster 2 and are mistakenly predicted in cluster 0: 3 samples

Analysis of Results
The results of two decision tree and random forest models at the clustering output are summarized for analysis in Table 8: Looking closely at the results, it is evident that the accuracy of the collective random forest method, which uses ten decision trees for prediction, is 91.97% higher than the Naive Bayes probabilistic method.Because the ratio of samples in different clusters is very different and imbalance is observed in clusters, the accuracy of the models varies greatly in clusters.In the Naive Bayes method on cluster 1, which has only four items, there is low confidence of 50%.
But in other clusters, there is an average of 81%, which has a 15% error more than random forest.In general, in collective methods, due to the division of the data set, they are more accurate on imbalanced data and improve the results.

Data Analysis of Random Forest Trees
The decision tree is one method whose output is examined as analyzable rules, so it is a white-box method.And compared to black-box algorithms such as neural networks, the results can be studied as "if-then" rules.The relationship between factors can be analyzed in the field of Transportation.Some of the random forest tree models are mentioned in the classification section as follows:

Data Extraction Analysis of results
The collected data set includes 1388 data related to the Transportation of various products of a transportation company in multiple types of rails and road.It exports products including zinc ingots and aluminum ingots to Turkey.The data consists of 14 numerical and discrete columns.The features are the type of origin and destination, type of load, transportation cost, estimated transportation time and actual time, kind of Transportation, amount of supply, demanded tonnage, and loading cargo.The following information was extracted by describing statistical information and frequency charts: The amount of supply and demand is from 1 to 18, and the average is 1,698.
Loading tonnage is defined from 4900 kg to 487070 kg.53% of Transportation from the original unit is defined from Sahlan city, 30% from Zanjan.The rest have a much lower percentage from Qazvin, Isfahan, Yazd, Qom, Tehran, Ilam, Salafchegan, and Arak.
Estimated transportation time is defined as a minimum of 3 days to 9 days and an average of 4.61.Transportation time is determined from the day of demand to unloading from a minimum of 3 days to 12 days.46% of loads are in truck type and 53% in rail type.96% of loads do not require a license for Transportation, and the information is defined from 2017 to 2019.
The maximum range of demanded tonnage and loading tonnage is defined for up to an average of 475,000 truck transport and only from 25,000 to 50,000 kg for rail transport.
For rail transportation, only the city of Sahlan is the origin of Transportation, and other loads have been defined in truck transportation in other cities.
In rail transportation, only the cities of Van, Gaziantep, Mersin, and Iskenderun are destinations, and the rest are related to truck transportation.
The maximum transportation time from the day of demand to the day of unloading is three days to 12 days for truck transportation is up to an average of 5 days and a maximum of 8 days for rail transportation.
The highest transportation cost is related to truck transportation up to 1,500,000,000 Rials, and the difference in cost between rail and truck transportation is noticeable.
In the wiping data step, there are only two rows of missing data in the feature of the amount of supply, which instead of deleting rows that delete essential information from other columns, data substitution with the mean value of the column is used.
The variable of demand date is of solar-type, and due to the existence of time interval from the day of demand until the day of loading, there is no requirement to this feature and removed.

6.2.1.Data Extraction (clustering)
to optimize the number of clusters and the best amount of clustering, the Davis Boldin evaluator and clustering concepts were obtained; the smaller this evaluator, the more efficient the clusters, which means that the distance within clusters is less and the space within clusters smore.Based on Figure 5-1, there were the best clustering by K-means with k = 4.  • Cluster zero (the lowest cost with the most rail loads): the highest number of loads (85%) which on average had the minimum amount in terms of transportation cost, loading tonnage per kilogram, amount of supply and demand, and the maximum transportation time from the day of demand to unloading.
Most of these loads are mainly of rail type, 37% of which are containers and the rest are zinc ingots, is from the city of Sahlan, and on average, loads are sent to all cities in Turkey.
• Cluster 2 (relatively expensive truck loads in intemperate weather conditions): After cluster 1, the highest transportation cost, highest loading tonnage after cluster 1, and the highest amount of supply and demand with an average of 6.1 are for items in this group with 24 samples.
The difference between this cluster and Cluster 1 is that 40% of the loads are in intemperate weather conditions, mainly sent to Van,Adana, and Ankara.
• Cluster 3 (average indicators in terms of Transportation with the most extended transportation time interval): Transportation costs in this group are average less than clusters 1 and 2 and more than cluster zero.The difference between the index of this group and other groups is the difference in the time interval between the estimated transportation time and the transportation time from the day of demand and the day of unloading, which is more than other groups.In fact, on average180 loads, mainly of zinc ingot type, have been sent to all Turkish cities from Zanjan.
In the data visualization section, the deviation diagram is used.By analyzing this diagram, it was found that transportation time, transportation cost, and ,loading cargo and weather conditions have a more significant impact than other factors in the separation of type of Transportation.

Data Extraction (Association Rules)
First, the repetitive patterns are created with fp-growth method and minimum confidence of 70%, and based on the repetitive patterns; nine association rules are extracted on quality indicators, which are as follows: 1.With 83.5% confidence, if the load destination is the city of Van, then the type of load is container type.
2. With 83.5% confidence, if the destination of the load is the city of Van, then the type of load is of the container type, and the origin is from the city of Sahlan.
3. With 89.5% confidence, if the destination of the load is the city of Van, then the origin is from the city of Sahlan.
4. With 93.3% confidence, if the load destination is the city of Van and the origin is from the city, the type of load is the container.
5. With 100% confidence, if the origin is the city of Zanjan, the transportation type is truck transport.
6.With 100% confidence, if the type of ingot load is zinc, the transportation type is truck transport.
7. With 100% confidence, if the type of load is the container, the destination is the city of Van.
8. With 100% confidence, if the type of load is the container, the destination is the city of Van, and the origin is the city of Sahlan.9.With 100% confidence, if the load type is ingot zinc, the origin is Zanjan, and the kind of Transportation is truck transport.

6.2.3.Data Extraction (Classification)
The following results are reported from the analysis and comparison of the accuracy of the classifiers in Figure 21: Looking closely at the results, it is that the collective random forest method, which uses ten decision trees for prediction, with 91.97% accuracy, has higher accuracy than the Naive Bayes probabilistic method.Then, the ratio of samples in different clusters is very different, so the imbalance is observed in clusters.The accuracy of the models varies greatly in sets.In the Naive Bayes method, cluster 1, which has only four items, has low confidence of 50%.But in other clusters, there is an average of 81%, which has a 15% error more than random forest.Due to the imbalanced data, the rule generation, white box, and hybrid random forest method were used.This method balances the data due to the division of the data set between 10 decision trees and reports accuracy on all classes with a very high value.The transportation cost variable being a parent in most trees indicates the impact of this variable on clustering.Some of the generated and valuable rules of random decision trees and forests are as follows: 1.If the transportation cost is more than one billion, the loads are in cluster 1 (high-cost, high-demand truckloads).
2. If the transportation cost is less than 173 million, the load is in the cluster with low cost and low rail demand.
3. If the transportation cost is more than 519 million, it is in cluster 2. If transportation cost is less than 519 million and more than 200 million, it is in cluster 3. (Direct impact of transportation cost feature).
4. If the loading cargo is less than 363165 kg, it is in cluster zero, and if it is more than 383360 kg, it is in cluster 3.
5. If the demanded cargo is less than 58750 kg and the transportation time from the day of demand to unloading is less than 11 days, it is in cluster zero.
6.If the demanded cargo is more than 203,000 kg and the transportation time is more than five days, it is in cluster 2.
7. But if the cargo is more than 228,000 and the transportation time is more than seven days, it is in cluster 1.
8. If the amount of demand is less than 2.5, it is in cluster zero.

Conclusions and Future Suggestions
In this paper, three data mining techniques of clustering, association rules, and classification were used to investigate the factors affecting the cost and time of road and rail transportation and data extraction.In the first step, by clustering method based on the K-means distance of 1388 transactions, road and rail transport were divided into four optimal clusters.The results showed that by far, transportation cost, cargo, and the amount of supply and demand features are the most practical features in cluster separation.The weather and type of transport and kind of load had a negligible effect on the classification of transport loads.In clusters zero and one, transportation costs and supply and demand were significantly different from other groups.In cluster 2, the intemperate weather conditions, and cluster 3, the time intervals from the day of demand to unloading caused the difference between a cluster and other clusters.In the association rules phase, nine rules with confidence above 85% were reported on the type of load and destination's qualitative variables.In the classification step, two Naive Bayes (probabilistic and collective random forest methods) were used on the clustering output.
The results showed that the random forest with a total accuracy of 97.91% has much more accurate results than Naive Bayes for cluster classification.This method also proved the importance of collective (voting) algorithms to improve the accuracy of imbalanced data.With random forest trees, eight proper rules were extracted.The rules proved that in the parent of most trees, transportation cost and loading tonnage are the criteria for comparing clusters and tree generation, which confirmed the importance of these factors in the road and rail transportation system classification.
In general, if researchers want to propose development on the results of this study in the future, they can consider the following: 1) using other clustering methods such as SOM (self-organizing map)and fuzzy clustering.2) Using methods to optimize the number of clusters, such as meta-heuristic genetic algorithm and particle swarm optimization.

Figure 2 :
Figure 2: Model of cumulative algorithms animals, or commodities from one place to another.In other w to implement the classes, we have to divide the data into two categories: training and test.(Figure 2) Using the crossvalidation method, we divide the data set into two classes:training and examination.In the simple validation method in all stages of implementation, for example, 70% of the training data set and the rest can be used to test and calculate the indicators.Still, more innovatively, the cross-validation method was used, which was also used in this study, to make the whole data set both in training and in the test.In this type of validation, the data is subdivided into K subsets; of these K True Positive: This value indicates the number of records whose actual class is positive and the classifier has correctly identified their class as positive.

Figure 5 :
Figure 5: Frequency of origin cities in Transportation

Figure 6 :
Figure 6: Frequency of destination cities in Transportation

Figure 7 :
Figure 7: Replacement of missing data

Figure 13 :
Figure 13: Results of Naive Bayes accuracy and confusion matrix

Figure 19 :
Figure 19: Comparison of clustering evaluator with Davis Boldin In the following, the results of K-means clustering classification are examined in terms of the number of samples per cluster, the central mean table of clusters; and the clusters are analyzed in terms of transportation concepts; Figure 20.

Figure 20 :
Figure 20: Frequency chart of transported loads • Cluster 1 (costly truckloads) with only four items had the maximum amount in terms of transportation cost, loading tonnage per kilogram, supply and demand, full transportation time from the day of request to unloading were placed separately from other clusters in a group.All truckloads were sent from Qazvin to Adana, Turkey, and Murat Bay.

Table 5 :
Data and their type

Table 3 :
Comparison of two clustering methods

Table 7 :
Central table of clusters based on numerical and qualitative indicators of Transportation

Table 8 :
Classification Results