HyperETA: An Estimated Time of Arrival Method based on Hypercube Clustering

The Estimated Time of Arrival (ETA) that predict the travel time of a given GPS trajectory has been extensively used in route planning. Deep learning has been widely applied to ETA prediction. However, prediction tasks involve some challenges, such as small data size, low GPU’s precision, high training loss, and low accuracy. Herein, we present a new machine-learning algorithm called HyperETA for ETA prediction. HyperETA is based on a extraordinary clustering method, called Hypercube Clustering. We conducted experiments to compare HyperETA with a deep-learning-based method called DeepTTE by using taxi trajectories as a benchmark. Two variations of both methods were evaluated. The results indicated that HyperETA outperformed the deep-learning approach in terms of prediction accuracy.


Introduction
Deep learning, a relatively new paradigm in artificial intelligence, has attracted substantial attention from the research community owing to its remarkable potential compared with traditional techniques. Compared with traditional machine learning (ML) methods, deep learning can be used to model sophisticated functions through layers of nonlinear transformations trainable from the beginning to the end. Methods based on deep learning models have considerably advanced the state of the art in diverse domains.
The success of deep learning has attracted the attention of researchers from different domains, who have explored its use for solving various problems. The problem considered in this paper is the estimated time of arrival (ETA) for a given path, which is a classic problem pertaining to travel planning, car navigation, and traffic dispatch. Numerous variables must be considered to predict the ETA accurately, and scholars have investigated different methods for improving the average accuracy of ETA predictions.
Several deep-learning-based methods have been proposed for solving the ETA problem [21] [19] [11] [16][5] [6]. A seminal method is DeepTTE [21], which uses historical trajectories as training data and predicts the ETA of a given trajectory. DeepTTE is a hybrid model [3] that integrates recurrent neural network and convolutional neural network (CNN) models. It has been cited by many researchers, used for comparison, and extended.
We present a novel machine learning algorithm called HyperETA for predicting the ETA of a given trajectory. HyperETA is based on a hypercube-clustering method [9], [10], which measures the similarities among the data of various trajectories. In a previous study, hypercube clustering outperformed other algorithms such as TraClus [12] and grid clustering. The primary concept of hypercube clustering is to represent trajectories by using hypercubes, a type of data structure that is more robust against noise in trajectory data. In the training phase, Hyper-ETA builds a trajectory model from historical trajectories. During the prediction phase, a given trajectory is transformed into hypercubes; the trajectory model is used to estimate the hypercubes' times, and the ETA is computed by summing up these times. Furthermore, in the proposed method, the parameters are set automatically, thus eliminating the need for guesswork or trial and error. Users can adjust the parameters by using metaparameters.
We conducted experiments to compare HyperETA with a deep-learning-based method called DeepTTE by using Cheng-Du taxi trajectories as a benchmark. The results demonstrated that HyperETA outperformed DeepTTE in terms of prediction accuracy.
The primary contributions of this study can be summarized as follows: 1. We designed a new ETA method based on hypercube clustering [9][10] to increase the accuracy of ETA prediction. 2. We designed an algorithm for setting parameters automatically. 3. We conducted experiments using a real-world dataset comprising GPS points generated by taxis in Cheng-du. The percentage error of the proposed method on this dataset was significantly lower than that of existing methods. 4. We empirically compared HyperETA with DeepTTE, a popular deep-learningbased method. 5. We discuss a few problems associated with deep learning, including data size, GPU's precision, training loss, and low accuracy.
The remainder of this paper is organized as follows. Section 2 introduces the related work, including DeepTTE and hypercube clustering. Section 3 presents the HyperETA algorithm. Section 4 describes the experiments and results. The final section presents our conclusion and an outline for future work.

Deep Learning Methods
Deep neural networks (DNNs) are considered the state-of-the-art among ML techniques [15]. DNNs have been applied in many domains [20]. For example, NVIDIA's researchers used DNNs in hand gesture recognition [13]; they used their superadvanced machine, four super GPU cards, and the latest software they developed for deep learning.
Training DNNs to obtain a good model involves stringent requirements. DNNs often require vast sources of data to learn appropriate abstractions [20]. Consequently, massive computing power is required for data processing. Accordingly, expensive GPUs and compatible mainframes are required. Such high levels of computing power are essential for tuning the parameters that define deep learning models, such as learning rates, number of neural network layers, and network architecture. The parameters are determined intuitively and through trial-and-error approaches; they must be determined as fast as possible.
One limitation of DNNs is their high level of opacity [15]. The multilayer mathematical neuronal structure of DNNs makes it difficult to interpret them (loosely defined as the science of comprehending what a model does [7]) and explain why certain inputs lead to certain outputs. Therefore, DNNs are typically executed as black boxes [22] due to this lack of transparency.
Several studies have used deep learning to forecast travel times. For example, [21] used geo-convolutional neural networks in DeepTTE. DeepTTE has been cited by many researchers, in addition to being compared with and extended [19] to predict bus travel times.
Graph convolutional neural networks (GCNNs) have been widely used to predict ETA [11] [16][5] [6], where "Graph" represents the required roadmaps. All of these methods accept spatiotemporal data. In addition, a GCNN-based origin-destination method [11] was proposed for ETA estimation. An origin-destination method is not concerned with paths. Instead, it aims to determine ETA only depends on two points. In [16], deep learning techniques such as GCNN, LSTM (Long Short-Term Memory), and GRU (Gated Recurrent Unit) were used to predict ETA. In [5], a hybrid spatiotemporal GCNN was used to predict ETA and possible congestion. The experimental dataset was not published online. In [6], a multilayer perceptron (MLP) was used to predict ETA.
A GCNN requires road network information to reduce the ETA to some graph problem, such as TSP (Travelling Salesman Problem). Such a requirement may not be achievable in many applications. The present study focused on methods that do not use road network information. Therefore, GCNN methods were not considered in this study.
For most of the papers published thus far, the software has not been published online, and specifications of the computer language and hardware environment have not been provided. In some of these studies, the experimental data used are not freely available. Therefore, these data could not be involved in our experiments. Some of the works require roadmaps, which is not the focus of the proposed method.

Non-Deep Learning Methods
Trajectory clustering methods entail finding similar trajectories; hypercube clustering is one of such methods. A trajectory may be presented as a series of GPS waypoints or a series of handwritten position marks. By design, methods based on curve similarity, such as dynamic time warping (DTW) [8], should not be applied to scenarios involving multiple common subtrajectories. DTW can be used to measure the shape similarity between two trajectories; however, it cannot be used to find common subtrajectories. The computational cost of this approach is high because of the direct processing of GPS points without the use of a points-reduction method. Moreover, DTW is easily influenced by noisy data, resulting in large Euclidean distances and eventual breakdown. Subtrajectory clustering [12] divides a trajectory into line segments; however, the time cost of finding common subtrajectories can be high under conditions that involve intricate lines. Most of these methods consider only spatial data; that is, they disregard spatiotemporal data [4]. TraClus [12] is a famous classical method in trajectory clustering [1][2] [18] and processes spatiotemporal data well. TraClus was specifically designed for trajectory clustering and widely used for the purpose. However, hypercube clustering [9] outperformed TraClus. HyperETA is based on hypercube clustering.

ARCHITECTURE OF HYPERETA
In this section, we describe the architecture of HyperETA. Fig. 1 presents a schematic of the architecture, including the training process and the prediction process. In the training process, the trajectory model and parameters are determined. In the prediction process, the ETA of a given trajectory is predicted. The dotted lines represent that the connected components are identical.
The following subsections describe the three essential steps of the proposed method, namely parameter determination, preprocessing, hypercube intersections, and ETA prediction. We designed Algorithm 1 to automatically find the parameters for the preprocessing and ETA prediction steps. Users can input a sequences of latitude or longitude or time to obtain a output, y or x or τ , respectively.
where γ and m are metaparameters. The default values of γ and m were 0.8 and 5, respectively, representing 80% of hypercubes that involve at least five points.

Preprocessing
A trajectory is transformed into a hypercube sequence such that each hypercube includes a part of the trajectory. The trajectory points are sequentially scanned and segmented based on the cube range ( x , y , τ ). The position of a cube is the average of internal points. The direction of a cube is computed according to the first and last points. Direction is the fourth dimension that transforms cubes into hypercubes. In other words, a hypercube is a cube with direction. If we visualize the cubes that transform from a trajectory, it will look like The direction of Hypercube c is represented by angle c.θ, the definition of which is 2, and p 1 and p n are respectively the first and last points inserted into c.
Furthermore, the preprocessing step helps prevent some fundamental problems. For a non-fixed-point interval trajectory, this step transforms the points into fixedrange hypercubes. Moreover, it can reduce computational costs by reforming the basic units of computing.

Hypercube Intersections
This step checks the intersection relation between two Hypercube sequences in four dimensions. It will be performed at the next step, ETA prediction. First, the algorithm checks the intersection between two sequences in temporal space. If there is no overlapping in temporal space, there will be no intersection, and the checking only costs constant time. Further examination begins at a temporal intersection if there is one. The examination includes temporal intersections, geographic intersections, and similarity in the relative direction of Hypercubes. The average time complexity with this step is O(n), where n is the number of Hypercubes.
Algorithm 3 shows the Hypercubes-Intersection method. If there is no temporal intersection, an empty set is output and only cost constant time complexity. The input is two sequences, C a and C b (in temporal order) and the output is information pertaining to the intersection. Line 1 check whether C a and C b have intersections in the time domain.  Only Hypercubes with temporal intersections are checked for geographic intersections and having a similar direction. LINE 3 is used to check two Hypercubes( they have a temporal intersection). f (c a , c b ) is Equation (3). For direction checking, we define that two sub-trajectories have no intersection relation if their difference in angle exceeds threshold φ. LINE 4 keeps the relation in E AB .

Algorithm 3 Hypercubes-Intersection method
The output is information related to Hypercube intersections, which are kept in a bipartite graph data structure, G AB = (A, B, E AB ), where each vertex in A is a Hypercube in c a , each vertex in B is a Hypercube in c b , and E AB = {(c a , c b )|c a ∈ C a and c b ∈ C b }. Each edge in G AB represents a pair of common sub-trajectories; because with proper normalization, it can be shown that two sub-trajectories satisfy the definition of common sub-trajectories.

ETA Prediction
This step is based on the techniques proposed in the previous subsections. Algorithm 4 is the primary method for predicting ETA. First, the input is a given hypercube series and a trajectory model (TM). The model includes a hypercube series, original trajectories, and a mapping table that maps the hypercubes to the original trajectories. The remaining inputs x , y , and τ are determined by Algorithm 1. The input θ is the angle difference between two hypercubes that can be tolerated by Hypercube Intersection(). The output is the predicted ETA of C given . Lines 2-6 predict the ETA of each given hypercube. In Line 3, the Hypercube Intersection() function is Algorithm 3. In Line 4, we further apply DTW [17] to find the most similar subtrajectory in the set C to increase accuracy. The next GPS point on the subtrajectory is counted as well. The time cost of the subtrajectory is accumulated into TotalTime.

Experimental Design
Data Description We evaluated the proposed method on a dataset of taxi trajectories in Cheng-du [21] . This dataset was determined to contain 19,400 trajectories (712 000 GPS records) of 4565 taxi drivers in August 2014 in Cheng-du, China. The shortest trajectory contained only 15 GPS records (2.5 km), and the longest trajectory contained 119 GPS records (40 km). Moreover, we found a few abnormal GPS subtrajectories in the training and testing data during the experiments. These abnormal data were removed from the test data in the experiments.
We compared the following methods in our experiments: 1. DeepTTE, which uses a deep learning framework, PyTorch, to predict ETA. 2. DeepTTE-GPU, which uses GPUs(graphics processing units) to accelerate the computational speed, which is common in PyTorch. 3. HyperETA-noDTW, which is the first HyperETA before the application of DTW. 4. HyperETA, which is the primary method developed in this study.
Criteria We derived mean absolute percentage error (MAPE) values to evaluate the prediction results. MAPE expresses the accuracy as a ratio, as presented in (4): where A i is the actual value and F i is the forecast value. Mean absolute error (MAE) and root mean square error (RMSE) were also used in the experiments. MAE is an arithmetic average of the absolute errors; that is, the average error between the actual value and forecast value. Regarding RMSE, large errors emerge from small errors.
Lab Environment The software and hardware environments in which the experiments were performed are outlined as follows: -CPU: Intel Core i7-4790 3.6GHz Meta-parameter setting In this experiment, the metaparameters γ and m were set to 0.8 and 5, respectively, for Algorithm 4 of HyperETA. The setting represents that most of the hypercubes, approximately 80%, would involve at least five GPS points. θ was set to 10 deg.
The software used in the experiments has been released on Github, including HyperETA 1 and DeepTTE 2 .

Accuracy Comparison
The experimental results are listed in Table 1.
The Test Data columns present the data obtained by the algorithms. Hy-perETA significantly outperformed DeepTTE and DeepTTE-GPU on these data. In HyperETA, the MAPE rate was only 20.52%; moreover, this algorithm had the most favorable RMSE and MAE values. HyperETA outperformed HyperETA-noDTW, indicating that HyperETA coupled with the DTW technique increased accuracy. The error rate of the deep-learning-framework-based DeepTTE was 33.63%. The error rate of DeepTTE-GPU, which uses a GPU instead of a CPU, was higher at 39.12%.
The Training loss columns present the data predicted by each algorithm for model training. Intuitively, the results should be perfect with near-zero error because the data were already learned. However, DeepTTE yielded distinct results, with a significant error rate of 16.88%. For DeepTTE-GPU, the error rate was terribly 25.75%, which was worse than that of the CPU version. HyperETA and HyperETA-noDTW yielded normal results, and their error rates were only 3% and 3.14%, respectively, which were considerably lower than those of DeepTTE and DeepTTE-GPU. The RMSE of HyperETA with training data was slightly higher than that of DeepTTE, but the MAE was smaller, similar to the other experimental values. This indicates that HyperETA exhibited some differentials in predicting long trajectories. However, the differentials occupied only small parts of the long trajectories, explaining why the MAPE of HyperETA was considerably more outstanding than those of the other techniques.
Distance Fig. 4, Fig. 5, and Fig. 6 present a comparison of the model performance levels for trajectories of different lengths. The benchmark is exactly the same as "TEST DATA" in Table 1, but the test dataset is separated into five datasets according to the distance. HyperETA outperformed DeepTTE for every path length. As illustrated in Fig. 4, most of the MAPE values of DeepTTE were higher than 22% but those of HyperETA were lower than 16.2%. After 5-10 distances, the MAPEs of HyperETA decreased significantly. As indicated in Fig. 5, the RMSEs of DeepTTE increased with distance. However, HyperETA exhibited an opposite phenomenon, where the RMSE values decreased. As shown in Fig. 6, the predicted MAE of DeepTTE (in seconds) increased with distance. However, the prediction quality of HyperETA remained steady, regardless of the distance.

Discussion
Data Size HyperETA outperformed DeepTTE. DeepTTE requires massive training data to achieve high accuracy. HyperETA offered a higher precision than DeepTTE with a small quantity of training data. This demonstrates that deep learning requires a considerable quantity of labeled data to solve problems; it is impractical if the data volume is limited.
GPU DeepTTE-GPU yielded poorer results than those of DeepTTE because the GPU precision was low. In this study, we used a GTX750Ti, a consumer-level GPU, which is inferior to the supercomputing GPU, for example, "TITAN V". In further words, consumer-level GPUs can only process single-precision floatingpoint numbers, but supercomputing GPUs can process double-precision floatingpoint numbers. This considerably influences accuracy because each value in the feature vectors in DeepTTE is normalized to a floating-point number between 0 and 1. Narang [14] described the issue of the single-precision format (FP32) and how it influences gradient descent in backpropagation. PyTorch-1.6 uses a certain method to increase computational speed, which allows it to have the speed of FP16 while maintaining the accuracy of FP32. The default precision of a CPU is FP64 (i.e., the double floating-point format). Furthermore, the GPU in this study was simultaneously used as the display card in the experiments, which affected its computational speed. So, to use an additional supercomputing GPU will benefit deep learning. However, doing so would not have helped in this study, because the small mainframe used in the study, the size of which was half that of a standard desktop, the mainframe had only one PCI-E slot. These practical problems indicate that to accelerate deep learning by using a GPU, large and expensive equipment is required, which is impractical for executing deep learning on small devices such as Internet of Things(IoTs) or mobile devices.
Training loss The training data predicted by DeepTTE were significantly poor, indicating that DeepTTE performed poorly at the fundamental level. The training data were already input in the training stage.

CONCLUSION
In this study, we proposed a novel approach called HyperETA for estimating the ETA for a given trajectory. HyperETA significantly outperformed DeepTTE and DeepTTE-GPU. Our research was a preliminary study comparing Hypercube clustering with deep learning. Numerous tasks are involved in the problems we studied, and they cannot be completed in one study. In future works, we will increase the robustness of the method proposed herein, including the reimplementation of Hy-perETA on a GPU. The implementation of HyperETA on a GPU can accelerate the computational speed. The tensor of PyTorch can be used in the implementation to quickly transfer programming functionalities from the CPU to the GPU.