A hexagon-based method for polygon generalization using morphological operators

Abstract Numerous methods based on square rasters have been proposed for polygon generalization. However, these methods ignore the inconsistent distance measurement among neighborhoods of squares, which may result in an imbalanced generalization in different directions. As an alternative raster, a hexagon has consistent connectivity and isotropic neighborhoods. This study proposed a hexagon-based method for polygon generalization using morphological operators. First, we defined three generalization operators: aggregation, elimination, and line simplification, based on hexagonal morphological operations. We then used corrective operations with selection, skeleton, and exaggeration to detect, classify, and correct the unreasonably reduced narrow parts of the polygons. To assess the effectiveness of the proposed method, we conducted experiments comparing the hexagonal raster to square raster and vector data. Unlike vector-based methods in which various algorithms simplified either areal objects or exterior boundaries, the hexagon-based method performed both simplifications simultaneously. Compared to the square-based method, the results of the hexagon-based method were more balanced in all neighborhood directions, matched better with the original polygons, and had smoother simplified boundaries. Moreover, it performed with shorter running time than the square-based method, where the minimal time difference was less than 1 min, and the maximal time difference reached more than 50 mins.


Introduction
Map generalization is one of the most challenging and innovative research fields in modern cartography. It represents a process in which geospatial information is summarized and abstracted when spatial data are reduced from a large scale to a small scale, rather than involving simple data compression or graphic simplification. Map generalization is a crucial step in digital spatial information processing for the construction and application of geographic information systems (GISs) (Guo et al. 2007. Areal features are a basic type of data in GIS applications and are one of the most essential and abundant map components. Thus, the generalization of polygons is an important aspect of map generalization. Simplification is one approach to polygon generalization, including the simplification of areal objects and exterior boundaries (Monmonier 1983, M€ uller and Zeshen 1992, Wu et al. 2017, Li et al. 2021. Areal object simplification aims to address the conflicts among adjacent polygons or within a polygon owing to the reduction in the map scale, where a new polygon may be used to denote the geographical geometric features of adjacent polygons or a polygon at the target scale. The basic generalization operators include aggregation and elimination (selective omission), which are supplemented by selection, collapse, exaggeration, and other operators (Wu et al. 2017). For general polygonal features such as lakes, residential areas, and other areal features, aggregation is a common operator to study in different domains. Aggregation refers to the geometric combination of adjacent polygonal features when the map scale decreases (Ai and Liu 2002). It is contingent on proximity including not only topological proximity but also the visual proximity calculated by the minimum visual distance (Guo et al. 2012). When the distance between two objects is less than a threshold value, which makes it difficult to visually distinguish the gap between them, the spatial relationship between these two objects refers to the visual proximity, and the distance threshold value is defined as the minimum visual distance (Peng 1995). Therefore, the object proximity is a crucial criterion for detecting objects and merging them into one object (Ai et al. 2019). A simple method to determine the neighboring relationship is buffer analysis with a defined radius, which can easily result in self-intersection of the buffer areas (Bader and Weibel 1997, Weng et al. 2012, Deng et al. 2018. Van Oosterom and Schenkelaars (1995) proposed a generalized area partitioning (GAP) tree for 'on-the-fly' map generalization, where the least significant small polygons are iteratively merged into the neighboring polygon with the longest common boundary length. This method can only manage polygon merging using touch proximity. Furthermore, Delaunay triangulation is a common method used to define the spatial relationship of area objects by constructing a triangulation network and then identifying the type of triangle one by one. For example, Jones et al. (1995) developed a simplicial data structure (SDS) based on constrained Delaunay triangulation and designed proximal search functions to determine the closest neighboring objects to any given object and calculate the distance. Aggregation can then be performed by deleting the common edges or allocating the triangles of the intervening region to the separated object. However, SDS fails to consider the islands within the polygon and to distinguish the concave area of the polygon and the blank area between adjacent polygons. To reduce this problem, Ai and Guo (2000) proposed a formal Delaunay data model (FDDM) supporting map generalization, which involves the description of polygon loops, islands, boundaries and vertexes. Therefore, formal conditional retrieval based on FDDM can be performed to support the search for neighboring polygons and aggregation operators. Recently, deep learning has been introduced for map generalization by learning cartographic knowledge from existing map products, particularly for building simplification. Yan et al. (2019) applied graph convolutional neural networks to building pattern classification by analyzing graph-structured vector data. A similar method was utilized for building grouping, which is a preliminary step for building simplification (Yan et al. 2020). Feng et al. (2019) employed deep convolutional neural networks (DCNNs) for raster-based building simplification and compared the performances of three convolutional neural networks: U-net, residual U-net and generative adversarial network (GAN).
Exterior boundary simplification is regarded as line simplification in numerous research (Bader and Weibel 1997, Shen et al. 2018, Kronenfeld et al. 2020. In early studies, line simplification was implemented by compressing the number of vertices (Douglas and Peucker 1973, Lang 1969, Opheim 1982, Visvalingam and Williamson 1995, Li and Openshaw 1992. The Douglas-Peucker (DP) algorithm is a commonly used algorithm that recursively subdivides the original line using a perpendicular distance tolerance and eliminates vertices that fall within a specified distance to the line segment joining the two original points on the line. This algorithm can produce selfintersecting lines given complex input lines, and its output tends to be angular rather than smooth. To address these drawbacks, M€ uller (1990), Saalfeld (1999), Wu et al. (2004), andLiu et al. (2020) introduced improved methods to the DP algorithm. Line simplification studies consider not only geometric criteria (such as shape preservation, topological consistency, and geo-characteristic considerations), but also semantic criteria. Tuti c and Lapaine (2009) proposed an algorithm for line simplification with the property of area preservation, in which new points are utilized to replace a subset of points on the given lines. Tong et al. (2015) presented an approach for polygonal boundary simplification via structured total least-squares adjustment with constraints (STLSC) to maintain the area of the original polygons in the simplification process. Based on Delaunay triangulation, Ai et al. (2017) developed a scale-driven simplification algorithm, an extension of Perkal's proposal (Perkal 1966). When applied to land use data, this algorithm can preserve the main shape of the polyline and approximately meet the area-preserving constraint for different types of geographic features. Kulik et al. (2005) utilized a weighting function to design an ontology-driven map generalization algorithm known as DMin, which incorporates both geometric and semantic information to simplify a line.
As shown in these studies, most existing algorithms for polygon generalization are applicable only to vector data structures. However, these data structures have two major disadvantages. First, it is difficult to develop an algorithm for polygon generalization that considers the simplification of both areal objects and exterior boundaries. Second, the computation of each operator implemented in the vector data structure is typically complex (Monmonier 1983). In contrast, owing to its simple data structure and space-primary characteristics, raster data have been considered by many researchers to be more suitable for manipulating areal features to complete certain operations, such as merging, splitting, and partitioning (Su and Li 1995, Su et al. 1997b, C amara and L opez 2000, Shen et al. 2019. Therefore, studies have increasingly focused on raster data for map generalization based on mathematical morphology theory (Li 1994;Su et al. 1997b, Wang et al. 2005, Damen et al. 2008, neural networks (Cheng et al. 2013), superpixel segmentation (Shen et al. 2018, Shen et al. 2019, and machine learning (Feng et al. 2019). Mathematical morphology is the most used method for this purpose. Li (1994) summarized the corresponding relationship between morphological operators and map generation operators. He also utilized a combination of opening and closing operators to develop operators corresponding to both aggregation and shape refinement. Moreover, cartographic rules can be realized by controlling the size of grid and structuring element, which greatly reduces the computational complexity and improves the computational efficiency. Wang et al. (2005) and Damen et al. (2008) later combined morphological operators to address building simplification issues.
In theory, morphological operator manipulation depends on the neighborhood relationships among cells. However, all raster-based studies for map generalization with mathematical morphology are based on a square grid structure, where two types of neighborhood connectivity (edge connectivity in the orthogonal direction and corner connectivity in the diagonal direction) result in different distance measurements. The distance between the centroids of both the central cell and neighboring cells in the diagonal direction is ͱ 2 times that in the orthogonal direction, which may lead to excessive expansion and contraction in the diagonal direction, further influencing the balance of the generalization results in each direction. Inconsistent connectivity also leads to at least two different definitions for neighborhoods: (1) 4-neighborhood, including only the cells with edge connectivity; and (2) 8-neighborhood, including all cells with edge connectivity and corner connectivity. This results in the connectivity paradox defined by Duff et al. (1973), wherein the adjacency between two objects is ambiguous. Beyond the aforementioned problems, there is an issue in which the rasterization of polygon features can lead to shape distortion and precision loss (Lin andPeng 2001, Sahr et al. 2003), especially for square raster data of low resolution. These issues result from the grid geometry of the data structure, which has been ignored by most studies related to map generalization with raster data.
As an alternative grid geometry of tessellation models for raster data structures, hexagons have been proven to be superior to squares in spatial modeling and neighborhood processing owing to their consistent connectivity with neighbors, higher symmetry, compact topology, and more isotropic structure (Birch et al. 2007, Raposo 2013, Wang et al. 2020. Unlike squares, one cell is exclusively connected to six neighboring cells via edges in the hexagonal tessellation model. This consistent connectivity avoids the connectivity paradox that occurs in squares and contributes to a uniform spatial meaning for hexagons. Hence, the spatial relationship between the central cell and its six neighbors is uniform adjacency and it is considered more suitable for neighborhood processing (de Sousa et al. 2006, Birch et al. 2007, Wang and Kwan 2018. The distances between the centroids of both the central cell and each neighboring cell are equal, which allows the hexagonal tessellation model to suffer less from orientation bias and sampling bias from edge effects than the square tessellation model (Krebs 1989, Birch et al. 2007, Tong et al. 2013, Wang and Kwan 2018. In addition, the six-fold radial symmetry of hexagons makes them more isotropic than squares with four-fold symmetry (Raposo 2013). Compared to squares, hexagons have a more similar shape to that of circles and shorter perimeters in equal areas, contributing to their more compact structure (Hales 2001). Theredore, hexagons can improve the representation precision for spatial features in analytical terms (Burt 1980, Poorthuis and Zook 2015, Wang and Kwan 2018 and WoodlandOrchardDry landPaddy fieldCountryside equal sampling meshes (Lin and Peng 2001). Furthermore, this reduces the precision loss in the process of rasterization with squares to some extent. Therefore, hexagon is a more suitable tessellated grid for planar space. An increasing number of studies have introduced hexagons for spatial modeling in various domains, such as image processing, cartography, ecology and hydrological analysis (Carr et al. 1992, Middleton and Sivaswamy 2005, Birch et al. 2007, Wang et al. 2020. Hexagons have been extended to the global system (Saff and Kuijlaars 1997, Kimerling et al. 1999, Tong et al. 2013. Raposo (2013) proposed a hexagonal quantization algorithm to simplify lines by resampling the hexagonal tessellations of variable scales. He concluded that lines simplified by hexagons maintain a higher location precision than those simplified by traditional squares. Nevertheless, few studies have explored the applicability of hexagonal tessellation models to other map generalization operations.
This study proposed a new hexagon-based method for polygon generalization (an important part of cartographic generalization), which was developed by combining hexagonal morphological operators. The remainder of this study is organized as follows. In Section 2, we describe the hexagonal grid structure applicable to generalization operators, introduce hexagon-based mathematical morphological operators by showing the corresponding detailed calculations in the generated grid structure and discuss the relationship between morphological operators and generalization operators. Subsequently, two parts of the hexagon-based polygon generalization method developed with hexagonal morphological operators are presented in Section 3. Section 4 analyzes the performance of the hexagon-based method by comparing its experimental results with square raster data and vector data. Finally, Section 5 presents the conclusion of this study.

Hexagon-based mathematical morphology
The application of the hexagonal tessellation model to image processing systems dates back to the 1990s when Petersen and Middleton (1962) used hexagonal grid cells as an alternative sampling grid to sample a two-dimensional Euclidean space. They believed that the squares were not the most effective sampling grids. As an increasingly common method in image processing, mathematical morphology was first applied to hexagonal images by Golay (1969) to present parallel computation by integrating different morphological operators. By comparing morphological algorithms with square and hexagonal images, Serra (1982) preferred the hexagon as an ideal grid for the manipulation of morphological operators because of its uniform connectivity and other topological advantages (Middleton and Sivaswamy 2005). Furthermore, more complicated morphological operators (Deutsch 1972, Staunton 1996 have been proposed using hexagons.

Hexagonal grid structure
During the introduction of mathematical morphology to map generalization using hexagons, it is crucial to generate a hexagonal grid structure as the data structure for the hexagonal tessellation model to manipulate generalization operators. Grid coding and neighboring relationships are the basic elements used to establish the hexagonal grid structure to store, calculate, and process hexagonal cells.
Vital to constructing a hexagonal grid structure, the grid coordinate system should have the following characteristics: (1) suitable conversion from vector data or other grid data and (2) easy establishment of a neighborhood relationship. As summarized by Wang et al. (2020), five grid coordinate systems are designed for the hexagonal grid structure: the offset coordinate, three-axis coordinate, axial coordinate, doubled coordinate and Gosper curve coordinate systems (Her 1995, Nagy 2003, Birch et al. 2007, Xin et al. 2017, Luki c and Nagy 2019). Among these, the three-axis coordinate system is one of the most widely used grid coordinate systems for hexagonal grid structures.
In the three-axis coordinate system, the three primary axes x, y, and z cross each other at 60 degrees, as shown in Figure 1(a). Grid coding is represented by the grid coordinates of each hexagonal cell, and the neighboring relationship can be summarized using the grid coding pattern. Essentially, the grid coordinates of cell c are the grid distances between its projected cell on the x, y and z axes and the origin cell, which can be mathematically calculated using the following equation: where gðx, y, zÞ are the grid coordinates of cell c; l hex refers to the side length of the hexagonal cell; point O is the centroid of the origin cell; P, Q, and R are the vertical projection points of the centroid of cell c on the x, y, and z axes, respectively; and PO, QO, and RO are the distances from points P, Q, and R to point O, respectively, with the sign of their values being the same as that of the axes where points P and Q are located. Here, the hexagonal cell c in orange in Figure 1(a) is considered as a case to illustrate the mathematical calculation of its grid coordinates. As shown in Figure 1(a), the projective points P, Q, and R can be obtained by drawing perpendicular lines from the centroid of cell c to the three axes. The values of PO, QO, and RO were measured to be 3l hex , À 9 2 l hex , and 3 2 l hex , respectively. Therefore, the calculated grid coordinates of the cell were (2, À3, 1). Using the same method, the grid coordinates of other hexagonal cells are derived, as shown in Figure 1(b) and are the exact grid coding in the hexagonal grid structure. There is a constraint on the grid coordinates (x, y, z) of each hexagonal cell, as represented by the formula x þ y þ z ¼ 0, which ensures that there are unique grid coordinates for each cell. This mathematical consistency makes it simple and efficient for rasterization. The three-axis coordinate system is symmetrical, distance measurements are the same in all orientations, and rotation matrices can be defined. Therefore, it is considered suitable for applications related to spatial relationships (Birch et al. 2007).
In addition, there is a consistent pattern of grid coding to define the neighborhoods in the three-axis coordinate system. Figure 1(c) displays this pattern, where for cell c (x, y, z), the neighborhood grid coding to its left, upper left, upper right, right, lower right and lower left are (x À 1, y þ 1, z), (x, y þ 1, z À 1), (x þ 1, y, z À 1), (x þ 1, y À 1, z), (x, y À 1, z þ 1) and (xÀ1, y, z þ 1), respectively. In practical applications, the grid coding of each cell can be determined by a specified origin cell and the above grid-coding pattern among neighborhoods. The hexagonal grid structure generated with grid coding and neighboring relationships based on the three-axis coordinate system facilitates the grid location, neighborhood search, and neighborhood processing in the subsequent raster-based calculations. This structure is suitable only for the calculation and analysis of planar space.

Hexagon-based morphological operators and generalization operators
As a unified approach to numerous image processing problems, mathematical morphology is a science of form and structure using the set theoretic method. It was initiated by Matheron and Serra in the 1960s (Matheron 1975, Serra 1982, Su and Li 1995. Golay (1969) utilized the principle of morphology in hexagon-based images. The basic idea of its image processing operations is to utilize a probe called a structuring element to collect the information reflected by the images. When this probe is constantly moving in the image, the relationship between various parts of the image can be detected to develop the structural characteristics of the image. Based on the mathematical morphological methodology, Li (1994) discussed the applicability of morphological operators to implement map generalization operators based on traditional square raster data, including aggregation, elimination, collapse, displacement, and other generalization operators. Similarly, this study summarizes the hexagon-based morphological operators essential for polygon generalization, which were developed in the hexagonal grid structure established in Section 2.1. Moreover, the relationship between morphological operators and generalization operators was discussed using simple cases.

Structuring element
The structuring element (SE) is a critical element for all the morphological operators. Its function is like that of a kernel (or mask) in the convolution operation. Therefore, the morphological operators can change the shape of the image by convolution (Su et al. 1997a(Su et al. , 1997b, where the SE is constantly moving inside and outside the image to detect the regions that conform to certain calculation rules. The movement process of the structuring element B can be geometrically defined as B þ x, indicating that B shifts by jxj in the direction of vector x, as shown in Figure 2(a). Unlike the regular shape of the kernel composed of n-order neighborhoods, the shape of the SE is arbitrary and must be designed according to the morphological operators and the effect of image processing. Additionally, the size and origin of an SE determine the processed result of the image. The three SEs are shown in Figure 2(b). They result in different processing effects for various morphological operators, which will be discussed in subsequent sections.

Basic morphological operators
The two primary operators of mathematical morphology are erosion and dilation. Erosion is defined as follows: where H is the notation of the erosion operator, A is the input image, B is the SE; and B þ x is the translation of B by x. AHB consists of all origins of SE B that can be completely contained in image A during the process of translation. This operator is called 'A eroded by B'. Dilation is a dual operation of the erosion operator, expressed as follows: where is the notation of the dilation operator. The formula above is called 'A dilated by B'. The dilation operator can be implemented by shifting the SE to all points of the input image and then calculating their union. Figure 3 shows the cases of the two basic operators in the hexagonal grid structure (see supplemental materials for detailed calculation steps), where the input image can be regarded as two rasterized polygons I and J. The fully symmetrical SE B plays a role in exaggerating polygons in all directions in the dilation operator, where the degree of exaggeration is determined by the size of the SE. The erosion can eliminate polygon J, which is smaller and narrower than SE B and contract the larger polygon I. However, it destroys the integrity and shape of polygon I. Two basic operators with SE B 1 whose shape is unidirectional linear can be utilized for the displacement of polygons, where the direction and size of the displacement depend on the shape and size of the SE.

Extended morphological operators for elementary polygon generalization
Any other operators can be developed based on the two basic operators. Opening and closing are the simplest combined operators and play a significant role in polygon generalization, which is represented as follows: where o and are the notations of the opening and closing operators, respectively. The opening operator is a process with two steps, erosion and dilation, represented as 'Opening of A by B'. There are two functions: (1) smoothing the outer boundary of the image and (2) eliminating small noise within the image. The closing operator is a dual-opening operator and is defined as dilation followed by erosion. Equation (6) of closing can be expressed as 'Closing of A by B'. It can smooth the inner boundary of the image, fill small holes inside objects and connect neighboring objects in the image. The opening and closing operators with a fully symmetrical SE can accomplish three map generalization operators: aggregation, elimination, and line simplification. As shown in Figure 4(a), regarding input image A 2 as two rasterized polygon objects X and Y, the opening operator deletes polygon object Y, which is smaller or narrower than SE B. This is the same as for the elimination operator. The closing operator merges polygons whose distances are narrower than the size of SE B, which has the same effect as the aggregation operator. Line simplification can be accomplished by removing convex bends or filling concave bends narrower than the SE in the opening and closing operators, respectively. In addition, the opening operator can eliminate narrow polygons in a specified direction using an SE with a linear shape, as shown in Figure 4(b). The cases in Figure 4(c) indicate that the closing operator also can fill the holes inside a polygon, and merge polygons in a specified direction without changing the shape of the boundaries by an SE with a linear shape. In summer, a combination of opening and closing operators can be used to perform polygon generalization using a hexagonal grid structure. The shape of the SEs in the two extended operators determines the direction of generalization, which further influences the effect of generalization. The size of the SEs corresponds to the distance threshold in polygon generalization.

Comparison of morphological operators between the square grid and the hexagonal grid
Theoretically, the principles of the above morphological operators developed in square and hexagonal grid structures are the same. However, the estimated results show differences owing to the different neighborhood operations. Figure 5 shows a case with two rasterized polygonal objects, where the areas of the square and hexagon are the same, and SEs Bs and B are set as the first-order neighborhoods of the squares and hexagons, respectively, to directly reflect the differences in neighborhood operations. Compared to the 6-neighborhood operation in the hexagonal grid structure, the 8neighborhood operation leads to inconsistent distance measurement in the square grid structure, which makes it subject to imbalance in both filling and omitting grids, as shown by the operators for polygons I and J in Figure 5(a). Taking L 3 in the orthogonal direction and L 4 in the diagonal direction, whose distances are similar, as an example, the square-based closing operator fills the grids of the gap between polygons I and J only in the diagonal direction. Analogously, for L 1 in the orthogonal direction and L 2 in the diagonal direction with similar distances, the square-based opening operator only omitted the grids in L 2 . The corresponding results for the hexagonal grid structure were uniform in each direction, as shown in Figure 5(b). These differences confirm that the closing and opening operators in the square grid structure applied to polygon generalization result in excessive aggregation and elimination compared to those in the hexagonal grid structure.

Other morphological operators
Apart from the aforementioned morphological operators, other morphological operators such as selection and collapse, are necessary for supplementary generalization operators. Here, we introduce two additional operators, namely, the hit-or-miss transform and thinning algorithm, which are indispensable for the subsequent polygon generalization method. The hit-or-miss transform is an effective operator for detecting the characteristics of images and is applied to solve issues such as shape detection, object identification and thinning algorithms. The calculation requires a pair of SEs B ¼ (E, F), with one probing the internal image and the other probing the external image. It is defined as follows: where is the notation of the hit-or-miss transform, and A C is the complement of A.
Hit-or-miss transform is a strict template matching. As shown in Figure 6, this transform detects the corner shaped as SE E.
The thinning algorithm is one of the methods used to implement skeletonization algorithms for image processing. It extracts the medial axis of the image by peeling the image from the outside to the inside, resulting in regular image shrinkage. When applied to map generalization, it is capable of deriving skeletons for polygonal objects. The thinning algorithm relies on the hit-or-miss operator, which is defined as follows: where O is the notation of the thinning algorithm, and B is a pair of SEs. Generally, Equation (8) is represented as follows: where B i f g is a sequence of SEs, and K is the number of pairs of SEs. The basic theory is to detect the boundary grids through the hit-or-miss transform by defining a series of pairs of SEs and to remove these grids iteratively until the output image is stable. Figure 7 shows a case of thinning algorithm processed in the hexagonal grid structure, where six pairs of SEs, B 1 , B 2 … B 6 , are utilized to thin image A from all directions to derive the skeleton of polygon P. In the first iteration, cells with orange edges are detected and removed by SEs B 1 , B 2 , B 3 , B 4 and B 5 . As shown in the output images of the second iteration, one cell in image A 7 can be detected by SE B 1 and the other images in this iteration are the same as those in image A 7 . The output images in the third iteration were fixed with all pairs of SEs, which were identical to image A 7 . Therefore, the iterations ended. A 7 is the final thinned result with a width of one cell, which is exactly the skeleton of polygon P in image A.

Preliminary operations for polygon generalization
As mentioned in Section 2.2.3, a combination of opening and closing operators with a fully symmetrical SE can be applied to simplify both areal features and exterior boundaries. Therefore, the preliminary operations for the generalization of polygons with general types can be defined as follows: where M i is the generalization result at the ith map scale, and B i is a symmetric SE with the origin at its center for the closing and opening operators. The size of B i is related to the map scale and the minimum visible distance in cartographic rules for polygon generalization among the three parameters and can be defined as follows: where S source and S target are the map scales of the original and target data, respectively, and d is the minimum visible grid at the source scale in terms of the number of cells below which two objects on the source map cannot be further separated. It corresponds to the minimum visible distance d min , which is approximately 0.2 mm in terms of map distance. The horizontal grid width of the hexagonal cell was calculated using the parameters S source , d and d min , expressed as w h ¼ d min S source Ã 1 d : Figure 8 shows the corresponding results obtained by implementing preliminary operations in the hexagonal grid structure. In this case, the value of the minimum visible grid d is 1, and the corresponding horizontal grid width of the hexagon is set to w h ¼ d min S source , indicating that the minimum visible distance corresponds to one cell at the source scale. For instance, at the original scale S source of 1:50,000 the horizontal grid width is 0:2mmÃ10 À3 1=50, 000m ¼ 10m per cell. The size of SE B 2 in Figure 8(b) is 2, indicating that the target scale S target is 1:150,000 or 1:200,000. With these defined parameters, the closing operator in Figure 8(c) not only merges adjacent polygons, as shown in the orange boxes, but also changes the weak connection between different parts of the polygon to a strong connection, as shown in green box 2. Moreover, the boundary becomes smoother by filling hexagonal cells of concave bends, which is apparent in the region of green box 1. As shown in Figure 8(e), the following opening operator makes the polygon smoother by removing the red hexagons of convex bends and omitting the green cells of the small individual polygons. Nevertheless, some important parts of polygons and large individual polygons can be deleted because of their narrow shape. This causes unreasonable results for polygon generalization. For example, the deletion of connecting parts P and Q destroys the integrity of the polygon merged in the first operation. Although the individual polygon M is narrower than SE B 2 , it is sufficiently long to be represented on a map.

Corrective operations for reduced narrow parts
To solve these unreasonable deletions, Su et al. (1997a) designed an operation called restoration to restore polygons that have not been completely removed by the erosion operator or to exaggerate the corresponding features by a dilation operator with the SE of the target scale. This ensures the integrity of polygonal objects. However, this operation fails to restore individual polygons with large areas and narrow shapes. The use of a dilation operator for the restored parts directly results in an excessive widening.
To improve this, corrective operations are divided into two steps: detection of unreasonably reduced parts according to the minimum visible distance of cartographic rules, and processing of these parts according to feature types. In our study, the unreasonably reduced parts are defined as two categories based on the minimum visible distance: the parts that destroy the integrity of a polygon and the narrow parts that are sufficiently long to be expressed on the maps. The selection of the processing operations for these parts depends on their feature types. For example, collapse can be applied to polygons of rivers or residential types (Su et al. 1998), whereas exaggeration is usually applied to polygons of other general types (Li et al. 1997).
Taking polygons of general types as an example, the detailed procedures of corrective operators with exaggeration implemented in the hexagonal grid structure are as follows: 1. Detect unreasonably reduced parts, including individual polygons and convex bends, which are narrower but larger than the SE and connecting parts: a. In the preliminary result of polygon generalization, regroup the eliminated hexagonal cells according to their connectivity as a series of images E ¼ fE 1 , E 2 , . . . , E n g, and regroup the retaining cells as a series of images R ¼ fR 1 , R 2 , . . . , R n g: b. For each image E i , count the number of adjacent images in R. Based on the calculated results, E i is defined as individual polygons, convex bends, and connecting parts with numbers 0, 1, and 2, respectively. c. Mark images with the number 0 and more grids than SEs, with number 1 and more cells than SEs, or with the number 2 in E and store them in P ¼ fP 1 , P 2 , . . . , P n g as the images to be processed. 2. Extract the skeleton of the unreasonably reduced parts: a. For each image P i , obtain its skeleton image S i with the thinning algorithm using six pairs of SEs. b. Combine all skeletons in step a as image S. 3. Exaggerate skeleton S using the dilation operator with the SE of the target scale as image D and merge it into the image of the preliminary result. Figure 9 shows the corrective operations for unreasonably deleted parts in the case shown in Figure 8. According to the first preprocessing step, the images of individual polygons M and connecting parts P and Q can be detected for processing. The widths of the extracted skeletons were in one cell. Notably, the exaggerated results were wider than the SEs in all directions. Consequently, the merged result within the red boxes in Figure 9(e) indicates that these corrective operations enable the restoration of an unreasonable elimination. However, it also can exaggerate the corresponding

Study data and parameters
The original data are a general type of woodland data in land use data at a map scale of 1:50,000 from one town located in the Hunan Province, China, with an east-west span of 6508 m and a north-south span of 4812 m. There are 50 area patches with three characteristics, as shown in Figure 10, (1) Some area patches are too small to be retained, which can be utilized to assess the effect of the hexagon-based generalization method in elimination.
(2) The distribution of some patches is concentrated, whereas some patches are dispersed. This can be applied to test for aggregation as well as for the effect of boundary simplification. (3) The shapes of some patches are irregular and fragmented, which is suitable for measuring the corrective operations of narrow parts.
During our experiments, the equivalent operations implemented in the square raster with the same method were set as in the comparative experiments to measure the performance of the hexagon in raster-based polygon generalization. First, we utilized ArcGIS software (Version 10.2, Esri, Redlands, CA, USA) to convert the original experimental data in vector form into GeoJSON form. Then, the data in the GeoJSON form were rasterized into both hexagonal raster and square raster forms, which were implemented in Visual Studio (Microsoft, Redmond, WA, USA) with the JavaScript language. Subsequently, based on the grid structure in the processed raster data, the hexagon-based method proposed in Section 3 and the square-based method with equivalent operations were developed in Visual Studio with the JavaScript language. The aggregation, elimination, and line simplification methods in ArcGIS software (version 10.2) were used to perform contrasting experiments with vector data. Aggregation and elimination were run in the [Coverage] tool, and line simplification was handled in the [Simplify polygon] tool with both the point-remove algorithm (Douglas and Peucker 1973) and the bend-simplify algorithm (Wang and M€ uller 1998).
During rasterization, the same scale between the hexagonal raster and square raster was defined as the same area of the grid cell to ensure a fair comparison for subsequent experiments. The minimum visible grid d was set to 1. Therefore, the horizontal grid width of the hexagon can be calculated as w h ¼ d min S source , and the corresponding horizontal grid width of the square was defined as w s ¼ w h Ã ffiffiffiffiffi ffiffi 3 p 2 q : Using these estimated parameters, the original vector data were rasterized to hexagonal and square rasters with horizontal grid widths of 10 m and 9.3 m at a scale of 1:50,000. It is difficult to set exactly equivalent SEs for square and hexagonal rasters. The purpose of this study is to explore the performance between the isotropy of hexagons and the inconsistent neighborhood connectivity of squares in polygon generalization. Therefore, the SEs were set as the first-order to fifth-order neighborhoods of squares and hexagons for map scales of 1:100,000, 1:150,000 or 1:200,000, 1:250,000 or 1: 300,000, 1:350,000 or 1:400,000 and 1:450,000 or 1:500,000 (numbered 1-5 in Figure 11), respectively, which can directly reflect the differences in the characteristics of neighborhood operations. The threshold values for vector data were the minimum visible distances at the corresponding target scale, which are 20, 40, 60, 80 and 100 m.  Figure 12 shows the generalization results of the raster-based and the vector-based methods. As seen from the raster data at the source scale, rasterization with hexagons or squares is a process of generalization where proximal polygons in the topology are combined and the boundary cells of adjacent polygons are connected. Therefore, the number of polygons in the original hexagonal and square rasters decreases to 38 and 39, respectively. There is no significant difference between the results of rasterization with hexagons and squares.

Comparison of generalization results
The hexagon-based method can simplify both the areal features and the exterior boundary, as shown in Figure 12(a). The degree of generalization of the results increases as the scale decreases, which is mainly reflected in three aspects: (1) the decreasing number of areal objects, which results from the aggregation of polygons with proximity relationships and the elimination of small objects; (2) the increasing simplification of polygon outlines, which is mainly caused by the removal of cells of convex bends and the filling of cells in concave bends; and (3) the more regular shape of area objects, to which the exaggeration or elimination operator for the narrow parts also leads, in addition to boundary simplification.
At the same map scale, the degree of areal object simplification in the hexagonal raster results is higher than that in the vector results, but lower than that in the square raster results. To display the differences clearly, regions of sample 1 with representative polygons within the red boxes in Figure 10 were chosen to enlarge the results of generalization at scales 3, 4, and 5, as shown in Figure 13(a). The polygons in green box 1 are merged in the raster-based results at scale 3, whereas they remain separated in vector-based results until scale 4. As seen in green box 2, the adjacent parts of the polygon are merged by the square-based method at scale 3, while they are merged in the hexagonal raster results from scale 4. A similar excessive aggregation of the square-based method also occurs in the polygons within green boxes 3 and 4. In addition, the square raster derives excessive generalization results in elimination compared to the hexagonal raster, as displayed in the red box. Although the polygon in the red box is also omitted by the point-remove algorithm of the vector data, it is caused by the line simplification operator rather than the elimination operator. The green boxes 2, 3, and 4 and the red box show that the overgeneralization produced by the square raster in simplifying the areal objects occurs in polygons in the diagonal direction, owing to its inconsistent distance measurement. This disadvantage can be improved by designing SEs close to the circle (Su et al. 1997b), but the inherent inconsistency of squares cannot be completely addressed. The isotropy of the hexagon alleviates this problem.
The performance of the hexagon-based method in terms of the simplification of the exterior boundary is similar to that of the simplification of the areal features. Figure 13(b) shows the details of the generalization results for sample 2 from scale 3 to scale 5. The operator in the raster-based generalization method that worked in this region is primarily line simplification. The hexagon-based method generates more simplified boundaries than any of the two vector-based line simplification methods. In comparison to the results of the point-remove algorithm, the boundaries simplified by the hexagon-based method are smoother, such as the boundary within the red ellipse. In contrast to the bend-simplify algorithm, the hexagon-based method generates more regular boundaries. As it not only identifies and fills or omits insignificant small bends, such as the boundary in the red box, it also processes elongated bends with elimination or exaggeration. Compared to the hexagon-based method, the square- based method has a greater degree of line simplification owing to overfilling of concave bends and the excessive removal of convex bends. As displayed in the red box, the convex bends operated in the hexagonal raster are widened at scale 4 owing to their narrow but long shape at scale 3. However, some parts of the bends are omitted in the square results at scale 4, which was influenced by the overfilling of the convex bend in the green box and excessive removal of the concave bend in the red box. In addition, the shape of the feature boundaries becomes straighter and rectangular when simplified with the square raster, as shown by the blue ellipses of Figure 13, especially in the results at lower scales. In contrast, the results of the hexagonal raster are smoother, which is more in line with the original shape feature of the boundary.
To quantify the hexagon-based method performance in terms of the degree of generalization, two geometric indices, the number change ratio and area change ratio, are expressed by the rate of change in the number and area of polygons at the target scale relative to the vector polygon at the original scale, respectively. Figure 14 shows a comparison of the calculated values at all target scales using different methods. As shown in the results of scale 0 in Figure 14, the number change ratio values of the hexagon-based and square-based methods are 22% and 24%, respectively, which far exceed the values of 0% of the two vector-based methods. However, the area change ratio values of the hexagon-based and square-based methods are not significantly different from those of the two vector-based methods, which were 1.38% and 1.39%, respectively. This demonstrates that the results of rasterization at the original scale in both the square grid hexagonal grid structures only merge all proximal polygons in topology and a few adjacent polygons whose boundary cells are connected, and they can capture the original shape features of polygons in vector form.
The two indices in all the methods exhibit an increasing tendency as the scale decreases. The values of the vector-based method at any scale are lower than those of the two raster-based methods, which demonstrates the higher degree of polygon generalization achieved by the raster-based methods. The number change ratio is theoretically determined by the aggregation and elimination operators. A comparison of the values displayed in Figure 14(a) shows that the estimated ratio of the number change in the hexagon-based method is lower than that in the square-based method from scales 1 to 5. This difference increases with the decreasing scale, as the degree of inconsistent distance measurement increases as the number of square cells increases, resulting in a larger difference in the generalization degree between the diagonal and orthogonal directions. The estimated area change ratio value is much higher in the raster data than in the vector data for scale 2. This difference increases with decreasing scale, resulting from the exaggeration operation of narrow parts in raster-based methods, which is not included in vector-based operations. Moreover, the value of this ratio in the square result is almost twice that of the hexagonal result at each scale.
The other two indices, the coefficient of area correspondence (CAC; Taylor 1977, Stum et al. 2017) andcompactness index (CI;Frolov 1975), are applied to evaluate the changes in generalized polygonal objects in shape, which are expressed as follows: where CAC(i) and CI(i) are the values of the coefficient of area correspondence and compactness for generalization results at scale i, respectively; Area m is the area of regions contained in both the original polygons and the generalized polygons; Area o is the area of regions contained in the original polygons but not in the generalized polygons; Area c is the area of regions contained in the generalized polygons but not in the original polygons; Area t is the area of generalized polygons; and Peri t is the perimeter of generalized polygons. Theoretically, a higher CAC value indicates a better match between the generalization result and objects at the original scale.
Compactness was proposed to describe polygonal circular curvature, that is, the degree to which a polygonal object approximates a circle. It is an indicator of the degree of concavity of a polygonal shape (MacEachren, 1985) and the overall narrowness of a faceted target (Jia, 2015). Therefore, a larger CI indicates a more regular and smoother polygonal object shape. Figure 15 compares the calculated values of CAC and CI based on the four generalization methods at all scales. Figure 15(a) shows that the CAC values in the hexagon- based results are greater than those in the point-remove algorithm results and squarebased results, but lower than those in the bend-simplify algorithm results. This implies that the hexagon-based method performs better in matching the original polygons than the square-based and vector-based methods with the point-remove algorithm. For the estimated CI values shown in Figure 15(b), the compactness of both the square-based and hexagon-based results is greater than that of the vector data, and the difference increases as the scale decreases. This indicates that generalization based on raster data results in a more regular shape for polygonal objects because of the specially designed operation of narrow parts. Even on scales 2 and 3, the square-based method produces higher values than the hexagon-based method, with slight differences, and the hexagon-based method has higher values and more significant differences at the remaining scales. This is because the geometric shapes of the hexagon and the SEs in which it is composed are closer to a circle than to a square, making the boundaries simplified by the hexagon-based method more regular and smoother.

Assessment of the applicability and running time
The other four land use types (orchard, dry land, paddy field and countryside) of polygonal data from the same data source and with the same experimental parameters as the experiments in Section 4.2 are applied to measure the applicability and running time of the polygon generalization method proposed in this study. The characteristics and properties of these four types of polygonal data are completely diverse, as shown in Figure 16 and Table 1. The orchard type of polygons with the largest CI values have regular shapes, and are aggregated distributions. The distribution of area patches with the dry land and countryside types is relatively discrete, and their shapes were long and narrow. The type of paddy field data has the largest area and most complex shape. The objects of these data are in a clustered distribution; however, they are irregularly shaped, partly narrow, and partly fragmented. There is little difference in the standard deviations of CI values of these four land use types of polygon data, which indicates that the degrees of shape variation among polygonal objects in the various land use types of data are approximately the same.
To display the generalization results of the four types of polygonal data, the objects of orchard, dry land, paddy field and countryside types within the red boxes a, b, c, and d in Figure 16 are enlarged in Figure 17. In addition, the results of rasterization at the original scale and the generalization results at scales 3 and 5 are compared between the hexagonal and square rasters. In the results obtained using the hexagonbased method, the small objects are omitted, the adjacent objects are merged, and the polygonal boundaries are simplified. The polygon generalization method proposed in Section 3 has a strong applicability to various general types of areal objects. When comparing the generalization results of the hexagonal raster and square raster, the conclusions are like those of the experiments with woodland data. The square-based results have a higher degree of generalization in aggregation and elimination, as shown by the green boxes in Figure 17. They exhibites poor performance in maintaining the smooth characteristics of the original features in terms of line simplification, as indicated by the red boxes in Figure 17.
Theoretically, the time complexities of the rasterization and generalization methods proposed in this study are O (n). Table 1 shows that the original data with larger amounts require more time to be converted into the two types of raster data. The rasterization efficiency in the hexagonal grid structure is higher than that in the square grid structure. However, in the actual running generalization process, the running time depends not only on the data volume but also on the size of the SE and the shape of the polygons. Figure 18 compares the running times of various types of polygonal data using the hexagon-based the square-based methods. The running time of the two raster-based methods increases as the scale increases for any type of polygonal data, because more cells in the larger SE need to be computed. By comparing the  running times of the woodland and countryside types, although the latter has a smaller data volume, its more complex shape reflected by the lower CI value results in a longer running time for the two raster-based method. From the estimated running time of the orchard type with the smallest area and the higher CI value and paddy field with the largest area and the lowest CI value, the polygons with the smaller data volume and the regular shape are generalized by the two raster-based methods in a shorter running time. The time gap between the two raster-based methods for these  polygons is slighter. Notably, the running time of the hexagon-based method for any type of polygon is much shorter than that of the square-based method at any scale. The time gap between these two raster-based methods increases with an increase in SE size. The largest gap appears in the generalization process of the paddy field polygons at scale 5, where the running time is 3.3 min in the hexagonal raster and 57.3 min in the square raster. Thus, the hexagonal grid structure is more efficient in the implementation of rasterbased polygon generalization algorithms, particularly the generalization of objects with large areas and complex shapes at small scales.

Conclusion
This study proposed an innovative raster-based method for polygon generalization with hexagons utilizing morphological operators. With the exception of Raposo (2013), who proposed a line simplification algorithm that relies on resampling hexagonal tessellations of variable scale, all previous studies on map generalization with raster data were based on a square grid structure, ignoring that the inherent distinctive distance measurement among the eight neighborhoods of squares leads to inconsistent generalization results between the diagonal and orthogonal directions. As an alternative grid geometry, hexagons are more suitable for generating a grid structure for raster data owing to their consistent connectivity, isotropy of local neighborhoods, higher symmetry, compact structure and visual advantages. Utilizing the superiority of hexagons in neighborhood processing to improve polygon generalization, we applied mathematical morphology in a hexagonal grid structure to achieve preliminary operators, such as aggregation, elimination, and simplification of boundaries and designed corrective operations for unreasonable generalization of narrow parts.
During the experiments measuring the effectiveness of the proposed hexagonbased method, equivalent operations were implemented in the square raster, and vector data were utilized for the contrasting experiments.
Experiments with the two raster-based methods resulted in three findings. (1) The degree of generalization of the hexagon-based method is lower than that of the square-based method and is more balanced in each direction. The square-based method is prone to excessive aggregation and elimination in the diagonal directions, as well as excessive deletion of convex bends and filling of concave bends in line simplification.
(2) The shape of the simplified polygons obtained using the hexagon-based method matches better with the original polygons. In addition, the simplified boundaries are smoother and more regular in the results of the hexagon-based method, whereas they tend to be straighter and right-angled in the results of the square-based method.
(3) The hexagonal raster is more efficient than the square raster in running time, especially in the generalization of polygons with large areas and complex shapes at small scales.
Compared to the results of the vector-based method, the following conclusions can be drawn for the hexagon-based method. (1) The hexagon-based method is simpler than the vector-based method in terms of the implementation of the algorithm. Different algorithms in the vector-based method are required for areal object and exterior boundary simplifications. However, the hexagon-based method can accomplish both simplifications. (2) The hexagon-based method has a higher degree of generalization than the vector-based method. (3) The shape of the polygons simplified by the hexagon-based method matches the original polygons better than those simplified by the point-remove algorithm but worse than those simplified by the bend-simplify algorithm. The boundary simplified by the hexagon-based method is smoother than that simplified by the point-remove algorithm and more regular than that simplified by the bend-simplify algorithm.
Future research should focus on improvements and applications.
(1) This method should be applied to the generalization of areal objects of different types or shape features, such as buildings of artificial geographical elements and rivers with linear shapes.
(2) The proposed polygon generalization method must be improved for polygons of various types. It was designed only for polygonal data of a single general type. Therefore, constraints, both in semantics and topology, should be considered in the improved generalization method for polygons of various types. (3) The hexagonal grid structure provides an innovative framework for map generalization. Future studies should explore its applicability to other generalization operators such as displacement and collapse.
Yilang Shen is an Associate Professor at Sun Yat-Sen University. His research interests include computer vision, map generalization, and change detection of spatial data. The main contributions to the paper is experimental design.
Min Yang is an Associate Professor at Wuhan University. His research interests include map generalization, machine learning, and spatial big data analysis. The main contributions to the paper is experimental design. Data and codes availability statement

ORCID
The study data and codes that support the findings of this study are available at a public link: https://doi.org/10.6084/m9.figshare.14980725