Geo-indistinguishable masking: enhancing privacy protection in spatial point mapping

ABSTRACT Spatial point mapping is a useful practice in exploratory point pattern analysis, but it poses significant privacy risks as the identity of individuals may be revealed from the maps. Geomasking methods have been developed to mitigate the risks by displacing spatial points before mapping. However, many of these methods rely on a weak privacy notion called spatial k-anonymity, which is insufficient to withstand the growing amount of spatial data (e.g. land use) that adversaries can use as side information to infer the actual locations of individuals. We proposes a method called geo-indistinguishable masking to address this issue by relying on a strong privacy notion called geo-indistinguishability. This notion ensures consistent levels of privacy protection regardless of any side information. The method consists of two steps. The first step involves creating a masking area for each spatial point to include a set of candidate locations to which the point can be relocated. In the second step, we formulate an optimization model to ensure the masked locations satisfy geo-indistinguishability while minimizing the distance displaced. Computational experiments on a synthetic dataset demonstrate that our proposed method is both efficient and effective in providing strong privacy protection while preserving the spatial point patterns.


Introduction
Spatial point data refers to information that is associated with a specific location, usually defined by geographic coordinates, such as latitude and longitude.This type of data often includes precise locations from mobile devices, global positioning system (GPS) trackers, or physical addresses.In the past decade, the growing use and demand for location-based applications and services, combined with the widespread adoption of GPSenabled devices, smartphones, and other connected devices, have led to an enormous increase in the generation and collection of spatial point data (Lee & Kang, 2015).To extract meaningful insights from this data, a variety of methods have been developed, with mapping being one of the most commonly used and effective approaches (Baddeley et al., 2015).The practice of spatial point mapping enables gaining insights into geographic patterns and relationships in an intuitive and accessible manner and is essential for research and policymaking in fields such as health, crime, and demography (Bakillah et al., 2014;Chainey & Ratcliffe, 2013;Cromley & McLafferty, 2011).
Spatial point mapping can raise serious privacy concerns, especially when it involves personal information that can be used to identify an individual, such as name, phone number, or physical address (Armstrong & Ruggles, 2005;Boulos et al., 2009;Fisher & Dobson, 2003).The disclosure of such information could be exploited to reinforce social control and the exercise of power (Crampton, 2003;Dobson & Fisher, 2003;Lyon, 2010), as well as perpetuate existing inequalities and discrimination by subjecting certain population groups (e.g.HIV patients) to disproportionate surveillance (Crampton, 2007;Curry, 1997).To address these privacy concerns, spatial point data is typically anonymized before mapping, aiming to remove personal information for individuals to remain anonymous (El Emam & Arbuckle, 2013).However, even with anonymized data, spatial point mapping can still be used to reveal the identity of individuals through methods such as reverse geocoding and linkage to public auxiliary data (Armstrong & Ruggles, 1999;Brownstein et al., 2006Brownstein et al., , 2006;;Curtis et al., 2006aCurtis et al., , 2006b;;Kounadi et al., 2013).Brownstein et al. (2005), for example, illustrates the use of common geographic information system (GIS) techniques, such as vectorization and buffering, to reverse geocode patient addresses from raster point disease maps.
Geographic masking, also known as geomasking, is a commonly used technique for enhancing the confidentiality and privacy of individuals when mapping anonymized spatial point data (Armstrong et al., 1999).This type of approach involves introducing a controlled level of displacement to the original data to prevent the release of fully identifiable raw data that can lead to the identification of specific individuals (Armstrong et al., 1999;Kwan et al., 2004;Leitner & Curtis, 2004, 2006;Zimmerman & Pavlik, 2008).Many geomasking methods incorporate or are evaluated using the privacy notion of spatial k-anonymity, which requires that an individual's location remains indistinguishable among a set of k spatial points (Allshouse et al., 2010;Charleux & Schofield, 2020;Hampton et al., 2010;Hasanzadeh et al., 2020;Kounadi & Leitner, 2016;Richter, 2018;Seidl et al., 2015;Swanlund et al., 2020;Zhang et al., 2017).One common method to achieve this is to replace the individual's location with a random point in a region containing at least k-1 other spatial points to ensure that an adversary cannot pinpoint the individual with a probability greater than 1/k (Ghinita et al., 2010).However, the effectiveness of spatial k-anonymity depends on the assumption that all spatial points in the region appear equally likely to be the true location from the adversary's perspective (Andrés et al., 2013;Chatzikokolakis et al., 2015;Dong et al., 2018).With the increasing availability of public auxiliary spatial data, such as point-of-interest (POI), land use, and building footprint data, adversaries can now easily obtain side information that can be used to rule out some of the unrealistic locations in the region, thereby violating the spatial k-anonymity notion.Therefore, spatial kanonymity may not be sufficient to ensure adequate location privacy protection in spatial point mapping.
The purpose of this paper is to develop an effective geomasking approach that enhances privacy protection in spatial point mapping.We address the shortcomings of existing methods by adopting a well-established location privacy notion known as geo-indistinguishability, which offers privacy protection against adversaries with access to any side information (Andrés et al., 2013).While geo-indistinguishability has been successfully applied in location-based service (LBS) systems (Andrés et al., 2013;Chen et al., 2022;Yan et al., 2022), its potential in the context of cartography and mapping remains largely unexplored.Considering that the prevailing masking methods used in mapping primarily rely on the concept of spatial k-anonymity, which may not offer a sufficiently strong privacy guarantee, this study introduces a novel approach called geoindistinguishable masking.By integrating the principles of geo-indistinguishability with masking techniques specifically designed for mapping purposes, our approach aims to provide an enhanced privacy solution.
In the next section, we discuss the motivation for this research by highlighting the limitations of current geomasking methods and introducing the concept of geoindistinguishability.We then present our method of geo-indistinguishable masking and demonstrate its effectiveness through a set of computational experiments.Finally, we discuss the limitations of our approach and future research directions.

Background
Spatial point mapping typically begins with collecting data on individuals, which involves geographic coordinates indicating their locations.This data is then anonymized by removing any personal information.Once the data has been anonymized, mapping can be performed to visually represent the spatial points using dot symbols based on geographic coordinates.However, it is still possible to perform de-anonymization to reverse engineer the raw spatial point data from these maps.As discussed earlier, standard GIS techniques can be used to deduce the physical addresses of individuals based on their mapped locations (Figure 1b).By linking the addresses to public auxiliary data such as voter registration and POI data, the raw spatial point data can be recovered, compromising the privacy of those whose locations have been mapped (Figure 1c).This can be particularly concerning for individuals whose location information is sensitive, such as patients and crime survivors.

Existing geomasking methods
Geomasking methods have been developed since the 1990s as a means to displace spatial point data in mapping and preserve privacy.These methods can be broadly classified into three main categories: point aggregation, affine transformation, and random perturbation, depending on whether they preserve the number of spatial points and whether they randomly relocate points (Wang et al., 2022).Point aggregation involves grouping the raw data into a smaller number of spatial points to conceal the true individual locations (Armstrong et al., 1999).Affine transformation utilizes a series of rotation, scaling, and translation operations to relocate spatial points for privacy protection while retaining their number (Armstrong et al., 1999).Random perturbation involves relocating spatial points by adding random noise to the geographic coordinates of each point.The noise can be added uniformly to relocate a spatial point to any location within a region around the original location, a practice known as naive random perturbation (Armstrong et al., 1999).Alternatively, it can be generated from a probability distribution to relocate spatial points in a more controlled manner, which is known as random perturbation with distribution functions (Zimmerman & Pavlik, 2008).In addition, Zhang et al. (2017) propose a novel random perturbation method called location swapping, which relocates a spatial point to a random location among others with physical addresses that helps avoid unrealistic locations such as water bodies.
Many of these geomasking methods incorporate or are evaluated based on a privacy notion called kanonymity (Sweeney, 2002a(Sweeney, , 2002b)).Initially developed for tabular data, k-anonymity protects individual privacy by ensuring that the information for each person in a dataset cannot be distinguished from the information of at least k À 1 other individuals.Spatial k-anonymity is a variation of this notion that is used for location privacy, which requires that each spatial point be indistinguishable from at least k À 1 other locations (Ghinita et al., 2010).In this case, the probability of an adversary discovering the actual location of an individual will be at most 1=k.One way to achieve spatial k-anonymity is to perform point aggregation by grouping every k spatial points into one (Kounadi & Leitner, 2016).Another approach is based on random perturbation, which involves adding random noise to relocate the original location of an individual within a region that contains k À 1 other individuals (Allshouse et al., 2010;Charleux & Schofield, 2020;Hampton et al., 2010;Hasanzadeh et al., 2020;Kounadi & Leitner, 2016).
Although spatial k-anonymity does not explicitly consider side information, its validity depends on the assumptions made about the adversary's knowledge.For example, the k À 1 candidate locations used in location swapping are only effective if they appear equally likely to be the actual location from the adversary's perspective.If any side information is available that can rule out certain locations as improbable, the notion of spatial kanonymity would be violated (Andrés et al., 2013;Chatzikokolakis et al., 2015;Dong et al., 2018).To address this limitation, various methods have been developed to incorporate potential side information that adversaries may possess, such as land use (Zhang et al., 2017), real property parcels (Richter, 2018), and road networks (Swanlund et al., 2020), while utilizing spatial k-anonymity.However, these methods also have their own drawbacks.One of the main limitations is that they require auxiliary data, such as land use data, which may not be available in all areas.The availability of such data typically depends on the location and the data collection methods used by local governments and organizations.In underdeveloped regions, such as those in the Global South, the availability of such data can be limited, making the application of these methods challenging.Another limitation of these methods is that it is possible that the adversary's actual side information is inconsistent with the assumptions being made by the methods.Even if the methods can consider all possible side information that can be accessed today, they are not time-proof as they cannot guarantee protection against all potential side information that may arise in the future.Therefore, spatial k-anonymity may not be sufficiently robust or practical to ensure privacy protection given the growing amount of side information that adversaries may possess.

Geo-indistinguishability
Geo-indistinguishability is a location privacy notion that provides stronger protection compared to spatial kanonymity by safeguarding against adversaries with any side information, regardless of whether it is currently accessible or may be obtained in the future (Andrés et al., 2013;Bordenabe et al., 2014).This privacy notion is a generalization of differential privacy, which is commonly used for protecting aggregated data (Dwork et al., 2006).Geo-indistinguishability achieves privacy protection by relocating the point locations of individuals.It ensures that a person enjoys l-privacy within radius r if any two points at a distance of at most r yield similar locations after relocation, where the level of similarity is dependent on l.The smaller l is, the higher the level of similarity, and thus the higher the level of privacy.To ensure that spatial point data remains useful after relocation, l should be proportional to radius r by a multiple of � (i.e.l ¼ �r), a privacy parameter that is userdefined.Here, � represents the privacy level per unit of distance, and its value is influenced by the choice of distance unit.For example, if � ¼ 0:1 and distances are measured in meters, the level of privacy for points one kilometer away is l ¼ 1000�.Changing the unit to kilometers would require setting � ¼ 100 to maintain consistency.
A mechanism satisfies �-geo-indistinguishability if, for any radius r � 0, an individual enjoys �r-privacy within r.The privacy parameter � determines the level of protection provided by geo-indistinguishability, with small values of � providing high level of privacy protection.Formally, �-geo-indistinguishability can be defined as follows.Let X denote the set of all possible locations of a spatial point, and let Y be the set of candidate locations to which the point can be relocated.A probabilistic mechanism is defined by a set of probabilities p ij , each specifying the probability of relocating a spatial point from location i 2 X to location j 2 Y.The mechanism satisfies �-geo-indistinguishability if, for any locations i; i 0 2 X and any candidate location j 2 Y, the following inequality holds: where d ii 0 is the Euclidean distance between locations i and i 0 .The notion of geo-indistinguishability implies that, given a fixed � value, the level of indistinguishability decreases with distance.This means that within a short radius, such as r ¼ 100 m (represented by a dark gray circle in Figure 2), the difference between probabilities p ij and p i 0 j is relatively small, ensuring that an adversary cannot accurately discern the actual location of an individual within this small radius.On the other hand, at greater distances, such as r ¼ 10 km (represented by a light gray circle in Figure 2), the difference between probabilities p ij and p i 0 j becomes more significant, allowing for the distinction between two vastly different locations, such as a city center and a remote farm.
It can be theoretically demonstrated that �-geoindistinguishability provides effective protection for sensitive location information even when adversaries possess arbitrary side information.By utilizing a prior distribution π on X to represent the adversary's side information, we can analyze the information gain of the adversary's posterior knowledge σ over π after observing the obfuscated location j 2 Y.The information gain is found to be bounded by e �d max , specifically σ π � e �d max , where d max corresponds to the maximum distance between any two locations in X .Importantly, this bound holds regardless of the specific prior distribution π.For a detailed proof, please refer to the Appendix.This result is a natural adaptation of a wellknown interpretation of standard differential privacy, stating that the adversary's knowledge does not increase due to the observation, irrespective of their side information (Dwork et al., 2006).

Geo-indistinguishable masking
We propose a geomasking method, called geoindistinguishable masking, which utilizes the notion of geo-indistinguishability to protect privacy in spatial point mapping.Figure 3 outlines the two primary steps of our method.In the first step, we generate a masking area around each spatial point to be protected.Each masking area contains a set of candidate locations with physical addresses to relocate the spatial point.In the second step, an optimization model is formulated to determine the probability of relocating each spatial point to each of the candidate locations in its masking area, while ensuring that geo-  indistinguishability is satisfied.A masked location can be obtained by drawing a realization from the probability distribution obtained for the spatial point.

Defining masking areas
We use the k-nearest neighbors (k-NN) method (Cover & Hart, 1967) to generate a masking area for each spatial point that requires relocation.Each masking area contains the nearest k candidate locations with physical addresses to a spatial point.Such a masking area is necessary to improve the computational efficiency of the method as considering all possible locations with physical addresses would significantly increase the computational burden required to satisfy geo-indistinguishability.Fortunately, given the definition of geo-indistinguishability, spatial points typically will not be relocated to remote locations, and therefore it is safe for us to consider only a subset of locations nearby the original points.

Determining optimal masked locations
A probabilistic mechanism is used to determine the optimal relocation of each spatial point for privacy protection.Specifically, we define p ij as the transition probability of relocating a spatial point from its original location i to a candidate location j within the masking area.Using a probabilistic mechanism offers several advantages over a binary policy of whether or not to move a point to a specific location (Lin & Xiao, 2023a).For instance, it allows for a range of optimal relocations to be considered, enabling a thorough exploration of potential outcomes when relocating spatial points in geography.In addition, the use of probabilities introduces randomness into the relocation process, which helps to preserve privacy in spatial point mapping.
We formulate an optimization model to determine the optimal transition probabilities p ij for each spatial point that needs to be protected.This model ensures that the notion of geo-indistinguishability is satisfied.However, geo-indistinguishability requires the original spatial point to be displaced to a certain distance, which can compromise the usefulness of the data for subsequent mapping practices.Therefore, we aim to minimize the distance displaced while still satisfying geo-indistinguishability.The following are the input parameters and decision variables of the optimization model.
Input parameters: � = privacy parameter in geo-indistinguishability, X = set of all possible locations of a spatial point in a masking area Y = set of candidate locations to which a point can be relocated in a masking area d ij = distance between locations i and j

Decision variables:
p ij = probability to relocate a spatial point from location i to j Specifically, we define Y as the set of all possible locations in X , except for the actual location of the spatial point that needs to be protected.This ensures that the masked location will not be the same as the true original location of the point.
For each spatial point that requires relocation, an optimization problem is formulated as follows: Objective 2 minimizes the expected distance displaced.Constraints 3 state that the transition probability p ij from location i to j should be less than or equal to e �d ii 0 p i 0 j , where i 0 is another location within X , j is a candidate location within Y, and � controls the level of privacy.These constraints ensure that the masked location satisfies geo-indistinguishability.Constraints 4 require that each spatial point be relocated to one and only one candidate location, by ensuring that the sum of transition probabilities to all candidate locations is one.Constraints 5 define the range of the decision variables by bounding the transition probabilities between zero and one.In summary, our optimization model balances the objective of minimizing the distance displaced while satisfying geo-indistinguishability.This optimization problem can be solved with linear programming solvers such as Gurobi, 1 CPLEX, 2 and COIN-OR. 3  Our masking approach shares similarities with the location swapping method introduced by Zhang et al. (2017).In both methods, each spatial point is moved to a new location with a physical address, instead of considering the universe of all possible points that include those in unlikely human habitation areas such as water bodies.In addition, both geo-indistinguishable masking and location swapping can be achieved by selecting a candidate location within a masking area delineated using k-NN.However, there are notable differences between the two methods.In location swapping, a candidate location is randomly chosen within the k-NN masking area to meet the requirements of spatial kanonymity (Figure 4b).Conversely, geoindistinguishable masking selects the optimal candidate location within the masking area to minimize the displacement distance while satisfying the privacy parameter defined by geo-indistinguishability (Figure 4a).This privacy notion is stronger than spatial kanonymity and forms the foundation of our approach.

Application
The American Community Survey (ACS) is a nationwide survey conducted by the U.S. Census Bureau that collects social, economic, housing, and demographic data from a sample of approximately 3.5 million households each year (U.S. Census Bureau, 2020).The survey provides valuable information for understanding the characteristics of communities and supporting data-driven decision-making.The ACS microdata includes individual records with comprehensive geographic information such as the physical address of each respondent's household, and the characteristics of each person and housing unit included in the survey.Due to confidentiality and privacy concerns, however, access to the microdata is restricted.Instead, the Census Bureau releases the ACS Public Use Microdata Sample (PUMS), a subset of the ACS microdata that includes about two-thirds of the total responses collected (U.S. Census Bureau, 2021).The PUMS replaces the most detailed geographic information from physical addresses with larger geographic areas known as Public Use Microdata Areas (PUMAs) to protect the privacy of the individuals in the dataset.Each PUMA typically contains a group of counties or census tracts and has a population of no less than 100,000 people.While the use of PUMAs offers privacy protection, it provides less precise geographic information, making it challenging to analyze data at a very local level.This can be a limitation for researchers or policymakers who require an understanding of the characteristics of small subpopulations or neighborhoods within PUMAs.Therefore, the purpose of this application is to address the question: is it possible to present individuallevel data in PUMS at the address level while still ensuring privacy protection?

Data
To evaluate the effectiveness of the proposed method, we select three counties and county groups in Ohio as our study areas: Franklin, Athens-Meigs-Gallia, and Coshocton-Holmes-Guernsey. These selections are based on their diverse and representative population characteristics.Franklin County is the most populous county in Ohio, with relatively high population density and a mix of urban and suburban areas.The county group of Athens-Meigs-Gallia represents a mediumsized population and is a college area (home to Ohio University), which introduces a different urban/rural mix and a young population demographic.The county group of Coshocton-Holmes-Guernsey represents small population size and low density, which is known for its rural character and significant Amish population.Selecting county groups instead of individual counties is intentional, as focusing on single counties might result in a limited population size not sufficient for testing the method effectively.In addition, the chosen county and county group boundaries align with the PUMAs and fit the purpose of this application.
We generate a synthetic address-level dataset for our subsequent experiments because physical addresses are not available in the PUMS data.To create this dataset, we first obtain all address points within our selected study areas from the National Address Database (NAD) maintained by the U.S. Department of Transportation (USDOT) (U.S.Department of Transportation, 2023).We then randomly sample 1% of the address points for each county or county group, following the established approach for creating the official PUMS sample (U.S. Census Bureau, 2019).These sampled address points are used to represent sensitive locations that require protection.Table 1 shows the total number of address points and sampled address points in the three study areas.Figure 5 illustrates the locations of all address points and sampled address points.Note that Coshocton has limited address points included in the NAD, primarily due to its predominantly rural or  nonstandard address characteristics.This creates challenges for geographic masking in the corresponding county group, as points are more likely to be displaced into areas without physical addresses.
The use of synthetic data is a common practice in the location privacy literature when access to real sensitive locations is limited (Allshouse et al., 2010;Lin & Xiao, 2022, 2023b;Lin, Y. 2023;Swanlund et al., 2020;Zhang et al., 2017).In our application, synthetic data is employed to emulate the PUMS data, but its applicability extends to other scenarios.By randomly sampling address points, we ensure that they conform to the population density distribution within the study areas.This enables the synthetic data to closely approximate various potential health events, including clusters of infectious diseases.We will examine the clustering characteristics of the synthetic data in the next subsection.

Research design
We use the proposed geo-indistinguishable masking method to protect the synthetic dataset.For each point, we apply k-NN with a parameter k set to 10, 20, and 30 to include the 10, 20, and 30 nearest neighbors in the masking area, respectively.The privacy parameter � for geo-indistinguishability is set to 0.0001, 0.001, 0.01, and 0.1.We use Gurobi (Gurobi Optimization, LLC, 2021) as the optimization solver to determine the optimal transition probabilities for relocating each spatial point.The experiments are conducted on a computer with an AMD Ryzen 5600X 6-Core Processor (3.70 GHz) and 32GB RAM.Table 2 shows the runtime taken to solve the optimization problem for each combination of k and � values.The results indicate that the runtime generally increases as we increase the value of k and decrease the value of �.Nonetheless, in our test setting, all the optimization problems can be solved within 26 minutes, indicating that our method can be implemented in a relatively efficient manner.Figure 6 displays two examples of a subset of the relocated points, using the transition probabilities obtained given two different combinations of k and � values.
We compare the results of geo-indistinguishable masking with two other methods: random perturbation (Armstrong et al., 1999) and location swapping (Zhang et al., 2017).For random perturbation, we randomly displace each original point to any location within the masking area, without necessarily being an existing address location.For location swapping, we displace the original point by randomly selecting an address location within the masking area.Both random perturbation and location swapping do not satisfy the geo-indistinguishability notion, and hence, we do not compare the level of privacy using this notion.Instead, our objective is to examine whether geo-indistinguishable masking can satisfy the strong privacy notion of geo-indistinguishability while also better preserving the spatial point patterns.We compare the impact of the three geomasking methods on the spatial point patterns using three measures: expected distance displaced (EDD), mean nearest neighbor distance (MNND), and cluster detection performance.These measures are commonly used in the location privacy literature to compare the data accuracy of different geomasking methods (Cassa et al., 2006;Clarke, 2016;Hampton et al., 2010;Kounadi & Leitner, 2016;Zhang et al., 2017).
EDD calculates the average expected distance between the original locations and their masked counterparts for the spatial points.A small EDD value indicates that the masked locations are close to the original locations, suggesting that the masking technique effectively preserves the spatial point patterns.MNND calculates the average distance between each spatial point and its nearest neighbor.Smaller MNND values generally indicate a more clustered point pattern, while larger values indicate a more dispersed pattern.In this study, we calculate the absolute difference in the MNND values before and after geomasking to determine the change in spatial pattern caused by the geomasking method.
Cluster detection performance is assessed using the density-based spatial clustering of applications with noise (DBSCAN) algorithm (Ester et al., 1996;Schubert et al., 2017).This widely used algorithm identifies areas of high and low clustering of spatial points within a study area.Specifically, DBSCAN requires two parameters: epsilon and minPoints.It starts by randomly selecting a spatial point and creating a circle of epsilon radius around it.If there are at least minPoints number of points within this circle, all these points are considered to be part of the same cluster.The process is repeated iteratively until all points have been assigned to clusters or labeled as noise that does not belong to any cluster.In this study, we set the epsilon value to 1000 m, and the minPoints value to 30. Figure 7 displays the 14 clusters detected by DBSCAN when applied to the original spatial points in our synthetic dataset.
The impact of geomasking on cluster detection performance is evaluated using three metrics: precision, recall, and F-score.These metrics are computed by comparing clusters of spatial points before and after  geomasking.All clustered spatial points are grouped into one class, while noise is considered the other class.A point is considered a true clustered point if it is labeled as clustered before and after geomasking.Precision is calculated as the ratio of true clustered points to all detected clustered points after geomasking, while recall is calculated as the ratio of true clustered points to all true clustered points before geomasking.F-score is the harmonic mean of precision and recall.The values of precision, recall, and F-score range between zero and one, with higher values indicating that the geomasking technique better preserves the cluster detection results and the spatial point patterns.

Evaluation results
Table 3 presents the results of the EDD.There are four main observations.First, across the three compared methods, the EDD value consistently increases as the k value increase.This is because including additional nearest neighbors in the masking areas allows the spatial points to be relocated to different locations, which can be distant from their original locations.Second, for geoindistinguishable masking, the value of EDD increases as the � value decreases.This aligns with the definition of geo-indistinguishability since a low � value indicates a high level of privacy, which can result in large displacement of spatial points.Third, given a fixed k value, geoindistinguishable masking yields smaller EDD values than location swapping, even for the smallest value of � (� = 0.0001).In addition, it yields smaller EDD values than random perturbation when � > 0.0001.This demonstrates the effectiveness of the geo-indistinguishable masking method in satisfying the geo-indistinguishability notion while also preserving the spatial point patterns.Table 4 presents the absolute difference in MNND values before and after geomasking, which leads to three main findings.First, the absolute difference in MNND   increases as the k value increases for the three compared methods.This suggests that adding additional nearest neighbors in the masking areas could result in a more pronounced impact on the spatial point patterns.Second, in the case of geo-indistinguishable masking, reducing the � value generally leads to an increase in the absolute difference in MNND.This indicates that increasing the level of privacy under geoindistinguishability may increase the alteration of the spatial point patterns.Third, for a given k value, using a relatively large � value (e.g.� ¼ 0:01) can result in a much smaller absolute difference in MNND compared to location swapping and random perturbation, indicating the effectiveness of geo-indistinguishable masking in preserving spatial point patterns.
Table 5 shows the impact of three geomasking methods on cluster detection performance.The results demonstrate that as the k value increases for the three methods, the precision, recall, and F-score values all decrease.In addition, reducing the � value for geoindistinguishable masking also leads to a decrease in these performance metrics.These findings are consistent with our previous results on EDD and MNND, demonstrating the effect of parameters on the ability of the two geomasking methods to preserve spatial point patterns.It is also observed that when the k value is fixed, the precision, recall, and F-score values achieved with geo-indistinguishable masking are higher than, or at least equal to, the one obtained using location swapping.This suggests that geo-indistinguishable masking is effective in preserving the spatial point patterns while ensuring privacy protection.

Discussion and conclusions
Advancements in location-based technologies over the last decade have led to the production of large-scale geospatial datasets with address-level location information, which has provided unprecedented convenience to researchers, policymakers, and businesses.However, it has also raised concerns about the location privacy of data subjects.This paper proposes a novel geomasking method called geo-indistinguishable masking to address the critical question of whether we can safely present useful address-level spatial data while preserving the location privacy of individuals.Importantly, the proposed method offers two significant advantages.First, the method satisfies a strong and commonly used location privacy notion of geo-indistinguishability, which ensures privacy protection even in the presence of adversaries attempting to infer the actual locations of individuals using any available side information.This feature is increasingly important as a growing amount of spatial data becomes accessible and can be easily converted into side information for privacy violations.The second advantage of our method, as demonstrated  in the application, is its ability to preserve spatial point patterns even compared to methods that only satisfy weak privacy notions such as spatial k-anonymity.
Ensuring that visual and analytical results of the masked data are similar to the original data is essential in supporting subsequent analysis and decision-making processes that rely on accurate and reliable spatial point data.Optimization modeling allows us to achieve this by explicitly minimizing the required distance displaced to satisfy geo-indistinguishability, enabling us to best preserve the original spatial point patterns.
One limitation of geo-indistinguishability is its potential inefficacy in preserving location privacy in traces that contain a set of temporally correlated spatial points (Andrés et al., 2013;Dong et al., 2018).Given a fixed �, the level of privacy decreases as the number of correlated spatial points increases, and thus privacy degrades rapidly when traces become longer.Therefore, our proposed geo-indistinguishable masking method may be limited to static points, such as home or work locations.However, recent progress has been made in introducing notions of location privacy that consider temporal correlations.For example, methods have been developed to incorporate temporal correlations in differential privacy (Xiao & Xiong, 2015), while others rely on epidemic models (Shokri et al., 2013).These approaches show promise in extending our proposed geomasking method to complex spatiotemporal data.
The computational complexity of the optimization model is influenced by the problem size, which is determined by the number of points that need to be relocated, as well as the values of k and �.Our results demonstrate that when relocating 5,112 points with k ranging from 10 to 30 and � ranging from 0.0001 to 0.1, the model takes up to 26 minutes to solve within our computational setup.It can be argued, though, that computational efficiency is a challenge in widely applying the proposed geomasking method, particularly if we scale up the problem by increasing the number of points to be relocated, raising the value of k, or reducing the value of �.Fortunately, recent advancements have been made to address this challenge and promote computational efficiency in achieving geo-indistinguishability.One such advancement is the spanner-based method proposed by Bordenabe et al. (2014) that can be used to reduce the number of constraints in the optimization model.This method holds the potential to be integrated into our model formulation to enhance efficiency.
The purpose of employing geo-indistinguishable masking is to prevent correct identification by adversaries, which involves the association of a spatial point with the correct individual or household.However, there is also a potential risk of false identification, where a spatial point is mistakenly linked to a different individual or household (Seidl et al., 2018).The consequences of false identification can be far-reaching: identifying an individual as an HIV patient, even erroneously, can subject him or her to discrimination and social stigmatization (Polzin & Kounadi, 2021).In this study, we employ geo-indistinguishable masking to relocate sensitive points to potential residences.This approach effectively eliminates the possibility of accurate identification and ensures that masked locations do not appear in unlikely areas.Nevertheless, it also carries an inherent risk of false identification, potentially misrepresenting certain individuals or households as sensitive parties.To address this issue, a potential solution is to generate dummy locations around the sensitive point being relocated and apply our method.By doing so, the masked location would not overlap with actual residential addresses, which can effectively avoid false identification.Recent advancements in location privacy literature have also presented potential directions to address the problem.For example, Seidl et al. (2015) proposes a Voronoi masking method that creates Voronoi polygons around the original data points and displaces them to the closest edge of their corresponding polygon.When applied to a dataset containing all residences within the study area, this ensures that no displaced point falls on an actual residence, thus preventing false identification of residences.In addition, Polzin and Kounadi (2021) extend this approach by considering the underlying population density and establishing areas of k-anonymity where Voronoi polygons are created.A combination of these research outcomes is promising to address the problem of false identification in geomasking and enhance the effectiveness of our approach.Geomasking has gained popularity and has been applied in a growing number of practical applications over the past two decades.For example, the Census Bureau's OnTheMap (OTM), a web-based mapping and reporting application that shows where workers are employed and where they live, uses geomasking methods to displace the actual locations of workers for privacy protection (Machanavajjhala et al., 2008).Open source tools have also been developed to enable the public to experiment with different geomasking methods (McKenzie et al., 2022;Swanlund et al., 2020).This paper aims to promote the adoption of geomasking techniques in cartography as a means to balance the benefits of spatial point analysis with the protection of individual privacy rights.It contributes to the ongoing discussion regarding the need for innovation in geomasking methods and their implementation as a standard cartographic principle for mapping spatial point data (Kounadi & Leitner, 2014).This is essential for increasing awareness about location privacy and promoting ethical practices in geography and cartography.

Figure 2 .
Figure 2. The relationship between the level of privacy and the radius r in geo-indistinguishability.

Figure 3 .
Figure 3.A flowchart of geo-indistinguishable masking applied to spatial point data with one point.The masking area contains two candidate locations.

Figure 4 .
Figure 4. Comparison between geo-indistinguishable masking and location swapping.The true location of a point is depicted by the dark gray triangle.The light gray circles represent candidate locations within the masking area delineated by the hollow circle.Each candidate location is labeled with the corresponding probability of relocating the point to this location.

Figure 5 .
Figure 5. Locations of all address points (gray dots) and sampled address points (blue dots).

Figure 6 .
Figure 6.Examples of original locations (blue dots) and masked locations (red dots) for two combinations of k and � values.The masked locations are obtained by sampling from the optimal transition probabilities.Note that only a subset of the synthetic data is displayed to provide a detailed view of the displacement.

Figure 7 .
Figure 7. Cluster detection using DBSCAN for the original sample address points.Each cluster is represented by a unique color, while noise is denoted by hollow dots.

Table 1 .
Address points for the three studied counties and county groups.

Table 3 .
Results of the expected distance displaced (m).

Table 4 .
Results of the absolute difference in mean nearest neighbor distance (m).

Table 5 .
Impacts on cluster detection performance.