Cluster-based RF fingerprint positioning using LTE and WLAN outdoor signals

In this paper we evaluate user-equipment (UE) positioning performance of three cluster-based RF fingerprinting methods using LTE and WLAN signals. Real-life LTE and WLAN data were collected for the evaluation purpose using consumer cellular-mobile handset utilizing `Nemo Handy' drive test software tool. Test results of cluster-based methods were compared to the conventional grid-based RF fingerprinting. The cluster-based methods do not require grid-cell layout and training signature formation as compared to the grid-based method. They utilize LTE cell-ID searching technique to reduce the search space for clustering operation. Thus UE position estimation is done in short time with less computational cost. Among the cluster-based methods Agglomerative Hierarchical Cluster based RF fingerprinting provided best positioning accuracy using a single LTE and six WLAN signal strengths. This method showed an improvement of 42.3 % and 39.8 % in the 68th percentile and 95th percentile of positioning error (PE) over the grid-based RF fingerprinting.


INTRODUCTION
Over the next decade the integration of location services into our day-to-day life will increase significantly as technologies mature and accuracy improves.Currently, as an accurate and reliable outdoor localization system Global Navigation Satellite System (GNSS) has revolutionized navigation-based applications running on automotive GNSSenabled devices and smart phones.However, GNSS relies on special hardware support, has high complexity, high battery consumption and the access to GPS signals is limited in some environments, such as urban areas with many high buildings, mountainous terrain and indoor areas [1].Received Signal Strength (RSS) based fingerprinting localization has been the most widely used technique for user positioning during the last few decades [2][3].Researchers are studying how to conduct radio signal positioning through signals from existing wireless infrastructure, such as cellular networks [2], WiMaX [3] and WiFi [4][5] networks.The rapid expansion of Wi-Fi access points (AP) across the urban/indoor environments made it possible for researchers to envision alternatives to TOA-based systems.One success story for deployment in the urban environment is Skyhook Wireless [6].Skyhook realized the potential of exploiting Wi-Fi signals emitted from residential homes and offices that are continuously in use.They have improved localization by building databases of Wi-Fi signatures tied to locations that could be integrated to aid in the localization process.Wi-Fi based fingerprint positioning system was evaluated in the Sydney CBD area and test results show that it works well for outdoor localization with errors in the tens of meters [7].In [8] authors have carried out outdoor fingerprinting over WLAN and achieved good accuracy using 802.11-basedpositioning.
In this study we have evaluated cluster-based RF fingerprinting approaches which have taken into account four key challenges of fingerprint positioning [5]: 1) RF fingerprint generation: the factors affecting fingerprint generation are the placement and number of survey points and time samples.Most approaches have selected such parameters experimentally.In order to avoid such difficulties we have used real life Minimization of Drive Tests (MDT) date -a feature introduced in 3GPP Release 10 which enables operators to utilize users' equipment to collect radio measurements and associated location information [9].
2) Preprocessing of recorded training data for reducing computational complexity: in [4] authors have proposed an offline clustering of locations aiming to reduce the search space to a single cluster.Chen et al. in [10] consider the similarity of signal values, as well as the covering APs, to generate a set of clusters using K-means to improve the power efficiency of mobile devices.Both of the above clustering techniques are carried out offline based on the training data.This hampers the operation of the system over time since WLAN infrastructures are highly dynamic and APs can be easily moved or discarded, in contrast to their base-station counterparts in cellular systems, which generally remain intact for long periods of time [5].Therefore, we have used a combination of LTE and WLAN signal strengths, generalized MDT (GMDT) which gives us the opportunity to use LTE serving cell-ID based searching technique to deliver user-equipment (UE) positioning in short time with less computational cost.
3) Selection of APs for use in positioning: in a typical dense urban WLAN environment the number of available APs is much higher than three and using all available APs increases the computational complexity of the positioning algorithm.In this research we have chosen seven LTE and WLAN signals for position estimation which has been found to be effective from previous Wi-Fi positioning results [11].
4) User equipment (UE) position estimate based on a new RSS observation: in the simplest case, the Euclidean distance is used to find the distance between the new RSS observation and the center of the training RSS vectors at each survey point or grid cell units [12] [13].However, choosing an optimal gridcell layout requires computational power and training time [14].Hence, we have selected cluster-based RF fingerprinting (CRFFP) methods: K-Nearest Neighbor (KNN), Agglomerative Hierarchical Clustering (AHC) and Fuzzy C-Means (FCM) which do not create training signatures through grid-cell layout during the training phase.To verify the effectiveness of CRFFP methods UE positioning results were compared to that of the conventional grid-cell based RF fingerprint positioning (GRFFP).
The rest of the paper is organized as follows: Section II contains a brief description of the recorded GMDT field measurements and then the conventional GRFFP method is described.In section III we explain three different CRFFP methods.Experimental test results of GRFFP and CRFFP methods are shown in section IV.Finally we draw some concluding remarks in section V.

A. Generalized MDT Measurements
Drive tests are the main source for collecting measurement data from cellular networks which is costly and time consuming.The problem that drive tests need human effort to collect measurement data and that only spot measurements can be performed, has led to automated solutions which include the UEs from the end user.The feature for this evolution in the 3GPP standard is named MDT [14].Here we were motivated to use GMDT data which is an enhancement to the LTE Minimization of Drive Tests architecture allowing the collection of location-aware radio measurements from WLAN access networks as well.Grid-based RF fingerprinting test results show that GMDT data containing only the single strongest WLAN measurement in addition to the LTE RF fingerprint can improve the 67th percentile location accuracy from 88.2 m to 49.4 m [15].The GMDT database were created with the help of a popular drive test software application known as Nemo Handy installed in Samsung Galaxy S3 (LTE capable) [16].This handheld drive test tool is very suitable for performing measurements both outdoors and in crowded indoor spaces while being simultaneously used as a regular mobile phone.In our research we have recorded reference signal received power (RSRP) values of LTE serving and neighboring base station (BS) signals and received signal strength indicator (RSSI) values of WLAN APs.About 150 kilo-metres of measurements were collected by feet, bicycle and car covering approximately an area of 0.33 square kilo-metres of a residential urban area in Tampere, Finland during September 2014 as shown in Fig. 1.
The GMDT samples used in this study were from LTE 1800 MHz measurements, in which 800 MHz inter-frequency measurements were also reported according to the measurement configuration provided by the network.Every route was repeated at least twice to ensure that enough measurement samples were collected for each grid unit.From the measurements we have found that all the GMDT samples contain at least one serving LTE BS RSRP and 98% of the samples comprises of more than five WLAN RSSI values.Authors in [15] have selected WLAN APs based on the largest signal strength values recorded at each location.Hence we have chosen seven signal strength values in total including both LTE RSRPs and WLAN RSSIs.Both RSRP and RSSI values were sorted in descending order of signal strength values.We were interested to see how different combinations of LTE and WLAN signals affect the UE positioning performance using the same fingerprinting method.Thus three different sets of GMDT samples were created by choosing different combinations of LTE and WLAN signals from the total database.A GMDT measurement set is defined by: M j = {s j,1 , s j,2 ,…, s j,N } where, j=1, 2 and 3 refers to the different GMDT sets, N is the total number of measurement samples of any particular set.The nth GMDT sample of a set is given by a row vector: where, LW ID denotes the LTE BS IDs and WLAN AP IDs, RSS LW comprises of the corresponding RSRP and RSSI values, and P XY contains the x-y coordinates of the UE obtained from GNSS information.

B. Grid-cell based RF Fingerprint Positioning
A conventional single grid-cell layout based RF fingerprinting method was used, which segmented the whole geographical area of interest into 10m-by-10m square grid-cell units (GCU).Euclidean distance was used to measure the statistical difference between training fingerprints and test samples, as previous WLAN-based UE positioning research suggests it to be effective [17].
Training Phase: To reduce the searching time of the best match training signature for a test sample and also to reduce the related computational cost, a single training signature (Train Sig ) is created Test Phase: To test a GMDT sample we first compare its LTE and WLAN IDs with all the training signatures available and select those signatures which meet a least matching threshold.The minimum matching threshold (MT) was set to three, so in this case all the training signatures that contain at least three or higher number of LTE and WLAN IDs similar to that of test sample will be chosen.The maximum MT number was set to six.A simplified Mahalanobis distance equation is used for distance calculation where the inverse covariance matrix is replaced by an identity matrix: where, u Te and u Tr denotes the RSRP and RSSI values of the Test Sam and a selected Train Sig respectively and I is the identity matrix.After separate calculation of all the distances between a Test Sam and the selected training signatures; the Train Sig corresponding to the smallest Euclidean distance is chosen for test UE positioning.The estimated position of that Test Sam is given by P Ref XY of the chosen Train Sig .

A. K-nearest Neighbors Cluster-Based Positioning
KNN is one of the basic algorithms used for UE positioning using RF fingerprint [18].In this work we have chosen K to be 5 which has given good positioning result in WLAN positioning performed in [19].Here the only processing required during the data collection phase is to group the GMDT samples according to the LTE serving BS ID.During the positioning phase the first task is to choose the group of GMDT samples according to the LTE serving BS ID of the test GMDT sample.Then  The test UE position is estimated from the mean x-y coordinate value of the five selected Train Sam .

B. Agglomerative Hirarchical Cluster-based Positioning
The AHC clustering method uses Davies-Bouldin criterion to select the optimal cluster number [20].This criterion is based on a ratio of within-cluster and between-cluster distances.The Davies-Bouldin index (DB) is defined by the follow equation: where, k is the number of clusters, D i,j is the within-tobetween cluster distance ratio for the ith and jth clusters.D i,j is given by: D i,j = (d i ¯ + d j ¯)/d i,j (7) where, d i ¯ is the average distance between each point in the ith cluster and the centroid of the ith cluster.d j ¯ is the average distance between each point in the ith cluster and the centroid of the jth cluster.d i,j is the Euclidean distance between the centroids of the ith and jth clusters.During evaluation optimal cluster number is set between 1 to 6 using the smallest Davies-Bouldin index value.When multiple clusters are formed, clustering criteria (CC) is followed: the cluster which contains the Test Sam must contain at least two GMDTs.AHCbased positioning method is described in Fig. 2.

C. Fuzzy C-Means Cluster-Based Positioning
FCM has effectively been used in WLAN indoor localization [21].Here we have used it for outdoor positioning using GMDT data.Its implementation steps are similar to that of the AHC-based fingerprint positioning.In this method in step 3 as shown in Fig. 2, we have added another criterion that if the number of selected GMDT samples is more than six than initial number of clusters assigned to FCM method is 6 otherwise 2. FCM starts with an initial guess for the cluster centers, which are intended to mark the mean location of each cluster and it also assigns every data point a membership grade for each cluster.By iteratively updating the cluster centers and the membership grades for each data point, it moves the cluster centers to the right location.This iteration is based on minimizing the objective function for the partition of the selected GMDT data-set [22]: where, J m is the objective function, n is the number of samples in the data set, c is the number of clusters (1 c n), u i,k is the element of partition matrix U of size (c x n) containing the membership function, v i is the center of the i th  I were obtained from 10 fold cross-validations for testing all GMDTs.The 1 st and 2 nd columns of Table I corresponds to the different LTE-WLAN sets and the matching threshold numbers and after that we can find the 68th and 95th percentile values of UE positioning error for each of the methods.The analyzed test GMDT percentage is attached to positioning error (PE) values for each of the methods.Table I shows that GCL method is capable of analyzing almost 100% of test GMDT and the PE values for any given data set remain the same for different MT values.The KNN and FCM perform better for MT-6 and similar results were obtained for MT-5 case as compared to that of the GCL method, both KNN and FCM have analyzed less percentages of test samples than that of GCL.Thus for a better comparison between the methods we have prepared Table II where the PE results of all four methods are calculated for common analyzed test GMDT samples as given in the last column of Table II.

1 .Fig. 2 :
Fig. 2: Block-diagram of the AHC-based Positioning Method for selecting training GMDT samples (Train Sam ) we start with the highest MT number: 7 and select n Train Sam which match with the Test Sam IDs.If we do not get any Train Sam corresponding to the chosen MT then MT is lowered to the next integer number and select n Train Sam that matches with the Test Sam IDs.This process continues until we get multiple matched Train Sam or the lowest MT is reached.Now Euclidean distance is used to choose five closest GMDTs with the KNN algorithm: Train RSS and Test RSS are vectors of LTE RSRP and WLAN RSSI values of Train Sam and Test Sam respectively.

TABLE I .
RESULTS OF GRFFP AND CRFFP METHODS USING ALL GMDT TEST DATA

TABLE II .
RESULTS OF GRFFP AND CRFFP METHODS USING COMMON GMDT TEST DATA