Solved – Finding the average GPS point

I need to write a program to find the average GPS point from a population of points.

In practice the following happens:

  • Each month a person records a GPS point of the same static asset.
  • Because of the nature of GPS, these points differ slightly each month.
  • Sometimes the person makes a mistake a records the wrong assest at a completely different location.
  • Each GPS point has a certainty weight (HDOP) that indicates how accurate the current GPS data is. GPS points with better HDOP values are preferred over lower ones..

How do I determine the following:

  • Deal with data with 2 values vs. a single value like age. (Find the average age in a population of people)
  • Determine the outliers. In the example below these would be [-28.252, 25.018] and [-28.632, 25.219]
  • After excluding the outliers, find the average GPS point in this it might be [-28.389, 25.245].
  • It would be a bonus if can work the "weight" provided by HDOP value for each point.

alt text

One of the problems with multivariate data is deciding on, and then interpreting, a suitable metric for calculating distances, hence clever but somewhat hard-to-explain concepts such as Mahalanobis distance. But in this case surely the choice is obvious – Euclidean distance. I'd suggest a simple heuristic algorithm something like:

  1. Calculate the (unweighted) centroid of the data points, i.e. the (unweighted) means of the 2 coordinates
  2. Calculate the Euclidean distance of all the readings from the centroid
  3. Exclude any readings that are further than a certain distance (to be determined based on your experience and knowledge of the technology, or failing that a bit of trial and error cross-validation – 100m, 1km, 10km??)
  4. Calculate the weighted average of both coords of the remaining points, weighting by the inverse of the HDOP score (or some monotonic function of it – i had a quick look at the wikipedia page linked in the question and think maybe you don't need such a function but i'd need to study it further to be sure)

There are clearly several ways to make this more sophisticated, such as down-weighting outliers or using M-estimators rather than simply excluding them, but I'm not sure whether such sophistication is really necessary here.

Similar Posts:

Rate this post

Leave a Comment