I am doing a project in which my aim is to predict the likely locations of a set of latitude/longitude points based on a couple of variables. Since I've never done any ML on locational data, which models would you suggest to try out first?
I am anticipating that the relationships between input variables and locations will not be linear, so models like linear regression would not really perform well.
Also, since it's an inherently spatial problem, should I be concerned about spatial autocorrelation? The points tend to be clustered together when I plot them.
Obviously, at that point, it's just speculation but I would just like to ask you for some general advice and approach which I could adopt.
EDIT: this is how my lat/lon data looks like. I noticed that perhaps creating another feature based on k-means clustering could help (ex. cluster ID and size)? For the plot, alpha was set to 0.1, so some clusters are very dense given that they still appear as pitch black.
Here are my suggestions regarding your problem:
Yes, I bet your problem is non-linear but I recommend you to try linear models first. First, it will give you a baseline: Your final model should outperform the linear one. Second, running linear models and inspecting their weights sometimes provide intuition regarding the problem. Finally, there truly are cases that linear models outperform non-linear ones, depending on the number of data, the number of features, and the problem domain.
In other words, try simpler idea first. For example, concerning spatial autocorrelation will be a good idea, but you don't have to if your model works well without it.
By "prediction of latitude and longitude" I assume you are solving a regression problem where output is two real, bounded values.
- The first try would be the treating latitude and longitude separately: using two unrelated models to predict each of them.
- Solved – Spatial coordinates (latitude and longitude) are non significant
- Solved – Providing latitude and longitude to a house price model
- Solved – Analysis of spatial data over time and space
- Solved – Predicting Co-Ordinate Data
- Solved – How to check whether a sample is representative across two dimensions simultaneously