I'm looking to perform a kernel density estimation on a set of 40 locations. At first I used LSCV as the bandwidth, however it seems to have oversmoothed the data, creating high densities in areas it shouldn't.
I've been looking around for information on which bandwidths are the best to use in which situations but am struggling. Does anyone know of a resource which explains this well?
Basically I'm hoping to find a bandwidth option which won't oversmooth the data as LSCV has and then be able justify my choice from the literature.
In general I think jbowman has it right. You either use a generic method for smoothing (minimizing the degree to which you are inserting your own bias as to how things should look) or you pick something that conveys the message you think it should convey. So if you reject the first then picking bandwidths is just fine. If you want to be a little less assertive, there are plenty of articles that show a four frame graphic with the same data plotted with 1/4, 1/2, 1, and 2km bandwidths. This lets the user see how the choices affect the interpretation (obviously you should pick the units and breaks that make sense for your problem).
Alternatively, you may want to begin with the data. Why does it need smoothing? Were all the points on a city block assigned to the nearest intersection? Was the GPS only accurate to within +/- 10 meters? Are there irregularities in the data collection that show up as outliers? Smoothing is typically a process to mitigate these issues so choosing a bandwidth consistent with a city block, 20 meters, or the units of observation may all be readily defensible choices without resorting to a specific literature.
Another line of thought that comes to mind is your statement that high densities are occurring in a place that they shouldn't through oversmoothing. This suggests that the smoothing process is indicating connectivity between two proximate regions where connectivity should not exist. This often occurs when places are physically near but separated (as by a highway or river). You didn't specify the software you are using for your analysis, but I know several of the tools out there allow you to delineate areas where density calculations should not spill over (I am pretty sure the spatstat package in R permits this for example).