Density Peak / Cluster Center Search in 2D Mesh / Point Process

I have a dataset with GPS minute coordinates recorded by a faces mobile phone. That is, the data set has 1440 rows with LON / LAT values. Based on the data, I would like to get a point estimate (lon / lat value) where the participants are at home. Suppose a home is the only place they spend most of their time in a given 24-hour interval. In addition, the GPS sensor most of the time has a fairly high accuracy, but sometimes it completely turns off, which leads to gigantic emissions.

I think the best way to do this is to treat it as a point process and use a 2D density estimate to find the peak. Is there any own way to do this in R? I looked in kde2d (MASS), but it really doesn't seem like a trick. Kde2d creates a 25x25 grid of data range with density values. However, according to my data, a person can easily overcome 100 miles or more per day, so these blocks are usually too large. I could narrow them down and use a much larger grid, but I'm sure there should be a better way to get a point estimate.

+4
source share
3 answers

There are time functions in the trip package (I am the author). You can create objects from track data that understand the process of the base track over time, and simply process points that assume straight lines between corrections. If "home" is the largest pixel of the value, i.e. When you split all segments based on the length of time and sum them into cells, then it's easy to find it. Tracing time using the tripGrid function is equal to SpatialGridDataFrame with standard sp package classes, and a trip object can consist of one or more tracks.

Using rgdal , you can easily convert the coordinates to the corresponding map projection if lon / lat is not suitable for your length, but this has nothing to do with calculating grid segments / time.

There is a simple speedfilter for removing fixes that involve moving too fast, but it is very simple and may introduce new problems, in general updating or filtering tracks for an unlikely move can be very difficult. (In my experience, most of the time spent in the grid gives you the same rating as many complex models that just open up new challenges). The filter works with Cartesian or long / lat coordinates, using tools in sp to calculate distances (long / lat is reliable, while poor selection of the map projection can introduce problems - at short distances, such as people on earth, it probably doesnโ€™t have of great importance)

(The tripGrid function computes the exact components of line segments using pixellate.psp , but this detail is hidden in the implementation).

From the point of view of data preparation, trip strictly adheres to a reasonable sequence of times and will not allow you to create an object if the data has duplicates, fails, etc. There is an example of reading data from a text file in ?trip , and a very simple example with (really) dummy data:

 library(trip) d <- data.frame(x = 1:10, y = rnorm(10), tms = Sys.time() + 1:10, id = gl(1, 5)) coordinates(d) <- ~x+y tr <- trip(d, c("tms", "id")) g <- tripGrid(tr) pt <- coordinates(g)[which.max(g$z), ] image(g, col = c("transparent", heat.colors(16))) lines(tr, col = "black") points(pt[1], pt[2], pch = "+", cex = 2) 

There are no overlapping areas in this dummy track, but it shows that finding the maximum point in the โ€œelapsed timeโ€ is quite simple.

+6
source

How about using a location that minimizes the squared squared sum to all events? This may be close to the supremum of any smoothing of the nucleus, if my brain works correctly.

If your data consists of two clusters (home and work), then I think that the location will be in the largest cluster, and not between them. This is not the same as the simple mean of the x and y coordinates.

For the uncertainty in this, tremble your data, regardless of your positional uncertainty (it would be great if you had this value from GPS, otherwise guess - 50 meters?) And recount. Do this 100 times, smooth the core of these places and find the 95% outline.

It is not strict, and I need to experiment with this minimum distance / core supremum ...

+3
source

In response to the spacedman - I'm sure the least squares won't work. The smallest squares are known for worshiping the requirements of garbage, without attaching much importance to things that are "nearby." This is the opposite of what is desired.

The likelihood of a biscuit rating, in my opinion, will work better, but I never used it. I think this also requires some tweaking.

This is more or less like estimating the least squares at a certain distance from 0, and then the weighting is constant outside of this. Therefore, as soon as a point becomes an outlier, it is constant. We do not want the emissions to become more and more, when we move away from them, we would prefer to weigh them constantly, and let the optimization be aimed at better picking things near the cluster.

0
source

Source: https://habr.com/ru/post/1416282/


All Articles