Permutation tests

I answered a question about spatial pattern testing on the R-sig-geo email list. Designing your own permutation test is (can be?) fun and easy, and I thought I’d save it here for future reference.

Why permutation tests? You can get at exactly what you want to know, which can be especially useful with complex spatial questions.

The question: which are closer on average to points in A, points in B or points in C?

Null hypothesis: mean distance from B to A is equal to mean distance from C to A.

I’ve adapted the example slightly from the original question on the mailing list.

library(spatstat)

set.seed(2019)

A <- rpoispp(100) ## Base set
B <- rpoispp(50) ## First set of points
C <- rpoispp(50) ## Second set of points

ABd <- crossdist(A, B)
ACd <- crossdist(A, C)

mean(ABd)
# 0.5168865
mean(ACd)
# 0.5070118

Are the two distances different?

One way to approach this with a permutation test is to keep the points in B and C where they are, but shuffle the assignment to B or C. That formulation might be appropriate if you're working with trees, where the trees are where they are but you're concerned with whether species B is closer to species A than another species is. (There are multiple ways to approach this question, but I like permutation tests.)

The basic procedure, then, is to reorder the point labels a bunch of times, and take the mean resampled distances. Then compare the true difference in distances to the permuted difference in distances to see where the true distance falls in the overall distribution.

The thing about using this method is that it assumes that the trees don't move, and so takes into account the intrinsic structure of the tree spatial pattern.

nperm <- 999
permout <- data.frame(ABd = rep(NA, nperm), ACd = rep(NA, nperm))

# create framework for a random assignment of B and C to the existing points

BC <- superimpose(B, C)
B.len <- npoints(B)
C.len <- npoints(C)
B.sampvect <- c(rep(TRUE, B.len), rep(FALSE, C.len))

set.seed(2019)
for(i in seq_len(nperm)) {
B.sampvect <- sample(B.sampvect)
B.perm <- BC[B.sampvect]
C.perm <- BC[!B.sampvect]

permout[i, ] <- c(mean(crossdist(A, B.perm)), mean(crossdist(A, C.perm)))
}

boxplot(permout$ABd - permout$ACd)
points(1, mean(ABd) - mean(ACd), col="red")

table(abs(mean(ABd) - mean(ACd)) >= abs(permout$ABd - permout$ACd))
# FALSE TRUE
# 573 426

sum(abs(mean(ABd) - mean(ACd)) >= abs(permout$ABd - permout$ACd)) / nperm
# 0.4264264

The difference between ACd and ABd is indistinguishable from that obtained by a random resampling of B and C.

Or, there is no apparent difference in the mean distance of B to A or C to A.

Which is reassuring, since A, B, and C were all randomly generated.