Kernels for Everyone! (2 August, 2016)

Background

During my dissertation, I spent a lot of time working on spatial kernel estimates. Where spatial kernel estimates are defined as a convolution of a spatial suppport $s,s_0 \in D$ , $$ (f*g)(s_0)=\int f(s)g(s_0-s)ds\text{.} $$ A simple example of this estimate is a Gaussian filter or blur in more common parlance. In the Guassian filter, $g$ is the normal density function $\phi$, with the location parameter $\mu=s_0$ and scale parameter $\sigma$ equal to the bandwidth $h$.

The kernelDensityEstimates package provides a means to apply these estimates to raster S4 objects from the raster R package. Each function preserves NA values, and uses OpenMP for acceleration. However, all processing is done in memory, so there is no support for extremely large images.

The rasterKernelEstimates R package is currently not on CRAN, but will be after the documentation is cleaned up. Furthermore, the implementation of the kernels is currently non-optimal, and a fair bit of work can be done to speed them up a bit more. For those interested in helping out, the code is on my github page.

A short example of each function is provided below with some basic performance metrics. All timings are done on a Late 2013 MacBook Pro 13" with a dual-core 2.4Ghz i5 running R 3.3.1 compiled under gcc 6.1.0.

rasterLocalSums
rasterLocalQuantiles
rasterLocalMoments
rasterLocalCategoricalModes

rasterLocalSums

The function rasterLocalSums calculates a weighted sum of pixel values in a spatial neighborhood defined by the matrix W, $$ f(s_0) = \sum_{s\in S_0} w(s) x(s) \text{.} $$ This is similar to the raster library function focal when fun is not specified. In fact, the results are identical when padding is used in the focal call.

library(raster)
library(rasterKernelEstimates)

set.seed(100)
n <- 500

# create a raster object
r <- raster::raster( matrix(rnorm(n^2),n,n))

# create a weight matrix
W <- raster::focalWeight(r,c(1,0.04),type='Gauss')

# apply the weight with rasterKernelEstimates
run.time <- proc.time()
rLocalKDE1 <- rasterLocalSums(r,W)
print(proc.time() - run.time)

##    user  system elapsed 
##   0.975   0.004   0.325

# apply the weight with the raster packages focal
run.time <- proc.time()
rLocalKDE2 <- raster::focal(r,W,pad=TRUE)
print(proc.time() - run.time)

##    user  system elapsed 
##   0.667   0.008   0.678

# plot original image
plot(r)

# plot the smoothed image
plot(rLocalKDE1)

# print out the max abs difference
print(
sprintf(
  "The maximum absolute difference = %f.",
    max(abs(values(rLocalKDE1) - values(rLocalKDE2)
    ),na.rm = T))
)

## [1] "The maximum absolute difference = 0.000000."

rasterLocalQuantiles

The function rasterLocalQuantiles calculates the $q^\text{th}$ quantile of pixel values in a spatial neighborhood defined by the matrix W; $$ f(s_0) = \left\{ argmax_{x(s): s \in S_0 } Pr\left(X(s) < x(s) \right) \le \frac{q}{100}, \right\} \text{,} $$ where $X(s)$ is a random variable on the support of $S_0$, a spatial neighborhood about point $s_0$.

This result should generally be reproducible using the raster library function focal when fun is specified as quantile. Because quantiles do not necessarily fall exactly on order statistics there is a need to map quantiles between a function of the order statistics. In the rasterLocalQuantiles function this mapping is done to the closest odd-valued-index order statistic. E.g. if the quantile falls between the first and second order statistic. This approach matches the inverse empirical cdf transform used by the type=1 transform in the stats::quantile function. Note that the quantile function applied to raster objects calculates the quantile from all pixels, not local quantiles.

As far as applications go, this function is particularly useful when working with data with a large number of outliers.

library(raster)
library(rasterKernelEstimates)


set.seed(100)
n <- 100

# create a raster object with extreme values
r <- raster::raster( matrix(rcauchy(n^2),n,n))

# create a weight matrix
W <- matrix(1,nrow=3,ncol=3)

# calculate local median
run.time <- proc.time()
rLocalKDE1 <- rasterLocalQuantiles(r,W,q=30)
print(proc.time() - run.time)

##    user  system elapsed
##   0.018   0.001   0.010

# calculate local median
run.time <- proc.time()
rLocalKDE2 <- raster::focal(
  r, W, pad=TRUE,
  fun=function(x) stats::quantile(x,probs=0.3,na.rm=T,type=1),
  padValue=NA)
print(proc.time() - run.time)

##    user  system elapsed
##   1.245   0.010   1.268

# plot boxplot of original image
boxplot(r)

# plot the smoothed image
plot(rLocalKDE1)

# print out the max abs difference
print(
sprintf(
  "The maximum absolute difference = %f.",
    max(abs(values(rLocalKDE1) - values(rLocalKDE2)
    ),na.rm = T))
)

## [1] "The maximum absolute difference = 0.000000."

**Figure 3:** Box plot of the Cauchy distributed values in the raster `r`.

rasterLocalMoments

The function rasterLocalMoments calculates up to the second moment over the weighted matricies Wmu and Wvar, $$ \hat{\mu}(s_0) = \frac{ \sum_{s\in S_0} w_\mu(s) x(s) }{ \sum_{s\in S_0} w_\mu(s) } $$ and $$ \hat{\sigma^2}(s_0) = \frac{ \sum_{s\in S_0} w_\sigma(s) \left( x(s) - \hat{\mu}(s_0) \right)^2 }{ \sum_{s\in S_0} w_\sigma(s) } \text{.} $$ The difference between the mean call here and rasterLocalSums is the normalization of the weights. This ensures the weights sum to one.

library(raster)
library(rasterKernelEstimates)

set.seed(100)
n <- 200

# create a raster object of two populations
r <- raster::raster(
  cbind( matrix(rnorm(n^2),n,n), matrix(rnorm(n^2,2,2),n,n)) )


# create a weight matrix
W <- raster::focalWeight(r,c(1,0.04),type='Gauss')

# calculate the weighted local mean and variance
run.time <- proc.time()
rLocalKDE1 <- rasterLocalMoments(r,W)
print(proc.time() - run.time)

##    user  system elapsed 
##   0.343   0.002   0.102

# plot original image
plot(r)

# plot the weighted mean
plot(rLocalKDE1$mu)

# plot the weighted variance
plot(rLocalKDE1$var)

**Figure 6:** Local weighted mean of the original image.

**Figure 7:** Local weighted variance of the original image.

rasterLocalCategorical Modes

The function rasterLocalCategoricalModes calculates a categorical mode from a raster image over the weighted spatial neighborhood devined by the matrix W, $$ f(s_0) = \left\{ c : max_{c \in C_0} \left\{ \sum_{ s \in S_0} w(s)\mathbb{I}_{x(s)=c} \right\} \right\} $$ where $C_0$ is the set of unique categories in r.

library(raster)
library(rasterKernelEstimates)

set.seed(100)
n <- 100

# create a categorical raster
r <- raster::raster(
  matrix( sample(-4:4,size=n^2,replace=T),n,n)
  )


# create a weight matrix
W <- matrix(1,9,9)

# calculate the weighted local mean and variance
run.time <- proc.time()
rLocalKDE1 <- rasterLocalCategoricalModes(r,W)
print(proc.time() - run.time)

##    user  system elapsed 
##   0.030   0.000   0.011

# plot original image
plot(r)

# plot the weighted mean
plot(rLocalKDE1)

**Figure 9:** Local weighted mode of the original image.