Kernels for Everyone! (2 August, 2016)
Background
During my dissertation, I spent a lot of time working on spatial kernel estimates. Where spatial kernel estimates are defined as a convolution of a spatial suppport \(s,s_0 \in D\) , $$ (f*g)(s_0)=\int f(s)g(s_0-s)ds\text{.} $$ A simple example of this estimate is a Gaussian filter or blur in more common parlance. In the Guassian filter, \(g\) is the normal density function \(\phi\), with the location parameter \(\mu=s_0\) and scale parameter \(\sigma\) equal to the bandwidth \(h\).
The kernelDensityEstimates
package provides a means to apply these estimates to raster
S4 objects from the NA
values, and uses OpenMP for acceleration.
However, all processing is done in memory, so there is no support for extremely large images.
The rasterKernelEstimates R package is currently not on CRAN, but will be after the documentation is cleaned up. Furthermore, the implementation of the kernels is currently non-optimal, and a fair bit of work can be done to speed them up a bit more. For those interested in helping out, the code is on my github page.
A short example of each function is provided below with some basic performance metrics. All timings are done on a Late 2013 MacBook Pro 13" with a dual-core 2.4Ghz i5 running R 3.3.1 compiled under gcc 6.1.0.
rasterLocalSums
The function rasterLocalSums
calculates a weighted sum of pixel values in a spatial neighborhood defined by the matrix W
,
$$
f(s_0) = \sum_{s\in S_0} w(s) x(s) \text{.}
$$
This is similar to the raster library function focal
when fun is not specified.
In fact, the results are identical when padding is used in the focal
call.
library(raster) library(rasterKernelEstimates) set.seed(100) n <- 500 # create a raster object r <- raster::raster( matrix(rnorm(n^2),n,n)) # create a weight matrix W <- raster::focalWeight(r,c(1,0.04),type='Gauss') # apply the weight with rasterKernelEstimates run.time <- proc.time() rLocalKDE1 <- rasterLocalSums(r,W) print(proc.time() - run.time)
## user system elapsed ## 0.975 0.004 0.325
# apply the weight with the raster packages focal run.time <- proc.time() rLocalKDE2 <- raster::focal(r,W,pad=TRUE) print(proc.time() - run.time)
## user system elapsed ## 0.667 0.008 0.678
# plot original image plot(r)
# plot the smoothed image plot(rLocalKDE1)
# print out the max abs difference print( sprintf( "The maximum absolute difference = %f.", max(abs(values(rLocalKDE1) - values(rLocalKDE2) ),na.rm = T)) )
## [1] "The maximum absolute difference = 0.000000."
rasterLocalQuantiles
The function rasterLocalQuantiles
calculates the \(q^\text{th}\) quantile of pixel values in a spatial neighborhood defined by the matrix W
;
$$
f(s_0) = \left\{ argmax_{x(s): s \in S_0 } Pr\left(X(s) < x(s) \right) \le \frac{q}{100}, \right\} \text{,}
$$
where \(X(s)\) is a random variable on the support of $S_0$, a spatial neighborhood about point \(s_0\).
This result should generally be reproducible using the raster library function focal
when fun is specified as quantile
.
Because quantiles do not necessarily fall exactly on order statistics there is a need to map quantiles between a function of the order statistics.
In the rasterLocalQuantiles
function this mapping is done to the closest odd-valued-index order statistic.
E.g. if the quantile falls between the first and second order statistic.
This approach matches the inverse empirical cdf transform used by the type=1
transform in the stats::quantile
function.
Note that the quantile
function applied to raster objects calculates the quantile from all pixels, not local quantiles.
As far as applications go, this function is particularly useful when working with data with a large number of outliers.
library(raster) library(rasterKernelEstimates) set.seed(100) n <- 100 # create a raster object with extreme values r <- raster::raster( matrix(rcauchy(n^2),n,n)) # create a weight matrix W <- matrix(1,nrow=3,ncol=3) # calculate local median run.time <- proc.time() rLocalKDE1 <- rasterLocalQuantiles(r,W,q=30) print(proc.time() - run.time)
## user system elapsed ## 0.018 0.001 0.010
# calculate local median run.time <- proc.time() rLocalKDE2 <- raster::focal( r, W, pad=TRUE, fun=function(x) stats::quantile(x,probs=0.3,na.rm=T,type=1), padValue=NA) print(proc.time() - run.time)
## user system elapsed ## 1.245 0.010 1.268
# plot boxplot of original image boxplot(r)
# plot the smoothed image plot(rLocalKDE1)
# print out the max abs difference print( sprintf( "The maximum absolute difference = %f.", max(abs(values(rLocalKDE1) - values(rLocalKDE2) ),na.rm = T)) )
## [1] "The maximum absolute difference = 0.000000."
rasterLocalMoments
The function rasterLocalMoments
calculates up to the second moment over the weighted matricies Wmu
and Wvar
,
$$
\hat{\mu}(s_0) = \frac{
\sum_{s\in S_0} w_\mu(s) x(s)
}{
\sum_{s\in S_0} w_\mu(s)
}
$$
and
$$
\hat{\sigma^2}(s_0) = \frac{
\sum_{s\in S_0} w_\sigma(s) \left( x(s) - \hat{\mu}(s_0) \right)^2
}{
\sum_{s\in S_0} w_\sigma(s)
}
\text{.}
$$
The difference between the mean call here and rasterLocalSums
is the normalization of the weights. This ensures the weights sum to one.
library(raster) library(rasterKernelEstimates) set.seed(100) n <- 200 # create a raster object of two populations r <- raster::raster( cbind( matrix(rnorm(n^2),n,n), matrix(rnorm(n^2,2,2),n,n)) ) # create a weight matrix W <- raster::focalWeight(r,c(1,0.04),type='Gauss') # calculate the weighted local mean and variance run.time <- proc.time() rLocalKDE1 <- rasterLocalMoments(r,W) print(proc.time() - run.time)
## user system elapsed ## 0.343 0.002 0.102
# plot original image plot(r)
# plot the weighted mean plot(rLocalKDE1$mu)
# plot the weighted variance plot(rLocalKDE1$var)
rasterLocalCategorical Modes
The function rasterLocalCategoricalModes
calculates a categorical mode from a raster image over the weighted spatial neighborhood devined by the matrix W
,
$$
f(s_0) = \left\{
c : max_{c \in C_0} \left\{ \sum_{ s \in S_0} w(s)\mathbb{I}_{x(s)=c} \right\}
\right\}
$$
where \(C_0\) is the set of unique categories in r
.
library(raster) library(rasterKernelEstimates) set.seed(100) n <- 100 # create a categorical raster r <- raster::raster( matrix( sample(-4:4,size=n^2,replace=T),n,n) ) # create a weight matrix W <- matrix(1,9,9) # calculate the weighted local mean and variance run.time <- proc.time() rLocalKDE1 <- rasterLocalCategoricalModes(r,W) print(proc.time() - run.time)
## user system elapsed ## 0.030 0.000 0.011
# plot original image plot(r)
# plot the weighted mean plot(rLocalKDE1)