Cleaning up SPAM with NLP
Hey, it's the holidays. Time to take some PTO/Leave, hang out with family, relax, and work on some side projects. One of the many projects I finally have time for is removal of 'SPAM' messages from one of the games I play from time-to-time. This allows me to play with some NLP and solve some fun problems, like how to score a model quickly while only using a few mb of RAM in the scripting language Lua.
Grand Central Dispatch
About a decade late, but I decided to give Grand Central Dispatch (GCD) a go in R. GCD is fairly similar to OpenMP as it provides a simplified interface for pthreads. Since its release for Mac OS X 10.6 in 2009, GCD has been ported to a number of operating systems through libdispatch. As of this writing, libdispatch is primarily available on Linux and Apple's operating systems using the Clang compiler. GCC may work, but I have not had any luck.
Mysteriously Slow sample
Hi everyone, I'm at JSM 2018 right now, so feel free to drop by my session or drop by in the halls! Just give me a tweet!
Making Your .C Less NOTEworthy
If you are a package maintainer, you may have noticed the following new notes from your code checks:
Found no calls to: ‘R_registerRoutines’, ‘R_useDynamicSymbols’If you are using
Rcppyou can easily fix this by refreshing the auto-generated function registration. However, if you have a lot of C code that uses the
.C()interface, then you need to make a few changes. In this blog post, I'll use my updated meanShiftR package as an example of how to quickly fix your code.
The Statistician's Apprentice: An Introduction to the SWP Operator
The sweep operator as defined in (Dempster, 1969), commonly referred to as the SWP operator, is a useful tool for a computational statistician working with covariance matrices. In particular, the SWP operator allows a statistician to quickly regress all variables against one specified variable, obtaining OLS estimates for regression coefficients and variances in a single application. Subsequent applications of the SWP operator allows for regressing against more variables.
In this blog post, I will be introducing the meanShiftR package. meanShiftR is a rewrite of my original mean shift R package from 2013, based on the Fast Library for Approximate Nearest Neighbors (FLANN). The meanShiftR package is focused on providing to R users the most computationally efficient mean shift implementations available in the literature. This includes approximations to the mean shift algorithm through kernel truncations and approximate nearest-neighbor (ANN) approaches.
Over the last few months I have been spending my nights taking care of my newly born second daughter. Keeping me company during the sleepless wee hours of the morning was the Reconcilable Differences Podcast. In episode 17 of this podcast, It's Devastating, there was an open question placed by John Siracusa with regard to how baby names change over time, and if there were any sudden changes. In this blog post, part of my investigation of podcast theme series, I will take a look at these two questions. I will also provide my source code in the spirit of reproducible research.
Kernels for everyone!
During my dissertation, I spent a lot of time working on spatial kernel estimates. Where spatial kernel estimates are defined as a convolution of a spatial suppport ,A simple example of this estimate is a Gaussian filter or blur in more common parlance. In the Guassian filter, is the normal density function , with the location parameter and scale parameter equal to the bandwidth .
On a finite time scale
It was rumored that updates to the MacBook Pro were coming at WWDC. These rumors did not pan out. Instead it looks like the new MacBook Pro will be landing sometime later this year, possibly due to delays in availability of high end Skylake 45w mobile parts. This seems plausible, given that Intel only released its Skylake quad core NUC in mid-May. The magnitude of these delays has certainly made its way around the tech press, but are these delays really exceptional?
Predicting Agriculture, Poorly (Part I)
Crop rotation is an agricultural production practice to increase yield, mitigate disease, and control pests. This production practice involves growing crops in specific sequences to improve the quality of soil for the following crop. A common example in the United States is the practice of growing soybeans before corn. In this example soybeans fixate nitrogen in the soil leading to an increase in yield and reduction in fertilizer consumption.
The cdlTools R package
This is a brief tutorial on the cdlTools package developed by Lu Chen and I to download and perform some simple analysis on USDA's cropland data layer (CDL). This tutorial will cover downloading CDL data, obtaining some zonal statistics, and explore land cover change. This package is not currently available on CRAN, but is available via github, and can be installed through Hadley Wickham's devtools package.