Wednesday, 7 August 2013

Will RCpp speed up the evaluation of basic R functions?

Will RCpp speed up the evaluation of basic R functions?

Pardon me, but I do not know much about Rcpp, but I am trying to figure
out if it would be good to learn it in order to improve a package I am
writing.
I have written an R package that (is supposed to) efficiently and randomly
samples uniformly from a high-dimensional, constrained space with an MCMC
algorithm. It is (unfinished and) located at
https://github.com/davidkane9/kmatching.
The problem is when I run a statistical test called the Gelman-Rubin
diagnostic to see whether my MCMC algorithm has converged to a stationary
distribution, I should get R = 1 for the statistic, but I get very high
numbers, which basically tells me my sampling is bad and no one should use
it. The solution is to take more samples and skip more (taking 1 out of
ever 1000 instead of every 100). However this takes a lot of time. If you
want some code to run, here is an example:
##install the package first
data(lalonde)
matchvars = c("age", "educ", "black")
k = kmatch(x = lalonde, weight.var = "treat", match.var = matchvars, n =
1000, skiplength = 1000, chains = 2, verbose = TRUE)
Looking at an Rprof output of this I get that rnorm and %*% are taking
most of the time:
total.time total.pct self.time self.pct
"kmatch" 1453.14 100.00 0.00 0.00
"hitandrun" 1450.18 99.79 128.80 8.86
"%*%" 757.00 52.09 757.00 52.09
"cat" 343.18 23.62 329.82 22.70
"rnorm" 106.34 7.32 103.50 7.12
"mirror" 35.26 2.43 21.84 1.50
"paste" 14.02 0.96 14.02 0.96
"stdout" 13.36 0.92 13.36 0.92
"runif" 13.32 0.92 13.32 0.92
"/" 12.82 0.88 12.82 0.88
">" 7.42 0.51 7.42 0.51
"<" 6.22 0.43 6.22 0.43
"-" 5.78 0.40 5.78 0.40
"max" 5.18 0.36 5.18 0.36
"nchar" 5.12 0.35 5.12 0.35
"*" 4.84 0.33 4.84 0.33
"min" 3.94 0.27 3.94 0.27
"sum" 3.42 0.24 3.42 0.24
"gelman.diag" 2.90 0.20 0.00 0.00
"==" 2.86 0.20 2.86 0.20
"ncol" 2.84 0.20 2.84 0.20
"apply" 2.72 0.19 0.26 0.02
"+" 2.48 0.17 2.48 0.17
"FUN" 2.32 0.16 1.66 0.11
"^" 2.08 0.14 2.08 0.14
":" 1.24 0.09 1.24 0.09
"sqrt" 0.96 0.07 0.96 0.07
"%%" 0.90 0.06 0.90 0.06
"mean.default" 0.62 0.04 0.62 0.04
"lapply" 0.40 0.03 0.26 0.02
"(" 0.32 0.02 0.32 0.02
"unlist" 0.26 0.02 0.00 0.00
"array" 0.12 0.01 0.02 0.00
"sapply" 0.12 0.01 0.00 0.00
"matrix" 0.06 0.00 0.02 0.00
"Null" 0.04 0.00 0.04 0.00
"print" 0.04 0.00 0.00 0.00
"unique" 0.04 0.00 0.00 0.00
"abs" 0.02 0.00 0.02 0.00
"all" 0.02 0.00 0.02 0.00
"aperm.default" 0.02 0.00 0.02 0.00
"as.matrix.mcmc" 0.02 0.00 0.02 0.00
"file.exists" 0.02 0.00 0.02 0.00
"list" 0.02 0.00 0.02 0.00
"print.default" 0.02 0.00 0.02 0.00
"stopifnot" 0.02 0.00 0.02 0.00
"unique.default" 0.02 0.00 0.02 0.00
"which.min" 0.02 0.00 0.02 0.00
"<Anonymous>" 0.02 0.00 0.00 0.00
"aperm" 0.02 0.00 0.00 0.00
"as.mcmc.list" 0.02 0.00 0.00 0.00
"as.mcmc.list.default" 0.02 0.00 0.00 0.00
"data" 0.02 0.00 0.00 0.00
"mcmc.list" 0.02 0.00 0.00 0.00
"print.gelman.diag" 0.02 0.00 0.00 0.00
"quantile.default" 0.02 0.00 0.00 0.00
"sort" 0.02 0.00 0.00 0.00
"sort.default" 0.02 0.00 0.00 0.00
"sort.int" 0.02 0.00 0.00 0.00
"summary" 0.02 0.00 0.00 0.00
"summary.default" 0.02 0.00 0.00 0.00
If I set verbose = F, the cat goes away, but then %*% takes about 70% of
the time. I am wondering if it would be worthwhile to try to write my code
in C++ and then use RCpp, or if because the functions that are taking so
much time are basic functions (already written in C) it wouldn't be worth
it and I'll just have to live with it or find a better algorithm.

No comments:

Post a Comment