A reply to Testing via credible sets

Last week I posted a manuscript on arXiv entitled On decision-theoretic justifications for Bayesian hypothesis testing through credible sets. A few days later, a discussion of it appeared on Xi’ans’ Og. I’ve read papers and books by Christian Robert with great interest and have been a follower of his “Og” for quite some time, and so was honoured and excited when he chose to blog about my work. I posted a comment to his blog post, but for some reason or other it has not yet appeared on the site. I figured that I’d share my thoughts on his comments here on my own blog for the time being.

The main goal of the paper was to discuss decision-theoretic justifications for testing the point-null hypothesis Θ0={θ0} against the alternative Θ1={θ: θ≠θ0} using credible sets. In this test procedure, Θ0 is rejected if θis not in the credible set. This is not the standard solution to the problem, but certainly not uncommon (I list several examples in the introduction to the paper). Tests of composite hypotheses are also discussed.

Judging from his blog post, Xi’an is not exactly in love with the manuscript. (Hmph! What does he know about Bayesian decision theory anyway? It’s not like he wrote the book on… oh, wait.) To some extent however, I think that his criticism is due to a misunderstanding.

Before we get to the misunderstanding though: Xi’an starts out by saying that he doesn’t like point-null hypothesis testing, so the prior probability that he would like it was perhaps not that great. I’m not crazy about point-null hypotheses either, but the fact remains that they are used a lot in practice and that there are situations where they are very natural. Xi’an himself gives a few such examples in Section 5.2.4 of The Bayesian Choice, as do Berger and Delampady (1987).

What is not all that natural, however, is the standard Bayesian solution to point-null hypothesis testing. It requires a prior with a mass on θ0, which seems like a very artificial construct to me. Apart from leading to such complications as Lindley’s paradox, it leads to very partial priors. Casella and Berger (1987, Section 4) give an example where the seemingly impartial prior probabilities P(θ0)=1/2 and P(Θ1)=1/2 actually yield a test with strong bias towards the null hypothesis. One therefore has to be extremely careful when applying the standard tests of point-null hypotheses, and carefully think about what the point-mass really means and how it affects the conclusions.

Tests based on credible sets, on the other hand, allows us to use a nice continuous prior for θ. It can, unlike the prior used in the standard solution, be non-informative. As for informative priors, it is often easier to construct a continuous prior based on expert opinion than it is to construct a mixed prior.

Theorem 2 of my paper presents a weighted 0-1-type loss function that leads to the acceptance region being the central (symmetric) credible interval. The prior distribution is assumed to be continuous, with no point-mass in θ0. The loss is constructed using directional conclusions, meaning that when θ0 is rejected, it is rejected in favour of either {θ: θ<θ0} or {θ: θ>θ0}, instead of simply being rejected in favour of {θ: θ≠θ0}. Indeed, this is how credible and confidence intervals are used in practice: if θis smaller than all values in the interval, then θis rejected and we conclude that θ>θ0. The theorem shows that tests based on central intervals can be viewed as a solution to the directional three-decision problem – a solution that does not require a point-mass for the null hypothesis. I therefore do not agree with Xi’an’s comment that “[tests using credible sets] cannot bypass the introduction of a prior mass on Θ0“. While a test traditionally only has one way to reject the null hypothesis, allowing two different directions in which Θcan be rejected seems perfectly reasonable for the point-null problem.

Regarding this test, Xi’an writes that it essentially [is] a composition of two one-sided tests, […], so even at this face-value level, I do not find the result that convincing”. But any (?) two-sided test can be said to be a composition of two one-sided tests (and therefore implicitly includes a directional conclusion), so I’m not sure why he regards it as a reason to remain unconvinced about the validity of the result.

As for the misunderstanding, Theorem 3 of the paper deals with one-sided hypothesis tests. It was not meant as an attempt to solve the problem of testing point-null hypotheses, but rather to show how credible sets can be used to test composite hypotheses – as was Theorem 4. Xi’an’s main criticism of the paper seems to be that the tests in Theorems 3 and 4 fail for point-null hypotheses, but they were never meant to be used for such hypotheses in the first place. After reading his comments, I realized that this might not have been perfectly clear in the first draft of the paper. In particular, the abstract seemed to imply that the paper only dealt with point-null hypotheses, which is not the case. In the submitted version (not yet uploaded to arXiv), I’ve tried to make the fact that both point-null and composite hypotheses are studied clearer.

There are certainly reasons to question the use of credible sets for testing, chief among them being that the evidence against Θis evaluated in a roundabout way. On the other hand, credible sets are reasonably easy to compute and tend to have favourable properties in frequentist analysis. It seems to me that a statistician that would like to use a method that is reasonable both in Bayesian and frequentist inference would want to consider tests based on credible sets.

Speeding up R computations

The past few days I’ve been going through some R code that I wrote last year, when I was preparing a massive simulation-based power study for some tests for multivariate normality that I’ve been working on. My goal was to reduce the time needed to run the simulation. I wasn’t expecting great improvement, since I’ve always believed that the most common R functions are properly vectorized and optimized for speed. Turns out I was wrong. Very wrong.

The first thing that I did was that I replaced all parentheses ( ) by curly brackets { }. I was inspired to do so by this post (and this, via Xi’Ans Og) over at Radford Neal’s blog. As he pointed out, code that uses parentheses is actually slower than the same code with curly brackets:

> system.time( for(i in 1:1000000) { 1*(1+1) } )
   user  system elapsed 
  1.337   0.005   1.349 
> system.time( for(i in 1:1000000) { 1*{1+1} } )
   user  system elapsed 
  1.072   0.003   1.076 

Similarly, you can compare a*a and a^2:
> system.time( for(i in 1:10000000) 3^2 )
   user  system elapsed 
  5.048   0.028   5.088 
> system.time( for(i in 1:10000000) 3*3 )
   user  system elapsed 
  4.721   0.024   4.748 
So, a^2 is slower than a*a. This made me wonder, are there other built-in R functions that are slower than they ought to be?

One thing that I found very surprising, and frankly rather disturbing, is that mean(x) takes ten times as long to calculate the mean value of the 50 real numbers in the vector x as the “manual” function sum(x)/50:

> x<-rnorm(50)
> system.time(for(i in 1:100000){mean(x)})
   user  system elapsed 
  1.522   0.000   1.523 
> system.time(for(i in 1:100000){sum(x)/length(x)})
   user  system elapsed 
  0.200   0.000   0.200 
> system.time(for(i in 1:100000){sum(x)/50})
   user  system elapsed 
  0.167   0.000   0.167 
> system.time(for(i in 1:100000){ overn<-rep(1/50,50); x%*%overn })
   user  system elapsed 
  0.678   0.000   0.677 
> overn<-rep(1/50,50); system.time(for(i in 1:100000){ x%*%overn })
   user  system elapsed 
  0.164   0.000   0.164 

I guess that the R development core team have been focusing on making R an easy-to-use high level programming language rather than optimizing all functions, but the poor performance of mean is just embarrassing.

Similarly, the var function can be greatly improved upon. Here are some of the many possibilites:

> x <- rnorm(50)
> system.time( for(i in 1:100000) { var(x) } )
   user  system elapsed 
  4.921   0.000   4.925 
> system.time( for(i in 1:100000) { sum((x-mean(x))^2)/{length(x)-1} } )
   user  system elapsed 
  2.322   0.000   2.325 
> system.time( for(i in 1:100000) { {sum(x*x)-sum(x)*sum(x)/length(x)}/{length(x)-1} } )
   user  system elapsed 
  0.736   0.000   0.737 
> system.time( for(i in 1:100000) { {sum(x*x)-sum(x)*sum(x)/50}/49 } )
   user  system elapsed 
  0.618   0.000   0.618 

> system.time( for(i in 1:100000) { sx<-sum(x); {sum(x*x)-sx*sx/50}/49 } )
   user  system elapsed 
  0.567   0.000   0.568

I changed all the uses of mean in my code to “sum/n” instead (and avoided using var entirely) and found that this sped things up quite a bit.
Another trick to speed up your computations is to create the vectors that you wish to change within a loop with the right number of elements. While
for(j in 1:100) a[j]<-j 
works just fine, it is actually quite a bit slower than
for(j in 1:100) a[j]<-j
You could create a in other ways as well of course, for instance by a<-vector(length=100). Here are the numbers:

> system.time( for(i in 1:100000) { a<-NA; for(j in 1:100) a[j]<-j })
   user  system elapsed 
 37.383   0.092  37.482 
> system.time( for(i in 1:100000) { a<-rep(NA,100); for(j in 1:100) a[j]<-j })
   user  system elapsed 
 25.866   0.065  25.936 
> system.time( for(i in 1:100000) { a<-vector(length=100); for(j in 1:100) a[j]<-j })
   user  system elapsed 
 25.517   0.022  25.548

In my case, I’d been a bit sloppy with creating the vectors in my loops in the proper way, so I changed this in my code as well.

In my simulation study, I simulate multivariate random variables, compute some test statistics and use these to estimate the powers of the normality tests against various alternatives. After doing the changes mentioned above, I compared the performance of my old code to that of the new code, for 1000 iterations of the procedure:

> system.time( source(“oldCode.R”) )
   user  system elapsed 
548.045   0.273 548.622 
> system.time( source(“newCode.R”) )

   user  system elapsed 
 93.138   0.002  93.194
The improved code is almost 6 times faster than the old code. When you do ten million or so iterations, that matters. A lot.

In conclusion, it’s definitely possible to speed up your code significantly if you know of the pitfalls of R. I suspect that I’ll be obsessed with finding more pitfalls in the next few weeks, so I’d be thankful for any hints about other weaknesses that R has.

It should probably be mentioned that R is really fast when things are properly vectorized. Last year, a coworker that uses Matlab challenged me to perform a number of matrix computations faster in R than in Matlab. To his great surprise, R won.

As a final remark, I’m now facing a bit of a dilemma. Should I write readable code; a^6; or fast code; a*a*a*a*a*a?

Update: looking to speed up your R computations even more? See my posts on compiling your code and parallelization.