Analyse Bioscreen growth curves using BAT 2.0

Some years ago, I developed an R script for analysing growth curves from Bioscreen C MBR machines. The script, called BAT (an acronym for Bioscreen Analysis Tool), has been used to analyse Bioscreen data in several labs across the globe – for instance in the work leading up to our paper on reversion of antibiotic resistance, published last year. It allows the user to quickly compute the growth rates and doubling times in hundreds of growth curves.

BAT 1.0 worked just fine most of the time, but required the user to learn a little bit of R. That threshold was too high for some users, and so I’ve thought about creating a more user-friendly version of the script for a while now. I’ve finally gotten around to doing just that, creating a version that can be run directly in any modern browser, by using a wonderful tool called Shiny. Users can now use BAT to analyse Bioscreen growth curves in their web browsers, without installing or knowing how to use R. It’s free, and always will be. And so, without further ado, I present to you:

BAT 2.0

Still reading? Well, in that case, I’m happy to tell you a little bit more about what BAT 2.0 does and how to use it.

Some background
Bioscreen machines, manufactured by Oy Growth Curves Ab Ltd, can be used to measure the growth of all kinds of microorganisms – bacteria, amoebae, fungi, yeast, algea, bacteriophages and so on – in different media (BAT was originally developed to analyse bacterial growth, which may be evident in some of the terminology used below). It measures growth in terms of changes in optical density (OD). Growth curves are what we get when we plot the growth against time, as in this figure, showing the growth of E. coli in a rich medium (Müller-Hinton broth):

Notice that I’ve plotted the logarithm of the OD along the y-axis, which allows us to visually identify the log phase of the bacterial growth as a straight line (shown in red). By fitting a straight line to the log phase, we can find the doubling time of the growth. So how can we use BAT to plot curves like this and fit lines, and to find growth rates and doubling times?


Tab: Data
When you start BAT 2.0 you’ll see the following:

Notice that there are several tabs in the menu. When using BAT, we always start with the leftmost tab (Data) and gradually work our way further right (all the way to Results).

The first thing to do is to upload the csv file containing the results from your Bioscreen run.

You can think of csv files as a mixture between text files and Excel spreadsheets. To view your data, you can open them in Excel, but LibreOffice Calc is much better at handling them.

Unfortunately, csv files from different machines don’t always look the same. First of all, the character encoding will differ depending on your computer (language settings, operating system, etc.). Second, the different columns of the csv file can be separated either by a comma, a semicolon or a tab. In order for BAT to read the file properly, you need to tell it the encoding of your file, and how it is separated. Once you’ve given BAT the right settings, the first few lines of your file will be displayed in the right panel.

Here’s an example csv file that you can try: a comma-separated csv file encoded with UTF-8.

So, how do you find out what encoding your file uses? If you’re a Linux user, you can open a terminal and type

cd ~/path/to/your/file
file -i

to find the encoding, and if you’re using macOS, open a terminal and type

cd ~/path/to/your/file
file -I

If you’re using Windows, have a look here instead.

Is your file in an encoding not included in the list in BAT? You can either try changing the encoding or send me an email, so that I can add it to the list.

Once your data is shown correctly to the right, as in the figure below, you’re ready to proceed to the next tab.

Tab: Blank wells
‘Blank’ wells, i.e. wells only containing the medium, can (and should!) be used to adjust for changes in OD not caused by microbial growth. Check the boxes corresponding to the blank wells on your plate. Here’s an example of what your blank wells can look like:

If you don’t have any blank wells, uncheck all boxes and move to the next tab, where you then will need to select No adjustment (expect to see error messages in red prior to checking the No adjustment box – that’s normal and nothing to worry about).

(If you are using the example csv file, wells 141-144 should be blank.)

Tab: Reference wells
The first thing to do on this tab is to select which method to use for using the blank wells to adjust the OD measurements for the remaining wells:

  • Method 1: the average OD of the blank wells during the entire run is subtracted from the OD measurements.
  • Method 2:  the average OD of the blank wells at each time point is subtracted from the OD measurements at that same time point. This is the recommended method, as it captures the changes in OD not due to microbial growth at each time point.
  • No adjustment: no adjustments are made. Use at your own risk.

After that, you should select the wells containing your reference strain. Relative growth rates will be computed with respect to this strain. If you don’t have a reference strain and only are interested in absolute growth rate (i.e. the doubling times), simply pick one of your strains to use as a reference.

Tab: Other wells
BAT is now almost ready to compute the growth rates in your wells. It will attempt to find the log phase, but you’ll need to provide a little more information first: where the log phase can be found, how many replicates of each strain you have and what you consider to be an acceptable fit:

  • Masking intervall: this sets the OD levels between which BAT will look for the log phase. In the plots to the right, this is the interval between the blue lines. Use the default settings or adjust them until the interval in the plots to the right seems appropriate for most wells.
  • Number of replicates: if you have replicates of your strains, you can set the number of replicates here. Replicates are assumed to be located in consecutive wells (e.g. wells 105, 106 and 107). BAT will then compute the mean and standard deviation of the growth rate of the strain.
  • Minimum R value: the R value is the linear correlation between time and log(OD) for the fitted line. If the R value for the line fitted in the masking interval is less than the value you set here, BAT will conclude that it’s failed to find the log phase, and will ask you to manually adjust the settings for that particular well in the next tab.
    In theory, the R value should be 1 during the log phase, but in practice it is always lower because of small measurement errors. A lot of the time, you can expect R to be greater than 0.999 during the log phase, but for some runs there will be more noise, in which case you may need to set a lower minimum R value.

Here’s an example of what you should be seeing:

In addition to this, you can also choose which wells to analyse. The default is to analyse all wells – simply uncheck the boxes for the wells that you do not wish to analyse.

Once satisfied with your settings, click Save results and proceed to the next tab.

Tab: Manual fitting
If there were some wells for which BAT failed to fit a line with an acceptable R value, you will be asked to change the interval in which BAT looks for the log phase. You can do this either by choosing a vertical masking interval (where you pick which log(OD) values to look for the log phase among) or a horizontal masking interval (where you choose between which time points to look for the log phase). Your chosen interval is shown in the plot to the right. Once you’ve found an interval with a sufficiently high R value, click Save results.

Here’s an example showing a horizontal masking interval:

BAT will prompt you to repeat this procedure for all wells with too low an R value.

If you get stuck on a well and simply can’t find an interval where you get a sufficiently high R value, choose horizontal masking and put the two sliders very close, so that only two points are used for the estimate. Then R will always be 1 and you can continue (but don’t trust the growth rate estimate for that well!). Alternatively, you can go back to Other wells and uncheck the box for that well.

Tab: Results
To the right, you will see a table showing the results for your data. This includes the slope of the fitted lines (fittedValue), doubling times (doubTime) and relative growth rates (growthRate) for each well, as well as averages and standard deviations for each strain. The mean doubling time for the strain is called groupMeanDoubTime, the mean relative growth rate is called groupMeanGrowthRate and the standard deviation for the relative growth rate is called groupGrowthRateSD. Here’s an example of what the results may look like:

If you are satisfied with the results, you can also download this table as a csv file (comma-separated with UTF-8 encoding) and download the plots showing the growth curves and the fitted lines for all wells in a pdf file. If not, you can go back to previous tabs and change the settings.


Citations
If you use BAT for your research, please use the following citation:

Thulin, M. (2018). BAT: an online tool for analysing growth curves. Retrieved from http://www.mansthulin.se/bat/


For frequent users
If you are planning on using BAT extensively, I highly recommend that you install it on your computer rather than running the online version. It will still look the same and you’ll still use it in exactly the same way. There are two reasons for installing it:

  • BAT will run much faster on your computer.
  • BAT is currently hosted on Shinyapps.io’s free server, meaning that it’s limited to being using 25 hours per month. If you use it a lot, you’ll block others from using it.

To install and run BAT on your computer, follow these five easy steps:

  1. Download RStudio (choose the free desktop version) and install it.
  2. Download the source code for BAT from gitHub: click Clone or download and then Download ZIP. Unzip the downloaded file in a folder of your choice.
  3. Open the file app.R in RStudio.
  4. Click Run App.
  5. Click Show in browser.

You’ll then be running BAT in your browser just as before, only now it’s your computer rather than the server that’s doing the computations.


Frequently asked questions
Alright, I’ll admit it: since BAT 2.0 is a brand new thing, nobody has actually asked me these questions yet. But you ought to, so I’ll put them here just in case.

Q: BAT’s not working for me – what should I do?
A: First of all, carefully follow the steps outlined above. If it’s still not working, please send me an email at mans@statistikkonsult.com. Attach your csv file and describe what the problem seems to be. I’ll see if I can help.

Q: I have growth curves measured by something else than a Bioscreen. Can I still use BAT to analyse them?
A: Absolutely. If you format your data in a csv file to look like the output from a Bioscreen, BAT can do the same type of analysis for your data.

Q: Can you add feature X or add option Y?
A: Maybe. Contact me at mans@statistikkonsult.com and we can talk about it!

Q: Can you make a custom BAT with features X and Y for my lab?
A: Possibly! Contact me at mans@statistikkonsult.com and we can talk about it.

Q: I know R and want to contribute feature X to BAT.
A: That’s a statement and not a question, but OK…! You can find the source code for BAT on gitHub. Feel free to play around with it, and if your feature X seems useful I’ll be happy to add it to the online version of BAT!

Q: I’ve run lots of Bioscreens and analysed them using BAT. I have so much data now! But I’m not sure how to proceed with further statistical analyses of it. What should I do?
A: Great to hear that you have so much data! As it happens, I’m a statistical consultant with a lot of experience from growth curve data. Why don’t you send me an email at mans@statistikkonsult.com, to see if I can help?

Q: So, I hear that you’re a statistical consultant… I have all these other data that I also need help analysing!
A: Also a statement rather than a question, but never mind… Email me at mans@statistikkonsult.com with a description of the data you have, how you have collected it and – most importantly – why you have collected it (what is the question that you are trying to answer?). If I’m able to help, I’ll get back to you with a quote.

Speeding up R computations Pt III: parallelization

In two previous posts, I have written about how you can speed up your R computations either by using strange notation and non-standard functions or by compiling your code. Last year my department bought a 64-core computational server, which allowed me to get some seriously improved performance by using parallelization. This post is about four insights that I’ve gained from my experiments with parallel programming in R. These are, I presume, pretty basic and well-known, but they are things that I didn’t know before and at least the last two of them came as something of a surprise to me.

I’ll start this post by a very brief introduction to parallelization using the doMC package in R, in which I introduce a toy example with matrix inversions. I will however not go into detail about how to use the doMC package.  If you already are familiar with doMC, you’ll probably still want to glance at the matrix inversion functions before you skip ahead to the four points about what I’ve learned from my experiments, as these functions will occur again later in the text.

Parallel programming in R
Modern processors have several cores, which essentially are independent CPUs that can work simultaneously. When your code is parallelized several computations are carried out in parallel on the different cores. This allows you to, for instance, run several steps of a for-loop in parallel. In R it is simple to do this using e.g. the doMC package.

As an example, imagine that you wish to generate and invert several random 200 by 200 matrices and save the inverted matrices in a list. Using a standard for loop, you can do this using the following function:

# A standard function:
myFunc
<-function(B,size)
{
a
<-vector("list", B)
for(i in 1:B)
{
myMatrix
<-matrix(runif(size^2),size,size)
a
[[i]]<-solve(myMatrix)
}
return(a)
}

Using the doMC package, we can write a parallelized version of this function as follows:

library(doMC) # Use doMC package
registerDoMC() # Determine number of cores to use, default = #cores/2

 

# A parallelized function:
myParFunc<-function(B,size)
{
    foreach(i = 1:B) %dopar% 
    {
        myMatrix<-matrix(runif(size^2),size,size)
        solve(myMatrix)
    }
}

 

The parallelized foreach %dopar% loops always returns a list. If you are used to using for-loops in which the output is manually stored in vectors or matrices, this may take some time getting used to. The list structure is actually very useful, and by using unlist, data.frame and Reduce in a clever way you probably won’t have to make a lot of changes in your old code.

Comparing the two functions, we find that the parallelized function is substantially faster:

>  system.time(myFunc(1000,200))
   user  system elapsed 
 19.026   0.148  19.186 
>  system.time(myParFunc(1000,200))
   user  system elapsed 
 24.982   8.448   5.576

The improvement is even greater for larger matrices. Parallelization can save you a lot of time! Unfortunately, there are situations where it actually makes your code slower, as we will see next.

1. Parallelization is only useful when you have computation-heavy loop steps
Each parallelization has an overhead cost in that it takes some time to assign the different loop steps to the different cores. If each step of your loops is quickly executed, the overhead cost might actually be larger than the gain from the parallelization! As an example, if you wish to generate and invert several small matrices, say of size 25 by 25, the overhead cost is too large for parallelization to be of any use:

>  system.time(myFunc(1000,25))
   user  system elapsed 
  0.264   0.004   0.266 
>  system.time(myParFunc(1000,25))
   user  system elapsed 
  0.972   1.480   1.137

In this case, the parallelized function is much slower than the standard unparallelized version. Parallelization won’t magically make all your loops faster!

2. Only parallelize the outer loops
If you are using nested loops, you should probably refuse the urge to replace all your for-loops by parallelized loops. If you have parallelized loops within parallelized loops, you tend to lose much of the gain from the outer parallelization, as overhead costs increase. This is a special case of point 1 above: it is probably the entire inner loop that makes each step of the outer loop computation-heavy, rather than a single step in the inner loop.

Returning to the matrix inversion example, let’s now assume that we wish to use an inner loop to generate a new type of random matrices by multiplying with other matrices with uniformly distributed elements. The unparallelized and parallelized versions of this are as follows:

# No parallelization:
myFunc<-function(B,size)
{
   a<-vector("list", B)
   for(i in 1:B)
   {
    myMatrix<-matrix(runif(size^2),size,size)
    for(j in 1:B)
    {
        myMatrix<-myMatrix %*% matrix(runif(size^2),size,size)
    }
    a[[i]]<-solve(myMatrix)
   }
}

 

# Outer loop parallelized:
myParFunc<-function(B,size)
{
    foreach(i = 1:B) %dopar% 
    {
        myMatrix<-matrix(runif(size^2),size,size)
        for(j in 1:B)
        {
         myMatrix<-myMatrix %*% matrix(runif(size^2),size,size)
        }
        solve(myMatrix)
    }
}

 

# Both loops parallelized:
myParFunc2<-function(B,size)
{
    foreach(i = 1:B) %dopar% 
    {
        myMatrix<-matrix(runif(size^2),size,size)
        myMatrix<-myMatrix %*% Reduce("%*%",foreach(j = 1:B) %dopar%
        {
         matrix(runif(size^2),size,size)
        })
        solve(myMatrix)
    }
}

When the outer loop is parallelized, the performance is much better than when it isn’t:

>  system.time(myFunc(1000,25))
   user  system elapsed 
 89.305   0.012  89.381 
>  system.time(myParFunc(1000,25))
   user  system elapsed 
 46.767   0.780   5.239 

However, parallelizing the inner loop as well slows down the execution:

>   system.time(myParFunc2(1000,25))
   user  system elapsed 
970.809 428.371  51.302 

If you have several levels of nested loops, it may be beneficial to parallelize the two or three outmost loops. How far you should parallelize depends on what is going on inside each loop. The best way to find out how far you should go is to simply try different levels of parallelization.

3. More is not always merrier when it comes to cores
Using a larger number of cores means spending more time on assigning tasks to different cores. When I tested the code for a large-scale simulation study by using different number of cores on our 64-core number cruncher, it turned out that the code executed much faster when I used only 16 cores instead of 32 or 64. Using 16 cores, I might add, was however definitely faster than using 8 cores. Then again, in another simulation, using 32 cores turned out to be faster than using 16. Less is not always merrier either.

An example where a larger number of cores slows down the execution is the following, which is a modified version of myParFunc2 above, in which the multiplication has been replaced by addition to avoid issues with singular matrices:

myParFunc3<-function(B,size)
{
    foreach(i = 1:B) %dopar% 
    {
        myMatrix<-matrix(runif(size^2),size,size)
        myMatrix<-myMatrix %*% Reduce("+",foreach(j = 1:B) %dopar%
        {
         matrix(runif(size^2),size,size)
        })
        solve(myMatrix)
    }
}

 

> registerDoMC(cores=16)
> system.time(myParFunc3(1000,75))
    user   system  elapsed 
2095.571 1340.748  137.397

 

> registerDoMC(cores=32)
> system.time(myParFunc3(1000,75))
    user   system  elapsed 
2767.289 2336.450  104.773

 

> registerDoMC(cores=64)
> system.time(myParFunc3(1000,75))
    user   system  elapsed 
2873.555 3522.984  112.809

In this example, using 32 cores if faster than using either 16 or 64 cores. It pays off to make a small trial in order to investigate how many cores you should use for a particular parallelized function!

4. Make fair comparisons to unparallelized code
The doMC package includes the option of running foreach-loops without parallelization by simply typing %do% instead of %dopar%. This is nice as it lets you compare the performance of parallelized and unparallelized loops without having to rewrite the code too much. It is however important to know that the %do% loop behaves differently than the standard for loop that you probably would use if you weren’t going to parallelize your code. An example is given by the matrix-inversion functions from point 1 above. The %do% function for this is

# A standard (?) function:
myFunc2<-function(B,size)
{
    foreach(i = 1:B) %do% 
    {
        myMatrix<-matrix(runif(size^2),size,size)
        solve(myMatrix)
    }
}

But this function is noticeably slower than the standard function described above:

>  system.time(myFunc(1000,25))
   user  system elapsed 
  0.264   0.004   0.266
>  system.time(myParFunc2(1000,25))
   user  system elapsed 
  0.952   0.044   0.994

When comparing parallelized and unparallelized loops, avoid %doloops, as these make the parallelized loops seem better than they actually are.

I’m honestly not sure why there is such a large discrepancy between the two unparallelized for-loops. I posted a question on this matter on Stack Overflow last year, but to date it has not received any satisfactory answers. If you happen to know wherein the difference lies, I’d much appreciate if you would let me know.

That’s all for now. These are my four insights from playing around with parallelization in R. It is entirely possible that they contradict some well-known principles of parallel programming – I confess that I am almost completely ignorant about the literature on parallelization. Please let me know in the comments if you disagree with any of the points above or if you have counterexamples
!

For more on parallelization in R, check out R bloggers!

doMc is only one of many many packages for parallelization in R.

Speeding up R computations Pt II: compiling

A year ago I wrote a post on speeding up R computations. Some of the tips that I mentioned then have since been made redundant by a single package: compiler. Forget about worrying about curly brackets and whether to write 3*3 or 3^2 – compiler solves those problems for you.

compiler provides a byte code compiler for R, which offers the possibility of compiling your functions before executing them. Among other things, this speeds up those embarrassingly slow for loops that you’ve been using:

> myFunction<-function() { for(i in 1:10000000) { 1*(1+1) } }
> library(compiler)
> myCompiledFunction <- cmpfun(myFunction) # Compiled function

> system.time( myFunction() )
   user  system elapsed 
 10.002   0.014  10.021 
> system.time( myCompiledFunction() )
   user  system elapsed 
  0.692   0.008   0.700 

That’s 14 times faster!

Functions written in C (imported using Rcpp) are still much faster than the compiled byte code, but for those of us who

  • don’t know C,
  • know C but prefer to write code in R,
  • know C but don’t want to rewrite functions that we’ve already written in R,

compiler offers a great way of speeding up computations. It’s included in the recommended R packages since R 2.13 (meaning that it comes with your basic R installation) and since R 2.14 most standard functions are already compiled. If you still are running an older version of R it’s definitely time to update.

Speeding up R computations

The past few days I’ve been going through some R code that I wrote last year, when I was preparing a massive simulation-based power study for some tests for multivariate normality that I’ve been working on. My goal was to reduce the time needed to run the simulation. I wasn’t expecting great improvement, since I’ve always believed that the most common R functions are properly vectorized and optimized for speed. Turns out I was wrong. Very wrong.

The first thing that I did was that I replaced all parentheses ( ) by curly brackets { }. I was inspired to do so by this post (and this, via Xi’Ans Og) over at Radford Neal’s blog. As he pointed out, code that uses parentheses is actually slower than the same code with curly brackets:

> system.time( for(i in 1:1000000) { 1*(1+1) } )
   user  system elapsed 
  1.337   0.005   1.349 
> system.time( for(i in 1:1000000) { 1*{1+1} } )
   user  system elapsed 
  1.072   0.003   1.076 

Similarly, you can compare a*a and a^2:
> system.time( for(i in 1:10000000) 3^2 )
   user  system elapsed 
  5.048   0.028   5.088 
> system.time( for(i in 1:10000000) 3*3 )
   user  system elapsed 
  4.721   0.024   4.748 
So, a^2 is slower than a*a. This made me wonder, are there other built-in R functions that are slower than they ought to be?

One thing that I found very surprising, and frankly rather disturbing, is that mean(x) takes ten times as long to calculate the mean value of the 50 real numbers in the vector x as the “manual” function sum(x)/50:


> x<-rnorm(50)
> system.time(for(i in 1:100000){mean(x)})
   user  system elapsed 
  1.522   0.000   1.523 
> system.time(for(i in 1:100000){sum(x)/length(x)})
   user  system elapsed 
  0.200   0.000   0.200 
> system.time(for(i in 1:100000){sum(x)/50})
   user  system elapsed 
  0.167   0.000   0.167 
> system.time(for(i in 1:100000){ overn<-rep(1/50,50); x%*%overn })
   user  system elapsed 
  0.678   0.000   0.677 
> overn<-rep(1/50,50); system.time(for(i in 1:100000){ x%*%overn })
   user  system elapsed 
  0.164   0.000   0.164 

I guess that the R development core team have been focusing on making R an easy-to-use high level programming language rather than optimizing all functions, but the poor performance of mean is just embarrassing.

Similarly, the var function can be greatly improved upon. Here are some of the many possibilites:


> x <- rnorm(50)
> system.time( for(i in 1:100000) { var(x) } )
   user  system elapsed 
  4.921   0.000   4.925 
> system.time( for(i in 1:100000) { sum((x-mean(x))^2)/{length(x)-1} } )
   user  system elapsed 
  2.322   0.000   2.325 
> system.time( for(i in 1:100000) { {sum(x*x)-sum(x)*sum(x)/length(x)}/{length(x)-1} } )
   user  system elapsed 
  0.736   0.000   0.737 
> system.time( for(i in 1:100000) { {sum(x*x)-sum(x)*sum(x)/50}/49 } )
   user  system elapsed 
  0.618   0.000   0.618 

> system.time( for(i in 1:100000) { sx<-sum(x); {sum(x*x)-sx*sx/50}/49 } )
   user  system elapsed 
  0.567   0.000   0.568

I changed all the uses of mean in my code to “sum/n” instead (and avoided using var entirely) and found that this sped things up quite a bit.
Another trick to speed up your computations is to create the vectors that you wish to change within a loop with the right number of elements. While
a<-NA
for(j in 1:100) a[j]<-j 
works just fine, it is actually quite a bit slower than
a<-rep(NA,100)
for(j in 1:100) a[j]<-j
You could create a in other ways as well of course, for instance by a<-vector(length=100). Here are the numbers:

> system.time( for(i in 1:100000) { a<-NA; for(j in 1:100) a[j]<-j })
   user  system elapsed 
 37.383   0.092  37.482 
> system.time( for(i in 1:100000) { a<-rep(NA,100); for(j in 1:100) a[j]<-j })
   user  system elapsed 
 25.866   0.065  25.936 
> system.time( for(i in 1:100000) { a<-vector(length=100); for(j in 1:100) a[j]<-j })
   user  system elapsed 
 25.517   0.022  25.548

In my case, I’d been a bit sloppy with creating the vectors in my loops in the proper way, so I changed this in my code as well.

In my simulation study, I simulate multivariate random variables, compute some test statistics and use these to estimate the powers of the normality tests against various alternatives. After doing the changes mentioned above, I compared the performance of my old code to that of the new code, for 1000 iterations of the procedure:

> system.time( source(“oldCode.R”) )
   user  system elapsed 
548.045   0.273 548.622 
> system.time( source(“newCode.R”) )

   user  system elapsed 
 93.138   0.002  93.194
The improved code is almost 6 times faster than the old code. When you do ten million or so iterations, that matters. A lot.

In conclusion, it’s definitely possible to speed up your code significantly if you know of the pitfalls of R. I suspect that I’ll be obsessed with finding more pitfalls in the next few weeks, so I’d be thankful for any hints about other weaknesses that R has.

It should probably be mentioned that R is really fast when things are properly vectorized. Last year, a coworker that uses Matlab challenged me to perform a number of matrix computations faster in R than in Matlab. To his great surprise, R won.

As a final remark, I’m now facing a bit of a dilemma. Should I write readable code; a^6; or fast code; a*a*a*a*a*a?

Update: looking to speed up your R computations even more? See my posts on compiling your code and parallelization.