Friday, October 12, 2012

Installing large number of packages in R



I have been learning R, more specifically install and maintain R for users.  Recently, I had to install the latest version 2.15.1 from source. Once it was installed, the next step was to install the packages from the older version (2.15.0) in the latest version.  Although installing packages in R is as simple as invoking the function install.packages(), it quickly becomes cumbersome when you have to install more than 400 packages.

Instead I resorted to a combination of R and Python to complete this process.

First, determine the list of all packages in the older version of R using the following commands

packs <- installed.packages()
exc <- names(packs[,'Package'])

Here, exc contains a column of package names.

Then, store this list in a text file, 'test.Rdata', so that it can be processed using Python.

write(exc,'test.Rdata')

To install, more than one package, a command similar to the one below can be used. This command will install the three packages, BiocInstaller, coda and DEGseq.  However, the aim of this blog post is to describe a method to install much more than these three packages.

install.packages(c("BiocInstaller", "coda", "DEGseq"),dependencies=TRUE) 

The Python script that generates this command is given below.  This program reads the column of data and concatenates them after adding " and , as appropriate.


fp = open("test.RData","r")
s = 'install.packages(c('
for i in fp.readlines():
        s = s+'"'+i.strip()+'",'
s = s+'),dependencies=TRUE)'
print s


Finally, copy the output of the Python program in to the R command line and wait a few hours to finish the installation.

I am assuming this is a common problem.  How do you handle it?  You can give your advice in the form of comments.

PS:  The concatenation can be performed using any other scripting language like perl, php, bash etc.