Clean and new

R 4.0.0 was just released! Binaries have magically appeared in the usual places.

But when you install a major upgrade like 3.6 to 4.0, you find yourself with a fresh, clean, empty, package library. Yikes!

Fortunately, installing all your old CRAN packages is easy, because your old library hasn’t been deleted and you can reference it.

Mac OSX saves the complete library with version number, so you need to get not just the path to your library, but the correct path to the previous version, because the 4.0 path will have only base packages.

# use .libPaths() to get the base directory, for me on OSX
#"/Library/Frameworks/R.framework/Versions/4.0/Resources/library"
# and replace 4.0 with appropriate prior version
pkglist <- list.files("/Library/Frameworks/R.framework/Versions/3.6/Resources/library/")

# it would be more sensible to remove base packages FIRST
basepkg <- c('base', 'compiler', 'datasets', 'graphics', 'grDevices', 'grid', 'methods', 'parallel', 'splines', 'stats', 'stats4', 'tcltk', 'tools', 'utils')
pkglist <- pkglist[!(pkglist %in% basepkg)]
install.packages(pkglist)

Rich Shepard prompted me to look into the situation on linux, because it is different. There, the library directory doesn’t change, so you simply need to list its contents. On my server, .libPaths() returns two results, so here’s one way to get the package list from multiple locations.

# get list of packages from the library
pkglist <- unlist(lapply(.libPaths(), list.files))

# it would be more sensible to remove base packages FIRST
basepkg <- c('base', 'compiler', 'datasets', 'graphics', 'grDevices', 'grid', 'methods', 'parallel', 'splines', 'stats', 'stats4', 'tcltk', 'tools', 'utils')
pkglist <- pkglist[!(pkglist %in% basepkg)]
install.packages(pkglist)

I don’t know how this works on Windows, but I would be happy to add that information if someone shares their experience.

Then go get coffee, because it will take a while.

This doesn’t address GitHub or other packages, and that’s where it gets complicated. I for one don’t remember where I got all that other stuff!

What to do? Well, on my system install.packages() returned three warnings:

  1. Packages that couldn’t be installed.
  2. Base packages that shouldn’t be separately installed (also included in 1.) [EDIT: I moved the check for base packages to the original pkglist specification above, because there’s no reason to even try to install these.]
  3. Packages that are on CRAN but threw errors. This is often because some system component needs to be updated as well, and requires human intervention.

The warnings provide neat comma-separated lists of package names (except they use curly quotes WHY EVEN), so it’s fairly easy to create a list of everything in 1 that’s not part of 2.

notinstalled <- c( [list from warning message] )

basepkg <- c('base', 'compiler', 'datasets', 'graphics', 'grDevices', 'grid', 'methods', 'parallel', 'splines', 'stats', 'stats4', 'tcltk', 'tools', 'utils')

notinstalled <- notinstalled[!(notinstalled %in% basepkg)]

Here’s where it gets messy. Some of those packages are on GitHub, but you need to know the repository to install them using devtools. I don’t know that! Some of them are on CRAN but obsolete and can’t be installed. And some of them came from who-knows-where.

We can at least try to automatically install them from GitHub using the githubinstall package from CRAN, which will try to look up the repository name as well. It will list all the repositories with packages of that name, and if it can’t find an exact match will list similar matches. Multiple repo options require human intervention, but if there’s only one we can try to install it.

First see which packages have exactly one match on GitHub. Try to install those, and make a list of which ones failed.

# install.packages("githubinstall")
library("githubinstall")
notinstalled.gh <- lapply(notinstalled, gh_suggest)
notinstalled <- notinstalled[sapply(notinstalled.gh, length) > 1]

notinstalled.gh <- notinstalled.gh[sapply(notinstalled.gh, length) == 1]
gh.out <- lapply(notinstalled.gh, function(x)try(githubinstall(x, dependencies = TRUE, ask = FALSE)))
gh.failed <- notinstalled.gh[sapply(gh.out, function(x)inherits(x, "try-error"))]

And now, you have almost everything freshly installed on your R 4.0 system. There are three things left for you to deal with:

  1. The packages that install.packages() threw error messages for.
  2. The packages in notinstalled that weren’t cleanly matched on GitHub, and might have come from who-knows-where.
  3. The packages in gh.failed – on my system, there were three, and they all had dependencies that couldn’t be installed for whatever reason.

If all you use is CRAN packages, the first step might be all you need. If you’re like me and accumulate packages from all over, you might have a bit more work to do. Or you might decide you don’t care, and will ignore those lost sheep until you need them…

This started as a twitter thread, and rmflight suggested a couple of tools that may automate the process: reup and yamlpack. I haven’t tried either; I wanted to see all the intermediate steps myself. The packages that need human intervention are going to need that regardless, but that might make them easier.

For what it’s worth, of the 1136 packages installed on my laptop, one is on CRAN but threw an error, three were found on GitHub but couldn’t be installed, and seventeen more need to be tracked down. (Edit: most of those seventeen have been removed from CRAN because they aren’t being maintained; the rest I don’t actually care about.)

Posted in | Tagged |