Stuck on rgdal

Update: Thanks to Colin Rundel figuring out the problem and Roger Bivand implementing it, the devel version of rgdal (revision 758) on R-Forge installs beautifully.


I updated all of my linux computers to Fedora 28, and now can’t install rgdal.

I can install sf with no problems, so it isn’t an issue with GDAL or with proj.

I checked with Roger Bivand, the package maintainer, who asked me to confirm whether I could install sf, to make sure it wasn’t a dependency issue, and to post the logs online, then ask on the R-sig-geo mailing list. I’m putting the full problem here, and the abbreviated version on the mailing list.

I’d appreciate any thoughts: I rely very heavily on the incredibly useful rgdal package.


Fedora packages installed (fc28):

gdal 2.2.4-2
proj 4.9.6-3
gcc 8.1.1-1


R information:


> sessionInfo()
R version 3.5.0 (2018-04-23)
Platform: x86_64-redhat-linux-gnu (64-bit)
Running under: Fedora 28 (Workstation Edition)

Matrix products: default
BLAS/LAPACK: /usr/lib64/R/lib/libRblas.so

locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics grDevices utils datasets methods base

loaded via a namespace (and not attached):
[1] compiler_3.5.0 tools_3.5.0


The problem appears to be with the C++ configuration options, but is beyond my ability to figure out.

The install message is:

* installing *source* package ‘rgdal’ ...
** package ‘rgdal’ successfully unpacked and MD5 sums checked
configure: CC: gcc -m64
configure: CXX: g++ -m64
configure: rgdal: 1.3-2
checking for /usr/bin/svnversion... no
configure: svn revision: 755
checking whether the C++ compiler works... yes
checking for C++ compiler default output file name... a.out
checking for suffix of executables...
checking whether we are cross compiling... configure: error: in
/tmp/RtmpfMGBY5/R.INSTALL66c3b8deddb/rgdal':
configure: error: cannot run C++ compiled programs.
If you meant to cross compile, use
--host'.
See config.log' for more details
ERROR: configuration failed for package ‘rgdal’
* removing ‘/usr/lib64/R/library/rgdal’
* restoring previous ‘/usr/lib64/R/library/rgdal’


And here's config.log:

This file contains any messages produced by compilers while
running configure, to aid debugging if configure makes a mistake.

It was created by rgdal configure 1.3-2, which was
generated by GNU Autoconf 2.69. Invocation command line was

$ ./configure

## --------- ##
## Platform. ##
## --------- ##

hostname = scgwork
uname -m = x86_64
uname -r = 4.16.12-300.fc28.x86_64
uname -s = Linux
uname -v = #1 SMP Fri May 25 21:13:28 UTC 2018

/usr/bin/uname -p = x86_64
/bin/uname -X = unknown

/bin/arch = x86_64
/usr/bin/arch -k = unknown
/usr/convex/getsysinfo = unknown
/usr/bin/hostinfo = unknown
/bin/machine = unknown
/usr/bin/oslevel = unknown
/bin/universe = unknown

PATH: /usr/lib64/qt-3.3/bin
PATH: /usr/share/Modules/bin
PATH: /usr/local/bin
PATH: /usr/local/sbin
PATH: /usr/bin
PATH: /usr/sbin
PATH: /home/sarahg/.bin

## ----------- ##
## Core tests. ##
## ----------- ##

configure:1773: CC: gcc -m64
configure:1775: CXX: g++ -m64
configure:1778: rgdal: 1.3-2
configure:1781: checking for /usr/bin/svnversion
configure:1794: result: yes
configure:1809: svn revision: 755
configure:1988: checking for C++ compiler version
configure:1997: g++ -m64 --version >&5
g++ (GCC) 8.1.1 20180502 (Red Hat 8.1.1-1)
Copyright (C) 2018 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

configure:2008: $? = 0
configure:1997: g++ -m64 -v >&5
Using built-in specs.
COLLECT_GCC=g++
COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-redhat-linux/8/lto-wrapper
OFFLOAD_TARGET_NAMES=nvptx-none
OFFLOAD_TARGET_DEFAULT=1
Target: x86_64-redhat-linux
Configured with: ../configure --enable-bootstrap
--enable-languages=c,c++,fortran,objc,obj-c++,ada,go,lto --prefix=/usr
--mandir=/usr/share/man --infodir=/usr/share/info
--with-bugurl=http://bugzilla.redhat.com/bugzilla --enable-shared
--enable-threads=posix --enable-checking=release --enable-multilib
--with-system-zlib --enable-__cxa_atexit
--disable-libunwind-exceptions --enable-gnu-unique-object
--enable-linker-build-id --with-gcc-major-version-only
--with-linker-hash-style=gnu --enable-plugin --enable-initfini-array
--with-isl --enable-libmpx --enable-offload-targets=nvptx-none
--without-cuda-driver --enable-gnu-indirect-function --enable-cet
--with-tune=generic --with-arch_32=i686 --build=x86_64-redhat-linux
Thread model: posix
gcc version 8.1.1 20180502 (Red Hat 8.1.1-1) (GCC)
configure:2008: $? = 0
configure:1997: g++ -m64 -V >&5
g++: error: unrecognized command line option '-V'
g++: fatal error: no input files
compilation terminated.
configure:2008: $? = 1
configure:1997: g++ -m64 -qversion >&5
g++: error: unrecognized command line option '-qversion'; did you mean
'--version'?
g++: fatal error: no input files
compilation terminated.
configure:2008: $? = 1
configure:2028: checking whether the C++ compiler works
configure:2050: g++ -m64 -I/usr/local/include -Wl,-z,relro -Wl,-z,now
-specs=/usr/lib/rpm/redhat/redhat-hardened-ld conftest.cpp >&5
configure:2054: $? = 0
configure:2102: result: yes
configure:2105: checking for C++ compiler default output file name
configure:2107: result: a.out
configure:2113: checking for suffix of executables
configure:2120: g++ -m64 -o conftest -I/usr/local/include
-Wl,-z,relro -Wl,-z,now -specs=/usr/lib/rpm/redhat/redhat-hardened-ld
conftest.cpp >&5
configure:2124: $? = 0
configure:2146: result:
configure:2168: checking whether we are cross compiling
configure:2176: g++ -m64 -o conftest -I/usr/local/include
-Wl,-z,relro -Wl,-z,now -specs=/usr/lib/rpm/redhat/redhat-hardened-ld
conftest.cpp >&5
/usr/bin/ld: /tmp/cc9pfZ1b.o: relocation R_X86_64_32 against
.rodata'
can not be used when making a shared object; recompile with -fPIC
/usr/bin/ld: final link failed: Nonrepresentable section on output
collect2: error: ld returned 1 exit status
configure:2180: $? = 1
configure:2187: ./conftest
./configure: line 2189: ./conftest: No such file or directory
configure:2191: $? = 127
configure:2198: error: in /home/sarahg/Downloads/rgdal':
configure:2200: error: cannot run C++ compiled programs.
If you meant to cross compile, use
--host'.
See `config.log' for more details

## ---------------- ##
## Cache variables. ##
## ---------------- ##

ac_cv_env_CCC_set=
ac_cv_env_CCC_value=
ac_cv_env_CPPFLAGS_set=
ac_cv_env_CPPFLAGS_value=
ac_cv_env_CXXFLAGS_set=
ac_cv_env_CXXFLAGS_value=
ac_cv_env_CXX_set=
ac_cv_env_CXX_value=
ac_cv_env_LDFLAGS_set=
ac_cv_env_LDFLAGS_value=
ac_cv_env_LIBS_set=
ac_cv_env_LIBS_value=
ac_cv_env_build_alias_set=
ac_cv_env_build_alias_value=
ac_cv_env_host_alias_set=
ac_cv_env_host_alias_value=
ac_cv_env_target_alias_set=
ac_cv_env_target_alias_value=
ac_cv_file__usr_bin_svnversion=yes

## ----------------- ##
## Output variables. ##
## ----------------- ##

CPPFLAGS='-I/usr/local/include'
CXX='g++ -m64'
CXXFLAGS=''
DEFS=''
ECHO_C=''
ECHO_N='-n'
ECHO_T=''
EXEEXT=''
GDAL_CONFIG=''
HAVE_CXX11=''
LDFLAGS='-Wl,-z,relro -Wl,-z,now -specs=/usr/lib/rpm/redhat/redhat-hardened-ld'
LIBOBJS=''
LIBS=''
LTLIBOBJS=''
OBJEXT=''
PACKAGE_BUGREPORT='Roger.Bivand@nhh.no'
PACKAGE_NAME='rgdal'
PACKAGE_STRING='rgdal 1.3-2'
PACKAGE_TARNAME='rgdal'
PACKAGE_URL=''
PACKAGE_VERSION='1.3-2'
PATH_SEPARATOR=':'
PKG_CPPFLAGS=''
PKG_LIBS=''
SHELL='/bin/sh'
ac_ct_CXX=''
bindir='${exec_prefix}/bin'
build_alias=''
datadir='${datarootdir}'
datarootdir='${prefix}/share'
docdir='${datarootdir}/doc/${PACKAGE_TARNAME}'
dvidir='${docdir}'
exec_prefix='NONE'
host_alias=''
htmldir='${docdir}'
includedir='${prefix}/include'
infodir='${datarootdir}/info'
libdir='${exec_prefix}/lib'
libexecdir='${exec_prefix}/libexec'
localedir='${datarootdir}/locale'
localstatedir='${prefix}/var'
mandir='${datarootdir}/man'
oldincludedir='/usr/include'
pdfdir='${docdir}'
prefix='NONE'
program_transform_name='s,x,x,'
psdir='${docdir}'
sbindir='${exec_prefix}/sbin'
sharedstatedir='${prefix}/com'
sysconfdir='${prefix}/etc'
target_alias=''

## ----------- ##
## confdefs.h. ##
## ----------- ##

/* confdefs.h */
#define PACKAGE_NAME "rgdal"
#define PACKAGE_TARNAME "rgdal"
#define PACKAGE_VERSION "1.3-2"
#define PACKAGE_STRING "rgdal 1.3-2"
#define PACKAGE_BUGREPORT "Roger.Bivand@nhh.no"
#define PACKAGE_URL ""

configure: exit 1


Ooops.

I learned a new thing about R!

That is not at all what I’d intended. I wanted to compare two related datasets, each a column in a separate data frame.

Here’s a reproducible example.

Files and memory

I’m still working with that 6TB of data, and likely will be for a long time. The data are divided up by time: full spatial extent for a few time periods. The research group would also like to have time series for a particular location, but I can’t load all the data into memory. My current approach is to load a dataset, then save it as smaller chunks (expect more on file formats and save/load options later). The chunks are small enough in spatial extent that I can open them all and assemble a time series.

Looping over a large number of files in R, doing things with them, then writing them out again can lead to slow memory leaks, even if files are over-written. Hadley Wickham talks about memory management in R in Advanced R. I spent some time poking around with the pryr package, just out of curiousity, but there’s an easier solution: stick all the heavy lifting into a function. As long as the function doesn’t return something that includes its environment, the memory is freed upon exit.

All the file handling (reading and writing) goes into the function.

Then the function is called for the full list of possible patterns.

No clean-up, no memory leaks. The operating system no longer kills my process every time I leave it.

File compression

I have about 6TB of climate data to manage, and more on the way. Besides a decent array of hard drives and a clever backup strategy, what tools can I use to help maintain these data in a useful way? They’re in NetCDF files, which is a decent (if non-user-friendly) way to maintain multidimensional data (latitude, longitude, time).

We’re mostly interested in summaries of these data right now (CLIMDEX, BIOCLIM, and custom statistics), and once these are calculated the raw data themselves will not be accessed frequently. But infrequently is not the same as never, so I can’t just put them on a spare hard drive in a drawer.

What are the compression options available to me, and what is the trade-off between speed and size for the NetCDF files I’m working with?

There are three major file compression tools on linux, the venerable gzip, bzip2, and the newer xz. I tried them out on a 285MB NetCDF file, one of the very many I’m working with. I included most compression (-9) and fastest (-1) options for each of the three tools, plus the default (-6) for gzip and xz. bzip2 doesn’t have the same range of options, just best (the default) and fast.

There wasn’t a huge difference in compression for this type of file, with the best (bzip2 -best) resulting in 2.4% of the original, and the worst (gzip -1) in 7.9% of the original size.

Speed, though: anywhere from 2.9 to 90.0 seconds to compress a single file. Uncompression time was about 1.6 seconds for gzip and xz regardless of option, and 3.2-3.6 for bzip2.

Compression tool results

For this file format, xz was useless: slow and not the most effective. bzip2 produced the smallest files, but not by a huge amount. gzip was fastest, but produced the largest files even on the best mode.

This matches what I expected, but the specifics are useful:

  • Using bzip2 -best I could get one set of files from 167GB to 4GB, but it would take 9.5 hours to do so.
  • Using gzip -1 I could get that set of files down to 13GB, and it would only take 24 minutes.

I think that’s a fair trade-off. The extra 9 hours is more important to me than the extra 9GB, and accessing a single file in 1.5 seconds instead of 3.5 also improves usability on the occasions when we need to access the raw data.