Installing R, R packages (e.g., tidyverse) and Rstudio on Ubuntu Linux

Forewords

The Linux operating system is a great platform for computing. However, it takes some efforts for users who migrate from other operating systems (e.g., Windows) to get started. For example, when I migrated to Linux, I spent quite some time trying to understand how to install the latest version of R on the system. For those of you who are in the same situation like I was, I am writing this tutorial to help. In this tutorial, I will not only show you how it’s done, but will also inform you why each step is necessary, so that you can get a better understanding.

This tutorial is based on Ubuntu, which is perhaps the most popular Linux distribution. Before diving in, here’s something you need to know: (1) most Linux distributions including Ubuntu include a program called bash that runs various kinds of commands such as those for software/package management, e.g., apt; (2) you should not confuse package installation using the apt install command in a bash session (which can be invoked by pressing CTRL+ALT+T) and that using the install.packages() function in an R session (which can be invoked by entering R in a bash session).

Update: I’ve added an appendix at the end of this post, which provides a simple step-by-step guide on how to compile base R from source.

Installing r-base

You can install the r-base package, which includes the essential components of R, using the apt install command. By default, the command will search and install the components from a repository called Universe. However, the version of R included in this repository is typically not up-to-date. Alternatively, you can tell apt install to obtain the latest version from a CRAN repository. This can be achieved by first adding the following entry in your /etc/apt/sources.list file in a new line (you can run a text editor as root, e.g., sudo nano /etc/apt/sources.list, to add the entry):

deb https://<my.favorite.cran.mirror>/bin/linux/ubuntu <code-name-adjective>-cran40/

Note that you should replace <my.favorite.cran.mirror> with one of the urls provided by this site; replace <code-name-adjective> with your Ubuntu release code name adjective (see here for a full list). For example, to obtain the latest R 4.0 packages, add an entry like

# for Ubuntu 22.04
deb https://cloud.r-project.org/bin/linux/ubuntu jammy-cran40/

or

# for Ubuntu 20.04
deb https://cloud.r-project.org/bin/linux/ubuntu focal-cran40/

or

# for Ubuntu 18.04
deb https://cloud.r-project.org/bin/linux/ubuntu bionic-cran40/

After adding the entry, you will also need to add a key to your system so that apt can perform signature checking of the Release File for the added repository to verify its authenticity. The CRAN repository for Ubuntu is signed with the key of “Michael Rutter [email protected]” (see https://cran.r-project.org/bin/linux/ubuntu/README.html). To add the key, enter the following command in bash:

sudo apt-key adv --keyserver keyserver.ubuntu.com --recv-keys E298A3A825C0D65DFD57CBB651716619E084DAB9

Now that the setup is done, you can install the latest version of R (remember to first update package index files from the repository):

sudo apt update
sudo apt install r-base

Now, you should be able to open R by entering R in bash.

Installing r-base-dev

After the installation of the core packages, you would typically want to install additional R packages using the install.packages() function in R. However, the function depends on the r-base-dev package to compile source code for some R packages. Therefore, prior to using the install.packages() function, you should first install the r-base-dev package. Like r-base, the r-base-dev package can be installed in a bash session:

sudo apt install r-base-dev

Now, you should be able to install most R packages using the install.packages() function in R.

Installing tidyverse (or other R packages with system dependencies)

You are perhaps aware that, when you install an R package that depends on other uninstalled R package(s), the install.packges() function will automatically install all the required packages at once, even if you don’t explicitly tell it to.

However, things are different when you install an R package that depends on other uninstalled non-R package(s). Under this circumstance, you will always need to manually install the required package(s) before you use the install.packages() function.

A good example would be an R package called tidyverse. In case you don’t know, the tidyverse is a set of R packages (e.g., ggplot2, dplyr, …) developed for data science (e.g., data visualization, data manipulation, …) under a tidy and elegant design philosophy. You can visit the home page for more information.

In short, the tidyverse package requires the following non-R packages: libcurl4-openssl-dev, libssl-dev, libxml2-dev. Like r-base and r-base-dev, you can install them in bash:

sudo apt install libcurl4-openssl-dev libssl-dev libxml2-dev

Now, you should be able to install tidyverse in R using the install.packages() function:

install.packages("tidyverse")

Installing RStudio and RStudio Server

If you are looking for an integrated development environment (IDE) for R, I recommend Rstudio. The installation of Rstudio is simple and straightforward. First, download an installer for your system from here. Second, double-click the installer (the installer for Ubuntu should have a file extension “.deb”) and follow the instructions. The whole process is very straightforward so I am not going into the details here.

If you want to use R on a remote Linux server, you’ll probably need RStudio Server as well. Briefly speaking, RStudio Server provides a browser based interface to R running on a remote Linux server. To install RStudio Server, you’ll need to follow the steps below on the remote server.

First of all, since RStudio Server by default does not allows system users (such as root) to authenticate, you need a normal user account with sudo privilege1 on the server. The following code shows how to create one:

useradd -m <user_name> # create a user with a home directory
passwd <user_name> # set password
usermod -G sudo <user_name> # add the user to the sudo group

Second, install gdebi2 and RStudio Server (you can find <deb_package_url> from here):

sudo apt install gdebi-core
wget <deb_package_url>
sudo gdebi <deb_package_name>

If you installed RStudio using a package manager binary (e.g. a Debian package or RPM) then RStudio is automatically registred as a deamon which starts along with the rest of the system. However, if you need to manually stop, start, and restart the serve, here’s how:

sudo rstudio-server stop
sudo rstudio-server start
sudo rstudio-server restart

There are a number of administrative commands which allow you to see what sessions are active and request suspension of running sessions (note that session data is not lost during a suspend):

sudo rstudio-server active-sessions # list all currently active sessions
sudo rstudio-server suspend-session <pid> # suspend an individual session
sudo rstudio-server force-suspend-session <pid> # a "force" variation of the suspend command which will send an interrupt to the session to request the termination of any running R command

After the above steps, you should be able to access RStudio Server through the web browser of your local machine. To do so, just enter the ip address (including port number; by default, RStudio Server listens on port 8787) of your remote machine.

Conclusions

In this post, I demonstrated and explained the process of installing R, as well as relevant components on Ubuntu. Once you are familiar with the concepts, you should be able to install additional R packages with more ease.

Appendix: Compile from source

TODO: figure out how to build R so that graphics (especially those with CJK characters) can be displayed on screen.

For some reasons3, you may want to compile base R from source. A simple step-by-step guide is given below (tested on Ubuntu 22.04 with R 4.2.0; for more details, see the R Installation and Administration manual):

First, install system dependencies using the following commands4:

# TODO: categorize dependencies by `capabilities()`

# required by r-base-core
sudo apt install zip unzip libpaper-utils xdg-utils libbz2-1.0 libc6 libcairo2 libcurl4 libglib2.0-0 libgomp1 libicu70 libjpeg8 liblzma5 libpango-1.0-0 libpangocairo-1.0-0 libpcre2-8-0 libpng16-16 libreadline8 libtcl8.6 libtiff5 libtirpc3 libtk8.6 libx11-6 libxt6 zlib1g ucf ca-certificates libcurl4-gnutls-dev

# required by r-base-dev
sudo apt install build-essential gcc g++ gfortran libatlas-base-dev libncurses5-dev libreadline-dev libjpeg-dev libpcre2-dev libpcre3-dev libpng-dev zlib1g-dev libbz2-dev liblzma-dev libicu-dev xauth pkg-config

# required for viewing graphics on-screen (configure R with `--with-x=no` if it is not needed)
sudo apt install xorg-dev

# needed by r-base-dev to build PDF help pages (optional; reasons not to install: 1. they are quite heavy; 2. PDF help pages are redundant since html ones are already included; 3. may interfere with tinytex, which is the recommended tex distribution for RMarkdown/Quarto.)
# sudo apt install texlive-base texlive-latex-base texlive-plain-generic texlive-fonts-recommended texlive-fonts-extra texlive-extra-utils texlive-latex-recommended texlive-latex-extra texinfo

Second, download the source tarfile of the desired R version (e.g., R-4.2.0.tar.gz) from CRAN, extract the content to a directory and execute the following command inside that directory to prepare for build:

./configure \
  --enable-memory-profiling \
  --enable-prebuilt-html \
  --enable-R-shlib \
  --with-blas
  
# TODO: how to build with `--with-tcltk`, `--with-cairo`, `--with-tiff` and `--with-libxml`?
# TODO: build with `--with-x` or `--without-x`?

Last, build, check and install (by default, the installation will be located somewhere under the /usr/local/ directory, e.g., /usr/local/lib/R and /usr/local/bin/R; you can change it in the previous configure step using the --prefix option):

make
make check
sudo make install
sudo chmod -R a+rx /usr/local/lib/R/ /usr/local/bin/R # may need to execute this command to grant privileges

  1. The sudo privilege is needed because you need to install various packages.

  2. This package handles the installation of RStudio and its dependencies on Debian/Ubuntu.

  3. For example, I upgraded to Ubuntu 22.04 soon after its initial release and I wanted to install R 4.2, which was also released at about the same time. At the time, the official CRAN repo for the new Ubuntu release was not ready, so I had to compile R from source. If you don’t want to go through the hassle, you may try an unofficial Ubuntu PPA provided by Dirk Eddelbuettel, which is usually more up-to-date than the official one: sudo add-apt-repository ppa:edd/misc && sudo apt update && sudo apt install r-base r-base-dev

  4. The dependencies can be found using apt show r-base-core and apt show r-base-dev. Some additional notes: 1. ATLAS (libatlas-base-dev + libatlas3-base) is a better alternative than BLAS (libblas3 + libblas-dev) + LAPACK (liblapack3 + liblapack-dev), in terms of performance. 2. libatlas-base-dev depends on libatlas3-base, and the latter provides libblas.so.3 as well as liblapack.so.3 for r-base-core. 3. Although not mentioned in apt show, libcurl4-gnutls-dev is also required by r-bese-core.

Next
comments powered by Disqus