Setting up Connections Between R and Databases
This post provides instructions on how to set up connections between R and various databases (e.g., PostgreSQL, Oracle) on Ubuntu 18.04.
Installing ODBC library and drivers
Run the following command to install the unixODBC library:
-get install unixodbc unixodbc-dev --install-suggests apt
With the unixODBC library installed, you then install the ODBC driver for the target database. For example, the following command installs the ODBC driver for PostgreSQL (see the appendix for instructions on how to install the driver for Oracle):
-get install odbc-postgresql apt
After installation, check that the driver (/usr/lib/x86_64-linux-gnu/odbc/psqlodbcw.so
) exists.
Configuring ini files
There are two ini files to configure before you can access a database using ODBC:
~/.odbcinst.ini
, which contains driver information, particularly the name of the driver library. Multiple drivers can be specified in the same file.~/.odbc.ini
, which contains connection (data source) information, particularly the username, password, database and host information. Multiple data sources can be specified in the same file.
Here is a template for the ~/.odbcinst.ini
file:
[PostgreSQL Driver]
Driver = /usr/lib/x86_64-linux-gnu/odbc/psqlodbcw.so
Here is a template for the ~/.odbc.ini
file (note that the Driver
line corresponds to the driver defined in odbcinst.ini
):
[data_source_name]
Driver = PostgreSQL Driver
Database = database_name
Servername = localhost
UserName = user_name
Password = password
Port = 5432
Making connections and queries
With the above configurations, you can now connect to a database through the DBI
and odbc
packages in R (note that data_source_name
(DNS) corresponds to the name defined in ~/.odbc.ini
):
<- DBI::dbConnect(odbc::odbc(), "data_source_name") con
There are three ways to query data:
- You can query your data with DBI by using the
DBI::dbGetQuery()
function. Simply paste your SQL code into the R function as a quoted string. Care should be used to escape your quotes as needed. - You can write your code in dplyr syntax, and dplyr will translate your code into SQL. This can be done by passing the target connection and table to the
dplyr::tbl()
function. - You can also make SQL queries in rmarkdown with a
{sql}
code chunk, and specify your connection with theconnection = con
code chunk option. If you want to send the query output to an R dataframe, useoutput.var = "mydataframe"
in the code chunk options.
The followings are the examples to demonstrate the above descriptions, and they all return the number of rows in the table (i.e., relation) tb
from connection con
:
# The DBI way
::dbGetQuery(con, "select count(1) from tb;") DBI
# The dplyr way
tbl(con, "tb") %>%
count()
# The rmarkdown way
```{sql, connection = con, output.var = "mydataframe"}
\select count(1) from tb;
\```
# You can then access the variable `mydataframe` in following chunks.
Appendix: connecting R to Oracle
The setup for Oracle databases is more complicated1, and here is how.
1 At the time of writing, there seems to be some issues that prevents the use of Oracle data source names defined in the ~/.odbcinst.ini
file. When doing this, the R session would just abort.
After the installation of the unixODBC library, go to https://www.oracle.com/technetwork/database/database-technologies/instant-client/downloads/index.html and download the following Oracle Instant Client zip packages: Basic, SDK, ODBC. Extract the contents in the zip files to the directory /opt/oracle/instantclient_<version>
.
Change the working directory to /opt/oracle/instantclient_<version>
and execute the following commands:
ln -s libclntsh.so.<version> libclntsh.so # This step is necessary for Oracle databases prior to version 18.3
ln -s libocci.so.<version> libocci.so # This step is necessary for Oracle databases prior to version 18.3
sudo apt install libaio1
./odbc_update_ini.sh /
To use the local naming method, put the tnsnames.ora
file under /opt/oracle/instantclient_<version>/network/admin/
. Note that the linebreak character of the file should be LF
instead of CRLF
, otherwise there may be an error when trying to create the connection.
Edit ~/.profile
and add the following lines:
export TNS_ADMIN=/opt/oracle/instantclient_<version>/network/admin/
export OCI_LIB=/scratch/instantclient_<version>/
export LD_LIBRARY_PATH=/scratch/instantclient_<version>:$LD_LIBRARY_PATH
After making sure the above environment variables are set, install the ROracle
package in R:
install.packages("ROracle")
To create a connection to an Oracle database, use the following command (note that dbname
can be a connect descriptor defined in the tnsnames.ora
file or a connect string):
library(ROracle)
<- DBI::dbConnect(DBI::dbDriver("Oracle"), username = "", password = "", dbname="")
con Sys.setenv(NLS_LANG = "SIMPLIFIED CHINESE_CHINA.AL32UTF8", TZ = "UTC", ORA_SDTZ = "UTC")
AL32UTF8 stores characters beyond U+FFFF as four bytes (exactly as Unicode defines UTF-8). Oracle’s “UTF8” stores these characters as a sequence of two UTF-16 surrogate characters encoded using UTF-8 (or six bytes per character)