The Usage of ANSI C Escape Sequences in Various Programing Languages

Introduction

According to Wikipedia:

An escape sequence is a sequence of characters that does not represent itself when used inside a character or string literal, but is translated into another character or a sequence of characters that may be difficult or impossible to represent directly.

Escape sequences are widely used in C and many other languages, such as R, (Postgre)SQL and Bash. While they all follow the C style, there are still some differences in the usage, which can be a source of confusion for someone who is a user of all these programs. This post aims to present and summarise the differences in order to suggest more consistent usage.

Commonly used escape sequences

The ANSI C standard specifies many escape sequences. The following is a list of the most frequently used ones:

  • \': single quote
  • \": double quote
  • \n: newline
  • \r: carriage return
  • \t: horizontal tab
  • \\: backslash itself

The listed sequences can be used by all the programs (but with different syntaxes) introduced in the following sections. Note, however, that the list is not comprehensive and different programs may have different levels of support for other escape sequences.

Usage of escape sequences in R

In R, a string can either be bounded by single quotes or double quotes. Although single quotes and double quotes can be used interchangebly in this case, the latter is preferred1. To insert a special character into a string, simply put the corresponding escape sequence where it should be. For example, the following command2 prints a string with 'line one' on the first line and 'line two' on the second line:

cat("line one\nline two")

A single quote may also be embedded directly in a double-quote delimited string and vice versa.

Usage of escape sequences in PostgreSQL

The SQL standard requires that a regular string constant be bounded by single quotes, not double quotes. PostgreSQL respects the standard but also implements an extension that allows ANSI C escape sequences (i.e., escape string constant). To use this extension, a string should be bounded by a pair of single quotes following the letter E. For example, the following command returns the same string as the example in R:

SELECT E'line one\nline two';

In addition to using the sequence \' to include a single quote character within a string constant (e.g., E'I\'m Nick'), another way would be to write two adjacent single quotes (e.g., 'I''m Nick' or E'I''m Nick').

For anyone who is working with multiple versions of PostgreSQL, here is something to watch out for. Before v9.1, the default value of the configuration parameter standard_conforming_strings is off, in which case PostgreSQL recognizes escape sequences in both regular string constants and escape string constants. However, as of v9.1, the default becomes on, and escape sequences are only recognized in escape string constants. Therefore, use E'' instead of '' to define a string constant whenever the string contains escape sequences, as the former form is more standard compliant and portable.

Usage of escape sequences in Bash

In Bash, a string constant containing escape sequences should be placed between a pair of single quotes following the dollar sign character $. Bash would expand such string with escape sequences replaced by the corresponding special character. For example, to print the same string as the previous examples in Bash:

echo $'line one\nline two'

The usage of strings is more complicated in Bash due to features like variable expansion, command substitution, word splitting, metacharacters, etc. Such topics are beyond the scope of this post and are thus not introduced here. For those curious, refer to the official documentation(Free Software Foundation 2018) to learn about the gory details.

Summary

Here are some take-away suggestions regarding the usage of escape sequences in various programs.

  • In R, just put the sequences inside double quotes (e.g., "\n").
  • In PostgreSQL, put the sequences inside single quotes following the letter E (e.g., E'\n').
  • In Bash, put the sequences inside single quotes following the dollar sign character (e.g., $'\n').
  • If a string contains any single or double quote, always write it using the corresponding escape sequence (e.g., \' ,\") with the above syntaxes, as opposed to other inconsistent ways proposed by various languages.

Additional notes on the usage of regular expressions (regex)

  • Do not confuse characters denoted by ANSI C escape sequences (e.g., \n, which represents a newline character) with regex metacharacters (e.g., \s, which represents a pattern that matches any white space character including newline), and do not use the former in a regex pattern.
  • Since regex meta characters starts with a backslash, which needs to be escaped before sending to a regex parser, a regex metacharacter typically requires two backslashes in practice (e.g., specify "\\s", E'\\s' or $'\\s' instead of "\s", E'\s' or $'\s' as a regex pattern).
  • R, PostgreSQL and Bash adopt different implementations of regex. To make the Bash grep command more compatible with the regex implementations in R and PostgreSQL, specify the -P option (i.e., grep -P).
  • Do get used to features like non-greedy matching (e.g., .*?), look-ahead (e.g., (?=)) and look-behind (e.g., (?<=)), as they are not implemented by every program or are enabled by default.

References

Free Software Foundation. 2018. Bash Reference Manual. https://www.gnu.org/software/bash/manual/bash.html.

R core team. 2018. R Language Definition. https://cran.r-project.org/doc/manuals/r-release/R-lang.html.

The PostgreSQL Global Development Group. 2018. PostgreSQL 10.5 Documentation. https://www.postgresql.org/docs/10/static/index.html.


  1. This preference is intended to be consistent with languages like C and C++. See here.

  2. Note that unlike cat(), the print() function does not convert escape sequences to the corresponding special characters when printing the result.

Related

Next
Previous
comments powered by Disqus