Getting to Know a Database: Using SQL-Compliant Methods to Retrieve Various Information

hacking skills

Author

zenggyu

Published

2018-08-11

Abstract

Shows how to use the information schema to get useful information of various database objects.

Introduction

Recently I was introduced to a database which contains tens of thousands of tables and millions of columns. Since I don’t have much documentation at hand, I constantly have the feeling that I don’t know enough to use the database while I was exploring it.

For that reason, I did some research, and discovered some nice ways to gather various information about the database and its objects (tables, columns, functions, etc.). The purpose of this post, therefore, is to document the discoveries that I made and share the knowledge. Based on this knowledge, I created some custom functions to simplify the workflow. The introduction and source code of the functions can be found in my other post.

It should be mentioned that the database I encountered is managed by Greenplum, which is based on PostgreSQL. Therefore, the methods introduced in this post is PostgreSQL-centric. However, to make the methods more portable, I tried to conform to the SQL standard (reserved keywords, the information schema, etc.).

Version

A database management system (DBMS) in production is typically several versions behind the latest release. Many issues can occur if a wrong version is assumed. Therefore, one useful thing to know about when working with a database is its version. In PostgreSQL, this information can be queried with the version() function:

SELECT version();

Current user

The name of the current user may not be very important in interactive use; however, it can be very useful when writing scripts/functions that apply to specific users. This information can be queried with the following command:

SELECT current_user;

Note that current_user should be seen as a function. The reason it must be called without trailing parentheses is that it is a reserved keyword in the SQL standard and has some special syntactic status.

Current catalog

The name of the catalog¹ (i.e., database) that the current session connects to can be queried with the current_catalog function, which should also be called without trailing parentheses:

¹ “Catalog” is a term used by the SQL standard.

SELECT current_catalog;

Current schema

The name of the current schema can be queried with the current_schema function, which can be called without trailing parentheses:

SELECT current_schema;

Tables and views

The information_schema.tables view provides various information about all tables and views in the current database that the current user has access to. I found the following columns to be particularly useful:

table_schema
table_name
table_type (one of: BASE TABLE, VIEW, FOREIGN TABLE, LOCAL TEMPORARY)

More information about views can be queried from information_schema.views. In addition to table_schema and table_name, it also contains a column named view_definition which, as the name suggests, stores query expression defining each view. For example:

SELECT table_schema, table_name, table_type, view_definition
FROM information_schema.views;

Columns

The information_schema.columns view provides various information about all table/view columns in the current database that the current user has access to. I found the following columns to be particularly useful:

table_schema
table_name
column_name
ordinal_position
data_type

For example:

SELECT table_schema, table_name, column_name, ordinal_position, data_type
FROM information_schema.columns;

Routines

The information_schema.routines view provides various information about all functions in the current database that the current user has access to. I found the following columns to be particularly useful:

routine_schema
routine_name
specific_name
data_type (return data type of the function)
routine_definition (source code of the function)
external_language (SQL, PLPGSQL, C, INTERNAL, etc.)

For example:

SELECT routine_schema, routine_name, specific_name, data_type, routine_definition, external_language
FROM information_schema.routines;

Note that functions with different input parameters may share the same name (i.e., routine_name) in one schema (this feature is termed “function overloading”²). A function is uniquely identified by the combination of the name of the function, the ordinal position and the data type of the input parameters. The specific_name column in information_schema.routines stores a convenient unique identifier for each function.

² See the wiki page for a more in-depth introduction.

In addition to the information_schema.routines view, it is also worth noting the information_schema.parameters view. This view provides various information about the parameters of all functions in the current database that the current user has access to. I found the following columns to be particularly useful:

specific_catalog
specific_name
ordinal_position
parameter_mode (IN, OUT, INOUT, etc.)
parameter_name
data_type

For example:

SELECT specific_catalog, specific_name, ordinal_position, parameter_mode, parameter_name, data_type
FROM information_schema.parameters;

Constraints

There are several views that provide information about table constraints, including information_schema.table_constraints, information_schema.key_column_usage, information_schema.referential_constraints and information_schema.check_constraints.

For any table with constraints, information_schema.table_constraints lists the schema (constraint_schema), name (constraint_name) and type (constraint_type) of each constraint.

For any unique, primary key or foreign key constraint, information_schema.key_column_usage lists the involving tables (table_name) and columns (column_name), as well as their ordinal position (ordinal_position) in the constraint.

For any foreign constraint, information_schema.referential_constraints list the unique constraint (or primary key constraint) (unique_constraint_name) that it references.

For any check constraint, information_schema.check_constraints lists the corresponding check clause (check_clause).

It should be noted that there is not a “NOT NULL” constraint type in the information schema; such constraints are stored as check constraints with a check clause IS NOT NULL.

Conclusions

This post presents some basics of the sources that store various information about the database. To put such knowledge in practice, I have written some custom functions based on these sources. Check out my other post to learn more.