Getting to Know a Database: Using SQL-Compliant Methods to Retrieve Various Information
Introduction
Recently I was introduced to a database which contains tens of thousands of tables and millions of columns. Since I don’t have much documentation at hand, I constantly have the feeling that I don’t know enough to use the database while I was exploring it.
For that reason, I did some research, and discovered some nice ways to gather various information about the database and its objects (tables, columns, functions, etc.). The purpose of this post, therefore, is to document the discoveries that I made and share the knowledge. Based on this knowledge, I created some custom functions to simplify the workflow. The introduction and source code of the functions can be found in my other post.
It should be mentioned that the database I encountered is managed by Greenplum, which is based on PostgreSQL. Therefore, the methods introduced in this post is PostgreSQL-centric. However, to make the methods more portable, I tried to conform to the SQL standard (reserved keywords, the information schema, etc.).
Version
A database management system (DBMS) in production is typically several versions behind the latest release. Many issues can occur if a wrong version is assumed. Therefore, one useful thing to know about when working with a database is its version. In PostgreSQL, this information can be queried with the version()
function:
SELECT version();
Current user
The name of the current user may not be very important in interactive use; however, it can be very useful when writing scripts/functions that apply to specific users. This information can be queried with the following command:
SELECT current_user;
Note that current_user
should be seen as a function. The reason it must be called without trailing parentheses is that it is a reserved keyword in the SQL standard and has some special syntactic status.
Current catalog
The name of the catalog1 (i.e., database) that the current session connects to can be queried with the current_catalog
function, which should also be called without trailing parentheses:
1 “Catalog” is a term used by the SQL standard.
SELECT current_catalog;
Current schema
The name of the current schema can be queried with the current_schema
function, which can be called without trailing parentheses:
SELECT current_schema;
Tables and views
The information_schema.tables
view provides various information about all tables and views in the current database that the current user has access to. I found the following columns to be particularly useful:
table_schema
table_name
table_type
(one of:BASE TABLE
,VIEW
,FOREIGN TABLE
,LOCAL TEMPORARY
)
More information about views can be queried from information_schema.views
. In addition to table_schema
and table_name
, it also contains a column named view_definition
which, as the name suggests, stores query expression defining each view. For example:
SELECT table_schema, table_name, table_type, view_definition
FROM information_schema.views;
Columns
The information_schema.columns
view provides various information about all table/view columns in the current database that the current user has access to. I found the following columns to be particularly useful:
table_schema
table_name
column_name
ordinal_position
data_type
For example:
SELECT table_schema, table_name, column_name, ordinal_position, data_type
FROM information_schema.columns;
Routines
The information_schema.routines
view provides various information about all functions in the current database that the current user has access to. I found the following columns to be particularly useful:
routine_schema
routine_name
specific_name
data_type
(return data type of the function)routine_definition
(source code of the function)external_language
(SQL
,PLPGSQL
,C
,INTERNAL
, etc.)
For example:
SELECT routine_schema, routine_name, specific_name, data_type, routine_definition, external_language
FROM information_schema.routines;
Note that functions with different input parameters may share the same name (i.e., routine_name
) in one schema (this feature is termed “function overloading”2). A function is uniquely identified by the combination of the name of the function, the ordinal position and the data type of the input parameters. The specific_name
column in information_schema.routines
stores a convenient unique identifier for each function.
2 See the wiki page for a more in-depth introduction.
In addition to the information_schema.routines
view, it is also worth noting the information_schema.parameters
view. This view provides various information about the parameters of all functions in the current database that the current user has access to. I found the following columns to be particularly useful:
specific_catalog
specific_name
ordinal_position
parameter_mode
(IN
,OUT
,INOUT
, etc.)parameter_name
data_type
For example:
SELECT specific_catalog, specific_name, ordinal_position, parameter_mode, parameter_name, data_type
FROM information_schema.parameters;
Constraints
There are several views that provide information about table constraints, including information_schema.table_constraints
, information_schema.key_column_usage
, information_schema.referential_constraints
and information_schema.check_constraints
.
For any table with constraints, information_schema.table_constraints
lists the schema (constraint_schema
), name (constraint_name
) and type (constraint_type
) of each constraint.
For any unique, primary key or foreign key constraint, information_schema.key_column_usage
lists the involving tables (table_name
) and columns (column_name
), as well as their ordinal position (ordinal_position
) in the constraint.
For any foreign constraint, information_schema.referential_constraints
list the unique constraint (or primary key constraint) (unique_constraint_name
) that it references.
For any check constraint, information_schema.check_constraints
lists the corresponding check clause (check_clause
).
It should be noted that there is not a “NOT NULL” constraint type in the information schema; such constraints are stored as check constraints with a check clause IS NOT NULL
.
Conclusions
This post presents some basics of the sources that store various information about the database. To put such knowledge in practice, I have written some custom functions based on these sources. Check out my other post to learn more.