Recently I was introduced to a database which contains tens of thousands of tables and millions of columns. Since I don’t have much documentation at hand, I constantly have the feeling that I don’t know enough to use the database while I was exploring it.
For that reason, I did some research, and discovered some nice ways to gather various information about the database and its objects (tables, columns, functions, etc.). The purpose of this post, therefore, is to document the discoveries that I made and share the knowledge. Based on this knowledge, I created some custom functions to simplify the workflow. The introduction and source code of the functions can be found in my other post.
It should be mentioned that the database I encountered is managed by Greenplum, which is based on PostgreSQL. Therefore, the methods introduced in this post is PostgreSQL-centric. However, to make the methods more portable, I tried to conform to the SQL standard (reserved keywords, the information schema, etc.).
A database management system (DBMS) in production is typically several versions behind the latest release. Many issues can occur if a wrong version is assumed. Therefore, one useful thing to know about when working with a database is its version. In PostgreSQL, this information can be queried with the
The name of the current user may not be very important in interactive use; however, it can be very useful when writing scripts/functions that apply to specific users. This information can be queried with the following command:
current_user should be seen as a function. The reason it must be called without trailing parentheses is that it is a reserved keyword in the SQL standard and has some special syntactic status.
The name of the catalog1 (i.e., database) that the current session connects to can be queried with the
current_catalog function, which should also be called without trailing parentheses:
The name of the current schema can be queried with the
current_schema function, which can be called without trailing parentheses:
Tables and views
information_schema.tables view provides various information about all tables and views in the current database that the current user has access to. I found the following columns to be particularly useful:
More information about views can be queried from
information_schema.views. In addition to
table_name, it also contains a column named
view_definition which, as the name suggests, stores query expression defining each view. For example:
SELECT table_schema, table_name, table_type, view_definition FROM information_schema.views;
information_schema.columns view provides various information about all table/view columns in the current database that the current user has access to. I found the following columns to be particularly useful:
SELECT table_schema, table_name, column_name, ordinal_position, data_type FROM information_schema.columns;
information_schema.routines view provides various information about all functions in the current database that the current user has access to. I found the following columns to be particularly useful:
data_type(return data type of the function)
routine_definition(source code of the function)
SELECT routine_schema, routine_name, specific_name, data_type, routine_definition, external_language FROM information_schema.routines;
Note that functions with different input parameters may share the same name (i.e.,
routine_name) in one schema (this feature is termed “function overloading”2). A function is uniquely identified by the combination of the name of the function, the ordinal position and the data type of the input parameters. The
specific_name column in
information_schema.routines stores a convenient unique identifier for each function.
In addition to the
information_schema.routines view, it is also worth noting the
information_schema.parameters view. This view provides various information about the parameters of all functions in the current database that the current user has access to. I found the following columns to be particularly useful:
SELECT specific_catalog, specific_name, ordinal_position, parameter_mode, parameter_name, data_type FROM information_schema.parameters;
There are several views that provide information about table constraints, including
For any table with constraints,
information_schema.table_constraints lists the schema (
constraint_schema), name (
constraint_name) and type (
constraint_type) of each constraint.
For any unique, primary key or foreign key constraint,
information_schema.key_column_usage lists the involving tables (
table_name) and columns (
column_name), as well as their ordinal position (
ordinal_position) in the constraint.
For any foreign constraint,
information_schema.referential_constraints list the unique constraint (or primary key constraint) (
unique_constraint_name) that it references.
For any check constraint,
information_schema.check_constraints lists the corresponding check clause (
It should be noted that there is not a “NOT NULL” constraint type in the information schema; such constraints are stored as check constraints with a check clause
IS NOT NULL.
This post presents some basics of the sources that store various information about the database. To put such knowledge in practice, I have written some custom functions based on these sources. Check out my other post to learn more.