About Table Definition Basics

A table in a relational database is much like a table on paper: It consists of rows and columns. The number and order of the columns is fixed, and each column has a name. The number of rows is variable — it reflects how much data is stored at a given moment. SQL does not make any guarantees about the order of the rows in a table. When a table is read, the rows will appear in an unspecified order, unless sorting is explicitly requested. Furthermore, SQL does not assign unique identifiers to rows, so it is possible to have several completely identical rows in a table.

Each column has a data type. The data type constrains the set of possible values that can be assigned to a column and assigns semantics to the data stored in the column so that it can be used for computations. For instance, a column declared to be of a numerical type will not accept arbitrary text strings, and the data stored in such a column can be used for mathematical computations. By contrast, a column declared to be of a character string type will accept almost any kind of data but it does not lend itself to mathematical calculations, although other operations such as string concatenation are available.

Greenplum Database includes a sizable set of built-in data types that fit many applications. Users can also define their own data types. Most built-in data types have obvious names and semantics. Some of the frequently used data types are integer for whole numbers, numeric for possibly fractional numbers, text for character strings, date for dates, time for time-of-day values, and timestamp for values containing both date and time.

To create a table, you use the aptly named CREATE TABLE command. In this command, you specify at least a name for the new table, the names of the columns and the data type of each column. For example:

CREATE TABLE my_first_table (
    first_column text,
    second_column integer
);

This command creates a table named my_first_table with two columns. The first column is named first_column and has a data type of text; the second column has the name second_column and the type integer. The table and column names follow the identifier syntax explained in Identifiers and Key Words in the PostgreSQL documentation. The type names are usually also identifiers, but there are some exceptions. The column list is comma-separated and surrounded by parentheses.

Normally, you would specify names for your tables and columns that convey the kind of data that they store, for example:

CREATE TABLE products (
    product_no integer,
    name text,
    price numeric
);

(The numeric type can store fractional components, as would be typical of monetary amounts.)

Tip: When you create many interrelated tables, it is wise to choose a consistent naming pattern for the tables and columns. For instance, you may choose using singular or plural nouns for table names.

Greenplum Database limits the number of columns that a table can contain. Depending on the column types, the maximum is between 250 and 1600. Defining a table with anywhere near the maximum number of columns is highly unusual and often a questionable design.

If you no longer need a table, you can remove it using the DROP TABLE command. For example:

DROP TABLE my_first_table;
DROP TABLE products;

Attempting to drop a table that does not exist is an error. It is common in SQL script files to unconditionally try to drop each table before creating it, ignoring any error messages, so that the script works whether or not the table exists. You can use the DROP TABLE IF EXISTS variant to avoid the error messages, but this is not standard SQL.

Default Column Values

You can assign a column a default value. When a new row is created and no values are specified for some of the columns, Greenplum Database assigns those columns their respective default values. A data manipulation command can also request explicitly that a column be set to its default value, without having to know what that value is.

If you do not explicitly declare a default for a column, the default value is the null value, which can be considered to represent unknown data.

In a table definition, specify the default value after the column data type. For example:

CREATE TABLE products (
    product_no integer,
    name text,
    price numeric DEFAULT 9.99
);

The default value can be an expression, which Greenplum Database evaluates whenever the default value is inserted (not when the table is created). A common example is for a timestamp column to have a default of CURRENT_TIMESTAMP, so that it gets set to the time of row insertion. Another common example is generating a "serial number" for each row. In Greenplum Database, you can do this as follows:

CREATE TABLE products (
    product_no integer DEFAULT nextval('products_product_no_seq'),
    ...
);

where the nextval() function supplies successive values from a sequence object. This situation is sufficiently common that there's a special shorthand for it:

CREATE TABLE products (
    product_no SERIAL,
    ...
);

Generated Columns

A generated column is a special column that is always computed from other columns; it is for columns what a view is for tables. There are two kinds of generated columns: stored and virtual. A stored generated column is computed when it is written (inserted or updated) and occupies storage as if it were a normal column. A virtual generated column occupies no storage and is computed when it is read.

A virtual generated column is similar to a view and a stored generated column is similar to a materialized view (except that it is always updated automatically). Greenplum Database currently implements only stored generated columns.

To create a generated column, use the GENERATED ALWAYS AS clause in CREATE TABLE, for example:

CREATE TABLE people (
    ...,
    height_cm numeric,
    height_in numeric GENERATED ALWAYS AS (height_cm / 2.54) STORED
);

The keyword STORED must be specified to choose the stored kind of generated column. See CREATE TABLE for more details.

You cannot write directly to a generated column. While you may not specify a value for a generated column in an INSERT or UPDATE command, you can specify the keyword DEFAULT.

Note the following differences between a column with a default and a generated column:

  • The column default is evaluated once when the row is first inserted if no other value was provided; a generated column is updated whenever the row changes and cannot be overridden.
  • A column default may not refer to other columns of the table; a generation expression would normally do so.
  • A column default can use volatile functions, for example random() or functions referring to the current time; Greenplum Database does not permit this for generated columns.

Several restrictions apply to the definition of generated columns and tables involving generated columns:

  • The generation expression can use only immutable functions and cannot use subqueries or reference anything other than the current row in any way.

  • A generation expression cannot reference another generated column.

  • A generation expression cannot reference a system column, except tableoid.

  • A generated column cannot have a column default or an identity definition.

  • A generated column cannot be used as a distribution key.

  • A generated column cannot be part of a partition key.

  • You may specify a generated column in a root partitioned table but not in a child table.

  • For inheritance:

    • If a parent column is a generated column, a child column must also be a generated column using the same expression. Omit the GENERATED clause in the definition of the child column, it will be copied from the parent.
    • In the case of multiple inheritance, if one parent column is a generated column, then all parent columns must be generated columns and with the same expression.
    • If a parent column is not a generated column, a child column may be defined to be either a generated column or not.

Additional considerations apply to the use of generated columns:

  • Foreign tables can have generated columns. See CREATE FOREIGN TABLE for details.
  • Generated columns maintain access privileges separately from their underlying base columns. So, it is possible to grant privileges such that a particular role can read from a generated column but not from the underlying base columns.
  • Generated columns are, conceptually, updated after BEFORE triggers have run. Changes made to base columns in a BEFORE trigger will be reflected in generated columns. But conversely, Greenplum Database does not permit access to generated columns in BEFORE triggers.
check-circle-line exclamation-circle-line close-line
Scroll to top icon