SQL conventions¶
Preface¶
Database guidelines at eurodata¶
Be sure to read the database guidelines in the eurodata engineering handbook.
Whenever a Mizool Playbook rule reiterates, clarifies, extends, tweaks or changes eurodata guidelines, it references the respective rule like this: ed-DB-321
.
For non-eurodata developers
Unfortunately, the eurodata engineering handbook is not available to third parties. Therefore, when starting to work on a PR for an open source project that uses this Mizool Playbook, please ask the eurodata developers if there are any rules that you should know about.
Applicability to legacy code¶
Legacy code often violates many of the rules given below. Unless working on a modernization effort with an explicit strategy for dealing with (or intentionally breaking) backwards compatibility, the rules only apply to newly created elements.
In situations where old and new code meet (e.g. adding a new column to an old table), applying current conventions could lead to inconsistencies with neighboring elements. If you encounter such a case, discuss it with other developers, then choose an approach to use and make a corresponding note on the surrounding element.
Naming conventions¶
General naming¶
- Use lower case for all identifiers in databases, and use
snake_case
for multi-word identifiers (see Capitalization and format, ed-DB-115 and ed-DB-116). - Avoid abbreviations and specifically making up your own abbreviations. For details, see General rules and ed-DB-118.
-
In general, names use English. However, German terms should be used when they refer to any of the following:
- business domain terms with company-specific meaning
- terms that are hard to translate as they are specific to German laws and institutions
- objects in other databases that the application interacts with
- arguments/fields in internal protocols (e.g. REST calls, asynchronous messages) or official standards
Examples
- The company-specific terms "Beraternummer" and "Mandantennummer" are not translated.
- "bank_statement" (or "bankstatement") is okay as the domain concept ("Kontoauszug") is not at all company specific and translating it does not cause loss of meaning.
- However, it is not acceptable if it refers to a specific table in another database that uses the German term.
Info
This rule is similar to, but less strict than ed-DB-112.
Tables and views¶
- Table names use singular (
book
, notbooks
). See ed-DB-117. - Do not add a general prefix or suffix to the name of views to distinguish them from tables (a practice suggested by ed-DB-119).
- A side effect of this rule is that you cannot have a table
foo_status
accompanied by av_foo_status
view, which we think would be confusing. If a view has some specific purpose, include it in the name instead, e.g.foo_status_printable
. This way, it (mostly) becomes an implementation detail whether a name refers to a view or a table.
- A side effect of this rule is that you cannot have a table
Columns¶
-
Primary key column names do not repeat the name of the table (see ed-DB-119).
Example
A
book
table could have a primary key column calledid
orkey
, but notbook_id
orbook_key
. -
Do not add prefixes and suffixes to the name of a column based on its type or usage.
- Specifically, do not add
fk_
to the name of a column that is (part of) a foreign key constraint (a practice suggested by ed-DB-119).
- Specifically, do not add
- Names of columns that hold ID/key values generated by another system should always unambiguously identify that system.
- You may feel that it's obvious to everybody that
external_id
orsource_id
refers to the company's "SplineReticulator Directory" product. But even if that were true at the time of writing, that does not mean that the same is true for future you or somebody else in charge of deciphering the code in 5 years. - Also, following this rule makes it easier to connect additional systems (each with their own column), or migrate from one to the other.
- As an alternative, a pair of columns can be used if the source system varies between rows.
For example,
source_id
holds the ID itself andsource_system
specifies the system that ID is from.
- You may feel that it's obvious to everybody that
Keys, indexes and constraints¶
The following rules are conscious departures from ed-DB-119.
Names of keys, indexes and constraints follow the structure given below, with segments separated by double underscores (__
).
Element | Name segments | Examples |
---|---|---|
Primary key | table namepk |
my_cool_table__pk |
Foreign key constraint | owning tablefk each included column of the owning table |
book__fk__author_id complaint__fk__order_id__order_line_pos |
Unique index (full or partial) |
table nameu title / each included column |
document__u__id__account_id subscription__u__one_active_per_account_id |
Other index (full or partial) |
table namei title / each included column |
group__i__parent_id orders__i__unbilled_order_numbers |
Check constraint | table namec title |
account__c__name_length contact__c__email_or_phone_set |
Name length
PostgreSQL's maximum name length is 63 characters, which should be sufficient for most elements, even foreign key constraints with multiple columns.
In case a name constructed according to the rules above ends up being too long, abbreviate or truncate some segments. However, double check that it is still obvious which column each name segment refers to, and that there is no ambiguity.
Sequences and triggers¶
-
Sequence names consist of the corresponding table name and the suffix
__seq
(same as ed-DB-119 but with__
instead of_
).Identity columns
Before creating a sequence, double check whether an identity column could be used instead. See the respective rule in the Table and column declarations section below.
-
Trigger names consist of the table name, two underscores and a short but meaningful title. We intentionally do not add prefixes (like
trg_
from ed-DB-119) or suffixes (e.g.bu
for "before update" from ed-DB-119).Examples
book__insert
,book__update
andbook__delete
my_cool_table__track_history
payment__trigger_processing
Table and column declarations¶
- Define primary keys, foreign key constraints, check constraints and indexes (unique or otherwise) explicitly, not as part of a column definition.
- Always specify the name of keys, constraints and indexes. Letting the RDBMS generate names would prevent developers from documenting their intention and often makes code hard to comprehend.
- Foreign key constraints
- Each relation between tables must be covered by a foreign key constraint (see ed-DB-107).
- Only specify the clauses that deviate from the default, e.g. include
on delete cascade
but don't addon update no action
. - If the referenced column is the primary key, specify only the table name without a column name:
constraint book__fk__author_id foreign key (author_id) references author -- leave out `(id)` here on delete cascade
- Primary keys should always be a surrogate key, also known as a synthetic key (see ed-DB-108).
- Using natural/business keys (like email addresses, or an employee ID generated by another system) may feel more intuitive at first, but it often causes problems later on.
- Instead, include the natural/business key as a data column, possibly with an index. However, if the data originates from another system, be careful with unique indexes as you may not be able to trust that system to enforce uniqueness at all times.
- This rule applies even if (or especially if) you cannot think of a scenario where the "obvious" natural/business key value of a row would change.
-
In PostgreSQL, columns holding database-generated integer IDs should be declared
bigint not null generated by default as identity
. Compared to the explicit creation of a sequence object, such identity columns are less prone to errors, and better document the author's intention.Note
We considered using the
generated always
clause instead ofgenerated by default
. However, we decided to use the latter as the alternative would make it very cumbersome to keep existing ID values during special operations like database moves and migrations.Still, during regular operation, our applications and scripts should never set their own values for identity columns.
-
As columns are nullable by default, omit the
null
keyword in column definitions.
Datatypes¶
Textual data¶
-
In PostgreSQL, whenever we use the
varchar
datatype, we do not specify its maximum length. Such columns can store several hundreds of thousands of characters. Not having to define the length in advance makes our SQL code more maintainable.Implementing length restrictions
If you do need to enforce a length restriction on the database level, use a check constraint instead. This way, changing the length restriction later on will not cause the RDBMS to physically rewrite the entire table.
Non-standard behavior
In ANSI SQL, the maximum length of a
varchar
must be specified. Still, the PostgreSQL behavior is so useful that we accept this tradeoff in portability. -
Avoid using PostgreSQL's
text
datatype. While it offers the same advantage as an unboundedvarchar
(see above), it would prevent us from ever using that column as part of an index.
Numeric data¶
-
In PostgreSQL, our default choice for storing numbers of any kind is to use
decimal
without specifying a precision or scale. Such columns can store several thousands of digits both before and after the decimal point.1 Not having to define precision or scale in advance makes our SQL code more maintainable.Non-standard behavior
ANSI SQL specifies that leaving out the precision or scale is equivalent to
(0)
, i.e. integer values. Still, the PostgreSQL behavior is so useful that we accept this tradeoff in portability. -
Only when there is a good reason to have the database coerce or round input values to a certain precision or scale, use
decimal(p,s)
with appropriate values. One such reason, obviously, is that the respective project does not use PostgreSQL. -
ANSI SQL specifies that
numeric
anddecimal
are synonyms. To avoid any potential confusion, our code exclusively uses thedecimal
name for this datatype.
Boolean values¶
-
In PostgreSQL and other RDBMS that support it or an equivalent datatype, we use
boolean
for all boolean-valued columns.- Always specify
boolean
columns asnot null
unlessnull
is forced upon us by the source of the data (e.g. an upstream system). Allowingnull
makes it harder to use the column later on, as it ends up having three possible states instead of two, leaving it up to debate whethernull
meansfalse
ortrue
. -
If you do need to allow
null
, be sure to add a comment explaining the meaning ofnull
values.Examples
- "null means there was no value present when importing."
- "If the user has not overridden the default value inherited from their group config, this column is null."
- Always specify
-
If
boolean
is not supported, it can be emulated with thesmallint
datatype (or equivalent). To maintain integrity and document your intention, add a check constraint that limits the column values to0
and1
.
Syntax¶
- Do not quote identifiers unless it is required, for example because a column name is a reserved word (e.g.
timestamp
). - Write SQL keywords, names of built-in objects and other predefined words in lower case.
- Type casts
- Prefer ANSI SQL syntax for casts over RDBMs specific syntax:
cast(my_value as text) -- good! my_value::text -- bad, avoid PostgreSQL syntax where possible.
- Avoid scattering type casts all over the place, it makes the code harder to read (and may even lead to back-and-forth casting which is confusing).
Instead, try to centralize them using either of the following strategies:
- near the original columns (e.g. the lowest levels of a multi-level Join) when working with raw data not usable otherwise (e.g. when dates or numbers are imported and stored as strings)
- at the latest possible step (e.g. when required by a join)
- Prefer ANSI SQL syntax for casts over RDBMs specific syntax:
-
Avoid RDBMS specific syntax whenever possible and use ANSI SQL instead.
Examples
- Oracle: use
coalesce()
instead ofnvl()
. - PostgreSQL: to return a single row table, use
SELECT VALUES('A Test String')
instead ofSELECT 'A Test String'
.
- Oracle: use
-
See the paragraph on "unconstrained numeric" columns in the numeric datatypes chapter of the PostgreSQL docs. ↩