phorge-phorge/src/docs/developer/database.diviner

@title Database Schema
@group developer

This document describes key components of the database schema and should answer
questions like how to store new types of data.

= Database System =

Phabricator uses MySQL with InnoDB engine. The only exception is the
`search_documentfield` table which uses MyISAM because MySQL doesn't support
fulltext search in InnoDB.

Let us know if you need to use other database system: @{article:Give Feedback!
Get Support!}.

= PHP Drivers =

Phabricator supports [[ http://www.php.net/book.mysql | MySQL ]] and
[[ http://www.php.net/book.mysqli | MySQLi ]] PHP extensions. Most installations
use MySQL but MySQLi should work equally well.

= Databases =

Each Phabricator application has its own database. The names are prefixed by
`phabricator_`. This design has two advantages:

* Each database is easier to comprehend and to maintain.
* We don't do cross-database joins so each database can live on its own machine
  which is useful for load-balancing.

= Connections =

Phabricator specifies if it will use any opened connection just for reading or
also for writing. This allows opening write connections to master and read
connections to slave in master/slave replication. It is useful for
load-balancing.

= Tables =

Each table name is prefixed by its application. For example, Differential
revisions are stored in database `phabricator_differential` and table
`differential_revision`. This duplicity allows easy recognition of the table in
DarkConsole (see @{article:Using DarkConsole}) and other places.

The exception is tables which share the same schema over different databases
such as `edge`.

We use lower-case table names with words separated by underscores. The reason is
that MySQL can be configured (with `lower_case_table_names`) to lower-case the
table names anyway.

= Column Names =

Phabricator uses camelCase names for columns. The main advantage is that they
directly map to properties in PHP classes.

Don't use MySQL reserved words (such as `order`) for column names.

= Data Types =

Phabricator uses `int unsigned` columns for storing dates instead of `date` or
`datetime`. We don't need to care about time-zones in both MySQL and PHP because
of it. The other reason is that PHP internally uses numbers for storing dates.

Phabricator uses UTF-8 encoding for storing all text data. We use
`utf8_general_ci` collation for free-text and `utf8_bin` for identifiers.

We don't use the `enum` data type because each change to the list of possible
values requires altering the table (which is slow with big tables). We use
numbers (or short strings in some cases) mapped to PHP constants instead.

= JSON =

Some data don't require structured access - you don't need to filter or order by
them. We store these data as text fields in JSON format. This approach has
several advantages:

* If we decide to add another unstructured field then we don't need to alter the
  table (which is slow for big tables in MySQL).
* Table structure is not cluttered by fields which could be unused most of the
  time.

An example of such usage can be found in column
`differential_diffproperty.data`.

= Primary Keys =

Most tables have auto-increment column named `id`. However creating such column
is not required for tables which are not usually directly referenced (such as
tables expressing M:N relations). Example of such table is
`differential_relationship`.

= Indexes =

Create all indexes necessary for fast query execution in most cases. Don't
create indexes which are not used. You can analyze queries @{article:Using
DarkConsole}.

= Foreign Keys =

We don't use InnoDB's foreign keys because our application is so great that
no inconsistencies can arise. It will just slow us down.

= PHIDs =

Each globally referencable object in Phabricator has its associated PHID
(Phabricator ID) which serves as a global identifier. We use PHIDs for
referencing data in different databases.

We use both autoincrementing IDs and global PHIDs because each is useful in
different contexts. Autoincrementing IDs are chronologically ordered and allow
us to construct short, human-readable object names (like D2258) and URIs. Global
PHIDs allow us to represent relationships between different types of objects in
a homogenous way.

For example, the concept of "subscribers" is more powerfully done with PHIDs
because we could theoretically have users, projects, teams, and more all
as "subscribers" of other objects. Using an ID column we would need to add a
"type" column to avoid ID collision; using PHIDs does not require this
additional column.

= Transactions =

Transactional code should be written using transactions. Example of such code is
inserting multiple records where one doesn't make sense without the other or
selecting data later used for update. See chapter in @{class:LiskDAO}.

= Advanced Features =

We don't use MySQL advanced features such as triggers, stored procedures or
events because we like expressing the application logic in PHP more than in SQL.
Some of these features (especially triggers) can also cause big confusion.

Avoiding these advanced features is also good for supporting other database
systems (which we don't support anyway).

= Schema Denormalization =

Phabricator uses schema denormalization for performance reasons sparingly. Try
to avoid it if possible.

= Changing the Schema =

There are three simple steps to update the schema:

# Create a `.sql` file in `resources/sql/patches/`. This file should:
 - Contain the approprate MySQL commands to update the schema.
 - Use `${NAMESPACE}` rather than `phabricator` for database names.
 - Use `COLLATE utf8_bin` for any columns that are to be used as identifiers,
such as PHID columns. Otherwise, use `COLLATE utf8_general_ci`.
 - Name all indexes so it is possible to delete them later.
# Edit `src/infrastructure/storage/patch/PhabricatorBuiltinPatchList.php` and
add your patch to @{method@phabricator:PhabricatorBuiltinPatchList::getPatches}.
# Run `bin/storage/upgrade`.

It is also possible to create more complex patches in PHP for data migration
(due to schema changes or otherwise.) However, the schema changes themselves
should be done in separate `.sql` files. Order can be guaranteed by editing
`src/infrastructure/storage/patch/PhabricatorBuiltinPatchList.php`
appropriately.

See the
[[https://secure.phabricator.com/rPb39175342dc5bee0c2246b05fa277e76a7e96ed3
| commit adding policy storage for Paste ]] for a reasonable example of the code
changes.

= See Also =

* @{class:LiskDAO}
* @{class:PhabricatorPHID}
Document database schema Summary: Please review carefully, me not very well on English. I am just guessing what were the design decisions in most parts of this document. Feel free to correct me or add more information. Test Plan: `diviner .` /docs/ /docs/article/Database_Schema.html Copy the text to Word and proofread. Reviewers: epriestley, btrahan Reviewed By: epriestley CC: nh, jungejason, aran, Koolvin Differential Revision: https://secure.phabricator.com/D2258 2012-04-17 15:06:40 -07:00			`@title Database Schema`
			`@group developer`

			`This document describes key components of the database schema and should answer`
			`questions like how to store new types of data.`

			`= Database System =`

			`Phabricator uses MySQL with InnoDB engine. The only exception is the`
			`search_documentfield` table which uses MyISAM because MySQL doesn't support
			`fulltext search in InnoDB.`

			`Let us know if you need to use other database system: @{article:Give Feedback!`
			`Get Support!}.`

			`= PHP Drivers =`

			`Phabricator supports [[ http://www.php.net/book.mysql \| MySQL ]] and`
			`[[ http://www.php.net/book.mysqli \| MySQLi ]] PHP extensions. Most installations`
			`use MySQL but MySQLi should work equally well.`

			`= Databases =`

			`Each Phabricator application has its own database. The names are prefixed by`
			`phabricator_`. This design has two advantages:

			`* Each database is easier to comprehend and to maintain.`
			`* We don't do cross-database joins so each database can live on its own machine`
			`which is useful for load-balancing.`

			`= Connections =`

			`Phabricator specifies if it will use any opened connection just for reading or`
			`also for writing. This allows opening write connections to master and read`
			`connections to slave in master/slave replication. It is useful for`
			`load-balancing.`

			`= Tables =`

			`Each table name is prefixed by its application. For example, Differential`
			revisions are stored in database `phabricator_differential` and table
			`differential_revision`. This duplicity allows easy recognition of the table in
			`DarkConsole (see @{article:Using DarkConsole}) and other places.`

			`The exception is tables which share the same schema over different databases`
			such as `edge`.

Explain why we store table names in lower-case Test Plan: Read it. Reviewers: epriestley Reviewed By: epriestley CC: aran, Korvin Differential Revision: https://secure.phabricator.com/D4607 2013-01-23 16:59:14 -08:00			`We use lower-case table names with words separated by underscores. The reason is`
			that MySQL can be configured (with `lower_case_table_names`) to lower-case the
			`table names anyway.`
Document database schema Summary: Please review carefully, me not very well on English. I am just guessing what were the design decisions in most parts of this document. Feel free to correct me or add more information. Test Plan: `diviner .` /docs/ /docs/article/Database_Schema.html Copy the text to Word and proofread. Reviewers: epriestley, btrahan Reviewed By: epriestley CC: nh, jungejason, aran, Koolvin Differential Revision: https://secure.phabricator.com/D2258 2012-04-17 15:06:40 -07:00
			`= Column Names =`

			`Phabricator uses camelCase names for columns. The main advantage is that they`
			`directly map to properties in PHP classes.`

			Don't use MySQL reserved words (such as `order`) for column names.

			`= Data Types =`

			Phabricator uses `int unsigned` columns for storing dates instead of `date` or
			`datetime`. We don't need to care about time-zones in both MySQL and PHP because
			`of it. The other reason is that PHP internally uses numbers for storing dates.`

			`Phabricator uses UTF-8 encoding for storing all text data. We use`
			`utf8_general_ci` collation for free-text and `utf8_bin` for identifiers.

Document enum database type Test Plan: Read. Reviewers: epriestley Reviewed By: epriestley CC: aran, Koolvin Differential Revision: https://secure.phabricator.com/D2286 2012-04-19 15:13:52 -07:00			We don't use the `enum` data type because each change to the list of possible
			`values requires altering the table (which is slow with big tables). We use`
			`numbers (or short strings in some cases) mapped to PHP constants instead.`

Document database schema Summary: Please review carefully, me not very well on English. I am just guessing what were the design decisions in most parts of this document. Feel free to correct me or add more information. Test Plan: `diviner .` /docs/ /docs/article/Database_Schema.html Copy the text to Word and proofread. Reviewers: epriestley, btrahan Reviewed By: epriestley CC: nh, jungejason, aran, Koolvin Differential Revision: https://secure.phabricator.com/D2258 2012-04-17 15:06:40 -07:00			`= JSON =`

			`Some data don't require structured access - you don't need to filter or order by`
			`them. We store these data as text fields in JSON format. This approach has`
			`several advantages:`

			`* If we decide to add another unstructured field then we don't need to alter the`
			`table (which is slow for big tables in MySQL).`
			`* Table structure is not cluttered by fields which could be unused most of the`
			`time.`

			`An example of such usage can be found in column`
			`differential_diffproperty.data`.

			`= Primary Keys =`

			Most tables have auto-increment column named `id`. However creating such column
			`is not required for tables which are not usually directly referenced (such as`
			`tables expressing M:N relations). Example of such table is`
			`differential_relationship`.

			`= Indexes =`

			`Create all indexes necessary for fast query execution in most cases. Don't`
			`create indexes which are not used. You can analyze queries @{article:Using`
			`DarkConsole}.`

			`= Foreign Keys =`

			`We don't use InnoDB's foreign keys because our application is so great that`
			`no inconsistencies can arise. It will just slow us down.`

			`= PHIDs =`

			`Each globally referencable object in Phabricator has its associated PHID`
			`(Phabricator ID) which serves as a global identifier. We use PHIDs for`
Improve docs for changing schema Test Plan: $ diviner . Reviewers: epriestley Reviewed By: epriestley CC: aran, Korvin Differential Revision: https://secure.phabricator.com/D3937 2012-11-08 20:44:45 -08:00			`referencing data in different databases.`
Document database schema Summary: Please review carefully, me not very well on English. I am just guessing what were the design decisions in most parts of this document. Feel free to correct me or add more information. Test Plan: `diviner .` /docs/ /docs/article/Database_Schema.html Copy the text to Word and proofread. Reviewers: epriestley, btrahan Reviewed By: epriestley CC: nh, jungejason, aran, Koolvin Differential Revision: https://secure.phabricator.com/D2258 2012-04-17 15:06:40 -07:00
			`We use both autoincrementing IDs and global PHIDs because each is useful in`
			`different contexts. Autoincrementing IDs are chronologically ordered and allow`
			`us to construct short, human-readable object names (like D2258) and URIs. Global`
			`PHIDs allow us to represent relationships between different types of objects in`
			`a homogenous way.`

Update database.schema doc to include how to upgrade the schema Summary: plus update the phid section a little Test Plan: read it, looked good. Reviewers: epriestley Reviewed By: epriestley CC: aran, Korvin Maniphest Tasks: T1287 Differential Revision: https://secure.phabricator.com/D3629 2012-10-04 15:26:16 -07:00			`For example, the concept of "subscribers" is more powerfully done with PHIDs`
			`because we could theoretically have users, projects, teams, and more all`
			`as "subscribers" of other objects. Using an ID column we would need to add a`
Improve docs for changing schema Test Plan: $ diviner . Reviewers: epriestley Reviewed By: epriestley CC: aran, Korvin Differential Revision: https://secure.phabricator.com/D3937 2012-11-08 20:44:45 -08:00			`"type" column to avoid ID collision; using PHIDs does not require this`
Update database.schema doc to include how to upgrade the schema Summary: plus update the phid section a little Test Plan: read it, looked good. Reviewers: epriestley Reviewed By: epriestley CC: aran, Korvin Maniphest Tasks: T1287 Differential Revision: https://secure.phabricator.com/D3629 2012-10-04 15:26:16 -07:00			`additional column.`

Document transactions in docs Summary: Also add some formatting and links. Also fix test broken by D2393. Test Plan: diviner . Reviewers: epriestley Reviewed By: epriestley CC: aran, Koolvin Differential Revision: https://secure.phabricator.com/D2461 2012-05-11 15:05:15 -07:00			`= Transactions =`

			`Transactional code should be written using transactions. Example of such code is`
			`inserting multiple records where one doesn't make sense without the other or`
			`selecting data later used for update. See chapter in @{class:LiskDAO}.`

Document database schema Summary: Please review carefully, me not very well on English. I am just guessing what were the design decisions in most parts of this document. Feel free to correct me or add more information. Test Plan: `diviner .` /docs/ /docs/article/Database_Schema.html Copy the text to Word and proofread. Reviewers: epriestley, btrahan Reviewed By: epriestley CC: nh, jungejason, aran, Koolvin Differential Revision: https://secure.phabricator.com/D2258 2012-04-17 15:06:40 -07:00			`= Advanced Features =`

			`We don't use MySQL advanced features such as triggers, stored procedures or`
			`events because we like expressing the application logic in PHP more than in SQL.`
			`Some of these features (especially triggers) can also cause big confusion.`

			`Avoiding these advanced features is also good for supporting other database`
			`systems (which we don't support anyway).`

			`= Schema Denormalization =`

			`Phabricator uses schema denormalization for performance reasons sparingly. Try`
			`to avoid it if possible.`

Update database.schema doc to include how to upgrade the schema Summary: plus update the phid section a little Test Plan: read it, looked good. Reviewers: epriestley Reviewed By: epriestley CC: aran, Korvin Maniphest Tasks: T1287 Differential Revision: https://secure.phabricator.com/D3629 2012-10-04 15:26:16 -07:00			`= Changing the Schema =`

			`There are three simple steps to update the schema:`

Improve docs for changing schema Test Plan: $ diviner . Reviewers: epriestley Reviewed By: epriestley CC: aran, Korvin Differential Revision: https://secure.phabricator.com/D3937 2012-11-08 20:44:45 -08:00			# Create a `.sql` file in `resources/sql/patches/`. This file should:
Update database.schema doc to include how to upgrade the schema Summary: plus update the phid section a little Test Plan: read it, looked good. Reviewers: epriestley Reviewed By: epriestley CC: aran, Korvin Maniphest Tasks: T1287 Differential Revision: https://secure.phabricator.com/D3629 2012-10-04 15:26:16 -07:00			`- Contain the approprate MySQL commands to update the schema.`
Improve docs for changing schema Test Plan: $ diviner . Reviewers: epriestley Reviewed By: epriestley CC: aran, Korvin Differential Revision: https://secure.phabricator.com/D3937 2012-11-08 20:44:45 -08:00			- Use `${NAMESPACE}` rather than `phabricator` for database names.
			- Use `COLLATE utf8_bin` for any columns that are to be used as identifiers,
Update database.schema doc to include how to upgrade the schema Summary: plus update the phid section a little Test Plan: read it, looked good. Reviewers: epriestley Reviewed By: epriestley CC: aran, Korvin Maniphest Tasks: T1287 Differential Revision: https://secure.phabricator.com/D3629 2012-10-04 15:26:16 -07:00			such as PHID columns. Otherwise, use `COLLATE utf8_general_ci`.
Improve docs for changing schema Test Plan: $ diviner . Reviewers: epriestley Reviewed By: epriestley CC: aran, Korvin Differential Revision: https://secure.phabricator.com/D3937 2012-11-08 20:44:45 -08:00			`- Name all indexes so it is possible to delete them later.`
			# Edit `src/infrastructure/storage/patch/PhabricatorBuiltinPatchList.php` and
Update database.schema doc to include how to upgrade the schema Summary: plus update the phid section a little Test Plan: read it, looked good. Reviewers: epriestley Reviewed By: epriestley CC: aran, Korvin Maniphest Tasks: T1287 Differential Revision: https://secure.phabricator.com/D3629 2012-10-04 15:26:16 -07:00			`add your patch to @{method@phabricator:PhabricatorBuiltinPatchList::getPatches}.`
Improve docs for changing schema Test Plan: $ diviner . Reviewers: epriestley Reviewed By: epriestley CC: aran, Korvin Differential Revision: https://secure.phabricator.com/D3937 2012-11-08 20:44:45 -08:00			# Run `bin/storage/upgrade`.
Update database.schema doc to include how to upgrade the schema Summary: plus update the phid section a little Test Plan: read it, looked good. Reviewers: epriestley Reviewed By: epriestley CC: aran, Korvin Maniphest Tasks: T1287 Differential Revision: https://secure.phabricator.com/D3629 2012-10-04 15:26:16 -07:00
update documentation for storage changes w/ .php files Summary: no schema changes allowed in the .php files Test Plan: viewed doc -- looked good. Reviewers: epriestley Reviewed By: epriestley CC: aran, Korvin Maniphest Tasks: T2059 Differential Revision: https://secure.phabricator.com/D3966 2012-11-16 07:22:31 -08:00			`It is also possible to create more complex patches in PHP for data migration`
Explain why we store table names in lower-case Test Plan: Read it. Reviewers: epriestley Reviewed By: epriestley CC: aran, Korvin Differential Revision: https://secure.phabricator.com/D4607 2013-01-23 16:59:14 -08:00			`(due to schema changes or otherwise.) However, the schema changes themselves`
update documentation for storage changes w/ .php files Summary: no schema changes allowed in the .php files Test Plan: viewed doc -- looked good. Reviewers: epriestley Reviewed By: epriestley CC: aran, Korvin Maniphest Tasks: T2059 Differential Revision: https://secure.phabricator.com/D3966 2012-11-16 07:22:31 -08:00			should be done in separate `.sql` files. Order can be guaranteed by editing
			`src/infrastructure/storage/patch/PhabricatorBuiltinPatchList.php`
			`appropriately.`
Improve docs for changing schema Test Plan: $ diviner . Reviewers: epriestley Reviewed By: epriestley CC: aran, Korvin Differential Revision: https://secure.phabricator.com/D3937 2012-11-08 20:44:45 -08:00
			`See the`
			`[[https://secure.phabricator.com/rPb39175342dc5bee0c2246b05fa277e76a7e96ed3`
Update database.schema doc to include how to upgrade the schema Summary: plus update the phid section a little Test Plan: read it, looked good. Reviewers: epriestley Reviewed By: epriestley CC: aran, Korvin Maniphest Tasks: T1287 Differential Revision: https://secure.phabricator.com/D3629 2012-10-04 15:26:16 -07:00			`\| commit adding policy storage for Paste ]] for a reasonable example of the code`
			`changes.`

Document database schema Summary: Please review carefully, me not very well on English. I am just guessing what were the design decisions in most parts of this document. Feel free to correct me or add more information. Test Plan: `diviner .` /docs/ /docs/article/Database_Schema.html Copy the text to Word and proofread. Reviewers: epriestley, btrahan Reviewed By: epriestley CC: nh, jungejason, aran, Koolvin Differential Revision: https://secure.phabricator.com/D2258 2012-04-17 15:06:40 -07:00			`= See Also =`

			`* @{class:LiskDAO}`
Update database.schema doc to include how to upgrade the schema Summary: plus update the phid section a little Test Plan: read it, looked good. Reviewers: epriestley Reviewed By: epriestley CC: aran, Korvin Maniphest Tasks: T1287 Differential Revision: https://secure.phabricator.com/D3629 2012-10-04 15:26:16 -07:00			`* @{class:PhabricatorPHID}`