1
0
Fork 0
mirror of https://we.phorge.it/source/phorge.git synced 2024-11-24 15:52:41 +01:00
phorge-phorge/src/docs/contributor/database.diviner
epriestley 18161d00a0 Update some storage documentation for new adjustment workflows
Summary: Ref T1191. General update of this document, which remains mostly accurate. Remove a warning.

Test Plan: Read document.

Reviewers: btrahan

Reviewed By: btrahan

Subscribers: epriestley

Maniphest Tasks: T1191

Differential Revision: https://secure.phabricator.com/D10760
2014-11-01 08:29:37 -07:00

212 lines
8 KiB
Text

@title Database Schema
@group developer
This document describes key components of the database schema and should answer
questions like how to store new types of data.
Database System
===============
Phabricator uses MySQL or another MySQL-compatible database (like MariaDB
or Amazon RDS).
Phabricator the InnoDB table engine. The only exception is the
`search_documentfield` table which uses MyISAM because MySQL doesn't support
fulltext search in InnoDB (recent versions do, but we haven't added support
yet).
We are unlikely to ever support other incompatible databases like PostgreSQL or
SQLite.
PHP Drivers
===========
Phabricator supports [[ http://www.php.net/book.mysql | MySQL ]] and
[[ http://www.php.net/book.mysqli | MySQLi ]] PHP extensions.
Databases
=========
Each Phabricator application has its own database. The names are prefixed by
`phabricator_` (this is configurable). This design has two advantages:
- Each database is easier to comprehend and to maintain.
- We don't do cross-database joins so each database can live on its own
machine. This gives us flexibility in sharding data later.
Connections
===========
Phabricator specifies if it will use any opened connection just for reading or
also for writing. This allows opening write connections to a primary and read
connections to a replica in primary/replica setups (which are not actually
supported yet).
Tables
======
Most table names are prefixed by their application names. For example,
Differential revisions are stored in database `phabricator_differential` and
table `differential_revision`. This generally makes queries easier to recognize
and understand.
The exception is a few tables which share the same schema over different
databases such as `edge`.
We use lower-case table names with words separated by underscores.
Column Names
============
Phabricator uses `camelCase` names for columns. The main advantage is that they
directly map to properties in PHP classes.
Don't use MySQL reserved words (such as `order`) for column names.
Data Types
==========
Phabricator defines a set of abstract data types (like `uint32`, `epoch`, and
`phid`) which map to MySQL column types. The mapping depends on the MySQL
version.
Phabricator uses `utf8mb4` character sets where available (MySQL 5.5 or newer),
and `binary` character sets in most other cases. The primary motivation is to
allow 4-byte unicode characters to be stored (the `utf8` character set, which
is more widely available, does not support them). On newer MySQL, we use
`utf8mb4` to take advantage of improved collation rules.
Phabricator stores dates with an `epoch` abstract data type, which maps to
`int unsigned`. Although this makes dates less readable when browsing the
database, it makes date and time manipulation more consistent and
straightforward in the application.
We don't use the `enum` data type because each change to the list of possible
values requires altering the table (which is slow with big tables). We use
numbers (or short strings in some cases) mapped to PHP constants instead.
JSON and Other Serialized Data
==============================
Some data don't require structured access -- we don't need to filter or order by
them. We store these data as text fields in JSON format. This approach has
several advantages:
- If we decide to add another unstructured field then we don't need to alter
the table (which is slow for big tables in MySQL).
- Table structure is not cluttered by fields which could be unused most of the
time.
An example of such usage can be found in column
`differential_diffproperty.data`.
Primary Keys
============
Most tables have auto-increment column named `id`. Adding an ID column is
appropriate for most tables (even tables that have another natural unique key),
as it improves consistency and makes it easier to perform generic operations
on objects.
For example, @{class:LiskMigrationIterator} allows you to very easily apply a
migration to a table using a constant amount of memory provided the table has
an `id` column.
Indexes
======
Create all indexes necessary for fast query execution in most cases. Don't
create indexes which are not used. You can analyze queries @{article:Using
DarkConsole}.
Older MySQL versions are not able to use indexes for tuple search:
`(a, b) IN ((%s, %d), (%s, %d))`. Use `AND` and `OR` instead:
`((a = %s AND b = %d) OR (a = %s AND b = %d))`.
Foreign Keys
============
We don't use foreign keys because they're complicated and we haven't experienced
significant issues with data inconsistency that foreign keys could help prevent.
Empirically, we have witnessed first hand as `ON DELETE CASCADE` relationships
accidentally destroy huge amounts of data. We may pursue foreign keys
eventually, but there isn't a strong case for them at the present time.
PHIDs
=====
Each globally referencable object in Phabricator has its associated PHID
("Phabricator ID") which serves as a global identifier, similar to a GUID.
We use PHIDs for referencing data in different databases.
We use both autoincrementing IDs and global PHIDs because each is useful in
different contexts. Autoincrementing IDs are meaningfully ordered and allow
us to construct short, human-readable object names (like `D2258`) and URIs.
Global PHIDs allow us to represent relationships between different types of
objects in a homogeneous way.
For example, infrastructure like "subscribers" can be implemented easily with
PHID relationships: different types of objects (users, projects, mailing lists)
are permitted to subscribe to different types of objects (revisions, tasks,
etc). Without PHIDs, we would need to add a "type" column to avoid ID collision;
using PHIDs makes implementing features like this simpler.
Transactions
============
Transactional code should be written using transactions. Example of such code is
inserting multiple records where one doesn't make sense without the other or
selecting data later used for update. See chapter in @{class:LiskDAO}.
Advanced Features
=================
We don't use MySQL advanced features such as triggers, stored procedures or
events because we like expressing the application logic in PHP more than in SQL.
Some of these features (especially triggers) can also cause a great deal of
confusion, and are generally more difficult to debug, profile, version control,
update, and understand than application code.
Schema Denormalization
======================
Phabricator uses schema denormalization sparingly. Avoid denormalization unless
there is a compelling reason (usually, performance) to denormalize.
Schema Changes and Migrations
=============================
To create a new schema change or migration:
**Create a database patch**. Database patches go in
`resources/sql/autopatches/`. To change a schema, use a `.sql` file and write
in SQL. To perform a migration, use a `.php` file and write in PHP. Name your
file `YYYYMMDD.patchname.ext`. For example, `20141225.christmas.sql`.
**Keep patches small**. Most schema change statements are not transactional. If
a patch contains several SQL statements and fails partway through, it normally
can not be rolled back. When a user tries to apply the patch again later, the
first statement (which, for example, adds a column) may fail (because the column
already exists). This can be avoided by keeping patches small (generally, one
statement per patch).
**Use namespace and character set variables**. When defining a `.sql` patch,
you should use these variables instead of hard-coding namespaces or character
set names:
| Variable | Meaning | Notes |
|---|---|---|
| {$NAMESPACE} | Storage Namespace | Defaults to `phabricator` |
| {$CHARSET} | Default Charset | Mostly used to specify table charset |
| {$COLLATE_TEXT} | Text Collation | For most text (case-sensitive) |
| {$COLLATE_SORT} | Sort Collation | For sortable text (case-insensitive) |
| {$CHARSET_FULLTEXT} | Fulltext Charset | Specify explicitly for fulltext |
| {$COLLATE_FULLTEXT} | Fulltext Collate | Specify explicitly for fulltext |
**Test your patch**. Run `bin/storage upgrade` to test your patch.
See Also
========
- @{class:LiskDAO}