mirror of
https://we.phorge.it/source/phorge.git
synced 2024-11-24 15:52:41 +01:00
18161d00a0
Summary: Ref T1191. General update of this document, which remains mostly accurate. Remove a warning. Test Plan: Read document. Reviewers: btrahan Reviewed By: btrahan Subscribers: epriestley Maniphest Tasks: T1191 Differential Revision: https://secure.phabricator.com/D10760
212 lines
8 KiB
Text
212 lines
8 KiB
Text
@title Database Schema
|
|
@group developer
|
|
|
|
This document describes key components of the database schema and should answer
|
|
questions like how to store new types of data.
|
|
|
|
Database System
|
|
===============
|
|
|
|
Phabricator uses MySQL or another MySQL-compatible database (like MariaDB
|
|
or Amazon RDS).
|
|
|
|
Phabricator the InnoDB table engine. The only exception is the
|
|
`search_documentfield` table which uses MyISAM because MySQL doesn't support
|
|
fulltext search in InnoDB (recent versions do, but we haven't added support
|
|
yet).
|
|
|
|
We are unlikely to ever support other incompatible databases like PostgreSQL or
|
|
SQLite.
|
|
|
|
PHP Drivers
|
|
===========
|
|
|
|
Phabricator supports [[ http://www.php.net/book.mysql | MySQL ]] and
|
|
[[ http://www.php.net/book.mysqli | MySQLi ]] PHP extensions.
|
|
|
|
Databases
|
|
=========
|
|
|
|
Each Phabricator application has its own database. The names are prefixed by
|
|
`phabricator_` (this is configurable). This design has two advantages:
|
|
|
|
- Each database is easier to comprehend and to maintain.
|
|
- We don't do cross-database joins so each database can live on its own
|
|
machine. This gives us flexibility in sharding data later.
|
|
|
|
Connections
|
|
===========
|
|
|
|
Phabricator specifies if it will use any opened connection just for reading or
|
|
also for writing. This allows opening write connections to a primary and read
|
|
connections to a replica in primary/replica setups (which are not actually
|
|
supported yet).
|
|
|
|
Tables
|
|
======
|
|
|
|
Most table names are prefixed by their application names. For example,
|
|
Differential revisions are stored in database `phabricator_differential` and
|
|
table `differential_revision`. This generally makes queries easier to recognize
|
|
and understand.
|
|
|
|
The exception is a few tables which share the same schema over different
|
|
databases such as `edge`.
|
|
|
|
We use lower-case table names with words separated by underscores.
|
|
|
|
Column Names
|
|
============
|
|
|
|
Phabricator uses `camelCase` names for columns. The main advantage is that they
|
|
directly map to properties in PHP classes.
|
|
|
|
Don't use MySQL reserved words (such as `order`) for column names.
|
|
|
|
Data Types
|
|
==========
|
|
|
|
Phabricator defines a set of abstract data types (like `uint32`, `epoch`, and
|
|
`phid`) which map to MySQL column types. The mapping depends on the MySQL
|
|
version.
|
|
|
|
Phabricator uses `utf8mb4` character sets where available (MySQL 5.5 or newer),
|
|
and `binary` character sets in most other cases. The primary motivation is to
|
|
allow 4-byte unicode characters to be stored (the `utf8` character set, which
|
|
is more widely available, does not support them). On newer MySQL, we use
|
|
`utf8mb4` to take advantage of improved collation rules.
|
|
|
|
Phabricator stores dates with an `epoch` abstract data type, which maps to
|
|
`int unsigned`. Although this makes dates less readable when browsing the
|
|
database, it makes date and time manipulation more consistent and
|
|
straightforward in the application.
|
|
|
|
We don't use the `enum` data type because each change to the list of possible
|
|
values requires altering the table (which is slow with big tables). We use
|
|
numbers (or short strings in some cases) mapped to PHP constants instead.
|
|
|
|
JSON and Other Serialized Data
|
|
==============================
|
|
|
|
Some data don't require structured access -- we don't need to filter or order by
|
|
them. We store these data as text fields in JSON format. This approach has
|
|
several advantages:
|
|
|
|
- If we decide to add another unstructured field then we don't need to alter
|
|
the table (which is slow for big tables in MySQL).
|
|
- Table structure is not cluttered by fields which could be unused most of the
|
|
time.
|
|
|
|
An example of such usage can be found in column
|
|
`differential_diffproperty.data`.
|
|
|
|
Primary Keys
|
|
============
|
|
|
|
Most tables have auto-increment column named `id`. Adding an ID column is
|
|
appropriate for most tables (even tables that have another natural unique key),
|
|
as it improves consistency and makes it easier to perform generic operations
|
|
on objects.
|
|
|
|
For example, @{class:LiskMigrationIterator} allows you to very easily apply a
|
|
migration to a table using a constant amount of memory provided the table has
|
|
an `id` column.
|
|
|
|
Indexes
|
|
======
|
|
|
|
Create all indexes necessary for fast query execution in most cases. Don't
|
|
create indexes which are not used. You can analyze queries @{article:Using
|
|
DarkConsole}.
|
|
|
|
Older MySQL versions are not able to use indexes for tuple search:
|
|
`(a, b) IN ((%s, %d), (%s, %d))`. Use `AND` and `OR` instead:
|
|
`((a = %s AND b = %d) OR (a = %s AND b = %d))`.
|
|
|
|
Foreign Keys
|
|
============
|
|
|
|
We don't use foreign keys because they're complicated and we haven't experienced
|
|
significant issues with data inconsistency that foreign keys could help prevent.
|
|
Empirically, we have witnessed first hand as `ON DELETE CASCADE` relationships
|
|
accidentally destroy huge amounts of data. We may pursue foreign keys
|
|
eventually, but there isn't a strong case for them at the present time.
|
|
|
|
PHIDs
|
|
=====
|
|
|
|
Each globally referencable object in Phabricator has its associated PHID
|
|
("Phabricator ID") which serves as a global identifier, similar to a GUID.
|
|
We use PHIDs for referencing data in different databases.
|
|
|
|
We use both autoincrementing IDs and global PHIDs because each is useful in
|
|
different contexts. Autoincrementing IDs are meaningfully ordered and allow
|
|
us to construct short, human-readable object names (like `D2258`) and URIs.
|
|
Global PHIDs allow us to represent relationships between different types of
|
|
objects in a homogeneous way.
|
|
|
|
For example, infrastructure like "subscribers" can be implemented easily with
|
|
PHID relationships: different types of objects (users, projects, mailing lists)
|
|
are permitted to subscribe to different types of objects (revisions, tasks,
|
|
etc). Without PHIDs, we would need to add a "type" column to avoid ID collision;
|
|
using PHIDs makes implementing features like this simpler.
|
|
|
|
Transactions
|
|
============
|
|
|
|
Transactional code should be written using transactions. Example of such code is
|
|
inserting multiple records where one doesn't make sense without the other or
|
|
selecting data later used for update. See chapter in @{class:LiskDAO}.
|
|
|
|
Advanced Features
|
|
=================
|
|
|
|
We don't use MySQL advanced features such as triggers, stored procedures or
|
|
events because we like expressing the application logic in PHP more than in SQL.
|
|
Some of these features (especially triggers) can also cause a great deal of
|
|
confusion, and are generally more difficult to debug, profile, version control,
|
|
update, and understand than application code.
|
|
|
|
Schema Denormalization
|
|
======================
|
|
|
|
Phabricator uses schema denormalization sparingly. Avoid denormalization unless
|
|
there is a compelling reason (usually, performance) to denormalize.
|
|
|
|
Schema Changes and Migrations
|
|
=============================
|
|
|
|
To create a new schema change or migration:
|
|
|
|
**Create a database patch**. Database patches go in
|
|
`resources/sql/autopatches/`. To change a schema, use a `.sql` file and write
|
|
in SQL. To perform a migration, use a `.php` file and write in PHP. Name your
|
|
file `YYYYMMDD.patchname.ext`. For example, `20141225.christmas.sql`.
|
|
|
|
**Keep patches small**. Most schema change statements are not transactional. If
|
|
a patch contains several SQL statements and fails partway through, it normally
|
|
can not be rolled back. When a user tries to apply the patch again later, the
|
|
first statement (which, for example, adds a column) may fail (because the column
|
|
already exists). This can be avoided by keeping patches small (generally, one
|
|
statement per patch).
|
|
|
|
**Use namespace and character set variables**. When defining a `.sql` patch,
|
|
you should use these variables instead of hard-coding namespaces or character
|
|
set names:
|
|
|
|
| Variable | Meaning | Notes |
|
|
|---|---|---|
|
|
| {$NAMESPACE} | Storage Namespace | Defaults to `phabricator` |
|
|
| {$CHARSET} | Default Charset | Mostly used to specify table charset |
|
|
| {$COLLATE_TEXT} | Text Collation | For most text (case-sensitive) |
|
|
| {$COLLATE_SORT} | Sort Collation | For sortable text (case-insensitive) |
|
|
| {$CHARSET_FULLTEXT} | Fulltext Charset | Specify explicitly for fulltext |
|
|
| {$COLLATE_FULLTEXT} | Fulltext Collate | Specify explicitly for fulltext |
|
|
|
|
|
|
**Test your patch**. Run `bin/storage upgrade` to test your patch.
|
|
|
|
See Also
|
|
========
|
|
|
|
- @{class:LiskDAO}
|