mirror of
https://we.phorge.it/source/phorge.git
synced 2025-01-10 14:51:06 +01:00
Update some storage documentation for new adjustment workflows
Summary: Ref T1191. General update of this document, which remains mostly accurate. Remove a warning. Test Plan: Read document. Reviewers: btrahan Reviewed By: btrahan Subscribers: epriestley Maniphest Tasks: T1191 Differential Revision: https://secure.phabricator.com/D10760
This commit is contained in:
parent
f5c426639c
commit
18161d00a0
2 changed files with 135 additions and 104 deletions
|
@ -4,93 +4,116 @@
|
||||||
This document describes key components of the database schema and should answer
|
This document describes key components of the database schema and should answer
|
||||||
questions like how to store new types of data.
|
questions like how to store new types of data.
|
||||||
|
|
||||||
= Database System =
|
Database System
|
||||||
|
===============
|
||||||
|
|
||||||
Phabricator uses MySQL with InnoDB engine. The only exception is the
|
Phabricator uses MySQL or another MySQL-compatible database (like MariaDB
|
||||||
|
or Amazon RDS).
|
||||||
|
|
||||||
|
Phabricator the InnoDB table engine. The only exception is the
|
||||||
`search_documentfield` table which uses MyISAM because MySQL doesn't support
|
`search_documentfield` table which uses MyISAM because MySQL doesn't support
|
||||||
fulltext search in InnoDB.
|
fulltext search in InnoDB (recent versions do, but we haven't added support
|
||||||
|
yet).
|
||||||
|
|
||||||
Let us know if you need to use other database system: @{article:Give Feedback!
|
We are unlikely to ever support other incompatible databases like PostgreSQL or
|
||||||
Get Support!}.
|
SQLite.
|
||||||
|
|
||||||
= PHP Drivers =
|
PHP Drivers
|
||||||
|
===========
|
||||||
|
|
||||||
Phabricator supports [[ http://www.php.net/book.mysql | MySQL ]] and
|
Phabricator supports [[ http://www.php.net/book.mysql | MySQL ]] and
|
||||||
[[ http://www.php.net/book.mysqli | MySQLi ]] PHP extensions. Most installations
|
[[ http://www.php.net/book.mysqli | MySQLi ]] PHP extensions.
|
||||||
use MySQL but MySQLi should work equally well.
|
|
||||||
|
|
||||||
= Databases =
|
Databases
|
||||||
|
=========
|
||||||
|
|
||||||
Each Phabricator application has its own database. The names are prefixed by
|
Each Phabricator application has its own database. The names are prefixed by
|
||||||
`phabricator_`. This design has two advantages:
|
`phabricator_` (this is configurable). This design has two advantages:
|
||||||
|
|
||||||
* Each database is easier to comprehend and to maintain.
|
- Each database is easier to comprehend and to maintain.
|
||||||
* We don't do cross-database joins so each database can live on its own machine
|
- We don't do cross-database joins so each database can live on its own
|
||||||
which is useful for load-balancing.
|
machine. This gives us flexibility in sharding data later.
|
||||||
|
|
||||||
= Connections =
|
Connections
|
||||||
|
===========
|
||||||
|
|
||||||
Phabricator specifies if it will use any opened connection just for reading or
|
Phabricator specifies if it will use any opened connection just for reading or
|
||||||
also for writing. This allows opening write connections to master and read
|
also for writing. This allows opening write connections to a primary and read
|
||||||
connections to slave in master/slave replication. It is useful for
|
connections to a replica in primary/replica setups (which are not actually
|
||||||
load-balancing.
|
supported yet).
|
||||||
|
|
||||||
= Tables =
|
Tables
|
||||||
|
======
|
||||||
|
|
||||||
Each table name is prefixed by its application. For example, Differential
|
Most table names are prefixed by their application names. For example,
|
||||||
revisions are stored in database `phabricator_differential` and table
|
Differential revisions are stored in database `phabricator_differential` and
|
||||||
`differential_revision`. This duplicity allows easy recognition of the table in
|
table `differential_revision`. This generally makes queries easier to recognize
|
||||||
DarkConsole (see @{article:Using DarkConsole}) and other places.
|
and understand.
|
||||||
|
|
||||||
The exception is tables which share the same schema over different databases
|
The exception is a few tables which share the same schema over different
|
||||||
such as `edge`.
|
databases such as `edge`.
|
||||||
|
|
||||||
We use lower-case table names with words separated by underscores. The reason is
|
We use lower-case table names with words separated by underscores.
|
||||||
that MySQL can be configured (with `lower_case_table_names`) to lower-case the
|
|
||||||
table names anyway.
|
|
||||||
|
|
||||||
= Column Names =
|
Column Names
|
||||||
|
============
|
||||||
|
|
||||||
Phabricator uses camelCase names for columns. The main advantage is that they
|
Phabricator uses `camelCase` names for columns. The main advantage is that they
|
||||||
directly map to properties in PHP classes.
|
directly map to properties in PHP classes.
|
||||||
|
|
||||||
Don't use MySQL reserved words (such as `order`) for column names.
|
Don't use MySQL reserved words (such as `order`) for column names.
|
||||||
|
|
||||||
= Data Types =
|
Data Types
|
||||||
|
==========
|
||||||
|
|
||||||
Phabricator uses `int unsigned` columns for storing dates instead of `date` or
|
Phabricator defines a set of abstract data types (like `uint32`, `epoch`, and
|
||||||
`datetime`. We don't need to care about time-zones in both MySQL and PHP because
|
`phid`) which map to MySQL column types. The mapping depends on the MySQL
|
||||||
of it. The other reason is that PHP internally uses numbers for storing dates.
|
version.
|
||||||
|
|
||||||
Phabricator uses UTF-8 encoding for storing all text data. We use
|
Phabricator uses `utf8mb4` character sets where available (MySQL 5.5 or newer),
|
||||||
`utf8_general_ci` collation for free-text and `utf8_bin` for identifiers.
|
and `binary` character sets in most other cases. The primary motivation is to
|
||||||
|
allow 4-byte unicode characters to be stored (the `utf8` character set, which
|
||||||
|
is more widely available, does not support them). On newer MySQL, we use
|
||||||
|
`utf8mb4` to take advantage of improved collation rules.
|
||||||
|
|
||||||
|
Phabricator stores dates with an `epoch` abstract data type, which maps to
|
||||||
|
`int unsigned`. Although this makes dates less readable when browsing the
|
||||||
|
database, it makes date and time manipulation more consistent and
|
||||||
|
straightforward in the application.
|
||||||
|
|
||||||
We don't use the `enum` data type because each change to the list of possible
|
We don't use the `enum` data type because each change to the list of possible
|
||||||
values requires altering the table (which is slow with big tables). We use
|
values requires altering the table (which is slow with big tables). We use
|
||||||
numbers (or short strings in some cases) mapped to PHP constants instead.
|
numbers (or short strings in some cases) mapped to PHP constants instead.
|
||||||
|
|
||||||
= JSON =
|
JSON and Other Serialized Data
|
||||||
|
==============================
|
||||||
|
|
||||||
Some data don't require structured access - you don't need to filter or order by
|
Some data don't require structured access -- we don't need to filter or order by
|
||||||
them. We store these data as text fields in JSON format. This approach has
|
them. We store these data as text fields in JSON format. This approach has
|
||||||
several advantages:
|
several advantages:
|
||||||
|
|
||||||
* If we decide to add another unstructured field then we don't need to alter the
|
- If we decide to add another unstructured field then we don't need to alter
|
||||||
table (which is slow for big tables in MySQL).
|
the table (which is slow for big tables in MySQL).
|
||||||
* Table structure is not cluttered by fields which could be unused most of the
|
- Table structure is not cluttered by fields which could be unused most of the
|
||||||
time.
|
time.
|
||||||
|
|
||||||
An example of such usage can be found in column
|
An example of such usage can be found in column
|
||||||
`differential_diffproperty.data`.
|
`differential_diffproperty.data`.
|
||||||
|
|
||||||
= Primary Keys =
|
Primary Keys
|
||||||
|
============
|
||||||
|
|
||||||
Most tables have auto-increment column named `id`. However creating such column
|
Most tables have auto-increment column named `id`. Adding an ID column is
|
||||||
is not required for tables which are not usually directly referenced (such as
|
appropriate for most tables (even tables that have another natural unique key),
|
||||||
tables expressing M:N relations). Example of such table is
|
as it improves consistency and makes it easier to perform generic operations
|
||||||
`differential_relationship`.
|
on objects.
|
||||||
|
|
||||||
= Indexes =
|
For example, @{class:LiskMigrationIterator} allows you to very easily apply a
|
||||||
|
migration to a table using a constant amount of memory provided the table has
|
||||||
|
an `id` column.
|
||||||
|
|
||||||
|
Indexes
|
||||||
|
======
|
||||||
|
|
||||||
Create all indexes necessary for fast query execution in most cases. Don't
|
Create all indexes necessary for fast query execution in most cases. Don't
|
||||||
create indexes which are not used. You can analyze queries @{article:Using
|
create indexes which are not used. You can analyze queries @{article:Using
|
||||||
|
@ -100,77 +123,90 @@ Older MySQL versions are not able to use indexes for tuple search:
|
||||||
`(a, b) IN ((%s, %d), (%s, %d))`. Use `AND` and `OR` instead:
|
`(a, b) IN ((%s, %d), (%s, %d))`. Use `AND` and `OR` instead:
|
||||||
`((a = %s AND b = %d) OR (a = %s AND b = %d))`.
|
`((a = %s AND b = %d) OR (a = %s AND b = %d))`.
|
||||||
|
|
||||||
= Foreign Keys =
|
Foreign Keys
|
||||||
|
============
|
||||||
|
|
||||||
We don't use InnoDB's foreign keys because our application is so great that
|
We don't use foreign keys because they're complicated and we haven't experienced
|
||||||
no inconsistencies can arise. It will just slow us down.
|
significant issues with data inconsistency that foreign keys could help prevent.
|
||||||
|
Empirically, we have witnessed first hand as `ON DELETE CASCADE` relationships
|
||||||
|
accidentally destroy huge amounts of data. We may pursue foreign keys
|
||||||
|
eventually, but there isn't a strong case for them at the present time.
|
||||||
|
|
||||||
= PHIDs =
|
PHIDs
|
||||||
|
=====
|
||||||
|
|
||||||
Each globally referencable object in Phabricator has its associated PHID
|
Each globally referencable object in Phabricator has its associated PHID
|
||||||
(Phabricator ID) which serves as a global identifier. We use PHIDs for
|
("Phabricator ID") which serves as a global identifier, similar to a GUID.
|
||||||
referencing data in different databases.
|
We use PHIDs for referencing data in different databases.
|
||||||
|
|
||||||
We use both autoincrementing IDs and global PHIDs because each is useful in
|
We use both autoincrementing IDs and global PHIDs because each is useful in
|
||||||
different contexts. Autoincrementing IDs are chronologically ordered and allow
|
different contexts. Autoincrementing IDs are meaningfully ordered and allow
|
||||||
us to construct short, human-readable object names (like D2258) and URIs. Global
|
us to construct short, human-readable object names (like `D2258`) and URIs.
|
||||||
PHIDs allow us to represent relationships between different types of objects in
|
Global PHIDs allow us to represent relationships between different types of
|
||||||
a homogeneous way.
|
objects in a homogeneous way.
|
||||||
|
|
||||||
For example, the concept of "subscribers" is more powerfully done with PHIDs
|
For example, infrastructure like "subscribers" can be implemented easily with
|
||||||
because we could theoretically have users, projects, teams, and more all as
|
PHID relationships: different types of objects (users, projects, mailing lists)
|
||||||
"subscribers" of other objects. Using an ID column we would need to add a
|
are permitted to subscribe to different types of objects (revisions, tasks,
|
||||||
"type" column to avoid ID collision; using PHIDs does not require this
|
etc). Without PHIDs, we would need to add a "type" column to avoid ID collision;
|
||||||
additional column.
|
using PHIDs makes implementing features like this simpler.
|
||||||
|
|
||||||
= Transactions =
|
Transactions
|
||||||
|
============
|
||||||
|
|
||||||
Transactional code should be written using transactions. Example of such code is
|
Transactional code should be written using transactions. Example of such code is
|
||||||
inserting multiple records where one doesn't make sense without the other or
|
inserting multiple records where one doesn't make sense without the other or
|
||||||
selecting data later used for update. See chapter in @{class:LiskDAO}.
|
selecting data later used for update. See chapter in @{class:LiskDAO}.
|
||||||
|
|
||||||
= Advanced Features =
|
Advanced Features
|
||||||
|
=================
|
||||||
|
|
||||||
We don't use MySQL advanced features such as triggers, stored procedures or
|
We don't use MySQL advanced features such as triggers, stored procedures or
|
||||||
events because we like expressing the application logic in PHP more than in SQL.
|
events because we like expressing the application logic in PHP more than in SQL.
|
||||||
Some of these features (especially triggers) can also cause big confusion.
|
Some of these features (especially triggers) can also cause a great deal of
|
||||||
|
confusion, and are generally more difficult to debug, profile, version control,
|
||||||
|
update, and understand than application code.
|
||||||
|
|
||||||
Avoiding these advanced features is also good for supporting other database
|
Schema Denormalization
|
||||||
systems (which we don't support anyway).
|
======================
|
||||||
|
|
||||||
= Schema Denormalization =
|
Phabricator uses schema denormalization sparingly. Avoid denormalization unless
|
||||||
|
there is a compelling reason (usually, performance) to denormalize.
|
||||||
|
|
||||||
Phabricator uses schema denormalization for performance reasons sparingly. Try
|
Schema Changes and Migrations
|
||||||
to avoid it if possible.
|
=============================
|
||||||
|
|
||||||
= Changing the Schema =
|
To create a new schema change or migration:
|
||||||
|
|
||||||
There are three simple steps to update the schema:
|
**Create a database patch**. Database patches go in
|
||||||
|
`resources/sql/autopatches/`. To change a schema, use a `.sql` file and write
|
||||||
|
in SQL. To perform a migration, use a `.php` file and write in PHP. Name your
|
||||||
|
file `YYYYMMDD.patchname.ext`. For example, `20141225.christmas.sql`.
|
||||||
|
|
||||||
# Create a `.sql` file in `resources/sql/patches/`. This file should:
|
**Keep patches small**. Most schema change statements are not transactional. If
|
||||||
- Contain the appropriate MySQL commands to update the schema.
|
a patch contains several SQL statements and fails partway through, it normally
|
||||||
- Be named as `YYYYMMDD.patchname.ext`. For example, `20130217.example.sql`.
|
can not be rolled back. When a user tries to apply the patch again later, the
|
||||||
- Use `${NAMESPACE}` rather than `phabricator` for database names.
|
first statement (which, for example, adds a column) may fail (because the column
|
||||||
- Use `COLLATE utf8_bin` for any columns that are to be used as identifiers,
|
already exists). This can be avoided by keeping patches small (generally, one
|
||||||
such as PHID columns. Otherwise, use `COLLATE utf8_general_ci`.
|
statement per patch).
|
||||||
- Name all indexes so it is possible to delete them later.
|
|
||||||
# Edit `src/infrastructure/storage/patch/PhabricatorBuiltinPatchList.php` and
|
|
||||||
add your patch to
|
|
||||||
@{method@phabricator:PhabricatorBuiltinPatchList::getPatches}.
|
|
||||||
# Run `bin/storage upgrade`.
|
|
||||||
|
|
||||||
It is also possible to create more complex patches in PHP for data migration
|
**Use namespace and character set variables**. When defining a `.sql` patch,
|
||||||
(due to schema changes or otherwise.) However, the schema changes themselves
|
you should use these variables instead of hard-coding namespaces or character
|
||||||
should be done in separate `.sql` files. Order can be guaranteed by editing
|
set names:
|
||||||
`src/infrastructure/storage/patch/PhabricatorBuiltinPatchList.php`
|
|
||||||
appropriately.
|
|
||||||
|
|
||||||
See the
|
| Variable | Meaning | Notes |
|
||||||
[[https://secure.phabricator.com/rPb39175342dc5bee0c2246b05fa277e76a7e96ed3
|
|---|---|---|
|
||||||
| commit adding policy storage for Paste ]] for a reasonable example of the code
|
| {$NAMESPACE} | Storage Namespace | Defaults to `phabricator` |
|
||||||
changes.
|
| {$CHARSET} | Default Charset | Mostly used to specify table charset |
|
||||||
|
| {$COLLATE_TEXT} | Text Collation | For most text (case-sensitive) |
|
||||||
|
| {$COLLATE_SORT} | Sort Collation | For sortable text (case-insensitive) |
|
||||||
|
| {$CHARSET_FULLTEXT} | Fulltext Charset | Specify explicitly for fulltext |
|
||||||
|
| {$COLLATE_FULLTEXT} | Fulltext Collate | Specify explicitly for fulltext |
|
||||||
|
|
||||||
= See Also =
|
|
||||||
|
|
||||||
* @{class:LiskDAO}
|
**Test your patch**. Run `bin/storage upgrade` to test your patch.
|
||||||
* @{class:PhabricatorPHID}
|
|
||||||
|
See Also
|
||||||
|
========
|
||||||
|
|
||||||
|
- @{class:LiskDAO}
|
||||||
|
|
|
@ -148,16 +148,11 @@ final class PhabricatorStorageManagementAdjustWorkflow
|
||||||
pht(
|
pht(
|
||||||
"Found %s issues(s) with schemata, detailed above.\n\n".
|
"Found %s issues(s) with schemata, detailed above.\n\n".
|
||||||
"You can review issues in more detail from the web interface, ".
|
"You can review issues in more detail from the web interface, ".
|
||||||
"in Config > Database Status.\n\n".
|
"in Config > Database Status. To better understand the adjustment ".
|
||||||
|
"workflow, see \"Managing Storage Adjustments\" in the ".
|
||||||
|
"documentation.\n\n".
|
||||||
"MySQL needs to copy table data to make some adjustments, so these ".
|
"MySQL needs to copy table data to make some adjustments, so these ".
|
||||||
"migrations may take some time.".
|
"migrations may take some time.",
|
||||||
|
|
||||||
// TODO: Remove warning once this stabilizes.
|
|
||||||
"\n\n".
|
|
||||||
"WARNING: This workflow is new and unstable. If you continue, you ".
|
|
||||||
"may unrecoverably destory data. Make sure you have a backup before ".
|
|
||||||
"you proceed.",
|
|
||||||
|
|
||||||
new PhutilNumber(count($adjustments))));
|
new PhutilNumber(count($adjustments))));
|
||||||
|
|
||||||
$prompt = pht('Fix these schema issues?');
|
$prompt = pht('Fix these schema issues?');
|
||||||
|
|
Loading…
Reference in a new issue