mirror of
https://we.phorge.it/source/phorge.git
synced 2024-12-23 05:50:55 +01:00
Write "Why does Phabricator need so many databases?"
Summary: We will sell you as many new databases as you want, cheap! Just $1 per database! Test Plan: (O).(O) Reviewers: chad Reviewed By: chad Differential Revision: https://secure.phabricator.com/D15249
This commit is contained in:
parent
50b8815e44
commit
f5c8a2fb18
2 changed files with 134 additions and 4 deletions
|
@ -28,11 +28,10 @@ Databases
|
||||||
=========
|
=========
|
||||||
|
|
||||||
Each Phabricator application has its own database. The names are prefixed by
|
Each Phabricator application has its own database. The names are prefixed by
|
||||||
`phabricator_` (this is configurable). This design has two advantages:
|
`phabricator_` (this is configurable).
|
||||||
|
|
||||||
- Each database is easier to comprehend and to maintain.
|
Phabricator uses a separate database for each application. To understand why,
|
||||||
- We don't do cross-database joins so each database can live on its own
|
see @{article:Why does Phabricator need so many databases?}.
|
||||||
machine. This gives us flexibility in sharding data later.
|
|
||||||
|
|
||||||
Connections
|
Connections
|
||||||
===========
|
===========
|
||||||
|
|
131
src/docs/flavor/so_many_databases.diviner
Normal file
131
src/docs/flavor/so_many_databases.diviner
Normal file
|
@ -0,0 +1,131 @@
|
||||||
|
@title Why does Phabricator need so many databases?
|
||||||
|
@group lore
|
||||||
|
|
||||||
|
Phabricator uses about 60 databases (and we may have added more by the time you
|
||||||
|
read this document). This sometimes comes as a surprise, since you might assume
|
||||||
|
it would only use one database.
|
||||||
|
|
||||||
|
The approach we use is designed to work at scale for huge installs with many
|
||||||
|
thousands of users. We care a lot about working well for large installs, and
|
||||||
|
about scaling up gracefully to meet the needs of growing organizations. We want
|
||||||
|
small startups to be able to install Phabricator and have it grow with them as
|
||||||
|
they expand to many thousands of employees.
|
||||||
|
|
||||||
|
A cost of this approach is that it makes Phabricator more difficult to install
|
||||||
|
on shared hosts which require a lot of work to create or authorize access to
|
||||||
|
each database. However, Phabricator does a lot of advanced or complex things
|
||||||
|
which are difficult to configure or manage on shared hosts, and we don't
|
||||||
|
recommend installing it on a shared host. The install documentation explicitly
|
||||||
|
discouarges installing on shared hosts.
|
||||||
|
|
||||||
|
Broadly, in cases where we must choose between operating well at scale for
|
||||||
|
growing organizations and installing easily on shared hosts, we prioritize
|
||||||
|
operating at scale.
|
||||||
|
|
||||||
|
|
||||||
|
Listing Databases
|
||||||
|
=================
|
||||||
|
|
||||||
|
You can get a full list of the databases Phabricator needs with `bin/storage
|
||||||
|
databases`. It will look something like this:
|
||||||
|
|
||||||
|
```
|
||||||
|
$ /core/lib/phabricator/bin/storage databases
|
||||||
|
secure_audit
|
||||||
|
secure_calendar
|
||||||
|
secure_chatlog
|
||||||
|
secure_conduit
|
||||||
|
secure_countdown
|
||||||
|
secure_daemon
|
||||||
|
secure_differential
|
||||||
|
secure_draft
|
||||||
|
secure_drydock
|
||||||
|
secure_feed
|
||||||
|
...<dozens more databases>...
|
||||||
|
```
|
||||||
|
|
||||||
|
Roughly, each application has its own database, and then there are some
|
||||||
|
databases which support internal systems or shared infrastructure.
|
||||||
|
|
||||||
|
|
||||||
|
Operating at Scale
|
||||||
|
==================
|
||||||
|
|
||||||
|
This storage design is aimed at large installs that may need more than one
|
||||||
|
physical database server to handle the load the install generates.
|
||||||
|
|
||||||
|
The primary reason we a database per application is to allow large installs to
|
||||||
|
scale up by spreading database load across more hardware. A large organization
|
||||||
|
with many thousands of active users may find themselves limited by the capacity
|
||||||
|
of a single database backend.
|
||||||
|
|
||||||
|
If so, they can launch a second backend, move some applications over to it, and
|
||||||
|
continue piling on more users.
|
||||||
|
|
||||||
|
This can't continue forever, but provides a substantial amount of headroom for
|
||||||
|
large installs to spread the workload across more hardware and continue scaling
|
||||||
|
up.
|
||||||
|
|
||||||
|
To make this possible, we put each application in its own database and use
|
||||||
|
database boundaries to enforce the logical constraints that the application
|
||||||
|
must have in order for this to work. For example, we can not perform joins
|
||||||
|
between separable tables, because they may not be on the same hardware.
|
||||||
|
|
||||||
|
Establishing boundaries with application databases is a simple, straightforward
|
||||||
|
way to partition storage and make administrative operations like spreading load
|
||||||
|
realistic.
|
||||||
|
|
||||||
|
|
||||||
|
Ease of Development
|
||||||
|
===================
|
||||||
|
|
||||||
|
This design is also easier for us to work with, and easier for users who
|
||||||
|
want to work with the raw database data to understand and interact with.
|
||||||
|
|
||||||
|
We have a large number of tables (more than 400) and we can not reasonably
|
||||||
|
reduce the number of tables very much (each table generally represents some
|
||||||
|
meaningful type of object in some application0. It's easier to develop with
|
||||||
|
tables which are organized into separate application databases, just like it's
|
||||||
|
easier to work with a large project if you organize source files into
|
||||||
|
directories.
|
||||||
|
|
||||||
|
If you aren't developing Phabricator and never look at the data in the
|
||||||
|
database, you probably don't benefit from this organization. However, if you
|
||||||
|
are a developer or want to extend Phabricator or look under the hood, it's
|
||||||
|
easier to find what you're looking for and work with the tables and data when
|
||||||
|
they're organized by application.
|
||||||
|
|
||||||
|
|
||||||
|
Databases Have No Cost
|
||||||
|
======================
|
||||||
|
|
||||||
|
In almost all cases, creating databases has zero cost, just like organizing
|
||||||
|
source code into directories has zero cost.
|
||||||
|
|
||||||
|
Even if we didn't derive enormous benefits from this approach at scale, there
|
||||||
|
is little reason //not// to organize storage like this.
|
||||||
|
|
||||||
|
There are a handful of administrative tasks which are very slightly more
|
||||||
|
complex to perform on multiple databases, but these are all either automated
|
||||||
|
with `bin/storage` or easy to build on top of the list of databases emitted by
|
||||||
|
`bin/storage databases`.
|
||||||
|
|
||||||
|
For example, you can dump all the databases with `bin/storage dump`, and you
|
||||||
|
can destroy all the databases with `bin/storage destroy`.
|
||||||
|
|
||||||
|
As mentioned above, an exception to this is that if you're installing on a
|
||||||
|
shared host and need to jump through hoops to individually authorize access to
|
||||||
|
each database, databases do cost something.
|
||||||
|
|
||||||
|
However, this cost is an artificial cost imposed by the selected environment,
|
||||||
|
and this is only the first of many issues you'll run into trying to install and
|
||||||
|
run Phabricator on a shared host. These issues are why we strongly discourage
|
||||||
|
using shared hosts, and recommend against them in the install guide.
|
||||||
|
|
||||||
|
|
||||||
|
Next Steps
|
||||||
|
==========
|
||||||
|
|
||||||
|
Continue by:
|
||||||
|
|
||||||
|
- learning more about databases in @{article:Database Schema}.
|
Loading…
Reference in a new issue