2011-01-16 22:51:39 +01:00
|
|
|
<?php
|
|
|
|
|
Add an assocations-like "Edges" framework
Summary:
We have a lot of cases where we store object relationships, but it's all kind of messy and custom. Some particular problems:
- We go to great lengths to enforce order stability in Differential revisions, but the implementation is complex and inelegant.
- Some relationships are stored on-object, so we can't pull the inverses easily. For example, Maniphest shows child tasks but not parent tasks.
- I want to add more of these and don't want to continue building custom stuff.
- UIs like the "attach stuff to other stuff" UI need custom branches for each object type.
- Stuff like "allow commits to close tasks" is notrivial because of nonstandard metadata storage.
Provide an association-like "edge" framework to fix these problems. This is nearly identical to associations, with a few differences:
- I put edge metadata in a separate table and don't load it by default, to keep edge rows small and allow large metadata if necessary. The on-edge metadata seemed to get abused a lot at Facebook.
- I put a 'seq' column on the edges to ensure they have an explicit, stable ordering within a source and type.
This isn't actually used anywhere yet, but my first target is attaching commits to tasks for T904.
Test Plan: Made a mock page that used Editor and Query. Verified adding and removing edges, overwriting edges, writing and loading edge data, sequence number generation.
Reviewers: btrahan
Reviewed By: btrahan
CC: aran, 20after4
Differential Revision: https://secure.phabricator.com/D2088
2012-04-05 00:30:21 +02:00
|
|
|
/**
|
|
|
|
* @task config Configuring Storage
|
|
|
|
*/
|
2011-01-23 02:48:55 +01:00
|
|
|
abstract class PhabricatorLiskDAO extends LiskDAO {
|
2011-01-16 22:51:39 +01:00
|
|
|
|
2012-04-30 20:56:58 +02:00
|
|
|
private static $namespaceStack = array();
|
Add an assocations-like "Edges" framework
Summary:
We have a lot of cases where we store object relationships, but it's all kind of messy and custom. Some particular problems:
- We go to great lengths to enforce order stability in Differential revisions, but the implementation is complex and inelegant.
- Some relationships are stored on-object, so we can't pull the inverses easily. For example, Maniphest shows child tasks but not parent tasks.
- I want to add more of these and don't want to continue building custom stuff.
- UIs like the "attach stuff to other stuff" UI need custom branches for each object type.
- Stuff like "allow commits to close tasks" is notrivial because of nonstandard metadata storage.
Provide an association-like "edge" framework to fix these problems. This is nearly identical to associations, with a few differences:
- I put edge metadata in a separate table and don't load it by default, to keep edge rows small and allow large metadata if necessary. The on-edge metadata seemed to get abused a lot at Facebook.
- I put a 'seq' column on the edges to ensure they have an explicit, stable ordering within a source and type.
This isn't actually used anywhere yet, but my first target is attaching commits to tasks for T904.
Test Plan: Made a mock page that used Editor and Query. Verified adding and removing edges, overwriting edges, writing and loading edge data, sequence number generation.
Reviewers: btrahan
Reviewed By: btrahan
CC: aran, 20after4
Differential Revision: https://secure.phabricator.com/D2088
2012-04-05 00:30:21 +02:00
|
|
|
|
2014-06-09 20:36:49 +02:00
|
|
|
const ATTACHABLE = '<attachable>';
|
Add an assocations-like "Edges" framework
Summary:
We have a lot of cases where we store object relationships, but it's all kind of messy and custom. Some particular problems:
- We go to great lengths to enforce order stability in Differential revisions, but the implementation is complex and inelegant.
- Some relationships are stored on-object, so we can't pull the inverses easily. For example, Maniphest shows child tasks but not parent tasks.
- I want to add more of these and don't want to continue building custom stuff.
- UIs like the "attach stuff to other stuff" UI need custom branches for each object type.
- Stuff like "allow commits to close tasks" is notrivial because of nonstandard metadata storage.
Provide an association-like "edge" framework to fix these problems. This is nearly identical to associations, with a few differences:
- I put edge metadata in a separate table and don't load it by default, to keep edge rows small and allow large metadata if necessary. The on-edge metadata seemed to get abused a lot at Facebook.
- I put a 'seq' column on the edges to ensure they have an explicit, stable ordering within a source and type.
This isn't actually used anywhere yet, but my first target is attaching commits to tasks for T904.
Test Plan: Made a mock page that used Editor and Query. Verified adding and removing edges, overwriting edges, writing and loading edge data, sequence number generation.
Reviewers: btrahan
Reviewed By: btrahan
CC: aran, 20after4
Differential Revision: https://secure.phabricator.com/D2088
2012-04-05 00:30:21 +02:00
|
|
|
|
|
|
|
/* -( Configuring Storage )------------------------------------------------ */
|
|
|
|
|
Make SQL patch management DAG-based and provide namespace support
Summary:
This addresses three issues with the current patch management system:
# Two people developing at the same time often pick the same SQL patch number, and then have to go rename it. The system catches this, but it's silly.
# Second/third-party developers can't use the same system to manage auxiliary storage they may want to add.
# There's no way to build mock databases for unit tests that need to do reads.
To resolve these things, you can now name your patches whatever you want and conflicts are just merge conflicts, which are less of a pain to fix than filename conflicts.
Dependencies are now a DAG, with implicit dependencies created on the prior patch if no dependencies are specified. Developers can add new concrete subclasses of `PhabricatorSQLPatchList` to add storage management, and define the dependency branchpoint of their patches so they apply in the correct order (although, generally, they should not depend on the mainline patches, presumably).
The commands `storage upgrade --namespace test1234` and `storage destroy --namespace test1234` will allow unit tests to build and destroy MySQL storage.
A "quickstart" mode allows an upgrade from scratch in ~1200ms. Destruction takes about 200ms. These seem like fairily reasonable costs to actually use in tests. Building from scratch patch-by-patch takes about 6000ms.
Test Plan:
- Created new databases from scratch with and without quickstart in a separate test namespace. Pointed the webapp at the test namespaces, browsed around, everything looked good.
- Compared quickstart and no-quickstart dump states, they're identical except for mysqldump timestamps and a few similar things.
- Upgraded a legacy database to the new storage format.
- Destroyed / dumped storage.
Reviewers: edward, vrana, btrahan, jungejason
Reviewed By: btrahan
CC: aran, nh
Maniphest Tasks: T140, T345
Differential Revision: https://secure.phabricator.com/D2323
2012-04-30 16:54:00 +02:00
|
|
|
/**
|
|
|
|
* @task config
|
|
|
|
*/
|
2012-04-30 20:56:58 +02:00
|
|
|
public static function pushStorageNamespace($namespace) {
|
|
|
|
self::$namespaceStack[] = $namespace;
|
Make SQL patch management DAG-based and provide namespace support
Summary:
This addresses three issues with the current patch management system:
# Two people developing at the same time often pick the same SQL patch number, and then have to go rename it. The system catches this, but it's silly.
# Second/third-party developers can't use the same system to manage auxiliary storage they may want to add.
# There's no way to build mock databases for unit tests that need to do reads.
To resolve these things, you can now name your patches whatever you want and conflicts are just merge conflicts, which are less of a pain to fix than filename conflicts.
Dependencies are now a DAG, with implicit dependencies created on the prior patch if no dependencies are specified. Developers can add new concrete subclasses of `PhabricatorSQLPatchList` to add storage management, and define the dependency branchpoint of their patches so they apply in the correct order (although, generally, they should not depend on the mainline patches, presumably).
The commands `storage upgrade --namespace test1234` and `storage destroy --namespace test1234` will allow unit tests to build and destroy MySQL storage.
A "quickstart" mode allows an upgrade from scratch in ~1200ms. Destruction takes about 200ms. These seem like fairily reasonable costs to actually use in tests. Building from scratch patch-by-patch takes about 6000ms.
Test Plan:
- Created new databases from scratch with and without quickstart in a separate test namespace. Pointed the webapp at the test namespaces, browsed around, everything looked good.
- Compared quickstart and no-quickstart dump states, they're identical except for mysqldump timestamps and a few similar things.
- Upgraded a legacy database to the new storage format.
- Destroyed / dumped storage.
Reviewers: edward, vrana, btrahan, jungejason
Reviewed By: btrahan
CC: aran, nh
Maniphest Tasks: T140, T345
Differential Revision: https://secure.phabricator.com/D2323
2012-04-30 16:54:00 +02:00
|
|
|
}
|
|
|
|
|
2012-04-30 20:56:58 +02:00
|
|
|
/**
|
|
|
|
* @task config
|
|
|
|
*/
|
2012-05-02 21:42:23 +02:00
|
|
|
public static function popStorageNamespace() {
|
2012-04-30 20:56:58 +02:00
|
|
|
array_pop(self::$namespaceStack);
|
|
|
|
}
|
|
|
|
|
|
|
|
/**
|
|
|
|
* @task config
|
|
|
|
*/
|
|
|
|
public static function getDefaultStorageNamespace() {
|
|
|
|
return PhabricatorEnv::getEnvConfig('storage.default-namespace');
|
|
|
|
}
|
Add an assocations-like "Edges" framework
Summary:
We have a lot of cases where we store object relationships, but it's all kind of messy and custom. Some particular problems:
- We go to great lengths to enforce order stability in Differential revisions, but the implementation is complex and inelegant.
- Some relationships are stored on-object, so we can't pull the inverses easily. For example, Maniphest shows child tasks but not parent tasks.
- I want to add more of these and don't want to continue building custom stuff.
- UIs like the "attach stuff to other stuff" UI need custom branches for each object type.
- Stuff like "allow commits to close tasks" is notrivial because of nonstandard metadata storage.
Provide an association-like "edge" framework to fix these problems. This is nearly identical to associations, with a few differences:
- I put edge metadata in a separate table and don't load it by default, to keep edge rows small and allow large metadata if necessary. The on-edge metadata seemed to get abused a lot at Facebook.
- I put a 'seq' column on the edges to ensure they have an explicit, stable ordering within a source and type.
This isn't actually used anywhere yet, but my first target is attaching commits to tasks for T904.
Test Plan: Made a mock page that used Editor and Query. Verified adding and removing edges, overwriting edges, writing and loading edge data, sequence number generation.
Reviewers: btrahan
Reviewed By: btrahan
CC: aran, 20after4
Differential Revision: https://secure.phabricator.com/D2088
2012-04-05 00:30:21 +02:00
|
|
|
|
|
|
|
/**
|
|
|
|
* @task config
|
|
|
|
*/
|
2012-04-30 21:14:37 +02:00
|
|
|
public static function getStorageNamespace() {
|
2012-04-30 20:56:58 +02:00
|
|
|
$namespace = end(self::$namespaceStack);
|
|
|
|
if (!strlen($namespace)) {
|
|
|
|
$namespace = self::getDefaultStorageNamespace();
|
|
|
|
}
|
|
|
|
if (!strlen($namespace)) {
|
2014-06-09 20:36:49 +02:00
|
|
|
throw new Exception('No storage namespace configured!');
|
2012-04-30 20:56:58 +02:00
|
|
|
}
|
2012-04-30 21:14:37 +02:00
|
|
|
return $namespace;
|
|
|
|
}
|
|
|
|
|
|
|
|
/**
|
|
|
|
* @task config
|
|
|
|
*/
|
|
|
|
public function establishLiveConnection($mode) {
|
|
|
|
$namespace = self::getStorageNamespace();
|
2012-04-30 20:56:58 +02:00
|
|
|
|
2012-04-06 23:42:02 +02:00
|
|
|
$conf = PhabricatorEnv::newObjectFromConfig(
|
|
|
|
'mysql.configuration-provider',
|
2012-04-30 20:56:58 +02:00
|
|
|
array($this, $mode, $namespace));
|
2011-04-30 02:23:25 +02:00
|
|
|
|
2012-04-07 06:29:19 +02:00
|
|
|
return PhabricatorEnv::newObjectFromConfig(
|
|
|
|
'mysql.implementation',
|
2011-01-16 22:51:39 +01:00
|
|
|
array(
|
2012-04-07 06:29:19 +02:00
|
|
|
array(
|
|
|
|
'user' => $conf->getUser(),
|
|
|
|
'pass' => $conf->getPassword(),
|
|
|
|
'host' => $conf->getHost(),
|
2013-07-15 01:02:12 +02:00
|
|
|
'port' => $conf->getPort(),
|
2012-04-07 06:29:19 +02:00
|
|
|
'database' => $conf->getDatabase(),
|
2013-01-03 15:01:14 +01:00
|
|
|
'retries' => 3,
|
2012-04-07 06:29:19 +02:00
|
|
|
),
|
2011-01-16 22:51:39 +01:00
|
|
|
));
|
|
|
|
}
|
|
|
|
|
Add an assocations-like "Edges" framework
Summary:
We have a lot of cases where we store object relationships, but it's all kind of messy and custom. Some particular problems:
- We go to great lengths to enforce order stability in Differential revisions, but the implementation is complex and inelegant.
- Some relationships are stored on-object, so we can't pull the inverses easily. For example, Maniphest shows child tasks but not parent tasks.
- I want to add more of these and don't want to continue building custom stuff.
- UIs like the "attach stuff to other stuff" UI need custom branches for each object type.
- Stuff like "allow commits to close tasks" is notrivial because of nonstandard metadata storage.
Provide an association-like "edge" framework to fix these problems. This is nearly identical to associations, with a few differences:
- I put edge metadata in a separate table and don't load it by default, to keep edge rows small and allow large metadata if necessary. The on-edge metadata seemed to get abused a lot at Facebook.
- I put a 'seq' column on the edges to ensure they have an explicit, stable ordering within a source and type.
This isn't actually used anywhere yet, but my first target is attaching commits to tasks for T904.
Test Plan: Made a mock page that used Editor and Query. Verified adding and removing edges, overwriting edges, writing and loading edge data, sequence number generation.
Reviewers: btrahan
Reviewed By: btrahan
CC: aran, 20after4
Differential Revision: https://secure.phabricator.com/D2088
2012-04-05 00:30:21 +02:00
|
|
|
/**
|
|
|
|
* @task config
|
|
|
|
*/
|
2011-01-16 22:51:39 +01:00
|
|
|
public function getTableName() {
|
2011-01-23 02:48:55 +01:00
|
|
|
$str = 'phabricator';
|
|
|
|
$len = strlen($str);
|
|
|
|
|
2011-01-16 22:51:39 +01:00
|
|
|
$class = strtolower(get_class($this));
|
2011-01-23 02:48:55 +01:00
|
|
|
if (!strncmp($class, $str, $len)) {
|
|
|
|
$class = substr($class, $len);
|
2011-01-16 22:51:39 +01:00
|
|
|
}
|
|
|
|
$app = $this->getApplicationName();
|
|
|
|
if (!strncmp($class, $app, strlen($app))) {
|
|
|
|
$class = substr($class, strlen($app));
|
|
|
|
}
|
2011-01-23 06:09:13 +01:00
|
|
|
|
|
|
|
if (strlen($class)) {
|
|
|
|
return $app.'_'.$class;
|
|
|
|
} else {
|
|
|
|
return $app;
|
|
|
|
}
|
2011-01-16 22:51:39 +01:00
|
|
|
}
|
|
|
|
|
Add an assocations-like "Edges" framework
Summary:
We have a lot of cases where we store object relationships, but it's all kind of messy and custom. Some particular problems:
- We go to great lengths to enforce order stability in Differential revisions, but the implementation is complex and inelegant.
- Some relationships are stored on-object, so we can't pull the inverses easily. For example, Maniphest shows child tasks but not parent tasks.
- I want to add more of these and don't want to continue building custom stuff.
- UIs like the "attach stuff to other stuff" UI need custom branches for each object type.
- Stuff like "allow commits to close tasks" is notrivial because of nonstandard metadata storage.
Provide an association-like "edge" framework to fix these problems. This is nearly identical to associations, with a few differences:
- I put edge metadata in a separate table and don't load it by default, to keep edge rows small and allow large metadata if necessary. The on-edge metadata seemed to get abused a lot at Facebook.
- I put a 'seq' column on the edges to ensure they have an explicit, stable ordering within a source and type.
This isn't actually used anywhere yet, but my first target is attaching commits to tasks for T904.
Test Plan: Made a mock page that used Editor and Query. Verified adding and removing edges, overwriting edges, writing and loading edge data, sequence number generation.
Reviewers: btrahan
Reviewed By: btrahan
CC: aran, 20after4
Differential Revision: https://secure.phabricator.com/D2088
2012-04-05 00:30:21 +02:00
|
|
|
/**
|
|
|
|
* @task config
|
|
|
|
*/
|
2011-01-16 22:51:39 +01:00
|
|
|
abstract public function getApplicationName();
|
2012-04-30 20:57:10 +02:00
|
|
|
|
|
|
|
protected function getConnectionNamespace() {
|
2012-04-30 21:14:37 +02:00
|
|
|
return self::getStorageNamespace().'_'.$this->getApplicationName();
|
2012-04-30 20:57:10 +02:00
|
|
|
}
|
Implement a more compact, general database-backed key-value cache
Summary:
See discussion in D4204. Facebook currently has a 314MB remarkup cache with a 55MB index, which is slow to access. Under the theory that this is an index size/quality problem (the current index is on a potentially-384-byte field, with many keys sharing prefixes), provide a more general index with fancy new features:
- It implements PhutilKeyValueCache, so it can be a component in cache stacks and supports TTL.
- It has a 12-byte hash-based key.
- It automatically compresses large blocks of data (most of what we store is highly-compressible HTML).
Test Plan:
- Basics:
- Loaded /paste/, saw caches generate and save.
- Reloaded /paste/, saw the page hit cache.
- GC:
- Ran GC daemon, saw nothing.
- Set maximum lifetime to 1 second, ran GC daemon, saw it collect the entire cache.
- Deflate:
- Selected row formats from the database, saw a mixture of 'raw' and 'deflate' storage.
- Used profiler to verify that 'deflate' is fast (12 calls @ 220us on my paste list).
- Ran unit tests
Reviewers: vrana, btrahan
Reviewed By: vrana
CC: aran
Differential Revision: https://secure.phabricator.com/D4259
2012-12-21 23:17:56 +01:00
|
|
|
|
|
|
|
|
|
|
|
/**
|
|
|
|
* Break a list of escaped SQL statement fragments (e.g., VALUES lists for
|
|
|
|
* INSERT, previously built with @{function:qsprintf}) into chunks which will
|
|
|
|
* fit under the MySQL 'max_allowed_packet' limit.
|
|
|
|
*
|
|
|
|
* Chunks are glued together with `$glue`, by default ", ".
|
|
|
|
*
|
|
|
|
* If a statement is too large to fit within the limit, it is broken into
|
|
|
|
* its own chunk (but might fail when the query executes).
|
|
|
|
*/
|
|
|
|
public static function chunkSQL(
|
|
|
|
array $fragments,
|
|
|
|
$glue = ', ',
|
|
|
|
$limit = null) {
|
|
|
|
|
|
|
|
if ($limit === null) {
|
|
|
|
// NOTE: Hard-code this at 1MB for now, minus a 10% safety buffer.
|
|
|
|
// Eventually we could query MySQL or let the user configure it.
|
|
|
|
$limit = (int)((1024 * 1024) * 0.90);
|
|
|
|
}
|
|
|
|
|
|
|
|
$result = array();
|
|
|
|
|
|
|
|
$chunk = array();
|
|
|
|
$len = 0;
|
|
|
|
$glue_len = strlen($glue);
|
|
|
|
foreach ($fragments as $fragment) {
|
|
|
|
$this_len = strlen($fragment);
|
|
|
|
|
|
|
|
if ($chunk) {
|
|
|
|
// Chunks after the first also imply glue.
|
|
|
|
$this_len += $glue_len;
|
|
|
|
}
|
|
|
|
|
|
|
|
if ($len + $this_len <= $limit) {
|
|
|
|
$len += $this_len;
|
|
|
|
$chunk[] = $fragment;
|
|
|
|
} else {
|
|
|
|
if ($chunk) {
|
|
|
|
$result[] = $chunk;
|
|
|
|
}
|
|
|
|
$len = strlen($fragment);
|
|
|
|
$chunk = array($fragment);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
if ($chunk) {
|
|
|
|
$result[] = $chunk;
|
|
|
|
}
|
|
|
|
|
|
|
|
foreach ($result as $key => $fragment_list) {
|
|
|
|
$result[$key] = implode($glue, $fragment_list);
|
|
|
|
}
|
|
|
|
|
|
|
|
return $result;
|
|
|
|
}
|
|
|
|
|
2013-07-21 18:27:00 +02:00
|
|
|
protected function assertAttached($property) {
|
|
|
|
if ($property === self::ATTACHABLE) {
|
|
|
|
throw new PhabricatorDataNotAttachedException($this);
|
|
|
|
}
|
|
|
|
return $property;
|
|
|
|
}
|
|
|
|
|
2013-09-03 15:02:14 +02:00
|
|
|
protected function assertAttachedKey($value, $key) {
|
|
|
|
$this->assertAttached($value);
|
|
|
|
if (!array_key_exists($key, $value)) {
|
|
|
|
throw new PhabricatorDataNotAttachedException($this);
|
|
|
|
}
|
|
|
|
return $value[$key];
|
|
|
|
}
|
|
|
|
|
2014-01-16 21:05:30 +01:00
|
|
|
protected function detectEncodingForStorage($string) {
|
|
|
|
return phutil_is_utf8($string) ? 'utf8' : null;
|
|
|
|
}
|
|
|
|
|
|
|
|
protected function getUTF8StringFromStorage($string, $encoding) {
|
|
|
|
if ($encoding == 'utf8') {
|
|
|
|
return $string;
|
|
|
|
}
|
2014-06-20 20:49:41 +02:00
|
|
|
|
|
|
|
if (function_exists('mb_detect_encoding')) {
|
|
|
|
if (strlen($encoding)) {
|
|
|
|
$try_encodings = array(
|
|
|
|
$encoding,
|
|
|
|
);
|
|
|
|
} else {
|
|
|
|
// TODO: This is pretty much a guess, and probably needs to be
|
|
|
|
// configurable in the long run.
|
|
|
|
$try_encodings = array(
|
|
|
|
'JIS',
|
|
|
|
'EUC-JP',
|
|
|
|
'SJIS',
|
|
|
|
'ISO-8859-1',
|
|
|
|
);
|
|
|
|
}
|
|
|
|
|
|
|
|
$guess = mb_detect_encoding($string, $try_encodings);
|
|
|
|
if ($guess) {
|
|
|
|
return mb_convert_encoding($string, 'UTF-8', $guess);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2014-01-16 21:05:30 +01:00
|
|
|
return phutil_utf8ize($string);
|
|
|
|
}
|
|
|
|
|
2014-03-09 21:44:54 +01:00
|
|
|
public function delete() {
|
|
|
|
|
|
|
|
// TODO: We should make some reasonable effort to destroy related
|
|
|
|
// infrastructure objects here, like edges, transactions, custom field
|
|
|
|
// storage, flags, Phrequent tracking, tokens, etc. This doesn't need to
|
|
|
|
// be exhaustive, but we can get a lot of it pretty easily.
|
|
|
|
|
|
|
|
return parent::delete();
|
|
|
|
}
|
|
|
|
|
2011-01-16 22:51:39 +01:00
|
|
|
}
|