1
0
Fork 0
mirror of https://we.phorge.it/source/phorge.git synced 2024-12-03 04:02:43 +01:00
phorge-phorge/src/infrastructure/storage/lisk/PhabricatorLiskDAO.php

223 lines
5.3 KiB
PHP
Raw Normal View History

<?php
Add an assocations-like "Edges" framework Summary: We have a lot of cases where we store object relationships, but it's all kind of messy and custom. Some particular problems: - We go to great lengths to enforce order stability in Differential revisions, but the implementation is complex and inelegant. - Some relationships are stored on-object, so we can't pull the inverses easily. For example, Maniphest shows child tasks but not parent tasks. - I want to add more of these and don't want to continue building custom stuff. - UIs like the "attach stuff to other stuff" UI need custom branches for each object type. - Stuff like "allow commits to close tasks" is notrivial because of nonstandard metadata storage. Provide an association-like "edge" framework to fix these problems. This is nearly identical to associations, with a few differences: - I put edge metadata in a separate table and don't load it by default, to keep edge rows small and allow large metadata if necessary. The on-edge metadata seemed to get abused a lot at Facebook. - I put a 'seq' column on the edges to ensure they have an explicit, stable ordering within a source and type. This isn't actually used anywhere yet, but my first target is attaching commits to tasks for T904. Test Plan: Made a mock page that used Editor and Query. Verified adding and removing edges, overwriting edges, writing and loading edge data, sequence number generation. Reviewers: btrahan Reviewed By: btrahan CC: aran, 20after4 Differential Revision: https://secure.phabricator.com/D2088
2012-04-05 00:30:21 +02:00
/**
* @task config Configuring Storage
*/
2011-01-23 02:48:55 +01:00
abstract class PhabricatorLiskDAO extends LiskDAO {
private static $namespaceStack = array();
Add an assocations-like "Edges" framework Summary: We have a lot of cases where we store object relationships, but it's all kind of messy and custom. Some particular problems: - We go to great lengths to enforce order stability in Differential revisions, but the implementation is complex and inelegant. - Some relationships are stored on-object, so we can't pull the inverses easily. For example, Maniphest shows child tasks but not parent tasks. - I want to add more of these and don't want to continue building custom stuff. - UIs like the "attach stuff to other stuff" UI need custom branches for each object type. - Stuff like "allow commits to close tasks" is notrivial because of nonstandard metadata storage. Provide an association-like "edge" framework to fix these problems. This is nearly identical to associations, with a few differences: - I put edge metadata in a separate table and don't load it by default, to keep edge rows small and allow large metadata if necessary. The on-edge metadata seemed to get abused a lot at Facebook. - I put a 'seq' column on the edges to ensure they have an explicit, stable ordering within a source and type. This isn't actually used anywhere yet, but my first target is attaching commits to tasks for T904. Test Plan: Made a mock page that used Editor and Query. Verified adding and removing edges, overwriting edges, writing and loading edge data, sequence number generation. Reviewers: btrahan Reviewed By: btrahan CC: aran, 20after4 Differential Revision: https://secure.phabricator.com/D2088
2012-04-05 00:30:21 +02:00
const ATTACHABLE = '<attachable>';
Add an assocations-like "Edges" framework Summary: We have a lot of cases where we store object relationships, but it's all kind of messy and custom. Some particular problems: - We go to great lengths to enforce order stability in Differential revisions, but the implementation is complex and inelegant. - Some relationships are stored on-object, so we can't pull the inverses easily. For example, Maniphest shows child tasks but not parent tasks. - I want to add more of these and don't want to continue building custom stuff. - UIs like the "attach stuff to other stuff" UI need custom branches for each object type. - Stuff like "allow commits to close tasks" is notrivial because of nonstandard metadata storage. Provide an association-like "edge" framework to fix these problems. This is nearly identical to associations, with a few differences: - I put edge metadata in a separate table and don't load it by default, to keep edge rows small and allow large metadata if necessary. The on-edge metadata seemed to get abused a lot at Facebook. - I put a 'seq' column on the edges to ensure they have an explicit, stable ordering within a source and type. This isn't actually used anywhere yet, but my first target is attaching commits to tasks for T904. Test Plan: Made a mock page that used Editor and Query. Verified adding and removing edges, overwriting edges, writing and loading edge data, sequence number generation. Reviewers: btrahan Reviewed By: btrahan CC: aran, 20after4 Differential Revision: https://secure.phabricator.com/D2088
2012-04-05 00:30:21 +02:00
/* -( Configuring Storage )------------------------------------------------ */
Make SQL patch management DAG-based and provide namespace support Summary: This addresses three issues with the current patch management system: # Two people developing at the same time often pick the same SQL patch number, and then have to go rename it. The system catches this, but it's silly. # Second/third-party developers can't use the same system to manage auxiliary storage they may want to add. # There's no way to build mock databases for unit tests that need to do reads. To resolve these things, you can now name your patches whatever you want and conflicts are just merge conflicts, which are less of a pain to fix than filename conflicts. Dependencies are now a DAG, with implicit dependencies created on the prior patch if no dependencies are specified. Developers can add new concrete subclasses of `PhabricatorSQLPatchList` to add storage management, and define the dependency branchpoint of their patches so they apply in the correct order (although, generally, they should not depend on the mainline patches, presumably). The commands `storage upgrade --namespace test1234` and `storage destroy --namespace test1234` will allow unit tests to build and destroy MySQL storage. A "quickstart" mode allows an upgrade from scratch in ~1200ms. Destruction takes about 200ms. These seem like fairily reasonable costs to actually use in tests. Building from scratch patch-by-patch takes about 6000ms. Test Plan: - Created new databases from scratch with and without quickstart in a separate test namespace. Pointed the webapp at the test namespaces, browsed around, everything looked good. - Compared quickstart and no-quickstart dump states, they're identical except for mysqldump timestamps and a few similar things. - Upgraded a legacy database to the new storage format. - Destroyed / dumped storage. Reviewers: edward, vrana, btrahan, jungejason Reviewed By: btrahan CC: aran, nh Maniphest Tasks: T140, T345 Differential Revision: https://secure.phabricator.com/D2323
2012-04-30 16:54:00 +02:00
/**
* @task config
*/
public static function pushStorageNamespace($namespace) {
self::$namespaceStack[] = $namespace;
Make SQL patch management DAG-based and provide namespace support Summary: This addresses three issues with the current patch management system: # Two people developing at the same time often pick the same SQL patch number, and then have to go rename it. The system catches this, but it's silly. # Second/third-party developers can't use the same system to manage auxiliary storage they may want to add. # There's no way to build mock databases for unit tests that need to do reads. To resolve these things, you can now name your patches whatever you want and conflicts are just merge conflicts, which are less of a pain to fix than filename conflicts. Dependencies are now a DAG, with implicit dependencies created on the prior patch if no dependencies are specified. Developers can add new concrete subclasses of `PhabricatorSQLPatchList` to add storage management, and define the dependency branchpoint of their patches so they apply in the correct order (although, generally, they should not depend on the mainline patches, presumably). The commands `storage upgrade --namespace test1234` and `storage destroy --namespace test1234` will allow unit tests to build and destroy MySQL storage. A "quickstart" mode allows an upgrade from scratch in ~1200ms. Destruction takes about 200ms. These seem like fairily reasonable costs to actually use in tests. Building from scratch patch-by-patch takes about 6000ms. Test Plan: - Created new databases from scratch with and without quickstart in a separate test namespace. Pointed the webapp at the test namespaces, browsed around, everything looked good. - Compared quickstart and no-quickstart dump states, they're identical except for mysqldump timestamps and a few similar things. - Upgraded a legacy database to the new storage format. - Destroyed / dumped storage. Reviewers: edward, vrana, btrahan, jungejason Reviewed By: btrahan CC: aran, nh Maniphest Tasks: T140, T345 Differential Revision: https://secure.phabricator.com/D2323
2012-04-30 16:54:00 +02:00
}
/**
* @task config
*/
public static function popStorageNamespace() {
array_pop(self::$namespaceStack);
}
/**
* @task config
*/
public static function getDefaultStorageNamespace() {
return PhabricatorEnv::getEnvConfig('storage.default-namespace');
}
Add an assocations-like "Edges" framework Summary: We have a lot of cases where we store object relationships, but it's all kind of messy and custom. Some particular problems: - We go to great lengths to enforce order stability in Differential revisions, but the implementation is complex and inelegant. - Some relationships are stored on-object, so we can't pull the inverses easily. For example, Maniphest shows child tasks but not parent tasks. - I want to add more of these and don't want to continue building custom stuff. - UIs like the "attach stuff to other stuff" UI need custom branches for each object type. - Stuff like "allow commits to close tasks" is notrivial because of nonstandard metadata storage. Provide an association-like "edge" framework to fix these problems. This is nearly identical to associations, with a few differences: - I put edge metadata in a separate table and don't load it by default, to keep edge rows small and allow large metadata if necessary. The on-edge metadata seemed to get abused a lot at Facebook. - I put a 'seq' column on the edges to ensure they have an explicit, stable ordering within a source and type. This isn't actually used anywhere yet, but my first target is attaching commits to tasks for T904. Test Plan: Made a mock page that used Editor and Query. Verified adding and removing edges, overwriting edges, writing and loading edge data, sequence number generation. Reviewers: btrahan Reviewed By: btrahan CC: aran, 20after4 Differential Revision: https://secure.phabricator.com/D2088
2012-04-05 00:30:21 +02:00
/**
* @task config
*/
public static function getStorageNamespace() {
$namespace = end(self::$namespaceStack);
if (!strlen($namespace)) {
$namespace = self::getDefaultStorageNamespace();
}
if (!strlen($namespace)) {
throw new Exception('No storage namespace configured!');
}
return $namespace;
}
/**
* @task config
*/
public function establishLiveConnection($mode) {
$namespace = self::getStorageNamespace();
$conf = PhabricatorEnv::newObjectFromConfig(
'mysql.configuration-provider',
array($this, $mode, $namespace));
return PhabricatorEnv::newObjectFromConfig(
'mysql.implementation',
array(
array(
'user' => $conf->getUser(),
'pass' => $conf->getPassword(),
'host' => $conf->getHost(),
'port' => $conf->getPort(),
'database' => $conf->getDatabase(),
'retries' => 3,
),
));
}
Add an assocations-like "Edges" framework Summary: We have a lot of cases where we store object relationships, but it's all kind of messy and custom. Some particular problems: - We go to great lengths to enforce order stability in Differential revisions, but the implementation is complex and inelegant. - Some relationships are stored on-object, so we can't pull the inverses easily. For example, Maniphest shows child tasks but not parent tasks. - I want to add more of these and don't want to continue building custom stuff. - UIs like the "attach stuff to other stuff" UI need custom branches for each object type. - Stuff like "allow commits to close tasks" is notrivial because of nonstandard metadata storage. Provide an association-like "edge" framework to fix these problems. This is nearly identical to associations, with a few differences: - I put edge metadata in a separate table and don't load it by default, to keep edge rows small and allow large metadata if necessary. The on-edge metadata seemed to get abused a lot at Facebook. - I put a 'seq' column on the edges to ensure they have an explicit, stable ordering within a source and type. This isn't actually used anywhere yet, but my first target is attaching commits to tasks for T904. Test Plan: Made a mock page that used Editor and Query. Verified adding and removing edges, overwriting edges, writing and loading edge data, sequence number generation. Reviewers: btrahan Reviewed By: btrahan CC: aran, 20after4 Differential Revision: https://secure.phabricator.com/D2088
2012-04-05 00:30:21 +02:00
/**
* @task config
*/
public function getTableName() {
2011-01-23 02:48:55 +01:00
$str = 'phabricator';
$len = strlen($str);
$class = strtolower(get_class($this));
2011-01-23 02:48:55 +01:00
if (!strncmp($class, $str, $len)) {
$class = substr($class, $len);
}
$app = $this->getApplicationName();
if (!strncmp($class, $app, strlen($app))) {
$class = substr($class, strlen($app));
}
2011-01-23 06:09:13 +01:00
if (strlen($class)) {
return $app.'_'.$class;
} else {
return $app;
}
}
Add an assocations-like "Edges" framework Summary: We have a lot of cases where we store object relationships, but it's all kind of messy and custom. Some particular problems: - We go to great lengths to enforce order stability in Differential revisions, but the implementation is complex and inelegant. - Some relationships are stored on-object, so we can't pull the inverses easily. For example, Maniphest shows child tasks but not parent tasks. - I want to add more of these and don't want to continue building custom stuff. - UIs like the "attach stuff to other stuff" UI need custom branches for each object type. - Stuff like "allow commits to close tasks" is notrivial because of nonstandard metadata storage. Provide an association-like "edge" framework to fix these problems. This is nearly identical to associations, with a few differences: - I put edge metadata in a separate table and don't load it by default, to keep edge rows small and allow large metadata if necessary. The on-edge metadata seemed to get abused a lot at Facebook. - I put a 'seq' column on the edges to ensure they have an explicit, stable ordering within a source and type. This isn't actually used anywhere yet, but my first target is attaching commits to tasks for T904. Test Plan: Made a mock page that used Editor and Query. Verified adding and removing edges, overwriting edges, writing and loading edge data, sequence number generation. Reviewers: btrahan Reviewed By: btrahan CC: aran, 20after4 Differential Revision: https://secure.phabricator.com/D2088
2012-04-05 00:30:21 +02:00
/**
* @task config
*/
abstract public function getApplicationName();
protected function getConnectionNamespace() {
return self::getStorageNamespace().'_'.$this->getApplicationName();
}
/**
* Break a list of escaped SQL statement fragments (e.g., VALUES lists for
* INSERT, previously built with @{function:qsprintf}) into chunks which will
* fit under the MySQL 'max_allowed_packet' limit.
*
* Chunks are glued together with `$glue`, by default ", ".
*
* If a statement is too large to fit within the limit, it is broken into
* its own chunk (but might fail when the query executes).
*/
public static function chunkSQL(
array $fragments,
$glue = ', ',
$limit = null) {
if ($limit === null) {
// NOTE: Hard-code this at 1MB for now, minus a 10% safety buffer.
// Eventually we could query MySQL or let the user configure it.
$limit = (int)((1024 * 1024) * 0.90);
}
$result = array();
$chunk = array();
$len = 0;
$glue_len = strlen($glue);
foreach ($fragments as $fragment) {
$this_len = strlen($fragment);
if ($chunk) {
// Chunks after the first also imply glue.
$this_len += $glue_len;
}
if ($len + $this_len <= $limit) {
$len += $this_len;
$chunk[] = $fragment;
} else {
if ($chunk) {
$result[] = $chunk;
}
$len = strlen($fragment);
$chunk = array($fragment);
}
}
if ($chunk) {
$result[] = $chunk;
}
foreach ($result as $key => $fragment_list) {
$result[$key] = implode($glue, $fragment_list);
}
return $result;
}
protected function assertAttached($property) {
if ($property === self::ATTACHABLE) {
throw new PhabricatorDataNotAttachedException($this);
}
return $property;
}
protected function assertAttachedKey($value, $key) {
$this->assertAttached($value);
if (!array_key_exists($key, $value)) {
throw new PhabricatorDataNotAttachedException($this);
}
return $value[$key];
}
Introduce ref cursors for repository parsing Summary: Ref T4327. I want to make change parsing testable; one thing which is blocking this is that the Git discovery process is still part of `PullLocal` daemon instead of being part of `DiscoveryEngine`. The unit test stuff which I want to use for change parsing relies on `DiscoveryEngine` to discover repositories during unit tests. The major reason git discovery isn't part of `DiscoveryEngine` is that it relies on the messy "autoclose" logic, which we never implemented for Mercurial. Generally, I don't like how autoclose was implemented: it's complicated and gross and too hard to figure out and extend. Instead, I want to do something more similar to what we do for pushes, which is cleaner overall. Basically this means remembering the old branch heads from the last time we parsed a repository, and figuring out what's new by comparing the old and new branch heads. This should give us several advantages: - It should be simpler to understand than the autoclose stuff, which is pretty mind-numbing, at least for me. - It will let us satisfy branch and tag queries cheaply (from the database) instead of having to go to the repository. We could also satisfy some ref-resolve queries from the database. - It should be easier to extend to Mercurial. This implements the basics -- pretty much a table to store the cursors, which we update only for Git for now. Test Plan: - Ran migration. - Ran `bin/repository discover X --trace --verbose` on various repositories with branches and tags, before and after modifying pushes. - Pushed commits to a git repo. - Looked at database tables. Reviewers: btrahan Reviewed By: btrahan CC: aran Maniphest Tasks: T4327 Differential Revision: https://secure.phabricator.com/D7982
2014-01-16 21:05:30 +01:00
protected function detectEncodingForStorage($string) {
return phutil_is_utf8($string) ? 'utf8' : null;
}
protected function getUTF8StringFromStorage($string, $encoding) {
if ($encoding == 'utf8') {
return $string;
}
if (function_exists('mb_detect_encoding')) {
if (strlen($encoding)) {
$try_encodings = array(
$encoding,
);
} else {
// TODO: This is pretty much a guess, and probably needs to be
// configurable in the long run.
$try_encodings = array(
'JIS',
'EUC-JP',
'SJIS',
'ISO-8859-1',
);
}
$guess = mb_detect_encoding($string, $try_encodings);
if ($guess) {
return mb_convert_encoding($string, 'UTF-8', $guess);
}
}
Introduce ref cursors for repository parsing Summary: Ref T4327. I want to make change parsing testable; one thing which is blocking this is that the Git discovery process is still part of `PullLocal` daemon instead of being part of `DiscoveryEngine`. The unit test stuff which I want to use for change parsing relies on `DiscoveryEngine` to discover repositories during unit tests. The major reason git discovery isn't part of `DiscoveryEngine` is that it relies on the messy "autoclose" logic, which we never implemented for Mercurial. Generally, I don't like how autoclose was implemented: it's complicated and gross and too hard to figure out and extend. Instead, I want to do something more similar to what we do for pushes, which is cleaner overall. Basically this means remembering the old branch heads from the last time we parsed a repository, and figuring out what's new by comparing the old and new branch heads. This should give us several advantages: - It should be simpler to understand than the autoclose stuff, which is pretty mind-numbing, at least for me. - It will let us satisfy branch and tag queries cheaply (from the database) instead of having to go to the repository. We could also satisfy some ref-resolve queries from the database. - It should be easier to extend to Mercurial. This implements the basics -- pretty much a table to store the cursors, which we update only for Git for now. Test Plan: - Ran migration. - Ran `bin/repository discover X --trace --verbose` on various repositories with branches and tags, before and after modifying pushes. - Pushed commits to a git repo. - Looked at database tables. Reviewers: btrahan Reviewed By: btrahan CC: aran Maniphest Tasks: T4327 Differential Revision: https://secure.phabricator.com/D7982
2014-01-16 21:05:30 +01:00
return phutil_utf8ize($string);
}
public function delete() {
// TODO: We should make some reasonable effort to destroy related
// infrastructure objects here, like edges, transactions, custom field
// storage, flags, Phrequent tracking, tokens, etc. This doesn't need to
// be exhaustive, but we can get a lot of it pretty easily.
return parent::delete();
}
}