1
0
Fork 0
mirror of https://we.phorge.it/source/phorge.git synced 2024-09-19 16:58:48 +02:00

Add a basic "fact" application

Summary:
Basic "Fact" application with some storage, part of a daemon, and a control binary.

= Goals =

The general idea is that we have various statistics we'd like to compute, like the frequency of image macros, reviewer responsiveness, task close rates, etc. Computing these on page load is expensive and messy. By building an ETL pipeline and running it in a daemon, we can precompute statistics and just pull them out of "stats" tables.

One way to do this is just to completely hard-code everything, e.g. have a daemon that runs every hour which issues a big-ass query and dumps results into a table per-fact or per fact-group. But this has a bunch of drawbacks: adding new stuff to the pipeline is a pain, various fact aggregators can't share much code, updates are slow and expensive, we can never build generic graphs on top of it, etc.

I'm hoping to build an ETL pipeline which is generic enough that we can use it for most things we're interested in without needing schema changes, and so that installs can use it also without needing schema changes, while still being specific enough that it's fast and we can build useful stuff on top of it. I'm not sure if this will actually work, but it would be cool if it does so I'm starting pretty generally and we'll see how far I get. I haven't built this exact sort of thing before so I might be way off.

I'm basing the whole thing on analyzing entire objects, not analyzing changes to objects. So each part of the pipeline is handed an object and told "analyze this", not handed a change. It pretty much deletes all the old data about that thing and then writes new data. I think this is simpler to implement and understand, and it protects us from all sorts of weird issues where we end up with some kind of garbage in the DB and have to wipe the whole thing.

= Facts =

The general idea is that we extract "facts" out of objects, and then the various view interfaces just report those facts. This change has on type of fact, a "raw fact", which is directly derived from an object. These facts are concerete and relate specifically to the object they are derived from. Some examples of such facts might be:

  D123 has 9 comments.
  D123 uses macro "psyduck" 15 times.
  D123 adds 35 lines.
  D123 has 5 files.
  D123 has 1 object.
  D123 has 1 object of type "DREV".
  D123 was created at epoch timestamp 89812351235.
  D123 was accepted by @alincoln at epoch timestamp 8397981839.

The fact storage looks like this:

  <factType, objectPHID, objectA, valueX, valueY, epoch>

Currently, we supprot one optional secondary key (like a user PHID or macro PHID), two optional integer values, and an optional timestamp. We might add more later. Each fact type can use these fields if it wants. Some facts use them, others don't. For instance, this diff adds a "N:*" fact, which is just the count of total objects in the system. These facts just look like:

  <"N:*", "PHID-xxxx-yyyy", ...>

...where all other fields are ignored. But some of the more complex facts might look like:

  <"DREV:accept", "PHID-DREV-xxxx", "PHID-USER-yyyy", ..., ..., nnnn> # User 'yyyy' accepted at epoch 'nnnn'.
  <"FILE:macro", "PHID-DREV-xxxx", "PHID-MACR-yyyy", 17, ..., ...> # Object 'xxxx' uses macro 'yyyy' 17 times.

Facts have no uniqueness constraints. For @vrana's reviewer responsiveness stuff, we can insert multiple rows for each reviewer, e.g.

  <"DREV:reviewed", "PHID-DREV-xxxx", "PHID-USER-yyyy", nnnn, ..., mmmm> # User 'yyyy' reviewed revision 'xxxx' after 'nnnn' seconds at 'mmmm'.

The second value (valueY) is mostly because we need it if we sample anything (valueX = observed value, valueY = sample rate) but there might be other uses. We might need to add "objectB" at some point too -- currently we can't represent a fact like "User X used macro Y on revision Z", so it would be impossible to compute macro use rates //for a specific user// based on this schema. I think we can start here though and see how far we get.

= Aggregated Facts =

These aren't implemented yet, but the idea is that we can then take the "raw facts" and compute derived/aggregated/rollup facts based on the raw fact table. For example, the "count" fact can be aggregated to arrive at a count of all objects in the system. This stuff will live in a separate table which does have uniqueness constraints, and come in the next diff.

We might need some kind of time series facts too, not sure about that. I think most of our use cases today are covered by raw facts + aggregated facts.

Test Plan: Ran `bin/fact` commands and verified they seemed to do reasonable things.

Reviewers: vrana, btrahan

Reviewed By: vrana

CC: aran, majak

Maniphest Tasks: T1562

Differential Revision: https://secure.phabricator.com/D3078
This commit is contained in:
epriestley 2012-07-27 13:34:21 -07:00
parent 14e3e5ccfd
commit 7c934e4176
15 changed files with 571 additions and 3 deletions

1
bin/fact Symbolic link
View file

@ -0,0 +1 @@
../scripts/fact/manage_facts.php

42
scripts/fact/manage_facts.php Executable file
View file

@ -0,0 +1,42 @@
#!/usr/bin/env php
<?php
/*
* Copyright 2012 Facebook, Inc.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
$root = dirname(dirname(dirname(__FILE__)));
require_once $root.'/scripts/__init_script__.php';
$args = new PhutilArgumentParser($argv);
$args->setTagline('manage fact configuration');
$args->setSynopsis(<<<EOSYNOPSIS
**fact** __command__ [__options__]
Manage and debug Phabricator data extraction, storage and
configuration used to compute statistics.
EOSYNOPSIS
);
$args->parseStandardArguments();
$workflows = array(
new PhabricatorFactManagementDestroyWorkflow(),
new PhabricatorFactManagementAnalyzeWorkflow(),
new PhabricatorFactManagementStatusWorkflow(),
new PhabricatorFactManagementListWorkflow(),
new PhutilHelpArgumentWorkflow(),
);
$args->parseWorkflows($workflows);

View file

@ -624,7 +624,17 @@ phutil_register_library_map(array(
'PhabricatorEventEngine' => 'infrastructure/events/PhabricatorEventEngine.php', 'PhabricatorEventEngine' => 'infrastructure/events/PhabricatorEventEngine.php',
'PhabricatorEventType' => 'infrastructure/events/constant/PhabricatorEventType.php', 'PhabricatorEventType' => 'infrastructure/events/constant/PhabricatorEventType.php',
'PhabricatorExampleEventListener' => 'infrastructure/events/PhabricatorExampleEventListener.php', 'PhabricatorExampleEventListener' => 'infrastructure/events/PhabricatorExampleEventListener.php',
'PhabricatorFactsUpdateIterator' => 'applications/facts/extract/PhabricatorFactsUpdateIterator.php', 'PhabricatorFactCountEngine' => 'applications/fact/engine/PhabricatorFactCountEngine.php',
'PhabricatorFactDAO' => 'applications/fact/storage/PhabricatorFactDAO.php',
'PhabricatorFactDaemon' => 'applications/fact/daemon/PhabricatorFactDaemon.php',
'PhabricatorFactEngine' => 'applications/fact/engine/PhabricatorFactEngine.php',
'PhabricatorFactManagementAnalyzeWorkflow' => 'applications/fact/management/PhabricatorFactManagementAnalyzeWorkflow.php',
'PhabricatorFactManagementDestroyWorkflow' => 'applications/fact/management/PhabricatorFactManagementDestroyWorkflow.php',
'PhabricatorFactManagementListWorkflow' => 'applications/fact/management/PhabricatorFactManagementListWorkflow.php',
'PhabricatorFactManagementStatusWorkflow' => 'applications/fact/management/PhabricatorFactManagementStatusWorkflow.php',
'PhabricatorFactManagementWorkflow' => 'applications/fact/management/PhabricatorFactManagementWorkflow.php',
'PhabricatorFactRaw' => 'applications/fact/storage/PhabricatorFactRaw.php',
'PhabricatorFactUpdateIterator' => 'applications/fact/extract/PhabricatorFactUpdateIterator.php',
'PhabricatorFeedBuilder' => 'applications/feed/builder/PhabricatorFeedBuilder.php', 'PhabricatorFeedBuilder' => 'applications/feed/builder/PhabricatorFeedBuilder.php',
'PhabricatorFeedConstants' => 'applications/feed/constants/PhabricatorFeedConstants.php', 'PhabricatorFeedConstants' => 'applications/feed/constants/PhabricatorFeedConstants.php',
'PhabricatorFeedController' => 'applications/feed/controller/PhabricatorFeedController.php', 'PhabricatorFeedController' => 'applications/feed/controller/PhabricatorFeedController.php',
@ -1650,7 +1660,16 @@ phutil_register_library_map(array(
'PhabricatorEvent' => 'PhutilEvent', 'PhabricatorEvent' => 'PhutilEvent',
'PhabricatorEventType' => 'PhutilEventType', 'PhabricatorEventType' => 'PhutilEventType',
'PhabricatorExampleEventListener' => 'PhutilEventListener', 'PhabricatorExampleEventListener' => 'PhutilEventListener',
'PhabricatorFactsUpdateIterator' => 'PhutilBufferedIterator', 'PhabricatorFactCountEngine' => 'PhabricatorFactEngine',
'PhabricatorFactDAO' => 'PhabricatorLiskDAO',
'PhabricatorFactDaemon' => 'PhabricatorDaemon',
'PhabricatorFactManagementAnalyzeWorkflow' => 'PhabricatorFactManagementWorkflow',
'PhabricatorFactManagementDestroyWorkflow' => 'PhabricatorFactManagementWorkflow',
'PhabricatorFactManagementListWorkflow' => 'PhabricatorFactManagementWorkflow',
'PhabricatorFactManagementStatusWorkflow' => 'PhabricatorFactManagementWorkflow',
'PhabricatorFactManagementWorkflow' => 'PhutilArgumentWorkflow',
'PhabricatorFactRaw' => 'PhabricatorFactDAO',
'PhabricatorFactUpdateIterator' => 'PhutilBufferedIterator',
'PhabricatorFeedController' => 'PhabricatorController', 'PhabricatorFeedController' => 'PhabricatorController',
'PhabricatorFeedDAO' => 'PhabricatorLiskDAO', 'PhabricatorFeedDAO' => 'PhabricatorLiskDAO',
'PhabricatorFeedPublicStreamController' => 'PhabricatorFeedController', 'PhabricatorFeedPublicStreamController' => 'PhabricatorFeedController',

View file

@ -0,0 +1,121 @@
<?php
/*
* Copyright 2012 Facebook, Inc.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
final class PhabricatorFactDaemon extends PhabricatorDaemon {
private $engines;
const RAW_FACT_BUFFER_LIMIT = 128;
public function run() {
throw new Exception("This daemon doesn't do anything yet!");
}
public function setEngines(array $engines) {
assert_instances_of($engines, 'PhabricatorFactEngine');
$this->engines = $engines;
return $this;
}
public function processIterator($iterator) {
$result = null;
$raw_facts = array();
foreach ($iterator as $key => $object) {
$raw_facts[$object->getPHID()] = $this->computeRawFacts($object);
if (count($raw_facts) > self::RAW_FACT_BUFFER_LIMIT) {
$this->updateRawFacts($raw_facts);
$raw_facts = array();
}
$result = $key;
}
if ($raw_facts) {
$this->updateRawFacts($raw_facts);
$raw_facts = array();
}
return $result;
}
private function computeRawFacts(PhabricatorLiskDAO $object) {
$facts = array();
foreach ($this->engines as $engine) {
if (!$engine->shouldComputeRawFactsForObject($object)) {
continue;
}
$facts[] = $engine->computeRawFactsForObject($object);
}
return array_mergev($facts);
}
private function updateRawFacts(array $map) {
foreach ($map as $phid => $facts) {
assert_instances_of($facts, 'PhabricatorFactRaw');
}
$phids = array_keys($map);
if (!$phids) {
return;
}
$table = new PhabricatorFactRaw();
$conn = $table->establishConnection('w');
$table_name = $table->getTableName();
$sql = array();
foreach ($map as $phid => $facts) {
foreach ($facts as $fact) {
$sql[] = qsprintf(
$conn,
'(%s, %s, %s, %d, %d, %d)',
$fact->getFactType(),
$fact->getObjectPHID(),
$fact->getObjectA(),
$fact->getValueX(),
$fact->getValueY(),
$fact->getEpoch());
}
}
$table->openTransaction();
queryfx(
$conn,
'DELETE FROM %T WHERE objectPHID IN (%Ls)',
$table_name,
$phids);
if ($sql) {
foreach (array_chunk($sql, 256) as $chunk) {
queryfx(
$conn,
'INSERT INTO %T
(factType, objectPHID, objectA, valueX, valueY, epoch)
VALUES %Q',
$table_name,
implode(', ', $chunk));
}
}
$table->saveTransaction();
}
}

View file

@ -0,0 +1,44 @@
<?php
/*
* Copyright 2012 Facebook, Inc.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/**
* Simple fact engine which counts objects.
*/
final class PhabricatorFactCountEngine extends PhabricatorFactEngine {
public function shouldComputeRawFactsForObject(PhabricatorLiskDAO $object) {
return true;
}
public function computeRawFactsForObject(PhabricatorLiskDAO $object) {
$facts = array();
$phid = $object->getPHID();
$type = phid_get_type($phid);
foreach (array('N:*', 'N:'.$type) as $fact_type) {
$facts[] = id(new PhabricatorFactRaw())
->setFactType($fact_type)
->setObjectPHID($phid)
->setEpoch($object->getDateCreated());
}
return $facts;
}
}

View file

@ -0,0 +1,43 @@
<?php
/*
* Copyright 2012 Facebook, Inc.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
abstract class PhabricatorFactEngine {
final public static function loadAllEngines() {
$classes = id(new PhutilSymbolLoader())
->setAncestorClass(__CLASS__)
->setConcreteOnly(true)
->selectAndLoadSymbols();
$objects = array();
foreach ($classes as $class) {
$objects[] = newv($class['name'], array());
}
return $objects;
}
public function shouldComputeRawFactsForObject(PhabricatorLiskDAO $object) {
return false;
}
public function computeRawFactsForObject(PhabricatorLiskDAO $object) {
return array();
}
}

View file

@ -21,7 +21,7 @@
* for "normal" Lisk objects: objects with an autoincrement ID and a * for "normal" Lisk objects: objects with an autoincrement ID and a
* dateModified column. * dateModified column.
*/ */
final class PhabricatorFactsUpdateIterator extends PhutilBufferedIterator { final class PhabricatorFactUpdateIterator extends PhutilBufferedIterator {
private $cursor; private $cursor;
private $object; private $object;

View file

@ -0,0 +1,47 @@
<?php
/*
* Copyright 2012 Facebook, Inc.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
final class PhabricatorFactManagementAnalyzeWorkflow
extends PhabricatorFactManagementWorkflow {
public function didConstruct() {
$this
->setName('analyze')
->setSynopsis(pht('Manually invoke fact analyzers.'))
->setArguments(array());
}
public function execute(PhutilArgumentParser $args) {
$console = PhutilConsole::getConsole();
$daemon = new PhabricatorFactDaemon(array());
$daemon->setVerbose(true);
$daemon->setEngines(PhabricatorFactEngine::loadAllEngines());
$iterators = array(
new PhabricatorFactUpdateIterator(new DifferentialRevision()),
);
foreach ($iterators as $iterator) {
$daemon->processIterator($iterator);
}
return 0;
}
}

View file

@ -0,0 +1,58 @@
<?php
/*
* Copyright 2012 Facebook, Inc.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
final class PhabricatorFactManagementDestroyWorkflow
extends PhabricatorFactManagementWorkflow {
public function didConstruct() {
$this
->setName('destroy')
->setSynopsis(pht('Destroy all facts.'))
->setArguments(array());
}
public function execute(PhutilArgumentParser $args) {
$console = PhutilConsole::getConsole();
$question = pht(
'Really destroy all facts? They will need to be rebuilt through '.
'analysis, which may take some time.');
$ok = $console->confirm($question, $default = false);
if (!$ok) {
return 1;
}
$tables = array();
$tables[] = new PhabricatorFactRaw();
foreach ($tables as $table) {
$conn = $table->establishConnection('w');
$name = $table->getTableName();
$console->writeOut("%s\n", pht("Destroying table '%s'...", $name));
queryfx(
$conn,
'TRUNCATE TABLE %T',
$name);
}
$console->writeOut("%s\n", pht('Done.'));
}
}

View file

@ -0,0 +1,40 @@
<?php
/*
* Copyright 2012 Facebook, Inc.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
final class PhabricatorFactManagementListWorkflow
extends PhabricatorFactManagementWorkflow {
public function didConstruct() {
$this
->setName('list')
->setSynopsis(pht('Show a list of fact engines.'))
->setArguments(array());
}
public function execute(PhutilArgumentParser $args) {
$console = PhutilConsole::getConsole();
$engines = PhabricatorFactEngine::loadAllEngines();
foreach ($engines as $engine) {
$console->writeOut("%s\n", get_class($engine));
}
return 0;
}
}

View file

@ -0,0 +1,59 @@
<?php
/*
* Copyright 2012 Facebook, Inc.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
final class PhabricatorFactManagementStatusWorkflow
extends PhabricatorFactManagementWorkflow {
public function didConstruct() {
$this
->setName('status')
->setSynopsis(pht('Show status of fact data.'))
->setArguments(array());
}
public function execute(PhutilArgumentParser $args) {
$console = PhutilConsole::getConsole();
$map = array(
'raw' => new PhabricatorFactRaw(),
);
foreach ($map as $type => $table) {
$conn = $table->establishConnection('r');
$name = $table->getTableName();
$row = queryfx_one(
$conn,
'SELECT COUNT(*) N FROM %T',
$name);
$n = $row['N'];
switch ($type) {
case 'raw':
$desc = pht('There are %d raw fact(s) in storage.', $n);
break;
}
$console->writeOut("%s\n", $desc);
}
return 0;
}
}

View file

@ -0,0 +1,26 @@
<?php
/*
* Copyright 2012 Facebook, Inc.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
abstract class PhabricatorFactManagementWorkflow
extends PhutilArgumentWorkflow {
public function isExecutable() {
return true;
}
}

View file

@ -0,0 +1,31 @@
<?php
/*
* Copyright 2012 Facebook, Inc.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
abstract class PhabricatorFactDAO extends PhabricatorLiskDAO {
public function getApplicationName() {
return 'fact';
}
public function getConfiguration() {
return array(
self::CONFIG_TIMESTAMPS => false,
) + parent::getConfiguration();
}
}

View file

@ -0,0 +1,32 @@
<?php
/*
* Copyright 2012 Facebook, Inc.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/**
* Raw fact about an object.
*/
final class PhabricatorFactRaw extends PhabricatorFactDAO {
protected $id;
protected $factType;
protected $objectPHID;
protected $objectA;
protected $valueX;
protected $valueY;
protected $epoch;
}

View file

@ -121,6 +121,11 @@ abstract class PhabricatorBaseEnglishTranslation
'changed %d revision(s), added %d: %s; removed %d: %s' => 'changed %d revision(s), added %d: %s; removed %d: %s' =>
'changed revisions, added %3$s; removed %5$s', 'changed revisions, added %3$s; removed %5$s',
'There are %d raw fact(s) in storage.' => array(
'There is %d raw fact in storage.',
'There are %d raw facts in storage.',
),
); );
} }