phorge-phorge

mirror of https://we.phorge.it/source/phorge.git synced 2024-11-29 10:12:41 +01:00

Author	SHA1	Message	Date
Joshua Spence	d6b882a804	Fix visiblity of `LiskDAO::getConfiguration()` Summary: Ref T6822. Test Plan: `grep` Reviewers: epriestley, #blessed_reviewers Reviewed By: epriestley, #blessed_reviewers Subscribers: hach-que, Korvin, epriestley Maniphest Tasks: T6822 Differential Revision: https://secure.phabricator.com/D11370	2015-01-14 06:54:13 +11:00
epriestley	8fa8415c07	Automatically build all Lisk schemata Summary: Ref T1191. Now that the whole database is covered, we don't need to do as much work to build expected schemata. Doing them database-by-database was helpful in converting, but is just reudndant work now. Instead of requiring every application to build its Lisk objects, just build all Lisk objects. I removed `harbormaster.lisk_counter` because it is unused. It would be nice to autogenerate edge schemata, too, but that's a little trickier. Test Plan: Database setup issues are all green. Reviewers: btrahan Reviewed By: btrahan Subscribers: epriestley, hach-que Maniphest Tasks: T1191 Differential Revision: https://secure.phabricator.com/D10620	2014-10-02 09:51:20 -07:00
epriestley	300172e799	Support AUTO_INCREMENT in `bin/storage adjust` Summary: Ref T1191. When changing the column type of an AUTO_INCREMENT column, we currently may lose the autoincrement attribute. Instead, support it. This is a bit messy because AUTO_INCREMENT columns interact with PRIMARY KEY columns (tables may only have one AUTO_INCREMENT column, and it must be a primary key). We need to migrate in more phases to avoid this issue. Introduce new `auto` and `auto64` types to represent autoincrement IDs. Test Plan: - Saw autoincrement show up correctly in web UI. - Fixed an autoincrement issue on the XHProf storage table with `bin/storage adjust` safely. Reviewers: btrahan Reviewed By: btrahan Subscribers: epriestley Maniphest Tasks: T1191 Differential Revision: https://secure.phabricator.com/D10607	2014-10-01 08:24:51 -07:00
epriestley	0d7489da79	Provide `bin/storage quickstart` to automate generation of `quickstart.sql` Summary: Ref T1191. Currently, the `quickstart.sql` gets generated in a pretty manual fashion. This is a pain, and will become more of a pain in the world of utf8mb4. Provide a workflow which does upgrade + adjust + dump + destroy, then massages the output to produce a workable `quickstart.sql`. Test Plan: Inspected output; I'll test this more throughly before actually generating a new quickstart, but that's some ways away. Reviewers: btrahan Reviewed By: btrahan Subscribers: epriestley Maniphest Tasks: T1191 Differential Revision: https://secure.phabricator.com/D10603	2014-10-01 08:22:37 -07:00
epriestley	dc8b2ae6d2	Generate expected schemata for Fact, Owners, Herald and Diviner Summary: Ref T1191. Notable: - `HeraldApplyTranscript` is not actually a DAO and has no table (it is serialized into HeraldTranscript). Test Plan: Down to fewer than 300 issues. Reviewers: btrahan Reviewed By: btrahan Subscribers: epriestley Maniphest Tasks: T1191 Differential Revision: https://secure.phabricator.com/D10588	2014-10-01 07:53:12 -07:00
vrana	ef85f49adc	Delete license headers from files Summary: This commit doesn't change license of any file. It just makes the license implicit (inherited from LICENSE file in the root directory). We are removing the headers for these reasons: - It wastes space in editors, less code is visible in editor upon opening a file. - It brings noise to diff of the first change of any file every year. - It confuses Git file copy detection when creating small files. - We don't have an explicit license header in other files (JS, CSS, images, documentation). - Using license header in every file is not obligatory: http://www.apache.org/dev/apply-license.html#new. This change is approved by Alma Chao (Lead Open Source and IP Counsel at Facebook). Test Plan: Verified that the license survived only in LICENSE file and that it didn't modify externals. Reviewers: epriestley, davidrecordon Reviewed By: epriestley CC: aran, Korvin Maniphest Tasks: T2035 Differential Revision: https://secure.phabricator.com/D3886	2012-11-05 11:16:51 -08:00
epriestley	f0af273165	Add FactCursors and application fact datasources Summary: - Add PhabricatorApplication. This is a general class that I have grand designs for, but used here to allow applications to provide objects for analysis by the facts appliction. - Add FactCursors, to keep track of where iterators are. - Make the daemon do something sort of useful. - Add `bin/fact cursors` for showing and managing objects and cursors. - Add some options to `bin/fact analyze`. Test Plan: - `bin/fact cursors`, `bin/fact cursors --reset DifferentialRevision`, `bin/fact cursors --reset X` - `bin/fact analyze`, `bin/fact analyze --all`, `bin/fact analyze --iterator DifferentialRevision --skip-aggregates` - `bin/phd debug fact` Reviewers: vrana, btrahan Reviewed By: btrahan CC: aran Maniphest Tasks: T1562 Differential Revision: https://secure.phabricator.com/D3098	2012-07-30 10:43:49 -07:00
epriestley	486f7c1e8e	Add aggregated facts to the Facts application Summary: Some facts are aggregations of other facts. For example, we may compute how many times each macro is used in each object as a "raw fact": Dnnn uses macro "psyduck" 6 times. But we want to present this data in aggregate form, e.g. "order macros by popularity". We can do this at runtime and it probably won't be too awful a query, but we can also aggregate it cheaply: Macro "psyduck" is used 3920 times across all objects. ...and then do a query like "select macros ordered by usage". "Aggregate" facts support facts like this. The aggregate facts I've implemented are: - Count of all objects. - Count of objects of type X. - Last time facts were updated. These clearly fit the "aggregate" facts template well. I'm not 100% sure macros do. We can use this table to answer a question like "What are the most popular macros, ordered by use?" We can also use it to answer a question like "What are the most popular macros in the last 6 months?", if we build a specific fact for that. But we can't use it to answer a question like "What are the most popular macros between times X and Y?". Maybe that's important; maybe not. This seems like a good fit for at least some types of facts. I'll de-magic the keys a bit in the next diff. Test Plan: Ran the engines and got some aggregated facts about other facts. Reviewers: vrana, btrahan Reviewed By: vrana CC: aran Maniphest Tasks: T1562 Differential Revision: https://secure.phabricator.com/D3089	2012-07-27 13:46:01 -07:00
epriestley	7c934e4176	Add a basic "fact" application Summary: Basic "Fact" application with some storage, part of a daemon, and a control binary. = Goals = The general idea is that we have various statistics we'd like to compute, like the frequency of image macros, reviewer responsiveness, task close rates, etc. Computing these on page load is expensive and messy. By building an ETL pipeline and running it in a daemon, we can precompute statistics and just pull them out of "stats" tables. One way to do this is just to completely hard-code everything, e.g. have a daemon that runs every hour which issues a big-ass query and dumps results into a table per-fact or per fact-group. But this has a bunch of drawbacks: adding new stuff to the pipeline is a pain, various fact aggregators can't share much code, updates are slow and expensive, we can never build generic graphs on top of it, etc. I'm hoping to build an ETL pipeline which is generic enough that we can use it for most things we're interested in without needing schema changes, and so that installs can use it also without needing schema changes, while still being specific enough that it's fast and we can build useful stuff on top of it. I'm not sure if this will actually work, but it would be cool if it does so I'm starting pretty generally and we'll see how far I get. I haven't built this exact sort of thing before so I might be way off. I'm basing the whole thing on analyzing entire objects, not analyzing changes to objects. So each part of the pipeline is handed an object and told "analyze this", not handed a change. It pretty much deletes all the old data about that thing and then writes new data. I think this is simpler to implement and understand, and it protects us from all sorts of weird issues where we end up with some kind of garbage in the DB and have to wipe the whole thing. = Facts = The general idea is that we extract "facts" out of objects, and then the various view interfaces just report those facts. This change has on type of fact, a "raw fact", which is directly derived from an object. These facts are concerete and relate specifically to the object they are derived from. Some examples of such facts might be: D123 has 9 comments. D123 uses macro "psyduck" 15 times. D123 adds 35 lines. D123 has 5 files. D123 has 1 object. D123 has 1 object of type "DREV". D123 was created at epoch timestamp 89812351235. D123 was accepted by @alincoln at epoch timestamp 8397981839. The fact storage looks like this: <factType, objectPHID, objectA, valueX, valueY, epoch> Currently, we supprot one optional secondary key (like a user PHID or macro PHID), two optional integer values, and an optional timestamp. We might add more later. Each fact type can use these fields if it wants. Some facts use them, others don't. For instance, this diff adds a "N:" fact, which is just the count of total objects in the system. These facts just look like: <"N:", "PHID-xxxx-yyyy", ...> ...where all other fields are ignored. But some of the more complex facts might look like: <"DREV:accept", "PHID-DREV-xxxx", "PHID-USER-yyyy", ..., ..., nnnn> # User 'yyyy' accepted at epoch 'nnnn'. <"FILE:macro", "PHID-DREV-xxxx", "PHID-MACR-yyyy", 17, ..., ...> # Object 'xxxx' uses macro 'yyyy' 17 times. Facts have no uniqueness constraints. For @vrana's reviewer responsiveness stuff, we can insert multiple rows for each reviewer, e.g. <"DREV:reviewed", "PHID-DREV-xxxx", "PHID-USER-yyyy", nnnn, ..., mmmm> # User 'yyyy' reviewed revision 'xxxx' after 'nnnn' seconds at 'mmmm'. The second value (valueY) is mostly because we need it if we sample anything (valueX = observed value, valueY = sample rate) but there might be other uses. We might need to add "objectB" at some point too -- currently we can't represent a fact like "User X used macro Y on revision Z", so it would be impossible to compute macro use rates //for a specific user// based on this schema. I think we can start here though and see how far we get. = Aggregated Facts = These aren't implemented yet, but the idea is that we can then take the "raw facts" and compute derived/aggregated/rollup facts based on the raw fact table. For example, the "count" fact can be aggregated to arrive at a count of all objects in the system. This stuff will live in a separate table which does have uniqueness constraints, and come in the next diff. We might need some kind of time series facts too, not sure about that. I think most of our use cases today are covered by raw facts + aggregated facts. Test Plan: Ran `bin/fact` commands and verified they seemed to do reasonable things. Reviewers: vrana, btrahan Reviewed By: vrana CC: aran, majak Maniphest Tasks: T1562 Differential Revision: https://secure.phabricator.com/D3078	2012-07-27 13:34:21 -07:00

9 commits