Summary:
Ref PHI174. This reverts most of these changes:
- 37843127e9 / D18481
- 94cad30ac3 / D18474
- 12ae08b6b1 / D18473
- 0a01334172 / D18462
- ac91ab1ef9 / D18452
These changes made the Diffusion blame view very similar to GitHub's blame view. See D18452 for a before/after of the bulk of these changes; the other revisions are bugfixes.
I think this was generally a step backward, and not motivated by solving a specific problem. I've found the new UI less usable than the old one, and at least one install (see PHI174) also has.
In particular, the revision/commit titles are very bulky and not terribly useful; the date column also isn't terribly useful; the "age" color actually IS pretty useful and was heavily de-emphasized.
I've kept one bugfix here (missing `'a'` tag type) and kept the upgraded icon for "Skip Past This Commit".
I'm going to follow this up with some additional changes:
- Show a small author profile icon, similar to GitHub, to address PHI174 more directly.
- Try a zebra-stripe on blocks of rows to make it more clear where changes affected by a particular commit begin and end.
- Try a hue shift, not just a brightness/saturation shift, to make the "age" color more distinct.
- Try computing colors as even steps, not based purely on age. Currently, if a file has one long-distant commit and several recent commits, all the recent ones show up as very bright green. I think this would probably be more useful if they were distributed more evenly across the available color bands.
Test Plan:
Viewed blame views in Diffusion, saw a more compact UI similar to the old UI.
{F5251019}
Reviewers: amckinley
Reviewed By: amckinley
Differential Revision: https://secure.phabricator.com/D18746
Summary: Ref T2543. After D18731, Herald build rules run more often, but now incorrectly try to run builds when Diffusion closes a revision because a commit landed.
Test Plan: Made some mundane updates locally; this is tricky to test comprehensively locally so I'm mostly planning to just push it to `secure`.
Reviewers: amckinley
Reviewed By: amckinley
Maniphest Tasks: T2543
Differential Revision: https://secure.phabricator.com/D18745
Summary:
Ref T2543. Instead of autosubmitting revisions to "Needs Review" when builds finish, allow them to be held in "Draft" indefinitely.
There's currently no UI for this. I plan to just expose it as `arc diff --draft` for now, in a followup change.
Test Plan:
- Created a revision (via Conduit) with "hold as draft", saw it hold as draft after builds finished.
- Created a revision (normally), saw it autosubmit after builds finished.
- Requested review of a "hold as draft" revision to kick it out of draft state.
Reviewers: amckinley
Reviewed By: amckinley
Maniphest Tasks: T2543
Differential Revision: https://secure.phabricator.com/D18737
Summary:
Depends on D18740. Prepares `arc` to receive a `--draft` flag by letting us switch to "differential.revision.edit" instead of "differential.createrevision".
To "differential.revision.edit", we need a transaction list, but we can't automatically construct this list from a field map. Return the transaction list alongside the field map.
The next change uses this list (if available) to switch us to the modern API method.
Test Plan: Ran `arc diff` on the experiemntal branch with followup changes, got a new revision.
Reviewers: amckinley
Reviewed By: amckinley
Differential Revision: https://secure.phabricator.com/D18741
Summary: See PHI90. For now, this only provides a limited amount of information, but should satisfy the use case in PHI90 and build toward a more complete version in the future.
Test Plan: Used new Conduit method to retrieve information about diffs.
Reviewers: amckinley
Reviewed By: amckinley
Differential Revision: https://secure.phabricator.com/D18744
Summary:
Ref T2543. Fixes T10109.
Currently, Herald only runs in Differential when a change updates the diff. This is partly for historical reasons, and partly because we don't want to restart builds every time someone makes a comment. However, this behavior is inconsistent with other applications (which always trigger on any change), and occasionally confusing to users (in T10109, for example) or otherwise undesirable.
A similar issue is that T2543 has introduced a "Draft" state, where revisions don't send normal mail until builds finish. This interacts poorly with "Send me an email" rules (which shouldn't do anything here) and particularly with "Send me an email + only run these actions the first time the rule matches", since that might have an effect like "do nothing when the revision is created, then never anything again since you already did nothing once".
To navigate both of these issues, let objects tell Herald that certain actions (like mail or builds) are currently forbidden. If a rule uses a field or action which is currently forbidden, the whole rule automatically fails before it executes, but doesn't count toward "only the first time" as far as Herald's tracking of rule execution is concerned.
Then, forbid mail for draft revisions, and forbid builds for revisions which didn't just get updated. Forbidding mail fixes the issues with "Send me an email" that were created by the introduction of the draft state.
Finally, make Herald run on every revision update, not just substantive updates to the diff. This resolves T10109.
Test Plan:
Created revisions via the draft -> submit workflow, saw different transcripts. Here's a mail action being forbidden for a draft:
{F5237324}
Here's a build action being forbidden for a "mundane" update:
{F5237326}
Reviewers: amckinley
Reviewed By: amckinley
Subscribers: PHID-OPKG-gm6ozazyms6q6i22gyam
Maniphest Tasks: T10109, T2543
Differential Revision: https://secure.phabricator.com/D18731
Summary: Depends on D18734. See PHI176. We run this query on the main Drydock lease web UI, among other places. There is currently no `status` key which can satisfy it.
Test Plan:
Viewed Drydock lease page to get the query.
Ran ##explain SELECT * FROM `drydock_lease` WHERE (status IN ('pending', 'acquired', 'active')) ORDER BY `id` DESC LIMIT 101;## before and after the change.
I don't have a ton of leases locally so the un-key'd EXPLAIN isn't //that// bad, but still shows that we're getting a better key. Before:
```
mysql> explain SELECT * FROM `drydock_lease` WHERE (status IN ('pending', 'acquired', 'active')) ORDER BY `id` DESC LIMIT 101;
+----+-------------+---------------+-------+---------------+---------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------------+-------+---------------+---------+---------+------+------+-------------+
| 1 | SIMPLE | drydock_lease | index | NULL | PRIMARY | 4 | NULL | 101 | Using where |
+----+-------------+---------------+-------+---------------+---------+---------+------+------+-------------+
1 row in set (0.00 sec)
```
After:
```
mysql> explain SELECT * FROM `drydock_lease` WHERE (status IN ('pending', 'acquired', 'active')) ORDER BY `id` DESC LIMIT 101;
+----+-------------+---------------+-------+---------------+------------+---------+------+------+---------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------------+-------+---------------+------------+---------+------+------+---------------------------------------+
| 1 | SIMPLE | drydock_lease | range | key_status | key_status | 130 | NULL | 5 | Using index condition; Using filesort |
+----+-------------+---------------+-------+---------------+------------+---------+------+------+---------------------------------------+
1 row in set (0.00 sec)
```
Reviewers: amckinley
Reviewed By: amckinley
Subscribers: PHID-OPKG-gm6ozazyms6q6i22gyam
Differential Revision: https://secure.phabricator.com/D18735
Summary: See PHI176. Depends on D18733. We issue a query when deleting files that currently doesn't hit any keys.
Test Plan:
Ran `./bin/remove destroy --force --trace F56376` to get the query.
Ran ##SELECT * FROM `file` WHERE storageEngine = 'blob' AND storageHandle = '23366' LIMIT 1## before and after the change.
Before:
```
mysql> explain SELECT * FROM `file` WHERE storageEngine = 'blob' AND storageHandle = '23366' LIMIT 1;
+----+-------------+-------+------+---------------+------+---------+------+-------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+------+---------+------+-------+-------------+
| 1 | SIMPLE | file | ALL | NULL | NULL | NULL | NULL | 33866 | Using where |
+----+-------------+-------+------+---------------+------+---------+------+-------+-------------+
1 row in set (0.01 sec)
```
After:
```
mysql> explain SELECT * FROM `file` WHERE storageEngine = 'blob' AND storageHandle = '23366' LIMIT 1;
+----+-------------+-------+------+---------------+------------+---------+-------------+------+------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+------------+---------+-------------+------+------------------------------------+
| 1 | SIMPLE | file | ref | key_engine | key_engine | 388 | const,const | 190 | Using index condition; Using where |
+----+-------------+-------+------+---------------+------------+---------+-------------+------+------------------------------------+
1 row in set (0.00 sec)
```
Reviewers: amckinley
Reviewed By: amckinley
Differential Revision: https://secure.phabricator.com/D18734
Summary: See PHI176. This is issued periodically by the garbage collector. Normally this table is relatively small-ish so this missing key isn't hugely noticeable.
Test Plan:
Ran `./bin/garbage collect --collector daemon.processes --trace` to get the query the GC runs.
Ran ##DELETE FROM `daemon_logevent` WHERE epoch < 1508443504 LIMIT 100## before and after the key, saw a much better query plan afterward:
Before:
```
mysql> explain DELETE FROM `daemon_logevent` WHERE epoch < 1508443504 LIMIT 100;
+----+-------------+-----------------+------+---------------+------+---------+------+-------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-----------------+------+---------------+------+---------+------+-------+-------------+
| 1 | SIMPLE | daemon_logevent | ALL | NULL | NULL | NULL | NULL | 19325 | Using where |
+----+-------------+-----------------+------+---------------+------+---------+------+-------+-------------+
1 row in set (0.00 sec)
```
After:
```
mysql> explain DELETE FROM `daemon_logevent` WHERE epoch < 1508443504 LIMIT 100;
+----+-------------+-----------------+-------+---------------+-----------+---------+-------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-----------------+-------+---------------+-----------+---------+-------+------+-------------+
| 1 | SIMPLE | daemon_logevent | range | key_epoch | key_epoch | 4 | const | 1 | Using where |
+----+-------------+-----------------+-------+---------------+-----------+---------+-------+------+-------------+
1 row in set (0.00 sec)
```
Reviewers: amckinley
Reviewed By: amckinley
Differential Revision: https://secure.phabricator.com/D18733
Summary: See PHI176. We issue a query with only `artifactIndex` from `BuildTarget`, but don't have an applicable key.
Test Plan: This isn't on the normal Harbormaster execution path so I'm not 100% sure I have a local repro, but will confirm with customer.
Reviewers: amckinley
Reviewed By: amckinley
Subscribers: PHID-OPKG-gm6ozazyms6q6i22gyam
Differential Revision: https://secure.phabricator.com/D18732
Summary: See PHI177. Ref T12974. PonderQuestion was overlooked during the Ferret engine conversions.
Test Plan:
Ran migrations, searched for questions, got results:
{F5241185}
Reviewers: amckinley
Reviewed By: amckinley
Maniphest Tasks: T12974
Differential Revision: https://secure.phabricator.com/D18736
Summary:
Ref T2543. When Harbormaster finishes builds and promotes a draft revision to review, we currently publish "Harbormaster requested review of...".
Instead, attribute this action to the author, since that's more natural and more useful.
Test Plan: Promoted a diff locally, saw it attributed to me rather than Harbormaster.
Reviewers: amckinley
Reviewed By: amckinley
Maniphest Tasks: T2543
Differential Revision: https://secure.phabricator.com/D18730
Summary:
See PHI165. See D18715. These objects (projects, blogs) also need implementations now.
(I thought about making this method `abstract` or doing try/catch to maybe make this more robust, but I think this should be the end of it, and those changes have mild complexity/compatibility/risk issues.)
Test Plan: Changed `bin/search index` to index only one document of each type, ran `bin/search index --all --force`, saw no more comment-related errors.
Reviewers: amckinley
Reviewed By: amckinley
Differential Revision: https://secure.phabricator.com/D18729
Summary:
Ref T12974. Ferret object queries SELECT a virtual "_ft_rank" column for relevance ordering.
Currently, they always SELECT this column. That's fine and doesn't hurt anything, but makes developing and debugging things kind of a pain since every query has this `, blah blah _ft_rank` junk.
Instead, construct this column only if we're actually going to use it.
Mostly, this cleans up DarkConsole / query logs a bit.
Test Plan:
Viewed normal query results on various pages, viewed global search results, ordered Maniphest tasks by normal stuff and by "Relevance".
Viewed DarkConsole, saw no more "_ft_rank" junk on normal pages.
Reviewers: amckinley
Reviewed By: amckinley
Maniphest Tasks: T12974
Differential Revision: https://secure.phabricator.com/D18728
Summary: Ref T12680. Use the slightly sleeker construction from D18722 in Differential.
Test Plan: Viewed revision list, reordered by date modified.
Reviewers: amckinley
Reviewed By: amckinley
Maniphest Tasks: T12680
Differential Revision: https://secure.phabricator.com/D18727
Summary:
Ref T12680. See PHI167. See that task for discussion.
Rewrite `DiffusionCommitQuery` to work more like `DifferentialRevisionQuery`, and use a UNION to find "all revisions you need to audit OR respond to".
I tried to get this working a little more cleanly than RevisionQuery does, and can probably simplify that now.
Test Plan: Poked at the UI locally without hitting any apparent issues, but my local data is pretty garbage at this point. I'll take a look at how the query plans work on `secure`.
Reviewers: amckinley
Reviewed By: amckinley
Maniphest Tasks: T12680
Differential Revision: https://secure.phabricator.com/D18722
Summary:
Ref T2543. When a revision is created, we check if any builds are waiting/failed, and submit it for review immediately if we aren't waiting for anything.
In doing this, we ignore builds with only autotargets, since these are client-side and failures from local `arc lint` / `arc unit` should not count (the user has already chosen to ignore/skip them).
The way we do this has some issues:
- Herald may have started builds, but they may still be PENDING and not have any targets yet. In this case, we'll see "no non-autotargets" and ignore the build, which is wrong.
- We have to load targets but don't really care about them, which is more work than we really need to do.
- And it's kind of complex, too.
Instead, just let `BuildQuery` filter out "autobuilds" (builds generated from autoplans) with a JOIN.
Test Plan: Ran `arc diff` with builds configured, got a clean "Draft" state instead of an incorrect promotion directly to "Needs Review".
Reviewers: amckinley
Reviewed By: amckinley
Subscribers: PHID-OPKG-gm6ozazyms6q6i22gyam
Maniphest Tasks: T2543
Differential Revision: https://secure.phabricator.com/D18721
Summary:
See PHI158. In the RefEngine, we test if any old branch positions have been removed from the repository. This is uncommon (but not impossible) in Mercurial, and corresponds to users deleting branches in Git.
Currently, we end up running `hg log` for each position, in parallel. Because of Python's large startup overhead, this can be resource intensive for repositories with a large number of branches.
We have to do this in the general case because the caller may be asking us to resolve `tip`, `newfeature`, `tip~3`, `9`, etc. However, in the specific case where the refs are 40-digit hashes, we can bulk resolve them if they exist, like this:
```
hg log ... --rev (abcd or def0 or ab12 or ...)
```
In the general case, we could probably do less of this than we currently do (instead of testing all old heads, we could prune the list by removing commits which we know are still pointed to by current heads) but that's a slightly more involved change and the effect here is already dramatic.
Test Plan:
Verified that CPU usage drops from ~110s -> ~0.9s:
Before:
```
epriestley@orbital ~/dev/phabricator $ time ./bin/repository refs nss
Updating refs in "nss"...
Done.
real 0m14.676s
user 1m24.714s
sys 0m21.645s
```
After:
```
epriestley@orbital ~/dev/phabricator $ time ./bin/repository refs nss
Updating refs in "nss"...
Done.
real 0m0.861s
user 0m0.882s
sys 0m0.213s
```
- Manually resolved `blue`, `tip`, `9`, etc., got expected results.
- Tried to resolve invalid hashes, got expected result (no resolution).
Reviewers: amckinley
Reviewed By: amckinley
Differential Revision: https://secure.phabricator.com/D18717
Summary: Most of this document is no longer relevant, since we're happy to work on prototypes if you're paying us and no longer have any meaningful free support.
Test Plan: Read document.
Reviewers: amckinley
Reviewed By: amckinley
Differential Revision: https://secure.phabricator.com/D18719
Summary: See rPcd14194a329788d5fff6365bcade278fd18f3612 for a similar change. Implement `getApplicationTransactionCommentObject()` to return `null` explicitly.
Test Plan: Ran `bin/search index --type ownerspackage`, got indexing after change.
Reviewers: amckinley
Reviewed By: amckinley
Differential Revision: https://secure.phabricator.com/D18715
Summary: Ref T2543. When revisions are in the draft state, tell the user what we're waiting for or why they aren't moving forward.
Test Plan: {F5228840}
Reviewers: amckinley
Reviewed By: amckinley
Maniphest Tasks: T2543
Differential Revision: https://secure.phabricator.com/D18714
Summary: Ref T2543. This is a less ambitious version of the rule in D18628, which I backed off from, since I think this probably still has a fair number of loose ends to tie up.
Test Plan: Created a revision locally.
Reviewers: amckinley
Reviewed By: amckinley
Maniphest Tasks: T2543
Differential Revision: https://secure.phabricator.com/D18713
Summary:
Ref T13000. Since other changes have generally made the ngrams table manageable, I'm not planning to enable common ngrams by default at this time.
Instead, make the threshold configurable with "--threshold" so we can guide installs through tuning this if they want (e.g. PHI110), and tune hosted instances.
(This might eventually become automatic, but just smoothing this bit off for now feels reasonable to me.)
Test Plan: Ran with `--reset`, and with various invalid and valid `--threshold` arguments.
Reviewers: amckinley
Reviewed By: amckinley
Maniphest Tasks: T13000
Differential Revision: https://secure.phabricator.com/D18710
Summary:
Ref T13008. We haven't hit any issues with this, but I can imagine we might in the future.
When one host makes an intracluster request to another host, the `$viewer` ends up as the omnipotent viewer. This viewer isn't logged in, so they'll currently accumulate rate limit points at a high rate.
Instead, don't give them any points. These requests are always legitimate, and if they originated from a user request, that request should be the one getting rate limited.
Test Plan: Browsed around.
Reviewers: amckinley
Reviewed By: amckinley
Maniphest Tasks: T13008
Differential Revision: https://secure.phabricator.com/D18708
Summary:
Ref T13000. The new approach for dumping database-by-database means that we don't get CREATE DATABASE or USE statements, which makes importing the dump again inconvenient.
Manually stitch these into the dump.
Test Plan:
- Used `bin/storage dump --namespace ...` to dump a smaller local instance.
- Used `bin/storage destroy --namespace ...`, to destroy the namespace, then inported the dump cleanly.
- Verified that each CREATE DATABASE statement appears only once.
- Verified that `bin/storage renamespace --live` can correctly process this file.
Reviewers: amckinley
Reviewed By: amckinley
Maniphest Tasks: T13000
Differential Revision: https://secure.phabricator.com/D18707
Summary: See PHI147.
Test Plan: Called the method from the web UI, got sensible results.
Reviewers: amckinley
Reviewed By: amckinley
Differential Revision: https://secure.phabricator.com/D18706
Summary:
Depends on D18702. Ref T13008. This replaces the old hard-coded single rate limit with multiple flexible limits, and defines two types of limits:
- Rate: reject requests if a client has completed too many requests recently.
- Connection: reject requests if a client has too many more connections than disconnections recently.
The connection limit adds +1 to the score for each connection, then adds -1 for each disconnection. So the overall number is how many open connections they have, at least approximately.
Supporting multiple limits will let us do limiting by Hostname and by remote address (e.g., a specific IP can't exceed a low limit, and all requests to a hostname can't exceed a higher limit).
Configuring the new limits looks something like this:
```
PhabricatorStartup::addRateLimit(new PhabricatorClientRateLimit())
->setLimitKey('rate')
->setClientKey($_SERVER['REMOTE_ADDR'])
->setLimit(5);
PhabricatorStartup::addRateLimit(new PhabricatorClientConnectionLimit())
->setLimitKey('conn')
->setClientKey($_SERVER['REMOTE_ADDR'])
->setLimit(2);
```
Test Plan:
- Configured limits as above.
- Made a lot of requests, got cut off by the rate limit.
- Used `curl --limit-rate -F 'data=@the_letter_m.txt' ...` to upload files really slowly. Got cut off by the connection limit. With `enable_post_data_reading` off, this correctly killed the connections //before// the uploads finished.
- I'll send this stuff to `secure` before production to give it more of a chance.
Reviewers: amckinley
Reviewed By: amckinley
Maniphest Tasks: T13008
Differential Revision: https://secure.phabricator.com/D18703
Summary:
Ref T13008. Depends on D18701. The overall goal here is to make turning `enable_post_data_reading` off not break things, so we can run rate limiting checks before we read file uploads.
The biggest blocker for this is that turning it off stops `$_FILES` from coming into existence.
This //appears// to mostly work. Specifically:
- Skip the `max_post_size` check when POST is off, since it's meaningless.
- Don't read or scrub $_POST at startup when POST is off.
- When we rebuild REQUEST and POST before processing requests, do multipart parsing if we need to and rebuild FILES.
- Skip the `is_uploaded_file()` check if we built FILES ourselves.
This probably breaks a couple of small things, like maybe `__profile__` and other DarkConsole triggers over POST, and probably some other weird stuff. The parsers may also need more work than they've received so far.
I also need to verify that this actually works (i.e., lets us run code without reading the request body) but I'll include that in the change where I update the actual rate limiting.
Test Plan:
- Disabled `enable_post_data_reading`.
- Uploaded a file with a vanilla upload form (project profile image).
- Uploaded a file with drag and drop.
- Used DarkConsole.
- Submitted comments.
- Created a task.
- Browsed around.
Reviewers: amckinley
Reviewed By: amckinley
Maniphest Tasks: T13008
Differential Revision: https://secure.phabricator.com/D18702
Summary:
Ref T13000. Garbage collecting common ngrams is slow because MySQL isn't all that great at deleting rows quickly. See PHI96, where it looks like it's going to take a week to GC ngrams for a ~million objects at a relatively conservative 0.15 threshold.
In the event of a restore, we can reduce the impact by persisting this table so the ngrams just don't get built when the reindex happens.
Test Plan: Viewed schema in Config, saw common ngrams tables marked as "Data" instead of "Index".
Reviewers: amckinley
Reviewed By: amckinley
Maniphest Tasks: T13000
Differential Revision: https://secure.phabricator.com/D18696
Summary: Noticed a couple of typos in the docs, and then things got out of hand.
Test Plan:
- Stared at the words until my eyes watered and the letters began to swim on the screen.
- Consulted a dictionary.
Reviewers: #blessed_reviewers, epriestley
Reviewed By: #blessed_reviewers, epriestley
Subscribers: epriestley, yelirekim, PHID-OPKG-gm6ozazyms6q6i22gyam
Differential Revision: https://secure.phabricator.com/D18693
Summary:
Ref PHI109. Ref T11786. We currently test elapsed time every 64 iterations (since iterations are normally very fast), but at least one install is seeing the page timeout after 30 seconds.
One reason could be that cache fills may occur, and are likely to be much slower than normal iterations. In an extreme case, we could do 64 cache fills before checking the time. Tweak thing so that we always check the time after doing a cache fill, regardless of how many iterations have elapsed since the last attempt.
Additionally, this API method currently accepts an arbitrary number of paths, but implicitly limits each cache query to 500ms. If more than 60 paths are passed, this may exceed 30s. Only let the cache churn for a maximum of 10s across all paths.
If this is more the latter issue than the former, this might replace the GraphCache timeouts with `git` timeouts, but at least our understanding of what's going on here will improve.
Test Plan: This is difficult to test convincingly locally, since I can't reproduce the original issue. It still works after these changes, but it worked fine before these changes too.
Reviewers: amckinley
Reviewed By: amckinley
Maniphest Tasks: T11786
Differential Revision: https://secure.phabricator.com/D18692
Summary:
Ref T13000. Currently, queries can only be executed from the web UI, which requires logging in as a user. I really want to avoid doing that wherever we can, but being able to execute queries on an instance (and, particularly, see the ngrams and timings on the underlying lookups) would have been helpful in several cases.
Improve tooling a bit in advance of the "common ngrams" stuff going out since it seems likely that it will be useful if issues arise.
Test Plan: Ran `bin/search query --query ...`, got useful minimal output. Ran with `--trace` to get internals.
Reviewers: amckinley
Reviewed By: amckinley
Maniphest Tasks: T13000
Differential Revision: https://secure.phabricator.com/D18690
Summary:
Ref T13000. After an ngram is marked as "common", we can delete it from the storage table.
Currently, the only way to get ngrams marked as "common" is to manually run `bin/search ngrams`, so this has no impact on normal installs.
Test Plan: Ran `bin/garbage collect`, saw it start chewing through my local Maniphest ngrams table and removing common ngrams.
Reviewers: amckinley
Reviewed By: amckinley
Maniphest Tasks: T13000
Differential Revision: https://secure.phabricator.com/D18687
Summary:
Ref T13000. This marks each table as either "data" (normal data), "cache" (automatically rebuilt, no need to ever dump) or "index" (can be manually rebuilt).
By default, `bin/storage dump` dumps data and index tables, but not cache tables.
With `--no-indexes`, it dumps only data tables. Indexes can be rebuilt after a restore with `bin/search index --all ...`.
Test Plan:
- Ran `--no-indexes` and normal dumps with `--trace`, verified that cache and index (former case) or cache only (latter case) tables were dumped with `--no-data`.
- Verified dump has the same number of `CREATE TABLE` statements as before the changes.
- Reviewed persistence tags in the web UI (note Ferret engine tables are "Index"):
{F5210886}
Reviewers: amckinley
Reviewed By: amckinley
Maniphest Tasks: T13000
Differential Revision: https://secure.phabricator.com/D18682
Summary:
Ref T13000. This allows us to be more selective about which tables we dump data for, to reduce the size of backups and exports. The immediate goal is to make large `ngrams` tables more manageable in the cluster, but this generally makes all backups and exports faster and easier.
Here, tables are dumped one at a time. A followup change will sometimes add the `--no-data` flag, to skip dumping readthrough caches and (optionally) rebuildable indexes.
Test Plan: Compared a dump from `master` and from this branch, found them to be essentially identical. The new dump has a little more header information in each section. Verified each contains the same number of `CREATE TABLE` statements.
Reviewers: amckinley
Reviewed By: amckinley
Maniphest Tasks: T13000
Differential Revision: https://secure.phabricator.com/D18679
Summary: See PHI118. Enables hovercards to support peeking at tags and other details if you, e.g., create numerous identical subtasks of each task.
Test Plan: {F5210816}
Reviewers: amckinley
Reviewed By: amckinley
Differential Revision: https://secure.phabricator.com/D18681
Summary:
If `account.editable` is set to false, we try to add a `null` button and fatal:
> Argument 1 passed to PHUIHeaderView::addActionLink() must be an instance of PHUIButtonView, null given, called in /srv/phabricator/phabricator/src/applications/settings/panel/PhabricatorSettingsPanel.php on line 290
Instead, don't try to render `null` as a button.
Test Plan:
- Configured `account.editable` false.
- Viewed email address settings.
- Before: fatal.
- After: page works, no button is provided.
Reviewers: amckinley
Reviewed By: amckinley
Differential Revision: https://secure.phabricator.com/D18677
Summary:
Depends on D18672. Ref T13000. This does an on-demand build of the common ngrams table.
Plan here is:
- Push to `secure`.
- Build the common ngrams table here.
- See if stuff breaks?
If it looks okay on this dataset, we can build out the GC support and try it in production.
Test Plan:
- Locally, my dataset has a bunch of `bin/lipsum` tasks with similar, common words.
- Verified that ipsum terms now skip ngrams. For "lorem ipsum" search performance actually IMPROVED by skipping the ngrams table (12s to 9s).
- Queried for normal terms, got very fast results using the ngram table, as normal.
Reviewers: amckinley
Reviewed By: amckinley
Maniphest Tasks: T13000
Differential Revision: https://secure.phabricator.com/D18673
Summary:
Ref T13000. This adds support for tracking "common" ngrams, which occur in too many documents to be useful as part of the ngram index.
If an ngram is listed in the "common" table, it won't be written when indexing documents, or queried for when searching for them.
In this change, nothing actually writes to the "common" table. I'll start writing to the table in a followup change.
Specifically, I plan to do this:
- A new GC process updates the "common" table periodically, by writing ngrams which appear in more than X% of documents to it, for some value of X, if there are at least a minimum number of documents (maybe like 4,000).
- A new GC process deletes ngrams that have been added to the common table from the existing indexes.
Hopefully, this will pare down the ngrams index to something reasonable over time without requiring any manual tuning.
Test Plan:
- Ran some queries and indexes.
- Manually inserted ngrams `xxx` and `yyy` into the ngrams table, searched and indexed, saw them ignored as viable ngrams for search/index.
Reviewers: amckinley
Reviewed By: amckinley
Maniphest Tasks: T13000
Differential Revision: https://secure.phabricator.com/D18672
Summary:
See PHI112. The install presumably wants to generate links to Diffusion commits from an external tool, but only knows the short name of the repository.
Provide a `/source/phabricator/commit/abcdef908273` URI which redirects to the canonical URI for the commit.
Test Plan:
- Visited `/source/` URI for a commit, got a redirect.
- Visited normal URI for a commit, got a commit page.
- Visited `/branches/` and `/tags/` for a `/source/` repository, got proper pages.
Reviewers: amckinley
Reviewed By: amckinley
Differential Revision: https://secure.phabricator.com/D18676