1
0
Fork 0
mirror of https://we.phorge.it/source/phorge.git synced 2024-09-21 01:38:48 +02:00
Commit graph

15903 commits

Author SHA1 Message Date
epriestley
43cf4edfb1 When waiting for long-running Harbormaster futures to resolve, close idle database connections
Summary:
Ref T13216. See PHI916. Harbormaster builds may be long-running, particularly if they effectively wrap `ssh ... ./run-huge-build.sh`. If we spend more than a few seconds waiting for futures to resolve, close idle database connections.

The general goal here is to reduce the held connection load for installs with a very large number of test runners.

Test Plan: Added debugging code to `phlog()` closures, saw connections closed while running builds.

Reviewers: amckinley

Reviewed By: amckinley

Subscribers: PHID-OPKG-gm6ozazyms6q6i22gyam

Maniphest Tasks: T13216

Differential Revision: https://secure.phabricator.com/D19824
2018-11-21 07:53:40 -08:00
epriestley
a0d4b6da4b Support (but do not actually enable) a maximum file size limit for Git repositories
Summary:
Depends on D19816. Ref T13216. See PHI908. See PHI750. In a few cases, users have pushed multi-gigabyte files full of various things that probably shouldn't be version controlled. This tends to create various headaches.

Add support for limiting the maximum size of any object. Specifically, we:

  - list all the objects each commit touches;
  - check their size after the commit applies;
  - if it's over the limit, reject the commit.

This change doesn't actually hook the limit up (the limit is always "0", i.e. unlimited), and doesn't have Mercurial or SVN support. The actual parser bit would probably be better in some other `Query/Parser` class eventually, too. But it at least roughly works.

Test Plan:
Changed the hard-coded limit to other values, tried to push stuff, got sensible results:

```
$ echo pew >> magic_missile.txt && git commit -am pew && git push
[master 98d07af] pew
 1 file changed, 1 insertion(+)
# Push received by "local.phacility.net", forwarding to cluster host.
# Acquiring write lock for repository "spellbook"...
# Acquired write lock immediately.
# Acquiring read lock for repository "spellbook" on device "local.phacility.net"...
# Acquired read lock immediately.
# Device "local.phacility.net" is already a cluster leader and does not need to be synchronized.
# Ready to receive on cluster host "local.phacility.net".
Counting objects: 49, done.
Delta compression using up to 8 threads.
Compressing objects: 100% (48/48), done.
Writing objects: 100% (49/49), 3.44 KiB | 1.72 MiB/s, done.
Total 49 (delta 30), reused 0 (delta 0)
remote: +---------------------------------------------------------------+
remote: |      * * * PUSH REJECTED BY EVIL DRAGON BUREAUCRATS * * *     |
remote: +---------------------------------------------------------------+
remote:              \
remote:               \                    ^    /^
remote:                \                  / \  // \
remote:                 \   |\___/|      /   \//  .\
remote:                  \  /V  V  \__  /    //  | \ \           *----*
remote:                    /     /  \/_/    //   |  \  \          \   |
remote:                    @___@`    \/_   //    |   \   \         \/\ \
remote:                   0/0/|       \/_ //     |    \    \         \  \
remote:               0/0/0/0/|        \///      |     \     \       |  |
remote:            0/0/0/0/0/_|_ /   (  //       |      \     _\     |  /
remote:         0/0/0/0/0/0/`/,_ _ _/  ) ; -.    |    _ _\.-~       /   /
remote:                     ,-}        _      *-.|.-~-.           .~    ~
remote:   *     \__/         `/\      /                 ~-. _ .-~      /
remote:    \____(Oo)            *.   }            {                   /
remote:    (    (..)           .----~-.\        \-`                 .~
remote:    //___\\  \ DENIED!  ///.----..<        \             _ -~
remote:   //     \\                ///-._ _ _ _ _ _ _{^ - - - - ~
remote:
remote:
remote: OVERSIZED FILE
remote: This repository ("spellbook") is configured with a maximum individual file size limit, but you are pushing a change ("98d07af863e799509e7c3a639404d216f9fc79c7") which causes the size of a file ("magic_missile.txt") to exceed the limit. The commit makes the file 317 bytes long, but the limit for this repository is 1 bytes.
remote:
# Released cluster write lock.
To ssh://local.phacility.com/source/spellbook.git
 ! [remote rejected] master -> master (pre-receive hook declined)
error: failed to push some refs to 'ssh://epriestley@local.phacility.com/source/spellbook.git'
```

Reviewers: amckinley

Reviewed By: amckinley

Subscribers: joshuaspence

Maniphest Tasks: T13216

Differential Revision: https://secure.phabricator.com/D19817
2018-11-20 08:04:17 -08:00
epriestley
ab14f49ef8 On the Diffusion cluster status page, improve device sort order
Summary:
Ref T13216. See PHI943. When you have a large number of cluster bindings for a repository, the UI sorting can be a bit hard to manage.

One install that regularly cycles repository cluster devices had a couple dozen older disabled bindings, with the enabled bindings intermingled.

Sort the UI:

  - enabled devices come first;
  - in each group, sort by name.

Test Plan: Mixed disabled/enabled bindings, loaded {nav Diffusion > Repository > Storage} page with clustering configured. Before: relatively unhelpful sort order. After: more intuitive sort order.

Reviewers: amckinley

Reviewed By: amckinley

Maniphest Tasks: T13216

Differential Revision: https://secure.phabricator.com/D19813
2018-11-20 08:03:31 -08:00
epriestley
4967cd6ab9 Fix some "%Q" behavior in PhortuneMerchantQuery
Summary: Ref T13217. This older query does some manual joins; update it for more modern joins.

Test Plan: Ran `instances/` unit tests and got a clean result, browsed Phortune merchants.

Reviewers: amckinley

Reviewed By: amckinley

Maniphest Tasks: T13217

Differential Revision: https://secure.phabricator.com/D19820
2018-11-20 07:59:57 -08:00
epriestley
9481b9eff1 Allow "Can Configure Application" permissions to be configured
Summary:
Ref T13216. See PHI980. Currently, each application in {nav Applications > X > Configure} has a "Can Configure Application" permission which is hard-coded to "Administrators".

There's no technical reason for this, there just hasn't been a great use case for unlocking it. I think when I originally wrote it our protections against locking yourself out of things weren't that great (i.e., it was easier to set the policy to something that prevented you from editing it after the new policy took effect). Our protections are better now.

The major goal here is to let installs open up Custom Forms for given applications (mostly Maniphest) to more users, but the other options mostly go hand-in-hand with that.

Also, in developer mode, include stack traces for policy exceptions. This makes debugging weird stuff (like the indirect Config application errors here) easier.

Test Plan:
  - Granted "Can Configure Application" for Maniphest to all users.
  - Edited custom forms as a non-administrator.
  - Configured Maniphest as a non-administrator.
  - Installed/uninstalled Maniphest as a non-administrator.
  - Tried to lock myself out (got an error message).

{F6015721}

Reviewers: amckinley, joshuaspence

Reviewed By: joshuaspence

Subscribers: joshuaspence

Maniphest Tasks: T13216

Differential Revision: https://secure.phabricator.com/D19822
2018-11-19 07:25:41 -08:00
epriestley
cb033673b6 Unify intracluster sync and Drydock working copy construction timeouts as a repository "copy time limit"
Summary:
Depends on D19814. Ref T13216. See PHI885. For various eldritch reasons, `git fetch` can hang. Although we'd probably like to fix this with `git fetch --require-sustained-network-transfer-rate=512KB/5s` or similar, that flag doesn't exist and we don't have a reasonable way to build it.

Short of that, move toward formalizing a repository "copy time limit": the longest amount of time anything may spend trying to make a copy of this repository.

This grows out of the existing intracluster sync limit, which is effectively the same thing. Here, apply it to `git clone` and `git fetch` in Drydock working copy construction, too. A future change may make it configurable.

Test Plan:
  - Set the limit to 0.001.
  - Tried to build and lease working copies, got sensible timeout errors (see D19815).

```
<Activation Failed> Lease activation failed: [CommandException] Command killed by timeout after running for more than 0.001 seconds.
COMMAND
ssh '-o' 'LogLevel=quiet' '-o' 'StrictHostKeyChecking=no' '-o' 'UserKnownHostsFile=/dev/null' '-o' 'BatchMode=yes' -l '********' -p '2222' -i '********' '127.0.0.1' -- '(cd '\''/var/drydock/workingcopy-163/repo/spellbook/'\'' && git clean -d --force && git fetch && git reset --hard)'
```

Reviewers: amckinley

Reviewed By: amckinley

Subscribers: yelirekim, PHID-OPKG-gm6ozazyms6q6i22gyam

Maniphest Tasks: T13216

Differential Revision: https://secure.phabricator.com/D19816
2018-11-16 13:08:12 -08:00
epriestley
933462b487 Continue cleaning up queries in the wake of changes to "%Q"
Summary: Depends on D19810. Ref T13217. Ref T13216. I mostly used `grep implode | grep OR` and `grep implode | grep AND` to find these -- not totally exhaustive but should be a big chunk of the callsites that are missing `%LO` / `%LA`.

Test Plan:
These are tricky to test exhaustively, but I made an attempt to hit most of them:

- Browsed Almanac interfaces.
- Created/browsed Calendar events.
- Enabled/disabled/showed the lock log.
- Browsed repositories.
- Loaded Facts UI.
- Poked at Multimeter.
- Used typeahead for users and projects.
- Browsed Phriction.
- Ran various fulltext searches.

Not sure these are reachable:

- All the lint stuff might be dead/unreachable/nonfunctional?

Reviewers: amckinley

Reviewed By: amckinley

Subscribers: yelirekim

Maniphest Tasks: T13217, T13216

Differential Revision: https://secure.phabricator.com/D19814
2018-11-16 12:49:44 -08:00
epriestley
49483bdb48 Use "%P" to protect session key hashes in SessionEngine queries from DarkConsole
Summary:
Ref T6960. Ref T13217. Ref T13216. Depends on D19811. Use the recently-introduced "%P" conversion ("Password/Secret") to load sessions in SessionEngine.

This secret isn't critical to protect (it's the //hash// of the actual secret and not useful to attackers on its own) but it shows up on every page in DarkConsole and is an obvious case where `%P` is a more appropriate conversion.

Test Plan:
Note "*********" in the middle of the output here, instead of a session key hash:

{F6012805}

Reviewers: amckinley

Reviewed By: amckinley

Maniphest Tasks: T13217, T13216, T6960

Differential Revision: https://secure.phabricator.com/D19812
2018-11-16 12:36:35 -08:00
epriestley
b2e91d2205 Move the "container updated" message for Buildables that build Diffs outside of the transaction
Summary:
Ref T13216. See PHI970. Ref T13054. See some discussion in T13216.

When a Harbormaster Buildable object is first created for a Diff, it has no `containerPHID` since the revision has not yet been created.

We later (after creating a revision) send the Buildable a message telling it that we've added a container and it should re-link the container object.

Currently, we send this message in `applyExternalEffects()`, which runs inside the Differential transaction. If Harbormaster races quickly enough, it can read the `Diff` object before the transaction commits, and not see the container update.

Add a `didCommitTransaction()` callback after the transactions commit, then move the message code there instead.

Test Plan:
  - See T13216 for substantial evidence that this change is on the right track.
  - Before change: added `sleep(15)`, reproduced the issue reliably.
  - After change: unable to reproduce issue even with `sleep(15)` (the `containerPHID` always populates correctly).

Reviewers: amckinley

Reviewed By: amckinley

Maniphest Tasks: T13216, T13054

Differential Revision: https://secure.phabricator.com/D19807
2018-11-16 12:34:06 -08:00
epriestley
44c32839a6 When you "Request Review" of a draft revision, change the button text from "Submit Quietly" to "Publish Revision"
Summary:
See PHI975. Ref T13216. Ref T2543. Previously, see D19204 and PHI433.

When you're acting on a draft revision, we change the button text to "Submit Quietly" as a hint that your actions don't generate notifications yet.

However, this isn't accurate when one of your actions is "Request Review", which causes the revision to publish.

Allow actions to override the submit button text, and make the "Request Review" action change the button text to "Publish Revision".

The alternative change I considered was to remove the word "Quietly" in all cases.

I'm not //thrilled// about how complex this change is to adjust one word, but the various pieces are all fairly clean individually. I'm not sure we'll ever be able to use it for anything else, but I do suspect that the word "Quietly" was the change in D19204 with the largest effect by far (see T10000).

Test Plan:
  - Created a draft revision. Saw "Submit Quietly" text.
  - Added a "Request Review" action, saw it change to "Publish Revision".
  - Reloaded page, saw stack saved and "Publish Revision".
  - Removed action, saw "Submit Quietly".
  - Repeated on a non-draft revision, button stayed put as "Submit".
  - Submitted the various actions, saw them have the desired effects.

Reviewers: amckinley

Reviewed By: amckinley

Maniphest Tasks: T13216, T2543

Differential Revision: https://secure.phabricator.com/D19810
2018-11-15 20:50:21 -08:00
epriestley
ec452e548a Improve text overflow behavior for hovercards with (for example) long package names
Summary: See PHI977. Ref T13216. Some text, like long package names, may overflow hovercards. Add overflow CSS behaviors to remedy this.

Test Plan:
Before:

{F6012699}

After:

{F6012700}

(You can use `/search/hovercard/` to render hovercards in a handy standalone way.)

Reviewers: amckinley

Reviewed By: amckinley

Maniphest Tasks: T13216

Differential Revision: https://secure.phabricator.com/D19809
2018-11-15 20:43:10 -08:00
epriestley
533e4e13b3 Add a bin/herald test ... for doing test runs via the CLI
Summary: Ref T13216. See D19666. It's currently tricky to profile Herald test runs since you have to submit a form and repeating them is a bit of a mess. Provide a simple CLI wrapper so we can use `--xprofile`. This is also maybe nice-to-have if we're ever debugging anything here.

Test Plan: Ran `bin/herald test --object ... --type ...` and got a sensible looking transcript in the UI.

Reviewers: amckinley

Reviewed By: amckinley

Maniphest Tasks: T13216

Differential Revision: https://secure.phabricator.com/D19806
2018-11-15 15:48:52 -08:00
epriestley
8a8123c9db Replace the primary "Disabled/Enabled" Herald Rule filter with "Active/Inactive", considering author status
Summary:
Ref T13216. See PHI947. In Herald, Personal rules do not run if their author's account is disabled.

This isn't communicated very clearly in the UI, and the way the SearchEngine/Query are set up isn't great.

Define "active" as "rule will actually run", which specifically means "rule is enabled, and has a valid (non-disabled) author if it needs one".

Change the meaning of the "Active" default filter from "rule is enabled" to "rule is enabled, and has a valid author if it needs one".

Refine the status badge on the view controller to show this "invalid author" state.

Tweak the language for "Disable/Enable" to be more consistent -- we currently call it "disabled" in some cases and "archived" in others.

Test Plan:
  - Disabled a user account and saw their personal rules behave properly with the new filters/options/view controller.
  - Disabled/enabled a rule, saw consistent text.

Reviewers: amckinley

Reviewed By: amckinley

Maniphest Tasks: T13216

Differential Revision: https://secure.phabricator.com/D19805
2018-11-15 15:47:35 -08:00
epriestley
bbd292b9b3 Modernize the Herald rule search engine
Summary: Ref T13216. Update the Herald Rule SearchEngine and Query to use a more modern style.

Test Plan: Ran various rule queries in the UI, got sensible results

Reviewers: amckinley

Reviewed By: amckinley

Maniphest Tasks: T13216

Differential Revision: https://secure.phabricator.com/D19803
2018-11-15 15:37:00 -08:00
epriestley
ea6d2afa86 Fix flickering tooltips in Chrome when the tip container overlaps the triggering element
Summary:
Fixes T8440. See that task for discussion.

Ref T13216. See PHI976.

Test Plan:
In Chrome, hovered a timestamp and moved the mouse up to the "overlap" area (see T8440). Before: flickered like crazy. After: no flickering.

(I couldn't reproduce the original issue in modern Firefox or Safari.)

Reviewers: amckinley, avivey

Reviewed By: avivey

Maniphest Tasks: T8440, T13216

Differential Revision: https://secure.phabricator.com/D19808
2018-11-15 10:43:55 -08:00
epriestley
e57bfbf421 Pull some debugging code back out of "master"
See D19778. This is a workaround for T13179 that landed by mistake.
2018-11-15 08:19:29 -08:00
epriestley
96f9b0917e Improve performance of two recent commit migrations
Summary:
Ref T13216. See PHI959. These two recent migrations can be expressed more efficiently:

  - When updating commit audit statuses, the field isn't JSON encoded or anything so we can just issue several bulk UPDATEs.
  - When inserting mail keys, we can batch them in groups of 100.

Test Plan: Used `bin/storage upgrade -f --apply phabricator:...` to reapply patches. Saw equivalent behavior and faster runtimes.

Reviewers: amckinley

Reviewed By: amckinley

Subscribers: PHID-OPKG-gm6ozazyms6q6i22gyam

Maniphest Tasks: T13216

Differential Revision: https://secure.phabricator.com/D19802
2018-11-15 03:52:06 -08:00
epriestley
86fd204148 Fix all query warnings in "arc unit --everything"
Summary:
Ref T13216. Ref T13217. Depends on D19800. This fixes all of the remaining query warnings that pop up when you run "arc unit --everything".

There's likely still quite a bit of stuff lurking around, but hopefully this covers a big set of the most common queries.

Test Plan: Ran `arc unit --everything`. Before change: lots of query warnings. After change: no query warnings.

Reviewers: amckinley

Reviewed By: amckinley

Maniphest Tasks: T13217, T13216

Differential Revision: https://secure.phabricator.com/D19801
2018-11-15 03:51:25 -08:00
epriestley
2f10d4adeb Continue making application fixes to Phabricator for changes to %Q semantics
Summary: Depends on D19789. Ref T13217. Continue updating things to use the new %Q-flavored conversions instead of smushing a bunch of strings together.

Test Plan: Browsed around, far fewer errors. These changes are largely mechanical in nature.

Reviewers: amckinley

Reviewed By: amckinley

Maniphest Tasks: T13217

Differential Revision: https://secure.phabricator.com/D19790
2018-11-15 03:50:02 -08:00
epriestley
98690ee326 Update many Phabricator queries for new %Q query semantics
Summary: Depends on D19785. Ref T13217. This converts many of the most common clause construction pathways to the new %Q / %LQ / %LO / %LA / %LJ semantics.

Test Plan: Browsed around a bunch, saw fewer warnings and no obvious behavioral errors. The transformations here are generally mechanical (although I did them by hand).

Reviewers: amckinley

Reviewed By: amckinley

Subscribers: hach-que

Maniphest Tasks: T13217

Differential Revision: https://secure.phabricator.com/D19789
2018-11-15 03:48:10 -08:00
epriestley
64b52b9952 Make SELECT construction in PolicyAwareQuery safer
Summary: Depends on D19784. Ref T13217. Reduce uses of unsafe `%Q` in SELECT construction.

Test Plan: This reduces the number of safety warnings when loading Phabricator home from ~900 to ~800.

Reviewers: amckinley

Reviewed By: amckinley

Maniphest Tasks: T13217

Differential Revision: https://secure.phabricator.com/D19785
2018-11-14 15:32:09 -08:00
epriestley
e26c4bddab Replace magical "branch" behavior in "diffusion.branchquery" with an explicit "patterns"
Summary:
See PHI958. Ref T13210. Previously, see PHI720.

The use case for the magic in PHI720 involves multiple patterns, and no parameter can be passed to `branch` that will result in multiple patterns being passed to `git`.

Replace the implicit magic with an explicit `patterns` parameter.

This whole thing is a bit shaky but probably isn't hurting anything.

Test Plan:
  - Ran query with no `patterns`.
  - Ran query with invalid `patterns`, got readable error.
  - Ran query with various valid `patterns` (plain branch name, globs with "?" and "*"), got sensible results.

Reviewers: amckinley

Reviewed By: amckinley

Maniphest Tasks: T13210

Differential Revision: https://secure.phabricator.com/D19771
2018-11-14 14:50:53 -08:00
epriestley
da40f80741 Update PhabricatorLiskDAO::chunkSQL() for new %Q semantics
Summary:
Ref T13217. This method is slightly tricky:

  - We can't safely return a string: return an array instead.
  - It no longer makes sense to accept glue. All callers use `', '` as glue anyway, so hard-code that.

Then convert all callsites.

Test Plan: Browsed around, saw fewer "unsafe" errors in error log.

Reviewers: amckinley

Reviewed By: amckinley

Subscribers: yelirekim, PHID-OPKG-gm6ozazyms6q6i22gyam

Maniphest Tasks: T13217

Differential Revision: https://secure.phabricator.com/D19784
2018-11-13 08:59:18 -08:00
epriestley
315d857a8a Add a basic web UI for intracluster sync logs
Summary: Depends on D19798. Ref T13216. This puts at least a basic UI on top of sync logs.

Test Plan:
Viewed logs from the web UI and exported data. Note that these syncs are somewhat simulated since I my local cluster is somewhat-faked (i.e., not actually multiple machines).

{F5995899}

Reviewers: amckinley

Reviewed By: amckinley

Maniphest Tasks: T13216

Differential Revision: https://secure.phabricator.com/D19799
2018-11-10 04:58:36 -08:00
epriestley
1d7c960531 Put push log "hookWait" to data export and add all wait values to UI
Summary:
Depends on D19797. Ref T13216.

  - Put the new `hookWait` in the export and the UI.
  - Put the existing waits in the UI, not just the export.
  - Make order consistent: host, write, read, hook (this is the order the timers start in).

Test Plan: Pushed some stuff, viewed web UI and saw sensible numbers, exported data and got the same values.

Reviewers: amckinley

Reviewed By: amckinley

Maniphest Tasks: T13216

Differential Revision: https://secure.phabricator.com/D19798
2018-11-10 04:47:38 -08:00
epriestley
2a7ac8e388 Make "bin/repository thaw" workflow more clear when devices are disabled
Summary:
Ref T13216. See PHI943. If autoscale lightning strikes all your servers at once and destroys them, the path to recovery can be unclear. You're "supposed" to:

  - demote all the devices;
  - disable the bindings;
  - bind the new servers;
  - put whatever working copies you can scrape up back on disk;
  - promote one of the new servers.

However, the documentation is a bit misleading (it was sort of written with "you lost one or two devices" in mind, not "you lost every device") and demote-before-disable is unnecessary and slightly risky if servers come back online. There's also a missing guardrail before the promote step which lets you accidentally skip the demotion step and end up in a confusing state. Instead:

  - Add a guard rail: when you try to promote a new server, warn if inactive devices still have versions and tell the user to demote them.
  - Allow demotion of inactive devices: the order "disable, demote" is safer and more intuitive than "demote, disable" and there's no reason to require the unintuitive order.
  - Make the "cluster already has leaders" message more clear.
  - Make the documentation more clear.

Test Plan:
  - Bound a repository to two devices.
  - Wrote to A to make it a leader, then disabled it (simulating a lightning strike).
  - Tried to promote B. Got a new, useful error ("demote A first").
  - Demoted A (before: error about demoting inactive devices; now: works fine).
  - Promoted B. This worked.

Reviewers: amckinley

Reviewed By: amckinley

Maniphest Tasks: T13216

Differential Revision: https://secure.phabricator.com/D19793
2018-11-10 04:46:46 -08:00
epriestley
c32fa06266 Use phutil_microseconds_since(...) to simplify some timing arithmetic
Summary: Depends on D19796. Simplify some timing code by using phutil_microseconds_since() instead of duplicate casting and arithmetic.

Test Plan: Grepped for `1000000` to find these. Pulled, pushed, made a conduit call. This isn't exhaustive but it should be hard for these to break in a bad way since they're all just diagnostic.

Reviewers: amckinley

Reviewed By: amckinley

Differential Revision: https://secure.phabricator.com/D19797
2018-11-08 16:46:32 -08:00
epriestley
b12e92e6e2 Add timing information for commit hooks to push logs
Summary:
Depends on D19779. Ref T13216. The push logs currently record the "hostWait", which is roughly "locking + subprocess cost". We also record locking separately, so we can figure out "subprocess cost" alone by subtracting the lock costs.

However, the subprocess (normally `git receive-pack`) runs hooks, and we don't have an easy way to figure out how much time was spent doing actual `git` stuff vs spent doing commit hook processing. This would have been useful in diagnosing at least one recent issue.

Track at least a rough hook cost and record it in the push logs.

Test Plan: Pushed to a repository, saw a reasonable hook cost appear in the database table.

Reviewers: amckinley

Reviewed By: amckinley

Subscribers: PHID-OPKG-gm6ozazyms6q6i22gyam

Maniphest Tasks: T13216

Differential Revision: https://secure.phabricator.com/D19780
2018-11-08 06:00:26 -08:00
epriestley
966db4d38e Add an intracluster synchronization log for cluster repositories
Summary:
Depends on D19778. Ref T13216. See PHI943, PHI889, et al.

We currently have a push log and a pull log, but do not separately log intracluster synchronization events. We've encountered several specific cases where having this kind of log would be helpful:

  - In PHI943, an install was accidentally aborting locks early. Having timing information in the sync log would let us identify this more quickly.
  - In PHI889, an install hit an issue with `MaxStartups` configuration in `sshd`. A log would let us identify when this is an issue.
  - In PHI889, I floated a "push the linux kernel + fetch timeout" theory. A sync log would let us see sync/fetch timeouts to confirm this being a problem in practice.
  - A sync log will help us assess, develop, test, and monitor intracluster routing sync changes (likely those in T13211) in the future.

Some of these events are present in the pull log already, but only if they make it as far as running a `git upload-pack` subprocess (not the case with `MaxStartups` problems) -- and they can't record end-to-end timing.

No UI yet, I'll add that in a future change.

Test Plan:
  - Forced all operations to synchronize by adding `|| true` to the version check.
  - Pulled, got a sync log in the database.

Reviewers: amckinley

Reviewed By: amckinley

Subscribers: PHID-OPKG-gm6ozazyms6q6i22gyam

Maniphest Tasks: T13216

Differential Revision: https://secure.phabricator.com/D19779
2018-11-07 18:24:20 -08:00
epriestley
e09d29fb1a Clean up the workflow for some post-push logging code
Summary:
Ref T13216. When a repository is clustered, we run this cleanup code (to tell the repository to update, and log some timing information) on both nodes. Currently, we do slightly too much work, which is unnecessary and can be a bit confusing to human readers.

The double update message doesn't hurt anything, but there's no reason to write it twice.

Likewise, the second timing information update query doesn't do anything: there's no PushEvent object with the right identifier, so it just updates nothing. We don't need to run it, and it's confusing that we do.

Instead, only do these writes if we're actually the final node with the repository on it.

Test Plan: Added some logging, saw double writes/updates before the change and no doubles afterwards, with no other behavioral changes.

Reviewers: amckinley

Reviewed By: amckinley

Maniphest Tasks: T13216

Differential Revision: https://secure.phabricator.com/D19778
2018-11-07 17:46:50 -08:00
epriestley
b645af981b Correct a missing parameter in "Outbound Mail" documentation
Summary: See <https://discourse.phabricator-community.org/t/documentation-error/2079>. This command is missing the configuration key.

Test Plan: Ran `bin/config set --stdin` with no other arguments, got an error about missing key. Ran with `... cluster.mailers`, got read from stdin.

Reviewers: amckinley

Reviewed By: amckinley

Differential Revision: https://secure.phabricator.com/D19791
2018-11-07 17:43:01 -08:00
epriestley
8a4bf38655 Use 160-bit TOTP keys rather than 80-bit TOTP keys
Summary:
See <https://hackerone.com/reports/435648>. We currently use 80-bit TOTP keys. The RFC suggests 128 as a minimum and recommends 160.

The math suggests that doing the hashing for an 80-bit key is hard (slightly beyond the reach of a highly motivated state actor, today) but there's no reason not to use 160 bits instead to put this completely out of reach.

See some additional discussion on the HackerOne report about enormous key sizes, number of required observations, etc.

Test Plan: Added a new 160-bit TOTP factor to Google Authenticator without issue.

Reviewers: amckinley

Reviewed By: amckinley

Differential Revision: https://secure.phabricator.com/D19792
2018-11-07 15:44:02 -08:00
epriestley
1f6a4cfffe Prevent users from selecting excessively bad passwords based on their username or email address
Summary:
Ref T13216. We occasionally receive HackerOne reports concerned that you can select your username as a password. I suspect very few users actually do this and that this is mostly a compliance/checklist sort of issue, not a real security issue.

On this install, we have about 41,000 user accounts. Of these, 100 have their username as a password (account or VCS). A substantial subset of these are either explicitly intentional ("demo", "bugmenot") or obvious test accounts ("test" in name, or name is a nonsensical string of gibberish, or looks like "tryphab" or similar) or just a bunch of numbers (?), or clearly a "researcher" doing this on purpose (e.g., name includes "pentest" or "xss" or similar).

So I'm not sure real users are actually very inclined to do this, and we can't really ever stop them from picking awful passwords anyway. But we //can// stop researchers from reporting that this is an issue.

Don't allow users to select passwords which contain words in a blocklist: their username, real name, email addresses, or the install's domain name. These words also aren't allowed to contain the password (that is, neither your password nor your username may be a substring of the other one). We also do a little normalization to try to split apart email addresses, domains, and real names, so I can't have "evan1234" as my password.

Test Plan:
  - Added unit tests and made them pass.
  - Tried to set my password to a bunch of variations of my username / email / domain name / real name / etc, got rejected.

Reviewers: amckinley

Reviewed By: amckinley

Maniphest Tasks: T13216

Differential Revision: https://secure.phabricator.com/D19776
2018-11-06 12:44:07 -08:00
epriestley
c206b066df When {meme ...} embed has no text, just use the raw file data unmodified
Summary:
Ref T13216. See PHI948. When you use the remarkup hint button to embed a meme with no text, you get `{meme src=X}`.

If the source is a GIF, we currently split the source apart into frame-by-frame images, process them, and stitch them back together. The end result is the same image we started with, but this process can be slow/expensive, and may timeout for sufficiently large GIFs.

Instead: when there's no text, just return the original image data.

Test Plan:
  - Used `{meme src=X}` with no text, got an image faster.
  - Used `{meme src=X, above=...}` to add text, got an attempt to add text (which didn't get very far locally since I don't have GD configured).

Reviewers: amckinley

Reviewed By: amckinley

Maniphest Tasks: T13216

Differential Revision: https://secure.phabricator.com/D19777
2018-11-06 09:40:22 -08:00
epriestley
bbfc860c63 Improve aesthetics of commit hook rejection message
Summary: See PHI939. Ref T13216. Make the dragon's companion animal more clearly cow-like.

Test Plan:
Before:

```
\     \__/
 \____(Oo)
 (    (--)
 //__\\
//    \\
```

After:

```
*     \__/
 \____(Oo)
 (    (..)
 //___\\
//     \\
```

Reviewers: amckinley

Reviewed By: amckinley

Maniphest Tasks: T13216

Differential Revision: https://secure.phabricator.com/D19775
2018-11-06 09:39:48 -08:00
epriestley
d38e768ed8 Prevent users from voting for invalid Slowvote options
Summary:
Depends on D19773. See <https://hackerone.com/reports/434116>. You can currently vote for invalid options by submitting, e.g., `vote[]=12345`.

By doing this, you can see the responses, which is sort of theoretically a security problem? This is definitely a bug, regardless.

Instead, only allow users to vote for options which are actually part of the poll.

Test Plan:
  - Tried to vote for invalid options by editing the form to `vote[]=12345` (got error).
  - Tried to vote for invalid options by editing the radio buttons on a plurality poll into checkboxes, checking multiple boxes, and submitting (got error).
  - Voted in approval and plurality polls the right way, from the main web UI and from the embed (`{V...}`) UI.

Reviewers: amckinley

Reviewed By: amckinley

Differential Revision: https://secure.phabricator.com/D19774
2018-11-06 09:21:18 -08:00
epriestley
5e1d94f336 Remove nonfunctional AJAX embed behavior for Slowvote
Summary:
See <https://hackerone.com/reports/434116>. Slowvote has a piece of Javascript that attempts to let you vote on `{V123}` polls inline.

It does not work: nothing ever triggers it (nothing renders a control with a `slowvote-option` sigil).

At least for now, just remove it. It has a completely separate pathway in the controller and both pathways are buggy, so this makes fixing them easier.

Test Plan: Voted in plurality and approval polls via Slowvote and the embedded widget.

Reviewers: amckinley

Reviewed By: amckinley

Differential Revision: https://secure.phabricator.com/D19773
2018-11-06 09:20:07 -08:00
epriestley
798a391e5a Add test coverage for "%R" in qsprintf() and convert LiskDAO to support it
Summary:
Ref T13210. Ref T11908. Add some basic test coverage for the new "%R" introduced in D19764, then convert LiskDAO to implement the "Database + Table Ref" interface.

To move forward, we need to convert all of these (where `%T` is not a table alias):

```counterexample
qsprintf($conn, '... %T ...', $thing->getTableName());
```

...to these:

```
qsprintf($conn, '... %R ...', $thing);
```

The new code is a little simpler (no `->getTableName()` call) which is sort of nice. But we also have a //lot// of `%T` so this is probably going to take a while.

(I'll hold this until after the release cut.)

Test Plan:
  - Ran unit tests.
  - Browsed around and edited some objects without issues. This change causes a reasonably large percentage of our total queries to use the new code since the LiskDAO builtin queries are some of the most commonly-constructed queries, although there are still ~700 callsites which need to be examined for possible conversion.

Reviewers: amckinley

Reviewed By: amckinley

Maniphest Tasks: T13210, T11908

Differential Revision: https://secure.phabricator.com/D19765
2018-11-05 10:59:50 -08:00
epriestley
d2316e8025 Fix an errant "switch ... continue"
Summary:
See <https://discourse.phabricator-community.org/t/unhandled-exception-on-create-task/2062>.

This construction has the same behavior as "switch ... break" but is unconventional. PHP 7.3 started warning about it because it's likely a mistake.

Test Plan: Created a task, edited a task owner. The new code is functionally identical to the old code.

Reviewers: amckinley

Reviewed By: amckinley

Differential Revision: https://secure.phabricator.com/D19772
2018-11-05 10:26:27 -08:00
epriestley
24a061f844 Correct an ambiguous regexp in DiffusionRequest
Summary:
See <https://discourse.phabricator-community.org/t/diffusionrequest-regex-error/2057/>.

The intent of `[\d-,]` is "digits, hyphen, and comma" but `[x-y]` means "character range x-y".

Specify `[\d,-]` instead to disambiguate the hyphen as "literal hyphen", not a character range marker.

Test Plan: I can't reproduce the original error as reported, but browsed around Diffusion for a bit.

Reviewers: amckinley, avivey

Reviewed By: avivey

Differential Revision: https://secure.phabricator.com/D19770
2018-11-01 20:01:39 -07:00
Tim Hirsh
9bea00c159 Add harbormaster.buildplan.search api method
Summary: This revision adds a conduit search method for build plans.  Other api methods (eg: `harbormaster.build.search`) support build plan phid's as a constraint, but they weren't exposed anywhere, so this provides a way to fetch them.

Test Plan:
Used the api console to run some searches.  Output:
```
{
  "data": [
    {
      "id": 1,
      "type": "HMCP",
      "phid": "PHID-HMCP-q2c25wvegzdkxs7gzor6",
      "fields": {
        "name": "my build plan",
        "planStatus": "active",
        "dateCreated": 1538085249,
        "dateModified": 1538085249,
        "policy": {
          "view": "users",
          "edit": "admin"
        }
      },
    {
      "id": 1,
      "type": "HMCP",
      "phid": "PHID-HMCP-q2c25wvegzdkxs7gzor6",
      "fields": {
        "name": "my build plan",
        "status": {
          "value": "active"
        },
        "dateCreated": 1538085249,
        "dateModified": 1538085249,
        "policy": {
          "view": "users",
          "edit": "admin"
        }
      },
      "attachments": {}
    },
    ...
  ],
  "maps": {},
  "query": {
    "queryKey": null
  },
  "cursor": {
    "limit": 100,
    "after": null,
    "before": null,
    "order": null
  }
}
```

Reviewers: epriestley, #blessed_reviewers

Reviewed By: epriestley, #blessed_reviewers

Subscribers: Korvin, yelirekim

Differential Revision: https://secure.phabricator.com/D19769
2018-11-02 02:57:38 +00:00
epriestley
5d4970d6b2 Fix a bug where "View as Query" could replace a saved query row by ID, causing workboard 404s
Summary:
Fixes T13208. See that task for details.

The `clone $query` line is safe if `$query` is a builtin query (like "open").

However, if it's a saved query we clone not only the query parameters but the ID, too. Then when we `save()` the query later, we overwrite the original query.

So this would happen in the database. First, you run a query and save it as the workboard default (query key "abc123"):

| 123 | abc123 | {"...xxx..."} |

Then we `clone` it and change the parameters, and `save()` it. But that causes an `UPDATE ... WHERE id = 123` and the table now looks like this:

| 123 | def456 | {"...yyy..."} |

What we want is to create a new query instead, with an `INSERT ...`:

| 123 | abc123 | {"...xxx..."} |
| 124 | def456 | {"...yyy..."} |

Test Plan:
  - Followed reproduction steps from above.
    - With just the new `save()` guard, hit the guard error.
    - With the `newCopy()`, got a new copy of the query and "View as Query" remained functional without overwriting the original query row.
  - Ran migration, saw an affected board get fixed.

Reviewers: amckinley, joshuaspence

Reviewed By: joshuaspence

Subscribers: PHID-OPKG-gm6ozazyms6q6i22gyam

Maniphest Tasks: T13208

Differential Revision: https://secure.phabricator.com/D19768
2018-11-01 05:44:49 -07:00
epriestley
b950f877c5 Allow Drydock Blueprints to control "supplemental allocation" behavior so all hosts in an Almanac pool get used
Summary:
Fixes T12145. Ref T13210. See PHI570. See PHI536.

Currently, when you give Drydock an Almanac host pool with more than one host, it never voluntarily builds a second host resource: there is no way to say "maximum X working copies per host" (only "maximum X global working copies") to make the first host overflow, and the allocator tries to pack resources as tightly as possible.

If you can force it to allocate the 2nd..Nth host, things will work reasonably well from there (it will spread working copies across the hosts randomly), but tricking it is very hard, especially before D19761.

To deal with this, give blueprints a new behavior around "supplemental allocations". The idea here is that a blueprint may decide that it would prefer to allocate a fresh new resource instead of allowing an otherwise valid acquisition to occur.

These supplemental allocations follow all the normal allocation rules (they can't exceed limits or actually replace existing resources), so they can only happen if there's free space in the resource pool. But a blueprint can elect for a supplemental allocation to provide a "grow the pool" hint.

The only useful policies here are probably "true" (immediately use all resources, like Almanac) or "false" (pack resources as efficiently as possible) but some other policies //might// be useful (perhaps "start growing the pool when we're getting a bit full even if we aren't at the limit yet, since our workload is bursty").

Then, give Almanac host resources a "true" policy (always allocate supplemental resources) so they use all hosts once a similar number of concurrent jobs arrive.

One aspect of this approach is that we only do supplemental resources if the normal allocation algorithm already decided that the best resource to acquire was part of the same blueprint. I started with an approach like "look at all the blueprints and see if any of them want to be greedy", but then a not-very-desirable blueprint would end up filling up its whole pool before we skipped the supplemental allocation part and ended up picking a different resource. That felt a bit silly and this feels a little cleaner and more focused.

Test Plan:
  - Without changing the Almanac blueprint policy, allocated hosts. Got A, A, A, A, ... (second host never used).
  - Changed the Almanac policy.
  - Allocated hosts, got A, B, random mix of A and B.
  - Destroyed B. Destroyed all leases on A. Allocated. Got A. This tests the "don't build a supplemental resource if there are no leases on the natural resource".

Reviewers: amckinley

Reviewed By: amckinley

Subscribers: yelirekim, PHID-OPKG-gm6ozazyms6q6i22gyam

Maniphest Tasks: T13210, T12145

Differential Revision: https://secure.phabricator.com/D19762
2018-10-31 18:06:47 -07:00
epriestley
57b4b59819 When a Drydock host based on an Almanac blueprint has its binding disabled, stop handing out leases
Summary:
Ref T13210. Ref T12145. The "Almanac Host" blueprint currently hands out new leases on a given host even if the binding has been disabled.

Although there are some more complicated cases here (e.g., involving cleanup of the existing resource and existing leases), this one seems clear cut: if the binding has been disabled, we should stop handing out new leases on it.

Test Plan:
  - Created a service with two hosts.
  - Requested a lease, got host A.
  - Requested more leases, always got host A (we never build a new host when we don't have to, and we currently never have to).
  - Disabled the binding to host A.
  - Requested a lease.
    - Before patch: got host A.
    - After patch: got host B.
  - Also disabled the other binding to host B, requested a lease, got an indefinite wait for resources (which is expected and reasonable).

Reviewers: amckinley

Reviewed By: amckinley

Subscribers: yelirekim, PHID-OPKG-gm6ozazyms6q6i22gyam

Maniphest Tasks: T13210, T12145

Differential Revision: https://secure.phabricator.com/D19761
2018-10-27 07:20:30 -07:00
epriestley
65e953658a Expose Audit actions for "transaction.search" in a basic way
Summary: Ref T13210. See PHI841. This mirrors D19509 for Differential.

Test Plan: Called `transaction.search` on a commit with a bunch of audit activity, got appropriate labels in the results.

Reviewers: amckinley

Reviewed By: amckinley

Maniphest Tasks: T13210

Differential Revision: https://secure.phabricator.com/D19760
2018-10-27 07:19:50 -07:00
epriestley
61ec434208 Remove unicode marks for "Accept/Raise Concern" in Audit
Summary:
Ref T13210. The comment action dropdown for audits has a heavy checkmark next to "Accept" and a heavy "X" next to "Raise Concern".

We previously removed similar marks in Differential in D19405 and that seems to have gone fine. For consistency, remove these too.

Test Plan: Viewed the comment action dropdown, no longer saw checkmark and X-mark.

Reviewers: amckinley

Reviewed By: amckinley

Maniphest Tasks: T13210

Differential Revision: https://secure.phabricator.com/D19759
2018-10-27 07:19:18 -07:00
epriestley
a7c008708d Correct a mangled translation string in "bin/phd log --id X"
Summary:
Ref T13210. See PHI930. This translation is wrong: the parameter is a comma-separated list as a string, but the USEnglish translation provides alternatives. We can't select among alternatives based on a random string (it isn't a plurality value to let us select "chair" vs "chairs", and isn't a gender value to let us select "his profile" vs "her profile") so we get an error.

But the string itself is also misleading, since "bin/phd log --id A --id B --id C" will say "none of these are valid" if //any// of them are invalid.

Instead, just tell the user explicitly about the first problem.

Test Plan:
  - Ran `bin/phd log --id` with good (got logs) and bad IDs (got sensible error).
  - Ran `bin/phd log` with any logs (got logs) and (simluated) without any logs (got error).

Reviewers: amckinley

Reviewed By: amckinley

Maniphest Tasks: T13210

Differential Revision: https://secure.phabricator.com/D19755
2018-10-26 06:13:18 -07:00
Mike Riley
5f3a7cb41b Expose Drydock leases via Conduit
Summary:
See T13212 for some context and discussion on this being revived.
See T11694 for original context.

Add a query constraint for lease owners and implement the conduit search method for Drydock leases.

Ref T11694. Fixes T13212.

Test Plan:
- Called the API method from conduit and browsed lease queries from the UI.
- Used the new "ownerPHIDs" constraint via API console.

{F5963044}

Reviewers: yelirekim, amckinley

Reviewed By: amckinley

Subscribers: PHID-OPKG-gm6ozazyms6q6i22gyam, epriestley

Maniphest Tasks: T11694, T13212

Differential Revision: https://secure.phabricator.com/D16594
2018-10-26 06:12:38 -07:00
epriestley
f6122547d7 When a lease triggers a resource allocation for a resource which must activate, awaken the lease task after the resource activates
Summary:
Depends on D19753. Ref T13210. This is a small optimization that saves us from waiting up to 15 seconds for a yield.

When there are no Working Copy resources and a new lease comes in, we'll allocate one and yield until it activates.

If activating it (SSH'ing and running `git clone`) takes less than 15 seconds, the resource will activate (say, at T+4) but the lease won't update again for a while (say, until T+15). This leaves us with a pointless wait (in this example, we're sitting around for 9 seconds when we could move forward).

To improve this a little bit, let resources wake up the lease update tasks that triggered allocation after they activate. In the best case, that task runs ~15 seconds sooner. In the worst case, the awaken is just a no-op.

With a more-full queue, this has a smaller effect (it's likely something else will run and be able to use the resource in those 9 seconds).

With already-activated resources, this has no effect (when resources are already activated, we can lease immediately).

Test Plan:
  - Cleaned up all working copy resources.
  - Requested a new "A" working copy.
  - Before patch: got a working copy after 17-18 seconds, most of which was spent yielded.
  - After patch: got a working copy after 3-4 seconds.

Reviewers: amckinley

Reviewed By: amckinley

Subscribers: PHID-OPKG-gm6ozazyms6q6i22gyam

Maniphest Tasks: T13210

Differential Revision: https://secure.phabricator.com/D19754
2018-10-26 06:11:43 -07:00
epriestley
78ab675bd8 After a Drydock lease triggers a resource to be reclaimed, stop it from triggering another reclaim until the first one completes
Summary:
Depends on D19752. Ref T13210. If resources take a long time to reclaim/destroy (normally, more than 15 seconds) a single new lease may update several times during the reclaim/destroy process and end up reclaiming multiple resources.

Instead: after a lease triggers a reclaim, prevent it from triggering another reclaim as long as the resource it is reclaiming hasn't finished its reclaim/destroy cycle. Basically, each lease only gets to destroy one resource at a time.

Test Plan:
  - Added a `sleep(120)` to `destroyResource()` to simulate a long reclaim/destroy cycle.
  - Allocated A, A, A working copies. Leased a B working copy.
  - Before patch: saw "B" lease destroy all three "A" working copies after ~0, ~15, and ~30 seconds, then build a new "B" resource after ~120 seconds (when the first reclaim/destroy finished).
  - After patch: saw "B" lease destroy one "A" working copy after ~0 seconds, then wait patiently until it finished up, then build a new "B" resource.

Reviewers: amckinley

Reviewed By: amckinley

Subscribers: PHID-OPKG-gm6ozazyms6q6i22gyam

Maniphest Tasks: T13210

Differential Revision: https://secure.phabricator.com/D19753
2018-10-26 06:11:05 -07:00