Summary:
Ref T10751. We currently have a placeholder Almanac document, and a fairly-bad-advice section in Daemons.
Pull these into the modern cluster documentation.
Test Plan: 17 phabricator PHDs
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T10751
Differential Revision: https://secure.phabricator.com/D15689
Summary: Fixes T10789. If we aren't configured with a device, we never grabbed a lock in the first place, and should not expect one to be held.
Test Plan: Pushed non-cluster-configured Git SSH repository.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T10789
Differential Revision: https://secure.phabricator.com/D15692
Summary:
Ref T4292. This mostly implements the locking/versioning logic for multi-master repositories. It is only active on Git SSH pathways, and doesn't actually do anything useful yet: it just does bookkeeping so far.
When we read (e.g., `git fetch`) the logic goes like this:
- Get the read lock (unique to device + repository).
- Read all the versions of the repository on every other device.
- If any node has a newer version:
- Fetch the newer version.
- Increment our version to be the same as the version we fetched.
- Release the read lock.
- Actually do the fetch.
This makes sure that any time you do a read, you always read the most recently acknowledged write. You may have to wait for an internal fetch to happen (this isn't actually implemented yet) but the operation will always work like you expect it to.
When we write (e.g., `git push`) the logic goes like this:
- Get the write lock (unique to the repository).
- Do all the read steps so we're up to date.
- Mark a write pending.
- Do the actual write.
- Bump our version and mark our write finished.
- Release the write lock.
This allows you to write to any replica. Again, you might have to wait for a fetch first, but everything will work like you expect.
There's one notable failure mode here: if the network connection between the repository node and the database fails during the write, the write lock might be released even though a write is ongoing.
The "isWriting" column protects against that, by staying locked if we lose our connection to the database. This will currently "freeze" the repository (prevent any new writes) until an administrator can sort things out, since it'd dangerous to continue doing writes (we may lose data).
(Since we won't actually acknowledge the write, I think, we could probably smooth this out a bit and make it self-healing //most// of the time: basically, have the broken node rewind itself by updating from another good node. But that's a little more complex.)
Test Plan:
- Pushed changes to a cluster-mode repository.
- Viewed web interface, saw "writing" flag and version changes.
- Pulled changes.
- Faked various failures, got sensible states.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T4292
Differential Revision: https://secure.phabricator.com/D15688
Summary:
Ref T4292. This adds some very basic cluster/device data to the new management view. Nothing interesting yet.
Also deal with disabled bindings a little more cleanly.
Test Plan: {F1214619}
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T4292
Differential Revision: https://secure.phabricator.com/D15685
Summary:
Ref T4292. This puts a very rough skeleton in place for the new "Manage Repository" UI, somewhat similar to the "Settings" UI.
Right now, it has one panel with no content, and is not reachable from the UI.
Test Plan: {F1214525}
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T4292
Differential Revision: https://secure.phabricator.com/D15683
Summary:
Ref T10756. When repositories are properly configured for the cluster (which is hard to set up today), be smart about which repositories are expected to exist on the current host, and only pull them.
This generally allows daemons to pretty much do the right thing no matter how many copies are running, although there may still be some lock contention issues that need to be sorted out.
Test Plan: {F1214483}
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T10756
Differential Revision: https://secure.phabricator.com/D15682
Summary: Ref T10702
Test Plan: Open a user profile, attempt to award an archived or previously awarded badge, badges dialog should provide a typeahead, and the suggestions should offer details about whether a badge is archived or already awarded.
Reviewers: epriestley, #blessed_reviewers
Reviewed By: epriestley, #blessed_reviewers
Subscribers: Korvin
Maniphest Tasks: T10702
Differential Revision: https://secure.phabricator.com/D15665
Summary:
Ref T4571. Write more of the missing documentation sections and clarify a few things.
Since the "replicating master" check needs a special permission, imposes a performance penalty, is probably very difficult to misconfigure, and likely not a big deal anyway, just drop the idea of trying to automatically detect + prevent it. We still show if it's an issue on the status page, provided we have permission to check.
When you don't have any cluster databases configured, never stop trying to connect to the default master database. We might want to do this eventually as load reduction, but just don't muddy the waters too much for now while things stabilize.
Test Plan:
- Tested functionality in cluster, non-cluster, and degraded-cluster modes.
- Used status console to monitor a health check cycle.
- Read docs.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T4571
Differential Revision: https://secure.phabricator.com/D15679
Summary:
Ref T4571. When a database goes down briefly, we fall back to replicas.
However, this fallback is slow (not good for users) and keeps sending a lot of traffic to the master (might be bad if the root cause is load-related).
Keep track of recent connections and fully degrade into "severed" mode if we see a sequence of failures over a reasonable period of time. In this mode, we send much less traffic to the master (faster for users; less load for the database).
We do send a little bit of traffic still, and if the master recovers we'll recover back into normal mode seeing several connections in a row succeed.
This is similar to what most load balancers do when pulling web servers in and out of pools.
For now, the specific numbers are:
- We do at most one health check every 3 seconds.
- If 5 checks in a row fail or succeed, we sever or un-sever the database (so it takes about 15 seconds to switch modes).
- If the database is currently marked unhealthy, we reduce timeouts and retries when connecting to it.
Test Plan:
- Configured a bad `master`.
- Browsed around for a bit, initially saw "unrechable master" errors.
- After about 15 seconds, saw "major interruption" errors instead.
- Fixed the config for `master`.
- Browsed around for a while longer.
- After about 15 seconds, things recovered.
- Used "Cluster Databases" console to keep an eye on health checks: it now shows how many recent health checks were good:
{F1213397}
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T4571
Differential Revision: https://secure.phabricator.com/D15677
Summary:
Ref T4571. If we fail to connect to the master, automatically try to degrade into a temporary read-only mode ("UNREACHABLE") for the remainder of the request, if possible.
If the request was something like "load the homepage", that'll work fine. If it was something like "submit a comment", there's nothing we can do and we just have to fail.
Detecting this condition imposes a performance penalty: every request checks the connection and gives the database a long time to respond, since we don't want to drop writes unless we have to. So the degraded mode works, but it's really slow, and may perpetuate the problem if the root issue is load-related.
This lays the groundwork for improving this case by degrading futher into a "SEVERED" mode which will persist across requests. In the future, if several requests in a short period of time fail, we'll sever the database host and refuse to try to connect to it for a little while, connecting directly to replicas instead (basically, we're "health checking" the master, like a load balancer would health check a web application server). This will give us a better (much faster) degraded mode in a major service disruption, and reduce load on the master if the root cause is load-related, giving it a better chance of recovering on its own.
Test Plan:
- Disabled master in config by changing the host/username, got degraded automatically to UNREACAHBLE mode immediately.
- Faked full SEVERED mode, requests hit replicas and put me in the mode properly.
- Made stuff work, hit some good pages.
- Hit some non-cluster pages.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T4571
Differential Revision: https://secure.phabricator.com/D15674
Summary: Ref T4571. If `cluster.databases` is configured but only has replicas, implicitly drop to read-only mode and send writes to a replica.
Test Plan:
- Disabled the `master`, saw Phabricator automatically degrade into read-only mode against replicas.
- (Also tested: explicit read-only mode, non-cluster mode, properly configured cluster mode).
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T4571
Differential Revision: https://secure.phabricator.com/D15672
Summary:
Ref T4571. Allows users to click the "read-only mode" notification to get more information about why an install is in read-only mode.
Installs can be in this mode for several reasons (explicit administrative action, no masters defined, no masters reachable), and it's useful to be able to tell the difference.
Test Plan: {F1212930}
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T4571
Differential Revision: https://secure.phabricator.com/D15671
Summary:
Ref T4571. Ref T10759. Ref T10758. This isn't complete, but gets most of the job done:
- When `cluster.databases` is set up, most things ignore `mysql.host` now.
- You can `bin/storage upgrade` and stuff works.
- You can browse around in the web UI and stuff works.
There's still a lot of weird tricky stuff to navigate, and this has real no advantages over configuring a single server yet (no automatic failover, etc).
Test Plan:
- Configured `cluster.databases` to point at my `t1.micro` hosts in EC2 (master + replica).
- Ran `bin/storage upgrade`, got a new install setup on them properly.
- Survived setup warnings, browsed around.
- Switched back to local config, ran `bin/storage upgrade`, browsed around, went through setup checks.
- Intentionally broke config (bad hosts, no masters) and things seemed to react reasonably well.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T4571, T10758, T10759
Differential Revision: https://secure.phabricator.com/D15668
Summary: Ref T4571. The configuration option still doesn't do anything, but add a status panel for basic setup monitoring.
Test Plan:
Here's what a good version looks like:
{F1212291}
Also faked most of the errors it can detect and got helpful diagnostic messages like this:
{F1212292}
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T4571
Differential Revision: https://secure.phabricator.com/D15667
Summary:
Ref T4571. This adds a new option which allows you to upgrade your one-host configuration to a multi-host configuration by configuring it.
Doing this currently does nothing. I wrote a lot of words about what it is //supposed// to do in the future, though.
Test Plan:
- Tried to configure the option in all the possible bad ways, got errors.
- Read documentation.
Reviewers: chad
Reviewed By: chad
Subscribers: eadler
Maniphest Tasks: T4571
Differential Revision: https://secure.phabricator.com/D15663
Summary:
Ref T4571. There will be a very long path beyond this, but add a basic read-only mode. You can explicitly enable this to put Phabricator in a sort of "maintenance" mode today if you're swapping databases or something.
In the long term, we'll automatically degrade into this mode if the master database is down.
Test Plan:
- Enabled read-only mode.
- Browsed around.
- Didn't immediately see anything that was totally 100% broken.
Most stuff is 80-90% broken right now. For example:
- Stuff like submitting comments doesn't work, and gives you a confusing, unhelpful error.
- None of the UI really knows that it's read-only. EditEngine stuff should all hide itself and say "you can't add new comments while an install is in read-only mode", for example, but currently does not.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T4571
Differential Revision: https://secure.phabricator.com/D15662
Summary: Testing out a new 'nav' layout in Settings / Config. Spent a few days here and couldn't find much better overall.
Test Plan: View each page in Settings and in Config. Save some config options. Test mobile, desktop, tablet.
Reviewers: epriestley
Reviewed By: epriestley
Subscribers: Korvin
Differential Revision: https://secure.phabricator.com/D15659
Summary: Fixes T5813, while I'm in here...
Test Plan: Sorted stuff by end date.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T5813
Differential Revision: https://secure.phabricator.com/D15657
Summary: Fixes T10684. Fixes T10520. This primarily implements a date/epoch field, and then does a bunch of standard plumbing.
Test Plan:
- Created countdowns.
- Edited countdowns.
- Used HTTP prefilling.
- Created a countdown ending on "Christmas Morning", etc.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T10520, T10684
Differential Revision: https://secure.phabricator.com/D15655
Summary: Closes T10690
Test Plan: Open Badges application, go to Advanced Search, search for a badge by its name and see result.
Reviewers: #blessed_reviewers, epriestley
Reviewed By: #blessed_reviewers, epriestley
Subscribers: Korvin
Maniphest Tasks: T10690
Differential Revision: https://secure.phabricator.com/D15656
Summary:
Fixes T8613. This was pretty straightforward, I just never dug into it originally.
`rawResultLimit = 0` just means "no limit", so the fix is to only apply a limit if it is set to some nonzero value.
Also modernize a few pieces of code.
Test Plan: I'm actually not sure this can actually be hit normally? I faked `setGenerateGhosts(true)` into an unrelated query, hit the fatal, then fixed it.
Reviewers: lpriestley, chad
Reviewed By: chad
Maniphest Tasks: T8613
Differential Revision: https://secure.phabricator.com/D15653
Summary: Found another bouncing around.
Test Plan: Review in diff
Reviewers: epriestley
Reviewed By: epriestley
Subscribers: Korvin
Differential Revision: https://secure.phabricator.com/D15651
Summary: Ref T10262. Instead of dumping an unhelpful 403 "ACCESS DENIED" page on users, explain the most likely cause of the issue and give them a link to return to the file detail page to learn more or get an up-to-date link.
Test Plan: Hit both errors, had a lovely experience with the helpful dialog text.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T10262
Differential Revision: https://secure.phabricator.com/D15650
Summary: Ref T6027. We got a not-very-user-friendly default string before.
Test Plan: Selected "Move", didn't change the dropdown, hit submit. Now, got a nice human-readable description of the issue.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T6027
Differential Revision: https://secure.phabricator.com/D15649
Summary: Bumps to 14px, fixes some on Differential
Test Plan: view various headers in Differential
Reviewers: epriestley
Reviewed By: epriestley
Subscribers: Korvin
Differential Revision: https://secure.phabricator.com/D15647
Summary:
Ref T10262. Currently, we always render a tag like this when you `{F123}` an image in remarkup:
```
<img src="/xform/preview/abcdef/" />
```
This either generates the preview or redirects to an existing preview. This is a good behavior in general, because the preview may take a while to generate and we don't want to wait for it to generate on the server side.
However, this flickers a lot in Safari. We might be able to cache this, but we really shouldn't, since the preview URI isn't a legitimately stable/permanent one.
Instead, do a (cheap) server-side check to see if the preview already exists. If it does, return a direct URI. This gives us a stable thumbnail in Safari.
Test Plan:
- Dragged a dog picture into comment box.
- Typed text.
- Thing didn't flicker like crazy all the time in Safari.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T10262
Differential Revision: https://secure.phabricator.com/D15646
Summary: Going to render these all normal case instead of all caps, and bump up the font size. Should be more consistent. Yellow if you green anything orange.
Test Plan: grep, lint
Reviewers: epriestley
Reviewed By: epriestley
Subscribers: Korvin
Differential Revision: https://secure.phabricator.com/D15645
Summary:
Ref T10262. This removes one-time tokens and makes file data responses always-cacheable (for 30 days).
The URI will stop working once any attached object changes its view policy, or the file view policy itself changes.
Files with `canCDN` (totally public data like profile images, CSS, JS, etc) use "cache-control: public" so they can be CDN'd.
Files without `canCDN` use "cache-control: private" so they won't be cached by the CDN. They could still be cached by a misbehaving local cache, but if you don't want your users seeing one anothers' secret files you should configure your local network properly.
Our "Cache-Control" headers were also from 1999 or something, update them to be more modern/sane. I can't find any evidence that any browser has done the wrong thing with this simpler ruleset in the last ~10 years.
Test Plan:
- Configured alternate file domain.
- Viewed site: stuff worked.
- Accessed a file on primary domain, got redirected to alternate domain.
- Verified proper cache headers for `canCDN` (public) and non-`canCDN` (private) files.
- Uploaded a file to a task, edited task policy, verified it scrambled the old URI.
- Reloaded task, new URI generated transparently.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T10262
Differential Revision: https://secure.phabricator.com/D15642
Summary:
Ref T10262. Files have an internal secret key which is partially used to control access to them, and determines part of the URL you need to access them. Scramble (regenerate) the secret when:
- the view policy for the file itself changes (and the new policy is not "public" or "all users"); or
- the view policy or space for an object the file is attached to changes (and the file policy is not "public" or "all users").
This basically means that when you change the visibility of a task, any old URLs for attached files stop working and new ones are implicitly generated.
Test Plan:
- Attached a file to a task, used `SELECT * FROM file WHERE id = ...` to inspect the secret.
- Set view policy to public, same secret.
- Set view policy to me, new secret.
- Changed task view policy, new secret.
- Changed task space, new secret.
- Changed task title, same old secret.
- Added and ran unit tests which cover this behavior.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T10262
Differential Revision: https://secure.phabricator.com/D15641
Summary:
This reverts commit 3f50ba90f1.
Fixes T10412. Everyone seems to hate this and I don't feel strongly about it. It's definitely a little weird.
Test Plan: Straight revert.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T10412
Differential Revision: https://secure.phabricator.com/D15644
Summary: Fixes T10737. I agree that this is a better behavior than always making them blue (boring).
Test Plan:
what
{F1208333}
wut
Reviewers: hach-que, chad
Reviewed By: chad
Maniphest Tasks: T10737
Differential Revision: https://secure.phabricator.com/D15643
Summary:
Ref T6027. Fixes T10734.
- If one of the projects a task is tagged with isn't visible to the user or doesn't have a board, it won't have columns.
- Don't show options for projects with disabled boards.
Test Plan:
- Viewed task with project with no columns; no fatal.
- Viewed task with project with disabled board; no options to move on that board.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T10734, T6027
Differential Revision: https://secure.phabricator.com/D15640
Summary:
Ref T6027. Normally, actions use the same order as the form, but in some cases (like moving stuff on workboards) it makes sense to reorder them explicitly.
Pin "Move on board" near the bottom, and "projects/subscribers" at the bottom. I think these are generally reasonable rules in all cases.
Test Plan: Opened menu, saw slightly better action order.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T6027
Differential Revision: https://secure.phabricator.com/D15639
Summary:
Ref T6027. Try this out and see how it feels? Clear issues:
- This definitely shouldn't be at the top.
- You should probably be able to select it multiple times?
- Some of the "which columns show up" rules might need adjustment?
- Diamond marker maybe not great?
Not sure I love this but it doesn't feel //terrible//...
Test Plan: {F1207891}
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T6027
Differential Revision: https://secure.phabricator.com/D15638
Summary: Ref T5214. Fixes T10486. Ref T6027. This exposes the `TYPE_COLUMNS` transaction in a usable way via API, and fixes the interactions via prefilling.
Test Plan:
- Created tasks directly into columns via API.
- Moved tasks between columns via API.
- Used `?column=...` to try to create a template task with valid and bogus column PHIDs.
Reviewers: chad
Reviewed By: chad
Subscribers: AmyLewis
Maniphest Tasks: T5214, T6027, T10486
Differential Revision: https://secure.phabricator.com/D15636
Summary: Ref T6027. This adds human-readable rendering for the new `TYPE_COLUMNS` core transactions.
Test Plan: {F1207784}
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T6027
Differential Revision: https://secure.phabricator.com/D15635
Summary:
Ref T6027. We currently have two different transaction types:
- `TYPE_PROJECT_COLUMNS` does most of the work, but has a sort of weird structure and isn't really suitable for API use.
- `TYPE_COLUMN` is this weird, junk transaction which mostly just creates the other transaction.
Merge them into a single higher-level `TYPE_COLUMNS` transaction which works properly and has a sensible structure and comprehensive error checking.
Remaining work here:
- I've removed the old rendering logic, but not yet added new logic. I need to migrate the old transaction types and add new rendering logic.
- Although the internal representation is now //suitable// for use in the API, it isn't properly exposed yet.
Test Plan:
- Created tasks into a column.
- Ran unit tests.
- Moved tasks between columns.
- Will perform additional testing in followups.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T6027
Differential Revision: https://secure.phabricator.com/D15634
Summary: Ref T10671
Test Plan: Open Conduit application, open `badges.edit` or `badges.search`, create, edit, or query for a badge.
Reviewers: epriestley, #blessed_reviewers
Reviewed By: epriestley, #blessed_reviewers
Subscribers: Korvin
Maniphest Tasks: T10671
Differential Revision: https://secure.phabricator.com/D15622
Summary: Modernize Files a bit, use newPage
Test Plan: New file, drag and drop file, view file, edit file
Reviewers: epriestley
Reviewed By: epriestley
Subscribers: Korvin
Differential Revision: https://secure.phabricator.com/D15631
Summary: Ref T7673. This is really just so I can force admin.phacility.com logout when you log out of an instance, but there are a few other things we could move here eventually, like the WILLREGISTERUSER event.
Test Plan: Logged out of an instance, got logged out of parent (see next change).
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T7673
Differential Revision: https://secure.phabricator.com/D15629
Summary:
Ref T7303. Currently, our handling of "scope" is fairly rigid and adheres to the spec, but some of these behaviors don't make much sense in practice.
Soften some behaviors and make them more flexible:
**Soft Failure on Unknown Permissions**: If a client asks for a permission we don't know about, just warn that we don't recognize it instead of fataling. In particular, I plan to make `offline_access` and `whoami` implicit. Older clients that request these permissions will still work fine as long as we don't hard-fatal.
**Move `user.whoami` to ALWAYS scope**: Make `whoami` a default permission. We've already done this, in effect; this just formalizes it.
**Tokens no longer expire**: Make `offline_access` (infinite-duration tokens) a default permission. I think the OAuth model doesn't map well to reality. It is common for other providers to issue "temporary" tokens with a duration of multiple years, and the refesh workflow is sort of silly. We can add a "temporary" scope later if we need temporary tokens.
This flow was potentially extra silly with the "log out of Phacility" use case, where we might need to have you log in again before we could log you out, which is bizarre and senseless. Avoid this nonsense.
**Move away from granular permissions**: Users currently get to pick-and-choose which permissions they grant, but this likely rarely/never works in practice and is fairly hostile since applications can't communicate which permissions they need. Applications which can actually operate with only some subset of permissions can make separate requests (e.g., when you activate "cool feature X", it asks for X permission). I think applications that do this are rare; pretty much everything just asks for tons of permissions and everyone grants them.
Making this all-or-nothing is better for well-behaved applications and better for users. It's also slightly better for overzealous applications that ask for more than they need, but whatever. Users can make an informed decision, hopefully, and I plan to let administrators force applications to a subset of permissions once we introduce meaningful scopes.
Test Plan:
- Generated tokens.
- Used tokens.
- Authorized an instance.
- Faked some bogus scopes, got clean authorization.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T7303
Differential Revision: https://secure.phabricator.com/D15621
Summary: Converts over to `newPage`
Test Plan: Pull up Settings panel, test a few.
Reviewers: epriestley
Reviewed By: epriestley
Subscribers: Korvin
Differential Revision: https://secure.phabricator.com/D15625
Summary: Ref T7303. This interaction is very oldschool; modernize it to enable/disable instead of "nuke from orbit".
Test Plan:
- Enabled applications.
- Disabled applications.
- Viewed applications in list view.
- Generated new tokens.
- Tried to use a token from a disabled application (got rebuffed).
- Tried to use a token from an enabled application (worked fine).
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T7303
Differential Revision: https://secure.phabricator.com/D15620
Summary: Runs through Releeph to move to new UI and `newPage`
Test Plan: Ran through product, release, branch, everything seems to work.
Reviewers: epriestley
Reviewed By: epriestley
Subscribers: Korvin
Differential Revision: https://secure.phabricator.com/D15623