Summary:
Fixes T11938.
Note that there's a subcase here: if you `hg clone` or `svn checkout` a short `/source/` URI that ends in `.git`, we miss the lookup and don't get this far, so you still get a generic error message.
Hopefully it is clear enough on its own that `proto://.../blah.git` is, in fact, a Git repository, since it says ".git" at the end.
If that doesn't prove to be true, we can be more surgical about this.
Test Plan:
```
$ git clone ssh://local@localvault.phacility.com/source/quack.notgit/
Cloning into 'quack.notgit'...
phabricator-ssh-exec: This repository ("quack.notgit") is not a Git repository. Use "hg" to interact with this repository.
fatal: Could not read from remote repository.
Please make sure you have the correct access rights
and the repository exists.
```
```
$ hg clone ssh://local@localvault.phacility.com/source/phabx
remote: phabricator-ssh-exec: This repository ("phabx") is not a Mercurial repository. Use "git" to interact with this repository.
abort: no suitable response from remote hg!
```
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T11938
Differential Revision: https://secure.phabricator.com/D16976
Summary: Ref T11801. In some cases, this could lead to us failing to generate the first recurrence in a series.
Test Plan: Imported `weekly.ics` (from D16974) and saw an event correctly occur on Aug 18, with my local timezone set to "America/Los_Angeles".
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T11801
Differential Revision: https://secure.phabricator.com/D16975
Summary:
Fixes T11936. After editing a repository URI, we were not correctly updating the URI index.
Any other edit to the repository //would// update the index, and this index is only really used by `arc` to figure out which repository a working copy belongs to, so that's how this evaded detection for this long. In particular, creating a repository would usually have an edit after any URI edits, to activate it, which would build the index correctly.
Test Plan:
- Added a new URI to a repository.
- Verified it was immediately reflected in the `repository_uriindex` table.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T11936
Differential Revision: https://secure.phabricator.com/D16972
Summary:
Ref T11922. When we deploy on Saturday I need to rebuild all the cluster indexes, but some instances won't have anything indexed so they won't actually trigger the activity.
Add a `--force` flag that just clears an activity even if the activity is not required.
Test Plan: Ran `bin/config done reindex --force` several times.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T11922
Differential Revision: https://secure.phabricator.com/D16970
Summary: Fixes T11791. We do this in durable column, but not in regular Conpherence. I think this is the right place? Not sure how this will feel with high lag.
Test Plan: Submit lots of text in a Conpherence.
Reviewers: epriestley
Reviewed By: epriestley
Subscribers: Korvin
Maniphest Tasks: T11791
Differential Revision: https://secure.phabricator.com/D16969
Summary:
Currently, custom Sites must match `.*` or similar to handle 404's, since the fallback is always generic.
This locks them out of the "redirect to canonicalize to `path/` code", so they currently have a choice between a custom 404 page or automatic correction of `/`.
Instead, allow the 404 controller to be constructed explicitly. Sites can now customize 404 by implementing this method and not matching everything.
(Sites can still match everything with a catchall rule if they don't want this behavior for some reason, so this should be strictly more powerful than the old behavior.)
See next diff for CORGI.
Test Plan:
- Visited real 404 (like "/asdfafewfq"), missing-slash-404 (like "/maniphest") and real page (like "/maniphest/") URIs on blog, main, and CORGI sites.
- Got 404 behavior, redirects, and real pages, respectively.
Reviewers: chad
Reviewed By: chad
Differential Revision: https://secure.phabricator.com/D16966
Summary:
Ref T11922. After updating to HEAD of `master`, you need to manually rebuild the index. We don't do this during `bin/storage upgrade` because it can take a very long time (`secure.phabricator.com` took roughly an hour) and can happen while Phabricator is running.
However, if we don't warn users about this they'll just get a broken index unless they go read the changelog (or file an issue, then we tell them to go read the changelog).
This adds a very simple table for notes to administrators so we can write a "you need to go rebuild the index" note, then adds one.
Administrators clear the note by completing the activity and running `bin/config done reindex`. This isn't automatic because there are various strategies you can use to approach the issue, which I'll discuss in greater detail in the linked documentation.
Also, fix an issue where `bin/storage upgrade --apply <patch>` could try to re-mark an already-applied patch as applied.
Test Plan:
- Ran storage ugrades.
- Got instructions to rebuild search index.
- Cleared instructions with `bin/config done reindex`.
Reviewers: chad
Reviewed By: chad
Subscribers: avivey
Maniphest Tasks: T11922
Differential Revision: https://secure.phabricator.com/D16965
Summary: This is still reasonably functional and useful to people, and we don't have better mechanics to offset the change.
Test Plan: New Workboard, set Workboard color, test mobile, desktop.
Reviewers: epriestley
Reviewed By: epriestley
Subscribers: Korvin
Differential Revision: https://secure.phabricator.com/D16964
Summary: Ref T3612. Mobilizes the new lightbox, changes large buttons to circle icons like Conpherence.
Test Plan: Click each new button on desktop, mobile.
Reviewers: epriestley
Reviewed By: epriestley
Subscribers: Korvin
Maniphest Tasks: T3612
Differential Revision: https://secure.phabricator.com/D16961
Summary:
Fixes T11929. When running with a query, we no longer enforce an order on the subquery join to produce results more quickly when searching for common strings.
However, this means that empty queries (like those issued by "Close as Duplicate") don't order subquery results.
Restore a `dateCreated` order if there is no query text.
Test Plan: Artificially set limit to 10, still saw 10 most recent tasks.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T11929
Differential Revision: https://secure.phabricator.com/D16960
Summary:
Via HackerOne. A researcher correctly reports that our install scripts use `HTTP`, not `HTTPS`, to fetch resources and execute them as `root`, which is a potentially significant vulnerability.
Instead, use `HTTPS`.
Test Plan: Verified that these URIs function correctly over `HTTPS`.
Reviewers: chad
Reviewed By: chad
Differential Revision: https://secure.phabricator.com/D16958
Summary: Ref T3612. Moves the listener to the frame of the image.
Test Plan: Click on image, no close, click on grey frame, closes image. Test image and document, clicking on arrows.
Reviewers: epriestley
Reviewed By: epriestley
Subscribers: Korvin
Maniphest Tasks: T3612
Differential Revision: https://secure.phabricator.com/D16959
Summary: Ref T3612. Passes in file size and file icon for non-images.
Test Plan: Review a PDF and PSD in a lightbox.
Reviewers: epriestley
Reviewed By: epriestley
Subscribers: Korvin
Maniphest Tasks: T3612
Differential Revision: https://secure.phabricator.com/D16957
Summary:
Fixes T11909. Ref T11816. Instead of offering a dropdown with choices between "Edit/Cancel/Reinstate This Event" and "Edit/Cancel/Reinstate Future Events", make the choice more explicit.
This dialog ends up pretty wordy but this edit is rare, so I think that's alright.
Test Plan: {F2046863}
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T11816, T11909
Differential Revision: https://secure.phabricator.com/D16956
Summary:
Ref T11816. Currently, if someone in California creates an event and then someone in New York edits it, we generate a no-op "<user> changed the start time from 3PM to 3PM." transaction.
This is because the internal timezone of the event is changing, but the actual absolute time is not.
Instead, when an edit wouldn't reschedule an event and would only change the internal timezone, ignore the edit.
Test Plan:
- Edited non-all-day events in PST / EST with out making changes (ignored).
- Edited non-all-day events in PST / EST with changes (changes worked).
- Performed the same edits with all-day events, which also were ignored and worked, respectively.
- Pulled events in and out of all-day mode in different timezones, behavior seemeed reasonable.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T11816
Differential Revision: https://secure.phabricator.com/D16955
Summary:
In D16936, I changed logged-out viewers so they use global settings.
This can lead to a `SELECT` from an isolated unit test. Instead, give the test fixtures and use standard `generateNewUser()` stuff.
Test Plan: Ran `arc unit --everything`.
Reviewers: chad
Reviewed By: chad
Differential Revision: https://secure.phabricator.com/D16952
Summary:
Ref T11816. I don't really know what happened here, maybe I rewrote and broke this at the last second?
In most cases, we directly respect the `isAllDay` flag on the event, so the internal date state doesn't matter too much.
However, in the case of mail notifications, the raw internal state is relevant. This should fix mail notifications for all-day events.
(I might still turn them off since I'm not sure they're too useful, but it's good to have them working.)
Test Plan:
- Created a new all-day event, verified database values wrote correctly.
- Ran `bin/calendar notify --trace`, verified it picked up an all-day event tomorrow with a large enough `--minutes` value.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T11816
Differential Revision: https://secure.phabricator.com/D16954
Summary: Spruce up the file embeds a little more, hover state, icons, file size.
Test Plan:
Add a psd and pdf, see new icons. Check differential, still see icons there too. Test mobile, desktop.
{F2042539}
Reviewers: epriestley
Reviewed By: epriestley
Subscribers: Korvin
Differential Revision: https://secure.phabricator.com/D16950
Summary: Found these in the `secure` error logs: one bad call, one bad column.
Test Plan: Searched for empty string. Double-checked method name.
Reviewers: chad
Reviewed By: chad
Differential Revision: https://secure.phabricator.com/D16948
Summary:
Ref T6740. When we index a document, also save a copy of the stemmed version.
When querying, search the combined corpus for the terms.
(We may need to tune this a bit later since it's possible for literal, quoted terms to match in the stemmed section, but I think this wil rarely cause issues in practice.)
A downside here is that search sort of breaks if you upgrade into this and don't reindex. I wasn't able to find a way to issue the query that remained compatible with older indexes and didn't have awful performance, so my plan is:
- Put this on `secure`.
- Rebuild the index.
- If things look good after a couple of days, add a way that we can tell people they need to rebuild the search index with a setup warning.
We might get some reports between now and then, but if this is super awful we should know by the end of the weekend.
Test Plan:
WOW AMAZING
{F2021466}
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T6740
Differential Revision: https://secure.phabricator.com/D16947
Summary: Ref T11741. I'll wait until the release cut to land this; it just adds a test for InnoDB FULLTEXT being available instead of always returning `false`.
Test Plan:
- Ran with InnoDB fulltext locally for a day and a half without issues.
- Ran `bin/storage upgrade`, saw it detect InnoDB fulltext.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T11741
Differential Revision: https://secure.phabricator.com/D16946
Summary:
Ref T6740. Currently, we issue fulltext queries with an "ORDER BY <score>" on the entire result set.
For very large result sets, this can require MySQL to do a lot of work. However, this work is generally useless: if you search for some common word like "diff" or "internet" or whatever and match 4,000 documents, the chance that we can score whatever thing you were thinking of at the top of the result set is nearly nothing. It's more useful to return quickly, and let the user see that they need to narrow their query to get useful results.
Instead of doing all that work, let MySQL find up to 1,000 results, then pick the best ones out of those.
This actual change is a little flimsy, since our index isn't really big enough to suffer indexing issues. However, searching for common terms on my local install (where I have some large repositories imported and indexed) drops from ~40ms to ~10ms.
My hope is to improve downstream performance for queries like "translatewiki" here, particularly:
<https://phabricator.wikimedia.org/T143863>
That query matches about 300 trillion documents but there's a ~0% chance that the one the user wants is at the top. It takes a couple of seconds to execute, for me. Better to return quickly and let the user refine their results.
I think this will also make some other changes related to stemming easier.
This also removes the "list users first" ordering on the query, which made performance more complicated and seems irrelevant now that we have the typeahead.
Test Plan:
- Searched for some common terms like "code" locally, saw similar results with better performance.
- Searched for useful queries (e.g., small result set), got identical results.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T6740
Differential Revision: https://secure.phabricator.com/D16944
Summary:
Ref T11741. This makes everything work if we switch to InnoDB, but never actually switches yet.
Since the default minimum word length (3) and stopword list (36 common English words) in InnoDB are generally pretty reasonable, I just didn't add any setup advice for them. I figure we're better off with simpler setup until we identify some real problem that the builtin stopwords create.
Test Plan: Swapped the `false` to `true`, ran `storage adjust`, got InnoDB fulltext indexes, searched for stuff, got default "AND" behavior.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T11741
Differential Revision: https://secure.phabricator.com/D16942
Summary:
Ref T11741. On recent-enough versions of MySQL, we would prefer to use InnoDB for fulltext indexes instead of MyISAM.
Allow `bin/storage adjust` to read actual and expected table engines, and apply adjustments as necessary.
We have one existing bad table that uses the wrong engine, `metamta_applicationemail`. This change corrects that table.
Test Plan:
- Ran `bin/storage upgrade`.
- Saw the adjustment phase apply this change properly:
```
>>>[463] <query> ALTER TABLE `local_metamta`.`metamta_applicationemail` COLLATE = 'utf8mb4_bin', ENGINE = 'InnoDB'
```
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T11741
Differential Revision: https://secure.phabricator.com/D16941
Summary:
Ref T11741. InnoDB uses a stopwords table instead of a stopwords file.
During `storage upgrade`, synchronize the table from the stopwords file on disk.
Test Plan:
- Ran `storage upgrade`.
- Ran `select * from stopwords`, saw stopwords.
- Added some garbage to the table.
- Ran `storage upgrade`, saw it remove it.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T11741
Differential Revision: https://secure.phabricator.com/D16940
Summary:
Ref T11741. Fixes T10642. Parse and compile user queries with a consistent ruleset, then submit queries to the backend using whatever ruleset MySQL is configured with.
This means that `ft_boolean_syntax` no longer needs to be configured (we'll just do the right thing in all cases).
This should improve behavior with RDS immediately (T10642), and allow us to improve behavior with InnoDB in the future (T11741).
Test Plan:
- Ran various queries in the UI, saw the expected results.
- Ran bad queries, got useful errors.
- Searched threads in Conpherence.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T10642, T11741
Differential Revision: https://secure.phabricator.com/D16939
Summary:
Fixes T11894. Currently, if you aren't attending any events for a while, we can cache that you are free for the next 72 hours, even if you have an event in a few hours.
Instead, only cache "user is free" until the next event, if one exists.
Test Plan: Dumped cache TTLs, saw 52 minutes instead of ~4300 minutes with a near-upcoming event.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T11894
Differential Revision: https://secure.phabricator.com/D16937
Summary: Fixes T11917. Give logged-out / omnipotent users the global settings, not the default settings.
Test Plan: Changed applications and language, logged out, saw changes as a public user.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T11917
Differential Revision: https://secure.phabricator.com/D16936
Summary:
Currently, when a payment method is invalid we still render the full name and let you save the form without making changes. This can be confusing.
Instead:
- Render "<Deleted Payment Method>", literally.
- Render an error immediately.
- Prevent the form from being saved without changing the method.
Test Plan: {F1955487}
Reviewers: chad
Reviewed By: chad
Differential Revision: https://secure.phabricator.com/D16935
Summary:
Ref T11044. This is still catching the older exceptions, which are now more general.
If you loaded the web UI without MySQL running, this meant you got a less-helpful error.
Test Plan: Stopped MySQL, loaded web UI, got a more-helpful error.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T11044
Differential Revision: https://secure.phabricator.com/D16930
Summary:
Ref T11044.
- Use shorter lock names. Fixes T11916.
- These granular exceptions now always raise as a more generic "Cluster" exception, even for a single host, because there's less special code around running just one database.
Test Plan:
- Configured bad `mysql.port`, ran `bin/storage upgrade`, got a more helpful error message.
- Ran `bin/storage upgrade --trace`, saw shorter lock names.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T11044, T11916
Differential Revision: https://secure.phabricator.com/D16924
Summary: Fixes T11845. Users can still embed a text panel on the home page to give it some ambiance.
Test Plan: Wrote an autoplay video as a comment, saw it in feed. Before change: autoplay. After change: no auto play. On task: still autoplay.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T11845
Differential Revision: https://secure.phabricator.com/D16920
Summary: Fixes T11910. I spent a couple of minutes looking for the root cause without much luck, but this will all be obsoleted by an eventual upgrade to `EditEngine` anyway.
Test Plan: Set and unset "Wait for Message", which now worked.
Reviewers: chad, avivey
Reviewed By: avivey
Subscribers: PHID-OPKG-gm6ozazyms6q6i22gyam
Maniphest Tasks: T11910
Differential Revision: https://secure.phabricator.com/D16919
Summary: Removes the viewable restriction on embedded files. Builds a basic lightbox UI for commenting.
Test Plan:
Add psd, pdf to Maniphest task, clicked on download, comment, left comment. Closed box.
{F1943726}
Reviewers: epriestley
Reviewed By: epriestley
Subscribers: Korvin
Maniphest Tasks: T3612
Differential Revision: https://secure.phabricator.com/D16917
Summary:
Ref T11044. Few issues here:
- The `PhutilProxyException` is missing an argument (hit this while in read-only mode).
- The `$ref_key` is unused.
- When you add a new master to an existing cluster, we can incorrectly apply `.php` patches which we should not reapply. Instead, mark them as already-applied.
Test Plan:
- Poked this locally, but will initialize `secure004` as an empty master to be sure.
Reviewers: chad, avivey
Reviewed By: avivey
Maniphest Tasks: T11044
Differential Revision: https://secure.phabricator.com/D16916
Summary: Ref T11044. Fixes T11672. In T11672, persistent connections seem to work fine, but they can require `max_connections` and other settings to be raised. Since most users don't need them, make them an advanced option.
Test Plan: Configured persistent connections, loaded some pages, observed persistent connections get used.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T11044, T11672
Differential Revision: https://secure.phabricator.com/D16913
Test Plan: `arc unit`, see test name in list.
Reviewers: chad, epriestley, #blessed_reviewers
Reviewed By: epriestley, #blessed_reviewers
Subscribers: Korvin
Differential Revision: https://secure.phabricator.com/D16915
Summary:
Ref T11044. Sometimes we have a sequence of patches like this:
- `01.newtable.sql`: Adds a new table to Files.
- `02.movedata.php`: Moves some data that used to live in Tokens to the new table.
This is fairly rare, but not unheard of. More commonly, we can have this sequence:
- `03.newtable.sql`: Add a new column to Phame.
- `04.setvalue.php`: Copy data into that new column.
In both cases, when applying database-by-database, we can get into trouble.
- In the first case, if Files is on a different master, we'll try to move data into a new table before creating the table.
- In the second case, if Phame is on a different master, the PHP patch will connect to it before we add the new column.
In either case, we try to interact with tables or columns which don't exist yet.
Instead, apply each patch in order, to all databases which need it. So we'll apply `01.newtable.sql` EVERYWHERE first, then move on.
In the case of PHP patches, we also now only apply them once, since they never make schema changes. It should normally be OK to apply them more than once safely, but this is a little faster and a little safer if we ever make a mistake.
Test Plan:
- Ran `bin/storage upgrade` on single-host and clustered setups.
- Initialized new storage on single-host and clustered setups.
- Upgraded again after initialization.
- Ran with `--apply`.
- Ran with `--dry-run`.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T11044
Differential Revision: https://secure.phabricator.com/D16912
Summary:
Ref T11044. This was old Facebook cruft for reading configuration from SMC (and maybe doing some other questionable things). See D183.
(See also D175 for discussion of this from 2011.)
In modern Phabricator, you can subclass `SiteConfig` to provide dynamic configuration, and we do so in the Phacility cluster. This lets you change any config, and change in response to requests (e.g., for instancing) and is generally more powerful than this mechanism was.
This configuration provider theoretically let you roll your own replication or partitioning, but in practice I believe no one ever did, and no one ever could have anyway without more support in the upstream (for migrations, read-after-write, etc).
Test Plan:
- Grepped for removed option.
- Browsed around with clustering off.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T11044
Differential Revision: https://secure.phabricator.com/D16911
Summary:
Ref T11044. One popular tool in a modern operations environment is Puppet. The primary purpose of this tool is to randomly revert hosts to older or different configurations.
Introducing an element of chaotic unpredictability into operations trains staff to be on high alert at all times, rather than lulled into complacency by predictability or consistency.
When Puppet reverts a Phabricator host's configuration to an older version, we might start writing data to a lot of crazy places where it shouldn't go. This will create a big sticky mess that is virtually impossible to undo, mostly because we'll get two files with ID 123 or two tasks with ID 456 or whatever else and good luck with that.
Instead, after changing the partition layout, require `bin/storage partition` to be run. This writes a copy of the config everywhere.
Then, when we start serving web requests, make sure every database has the exact same config. This will foil Puppet by refusing to run requests on hosts it has reverted.
Test Plan:
- Changed partition configuration.
- Ran Phabricator.
- FOILED!
- Ran `bin/storage partition` to sync config.
- Things worked again.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T11044
Differential Revision: https://secure.phabricator.com/D16910
Summary:
Ref T11044. Fixes T10931. This option has essentially never been useful for anything, and we've picked the best implementation for a long time (MySQLi if available, MySQL if not).
I am not aware of any reason to ever set this manually. If someone comes up with some bizarre but legitimate use case that I haven't thought of, we can modularize it.
Test Plan: Browsed around. Grepped for `mysql.implementation`.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T10931, T11044
Differential Revision: https://secure.phabricator.com/D16909
Summary: Adds a comment box, you can put text into it, hit enter, and see it come back.
Test Plan: Put text into box, see it come back.
Reviewers: epriestley
Reviewed By: epriestley
Subscribers: Korvin
Maniphest Tasks: T3612
Differential Revision: https://secure.phabricator.com/D16907
Summary:
I frequently run into a situation where I want to kill tasks that have accumulated a lot of failures regardless of what class they are. Or I'll want to kill every worker of a certain class but only if it has failed at least once. This change allows me to run `./bin/worker cancel --class <MYCLASS> --min-failure-count 5` to only kill tasks with at least 5 failed attempts.
The `--min-failure-count N` argument can be used by itself as well as with `--class CLASSNAME`. I don't think it makes sense for it to work with `--id ID`, but I'm not dead set on that or anything.
Test Plan: I ran the worker management workflow with and without the `--min-failure-count` argument and it worked as expected.
Reviewers: #blessed_reviewers, epriestley
Reviewed By: #blessed_reviewers, epriestley
Subscribers: Korvin, epriestley, yelirekim
Differential Revision: https://secure.phabricator.com/D16906
Summary:
Fixes T10759. Fixes T11817. This runs all the general sanity/configuration checks on all the active servers.
None of these warnings are very important, and this doesn't change any logical stuff.
Depends on D16904.
Test Plan: Painstakingly triggered each warning, verified that they rendered correctly and that messages told me which host was affected.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T10759, T11817
Differential Revision: https://secure.phabricator.com/D16905
Summary:
Ref T10759. Check master/replica status during startup.
After D16903, this also means that we check this status after a database comes back online after being unreachable.
If a master is replicating, fatal (since this can do a million kinds of bad things).
If a replica is not replicating, warn (this just means the replica is behind so some data is at risk).
Also: if your masters were actually configured properly (mine weren't until this change detected it), we would throw away patches as we applied them, so they would only apply to the //first// master. Instead, properly apply all migration patches to all masters.
Test Plan:
- Started Phabricator with a replicating master, got a fatal.
- Stopped replication on a replica, got a warning.
- With two non-replicating masters, upgraded storage.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T10759
Differential Revision: https://secure.phabricator.com/D16904
Summary:
Ref T10759. We may "discover" the presence of a fatal setup error later, after starting Phabricator.
This can happen in a few ways, but most are unlikely. The one I'm immediately concerned about is:
- Phabricator starts up during a disaster with some databases unreachable.
- We start with warnings (unreachable databases are generally not fatal, since it's OK for some subset of hosts to be down in replicated/partitioned setups).
- The unreachable databases later recover and become accessible again.
- When we run checks against them, we discover that they are misconfigured.
Currently, "fatal" setup issues are not truly fatal if we're "in flight" -- we've survived setup checks at least once in the past. This is bad in the scenario above.
Especially with partitioning, it could lead to mangled data in a disaster scenario where operations staff makes a small configuration mistake while trying to get things running again.
Instead, if we "discover" a fatal error while already "in flight", reset the whole setup process as though the webserver had just restarted. Don't serve requests again until we can make it through setup without hitting fatals.
Test Plan:
- Started Phabricator with multiple masters, one of which was down and broken.
- Got a warning about the bad master.
- Revived the master.
- Before: Phabricator detects the fatal, but keeps serving requests.
- After: Phabricator detects the fatal, resets the webserver, and stops serving requests until the fatal is resolved.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T10759
Differential Revision: https://secure.phabricator.com/D16903