1
0
Fork 0
mirror of https://we.phorge.it/source/phorge.git synced 2024-12-29 17:00:59 +01:00
Commit graph

413 commits

Author SHA1 Message Date
epriestley
ea182b6df9 When we failover to a replica, log the exception we hit
Summary: See PHI1180. Currently, when we failover to a replica, we may not log the failure. Failovers are serious business and bad news, so emit a log even if we are able to connect to the replica.

Test Plan:
Configured a bogus master and a good replica:

```
$ ./bin/mail list-outbound
[2019-03-29 16:26:09] PHLOG: 'Retrying (attempt 1) after connection failure ("AphrontConnectionQueryException", #2002): Attempt to connect to root@127.0.0.2 failed with error #2002: Operation timed out.' at [/Users/epriestley/dev/core/lib/libphutil/src/aphront/storage/connection/mysql/AphrontBaseMySQLDatabaseConnection.php:124]
[2019-03-29 16:26:19] PHLOG: 'Retrying (attempt 2) after connection failure ("AphrontConnectionQueryException", #2002): Attempt to connect to root@127.0.0.2 failed with error #2002: Operation timed out.' at [/Users/epriestley/dev/core/lib/libphutil/src/aphront/storage/connection/mysql/AphrontBaseMySQLDatabaseConnection.php:124]
[2019-03-29 16:26:29] EXCEPTION: (PhutilProxyException) Failed to connect to master database ("local_config"), failing over into read-only mode. {>} (AphrontConnectionQueryException) Attempt to connect to root@127.0.0.2 failed with error #2002: Operation timed out. at [<phutil>/src/aphront/storage/connection/mysql/AphrontBaseMySQLDatabaseConnection.php:362]
<...snip backtrace...>
3945   Voided email rP04f9e72cbd10: Don't subscribe bots implicitly when they act on objects, or when they are…
3946   Voided email rPdf53d72e794c: Allow "Move Tasks to Column..." to prompt for MFA
3947   Voided email rP492b03628f19: Fix a typo in Drydock "Land" operations
3948   Voided email rPb469a5134ddd: Allow "SMTP" and "Sendmail" mailers to have "Message-ID" behavior configured in…
3949   Voided email rPa6fd8f04792d: When performing complex edits, pause sub-editors before they publish to…
...
```

Configured a bogus master and a bogus replica:

```
$ ./bin/mail list-outbound
[2019-03-29 16:26:57] PHLOG: 'Retrying (attempt 1) after connection failure ("AphrontConnectionQueryException", #2002): Attempt to connect to root@127.0.0.2 failed with error #2002: Operation timed out.' at [/Users/epriestley/dev/core/lib/libphutil/src/aphront/storage/connection/mysql/AphrontBaseMySQLDatabaseConnection.php:124]
[2019-03-29 16:27:07] PHLOG: 'Retrying (attempt 2) after connection failure ("AphrontConnectionQueryException", #2002): Attempt to connect to root@127.0.0.2 failed with error #2002: Operation timed out.' at [/Users/epriestley/dev/core/lib/libphutil/src/aphront/storage/connection/mysql/AphrontBaseMySQLDatabaseConnection.php:124]
[2019-03-29 16:27:27] PHLOG: 'Retrying (attempt 1) after connection failure ("AphrontConnectionQueryException", #2002): Attempt to connect to root@127.0.0.3 failed with error #2002: Operation timed out.' at [/Users/epriestley/dev/core/lib/libphutil/src/aphront/storage/connection/mysql/AphrontBaseMySQLDatabaseConnection.php:124]
[2019-03-29 16:27:37] PHLOG: 'Retrying (attempt 2) after connection failure ("AphrontConnectionQueryException", #2002): Attempt to connect to root@127.0.0.3 failed with error #2002: Operation timed out.' at [/Users/epriestley/dev/core/lib/libphutil/src/aphront/storage/connection/mysql/AphrontBaseMySQLDatabaseConnection.php:124]
[2019-03-29 16:27:47] EXCEPTION: (PhabricatorClusterStrandedException) Unable to establish a connection to any database host (while trying "local_config"). All masters and replicas are completely unreachable.

AphrontConnectionQueryException: Attempt to connect to root@127.0.0.2 failed with error #2002: Operation timed out. at [<phabricator>/src/infrastructure/storage/lisk/PhabricatorLiskDAO.php:177]
<...snip backtrace...>
```

Reviewers: amckinley

Reviewed By: amckinley

Differential Revision: https://secure.phabricator.com/D20351
2019-04-01 14:52:45 -07:00
epriestley
18b444e427 When queries overheat, raise an exception
Summary:
Ref T13259. Currently, queries set a flag and return a partial result set when they overheat. This is mostly okay:

  - It's very unusual for queries to overheat if they don't have a real viewer.
  - Overheating is rare in general.
  - In most cases where queries can overheat, the context is a SearchEngine UI, which handles this properly.

In T13259, we hit a case where a query with an omnipotent viewer can overheat: if you have more than 1,000 consecutive commits in the database with invalid `repositoryID` values, we'll overheat and bail out. This is pretty bad, since we don't process everything.

Change this beahvior:

  - Throw by default, so this stuff doesn't slip through the cracks.
  - Handle the SearchEngine case explicitly ("it's okay to overheat, we'll handle it").
  - Make `QueryIterator` disable overheating behavior: if we're iterating over all objects, we want to hit the whole table even if most of it is garbage.

There are some cases where this might cause new exception behavior that we don't necessarily want. For example, in Owners, each package shows "recent commits in this package". If you can't see the first 1,000 recent commits, you'd previously get a slow page with no results. Now you'll probably get an exception.

If these crop up, I think the best approach for now is to deal with them on a case-by-case basis and see how far we get. In the "Owners" case, it might be good to query by repositories you can see first, then query by commits in the package in those repositories. That should give us a better outcome than any generic behavior we could implement.

Test Plan:
  - Added 100000 to all repositoryID values for commits on my local install.
  - Before making changes, ran `bin/repository rebuild-identities --all --trace`. Saw the script process 1,000 rows and exit silently.
  - Applied the first part ("throw by default") and ran `bin/repository rebuild-identities`. Saw the script process 1,000 rows, then raise an exception.
  - Applied the second part ("disable for queryiterator") and ran the script again. Saw the script process all 15,000 rows without issues (although none are valid and none actually load).
  - Viewed Diffusion, saw appropriate NUX / "overheated" UIs.

Reviewers: amckinley

Reviewed By: amckinley

Maniphest Tasks: T13259

Differential Revision: https://secure.phabricator.com/D20294
2019-03-19 13:02:59 -07:00
epriestley
889eca1af9 Allow a DAO object storage namespace to be forced to a particular value
Summary:
Ref T6703. When we import external data from a third-party install to a Phacility instance, we must link instance accounts to central accounts: either existing central accounts, or newly created central accounts that we send invites for.

During this import, or when users register and claim those new accounts, we do a write from `admin.phacility.com` directly into the instance database to link the accounts.

This is pretty sketchy, and should almost certainly just be an internal API instead, particularly now that it's relatively stable.

However, it's what we use for now. The process has had some issues since the introduction of `%R` (combined database name and table refrence in queries), and now needs to be updated for the new `providerConfigPHID` column in `ExternalAccount`.

The problem is that `%R` isn't doing the right thing. We have code like this:

```
$conn = new_connection_to_instance('turtle');
queryf($conn, 'INSERT INTO %R ...', $table);
```

However, the `$table` resolves `%R` using the currently-executing-process information, not anything specific to `$conn`, so it prints `admin_user.user_externalaccount` (the name of the table on `admin.phacility.com`, where the code is running).

We want it to print `turtle_user.user_externalaccount` instead: the name of the table on `turtle.phacility.com`, where we're actually writing.

To force this to happen, let callers override the namespace part of the database name.

Long term: I'd plan to rip this out and replace it with an API call. This "connect directly to the database" stuff is nice for iterating on (only `admin` needs hotfixes) but very very sketchy for maintaining.

Test Plan: See next diff.

Reviewers: amckinley

Reviewed By: amckinley

Maniphest Tasks: T6703

Differential Revision: https://secure.phabricator.com/D20167
2019-02-14 12:02:08 -08:00
epriestley
a46c25d2ba Make two ancient migrations fatal if they affect data
Summary:
Depends on D20106. Ref T6703. Since I plan to change the `ExternalAccount` table, these migrations (which rely on `save()`) will stop working.

They could be rewritten to use raw queries, but I suspect few or no installs are affected. At least for now, just make them safe: if they would affect data, fatal and tell the user to perform a more gradual upgrade.

Also remove an `ALTER IGNORE TABLE` (this syntax was removed at some point) and fix a `%Q` when adjusting certain types of primary keys.

Test Plan: Ran `bin/storage upgrade --no-quickstart --force --namespace test1234` to get a complete migration since the beginning of time.

Reviewers: amckinley

Reviewed By: amckinley

Subscribers: PHID-OPKG-gm6ozazyms6q6i22gyam

Maniphest Tasks: T6703

Differential Revision: https://secure.phabricator.com/D20107
2019-02-06 17:08:34 -08:00
epriestley
9aa5a52fbd Completely remove "LiskDAOSet" and "loadRelatives/loadOneRelative"
Summary: Fixes T13218. We have no more callers to any of this and can get rid of it forever.

Test Plan: Grepped for all four API methods, `LiskDAOSet`, and `inSet`.

Reviewers: amckinley

Reviewed By: amckinley

Maniphest Tasks: T13218

Differential Revision: https://secure.phabricator.com/D19879
2018-12-12 16:41:51 -08:00
epriestley
02933acbd5 Remove all application callers to "putInSet()"
Summary: Ref T13218. This is the last public-facing API call for `loadRelatives/loadOneRelative`. This just "primed" objects to make the other calls work and had no direct effects.

Test Plan:
- Ran `bin/fact analyze`.
- Used `bin/storage upgrade -f --apply` to apply `20181031.board.01.queryreset.php`, which uses `LiskMigrationIterator`.
- Browsed user list.

Reviewers: amckinley

Reviewed By: amckinley

Subscribers: yelirekim

Maniphest Tasks: T13218

Differential Revision: https://secure.phabricator.com/D19878
2018-12-12 16:41:12 -08:00
epriestley
43cf4edfb1 When waiting for long-running Harbormaster futures to resolve, close idle database connections
Summary:
Ref T13216. See PHI916. Harbormaster builds may be long-running, particularly if they effectively wrap `ssh ... ./run-huge-build.sh`. If we spend more than a few seconds waiting for futures to resolve, close idle database connections.

The general goal here is to reduce the held connection load for installs with a very large number of test runners.

Test Plan: Added debugging code to `phlog()` closures, saw connections closed while running builds.

Reviewers: amckinley

Reviewed By: amckinley

Subscribers: PHID-OPKG-gm6ozazyms6q6i22gyam

Maniphest Tasks: T13216

Differential Revision: https://secure.phabricator.com/D19824
2018-11-21 07:53:40 -08:00
epriestley
86fd204148 Fix all query warnings in "arc unit --everything"
Summary:
Ref T13216. Ref T13217. Depends on D19800. This fixes all of the remaining query warnings that pop up when you run "arc unit --everything".

There's likely still quite a bit of stuff lurking around, but hopefully this covers a big set of the most common queries.

Test Plan: Ran `arc unit --everything`. Before change: lots of query warnings. After change: no query warnings.

Reviewers: amckinley

Reviewed By: amckinley

Maniphest Tasks: T13217, T13216

Differential Revision: https://secure.phabricator.com/D19801
2018-11-15 03:51:25 -08:00
epriestley
2f10d4adeb Continue making application fixes to Phabricator for changes to %Q semantics
Summary: Depends on D19789. Ref T13217. Continue updating things to use the new %Q-flavored conversions instead of smushing a bunch of strings together.

Test Plan: Browsed around, far fewer errors. These changes are largely mechanical in nature.

Reviewers: amckinley

Reviewed By: amckinley

Maniphest Tasks: T13217

Differential Revision: https://secure.phabricator.com/D19790
2018-11-15 03:50:02 -08:00
epriestley
98690ee326 Update many Phabricator queries for new %Q query semantics
Summary: Depends on D19785. Ref T13217. This converts many of the most common clause construction pathways to the new %Q / %LQ / %LO / %LA / %LJ semantics.

Test Plan: Browsed around a bunch, saw fewer warnings and no obvious behavioral errors. The transformations here are generally mechanical (although I did them by hand).

Reviewers: amckinley

Reviewed By: amckinley

Subscribers: hach-que

Maniphest Tasks: T13217

Differential Revision: https://secure.phabricator.com/D19789
2018-11-15 03:48:10 -08:00
epriestley
da40f80741 Update PhabricatorLiskDAO::chunkSQL() for new %Q semantics
Summary:
Ref T13217. This method is slightly tricky:

  - We can't safely return a string: return an array instead.
  - It no longer makes sense to accept glue. All callers use `', '` as glue anyway, so hard-code that.

Then convert all callsites.

Test Plan: Browsed around, saw fewer "unsafe" errors in error log.

Reviewers: amckinley

Reviewed By: amckinley

Subscribers: yelirekim, PHID-OPKG-gm6ozazyms6q6i22gyam

Maniphest Tasks: T13217

Differential Revision: https://secure.phabricator.com/D19784
2018-11-13 08:59:18 -08:00
epriestley
798a391e5a Add test coverage for "%R" in qsprintf() and convert LiskDAO to support it
Summary:
Ref T13210. Ref T11908. Add some basic test coverage for the new "%R" introduced in D19764, then convert LiskDAO to implement the "Database + Table Ref" interface.

To move forward, we need to convert all of these (where `%T` is not a table alias):

```counterexample
qsprintf($conn, '... %T ...', $thing->getTableName());
```

...to these:

```
qsprintf($conn, '... %R ...', $thing);
```

The new code is a little simpler (no `->getTableName()` call) which is sort of nice. But we also have a //lot// of `%T` so this is probably going to take a while.

(I'll hold this until after the release cut.)

Test Plan:
  - Ran unit tests.
  - Browsed around and edited some objects without issues. This change causes a reasonably large percentage of our total queries to use the new code since the LiskDAO builtin queries are some of the most commonly-constructed queries, although there are still ~700 callsites which need to be examined for possible conversion.

Reviewers: amckinley

Reviewed By: amckinley

Maniphest Tasks: T13210, T11908

Differential Revision: https://secure.phabricator.com/D19765
2018-11-05 10:59:50 -08:00
epriestley
201f29fbf4 Fix truncation in "bin/storage probe" of tables larger than 100GB
Summary:
Ref T13164. PHI805 incidentally includes some `bin/storage probe` output for 100GB+ tables which renders wrong.

We have the tools to render it properly, so stop doing this manually and let ConsoleTable figure out the alignment.

Test Plan:
Faked very large table sizes, ran `bin/storage probe`:

{F5785946}

(Then, un-faked the very large table sizes and ran it again, got sensible output.)

Reviewers: amckinley

Reviewed By: amckinley

Maniphest Tasks: T13164

Differential Revision: https://secure.phabricator.com/D19567
2018-08-08 09:50:59 -07:00
Austin McKinley
ee7879e626 Change bin/storage destroy to be less scary when removing test data
Summary: I've pulled up this code probably three different times to make sure that the big scary warning does, in fact, still get printed even when passing `--unitest-fixtures` to `bin/storage destroy`. Make the warning message less scary if only removing test data.

Test Plan: Ran with and without `--unitest-fixtures` and saw expected warnings. After agreeing to warnings, test data was deleted as expected. Did not test `bin/storage destroy` without `--unittest-fixtures`.

Reviewers: epriestley

Reviewed By: epriestley

Subscribers: Korvin

Differential Revision: https://secure.phabricator.com/D19535
2018-07-25 12:14:41 -07:00
epriestley
13dd3014a7 Fix loop in QueryIterator when row count is an exact multiple of page size
Summary: Ref T13152. The pager does a bit of magic here and doesn't populate `nextPageID` when it knows it got an exact final page. The logic misfired in this case and sent us back to the start.

Test Plan:
  - Set page size to 1 to guarantee rows were an exact multiple of page size.
  - Ran `rebuild-identities` (I no-op'd the actual logic to make it faster).
  - Before: looped forever.
  - After: clean exit after processing everything.

Reviewers: amckinley

Reviewed By: amckinley

Maniphest Tasks: T13152

Differential Revision: https://secure.phabricator.com/D19479
2018-06-07 13:17:34 -07:00
epriestley
3c5668b4a5 When database connection exceptions occur, raise them to the setup layer
Summary:
Ref T13141. Currently, during first-time setup we don't surface all the details about connection exceptions that we could: the underlying exception is discarded inside cluster connection management.

This isn't a huge issue since the reason for connection problems is usually fairly obvious, but in at least one case (see T13141) we hit a less-than-obvious exception.

Instead, store the original exception and propagate the message up the stack so users have more information about the problem.

Test Plan:
  - Configured an intentionally bad MySQL username.
  - Restarted Apache and loaded Phabricator.
  - Got a more helpful exception with a specific authentication error message.

{F5622361}

Reviewers: amckinley

Reviewed By: amckinley

Maniphest Tasks: T13141

Differential Revision: https://secure.phabricator.com/D19454
2018-05-18 09:02:36 -07:00
epriestley
7aa12f192a Add PhabricatorQueryIterator, for buffered iteration over a CursorPagedPolicyAwareQuery
Summary:
See D19446. This should make it easier to process larger, more complex result sets in constant memory.

Today, `LiskMigrationIterator` takes constant memory but can't apply `needX()` reqeusts or `withY(...)` constraints.

Using a raw `Query` can handle this stuff, but requires memory proportional to the size of the result set.

Offer the best of both worlds: constant memory and full access to the power of `Query` classes.

Test Plan:
Used this script to iterate over every commit, saw sensible behavior:

```name=list-commits.php
<?php

require_once 'scripts/init/init-script.php';

$viewer = PhabricatorUser::getOmnipotentUser();

$query = id(new DiffusionCommitQuery())
  ->setViewer($viewer);

$iterator = new PhabricatorQueryIterator($query);
foreach ($iterator as $commit) {
  echo $commit->getID()."\n";
}
```

Reviewers: amckinley

Reviewed By: amckinley

Differential Revision: https://secure.phabricator.com/D19450
2018-05-14 12:13:51 -07:00
epriestley
44f0664d2c Add a "lock log" for debugging where locks are being held
Summary: Depends on D19173. Ref T13096. Adds an optional, disabled-by-default lock log to make it easier to figure out what is acquiring and holding locks.

Test Plan: Ran `bin/lock log --enable`, `--disable`, `--name`, etc. Saw sensible-looking output with log enabled and daemons restarted. Saw no additional output with log disabled and daemons restarted.

Maniphest Tasks: T13096

Differential Revision: https://secure.phabricator.com/D19174
2018-03-05 17:55:34 -08:00
epriestley
8dccf05c4c Manually set "max_allowed_packet" to 1GB for "mysqldump"
Summary:
We have one production instance with failing database backups since they recently uploaded a 52MB hunk. The production configuration specifies a 64MB "max_allowed_packet" in `[mysqld]`, but this doesn't apply to `mysqldump` (we'd need to specify it in a separate `[mysqldump]` section) and `mysqldump` runs with an effective limit of the default (16MB).

We could change our production config to specify a value in `[mysqldump]`, but just change it unconditionally at execution time since there's no reason for any user to ever want this command to fail because they have too much data.

Test Plan: Dumped locally, will verify production backup goes through cleanly.

Reviewers: amckinley

Reviewed By: amckinley

Differential Revision: https://secure.phabricator.com/D18834
2017-12-20 10:29:02 -08:00
epriestley
c5e8de9450 Make bin/storage dump insert CREATE DATABASE and USE statements
Summary:
Ref T13000. The new approach for dumping database-by-database means that we don't get CREATE DATABASE or USE statements, which makes importing the dump again inconvenient.

Manually stitch these into the dump.

Test Plan:
  - Used `bin/storage dump --namespace ...` to dump a smaller local instance.
  - Used `bin/storage destroy --namespace ...`, to destroy the namespace, then inported the dump cleanly.
  - Verified that each CREATE DATABASE statement appears only once.
  - Verified that `bin/storage renamespace --live` can correctly process this file.

Reviewers: amckinley

Reviewed By: amckinley

Maniphest Tasks: T13000

Differential Revision: https://secure.phabricator.com/D18707
2017-10-13 14:35:18 -07:00
Dmitri Iouchtchenko
9bd6a37055 Fix spelling
Summary: Noticed a couple of typos in the docs, and then things got out of hand.

Test Plan:
  - Stared at the words until my eyes watered and the letters began to swim on the screen.
  - Consulted a dictionary.

Reviewers: #blessed_reviewers, epriestley

Reviewed By: #blessed_reviewers, epriestley

Subscribers: epriestley, yelirekim, PHID-OPKG-gm6ozazyms6q6i22gyam

Differential Revision: https://secure.phabricator.com/D18693
2017-10-09 10:48:04 -07:00
epriestley
4fd9d2d4bb Fix "bin/storage dump" with no "--output"
Ref T13004. (I distinctly remember testing this, but must have tweaked things afterward.)
2017-10-07 13:23:18 -07:00
epriestley
1ee7b3ab8c Correct "bin/storage dump" command construction with passwords
Fixes T13004. This should mirror the other branch.
2017-10-07 04:59:29 -07:00
epriestley
c767c971ca Add "persistence" types (data, cache, or index) to tables, and tweak what "storage dump" dumps
Summary:
Ref T13000. This marks each table as either "data" (normal data), "cache" (automatically rebuilt, no need to ever dump) or "index" (can be manually rebuilt).

By default, `bin/storage dump` dumps data and index tables, but not cache tables.

With `--no-indexes`, it dumps only data tables. Indexes can be rebuilt after a restore with `bin/search index --all ...`.

Test Plan:
  - Ran `--no-indexes` and normal dumps with `--trace`, verified that cache and index (former case) or cache only (latter case) tables were dumped with `--no-data`.
  - Verified dump has the same number of `CREATE TABLE` statements as before the changes.
  - Reviewed persistence tags in the web UI (note Ferret engine tables are "Index"):

{F5210886}

Reviewers: amckinley

Reviewed By: amckinley

Maniphest Tasks: T13000

Differential Revision: https://secure.phabricator.com/D18682
2017-10-04 12:09:33 -07:00
epriestley
02e1440ef2 Dump tables one at a time, rather than all at once
Summary:
Ref T13000. This allows us to be more selective about which tables we dump data for, to reduce the size of backups and exports. The immediate goal is to make large `ngrams` tables more manageable in the cluster, but this generally makes all backups and exports faster and easier.

Here, tables are dumped one at a time. A followup change will sometimes add the `--no-data` flag, to skip dumping readthrough caches and (optionally) rebuildable indexes.

Test Plan: Compared a dump from `master` and from this branch, found them to be essentially identical. The new dump has a little more header information in each section. Verified each contains the same number of `CREATE TABLE` statements.

Reviewers: amckinley

Reviewed By: amckinley

Maniphest Tasks: T13000

Differential Revision: https://secure.phabricator.com/D18679
2017-10-04 12:08:52 -07:00
epriestley
4cae4a3b76 Correct bin/storage analyze internal API for cluster environments
Summary:
Ref T12819. This worked right in a non-cluster environment, but `bin/storage upgrade` iterates over each master in a partitioned cluster environment.

Tweak the API so `bin/storage analyze` targets a single host but `bin/storage upgrade` can hit all the masters.

Test Plan: Will run `bin/storage upgrade` in production again. Ran `upgrade` and `analyze` locally, still work fine.

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T12819

Differential Revision: https://secure.phabricator.com/D18576
2017-09-07 16:35:26 -07:00
epriestley
8e9f049626 Provide "bin/storage analyze" and make "bin/storage upgrade" run analysis automatically
Summary:
Ref T12819. Normallly "ANALYZE TABLE" is like sprinkling magic pixie dust on the database and hoping it will make "good vibes" that cause it to go faster, but in at least some concrete cases with the ngrams tables there really was a key cardinality issue which ANALYZE TABLE corrected, fixing bogus query plans.

Add `bin/storage analyze` to analyze all tables, and make `bin/storage upgrade` run it after adjustment if `--no-adjust` is not specified, and make `bin/storage adjust` run it always.

This runs in a couple seconds and should never hurt anything, so it should be fine to sprinkle lots of pixie dust into the `bin/storage` workflow.

Test Plan: Ran `bin/storage analyze`. Ran `bin/storage upgrade`, saw analyze run. Totally felt great vibes and really aligned chakras on the database.

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T12819

Differential Revision: https://secure.phabricator.com/D18573
2017-09-07 14:44:34 -07:00
epriestley
bcd87e0e3f Don't apply patches or mark patches applied with bin/storage upgrade --dryrun
Summary: Fixes T12682.

Test Plan: Ran `bin/storage upgrade --dryrun` repeatedly with un-applied patches, saw it not apply them and not mark them applied.

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T12682

Differential Revision: https://secure.phabricator.com/D17837
2017-05-05 19:57:18 -07:00
epriestley
85ff1d5c2d Reduce the impact of bin/storage dump
Summary:
Ref T12646.

  - Use "wb1" instead of "wb" to use level 1 gzip compression (faster, less compressy). Locally, this went about 2x faster and the output only grew 4% larger.
  - LinesOfALargeExecFuture does a lot of unnecessary string operations, and can boil down to a busy wait. The process is pretty saturated by I/O so this isn't the end of the world, but just use raw ExecFuture with FutureIterator so that we wait in `select()`.
  - Also, nice the process to +19 so we try to give other things CPU.

Test Plan:
  - Ran `bin/storage dump --compress --output ...`.
  - Saw CPU time for my local database drop from ~240s to ~90s, with a 4% larger output. Most of this was adding the `1`, but the ExecFuture thing helped a little, too.
  - I'm not sure what a great way to test `nice` in a local environment is and it's system dependent anyway, but nothing got worse / blew up.
  - Used `gzcat | head` and `gzcat | tail` on the result to sanity-check that everything was preserved.

Reviewers: chad, amckinley

Reviewed By: chad

Maniphest Tasks: T12646

Differential Revision: https://secure.phabricator.com/D17795
2017-04-26 12:08:59 -07:00
Austin McKinley
febd68039f Add initial infrastructure for adding ModularTransaction support to Application config changes
Summary: Part of the groundwork for T11476.

Test Plan: ran `./bin/storage upgrade` and observed expected DB tables

Reviewers: epriestley

Reviewed By: epriestley

Subscribers: Korvin

Maniphest Tasks: T11476

Differential Revision: https://secure.phabricator.com/D17736
2017-04-19 15:44:57 -07:00
epriestley
d1421bc3a1 Add "bin/storage optimize" to run OPTIMIZE TABLE on everything
Summary:
Even with `innodb_file_per_table` enabled, individual table files on disk don't normally shrink.

For most tables, like `maniphest_task`, this is fine, since the data in the table normally never shrinks, or only shinks a tiny amount.

However, some tables (like the "worker" and "daemon" tables) grow very large during a huge import but most of the data is later deleted by garbage collection. In these cases, this lost space can be reclaimed by running `OPTIMIZE TABLE` on the tables.

Add a script to `OPTIMIZE TABLE` every table.

My primary goal here is just to reduce storage pressure on `db001` since there are a couple of "import the linux kernel" installs on that host wasting a bunch of space. We're not in any trouble, but this should buy us a good chunk of headroom.

Test Plan: Ran `bin/storage optimize` locally and manually ran `OPTIMIZE TABLE` in production, saw tables get optimized.

Reviewers: chad

Reviewed By: chad

Subscribers: cspeckmim

Differential Revision: https://secure.phabricator.com/D17640
2017-04-08 15:15:49 -07:00
Jakub Vrana
9f3cde4db7 Fix errors found by PHPStan
Test Plan: None.

Reviewers: #blessed_reviewers, epriestley

Reviewed By: #blessed_reviewers, epriestley

Subscribers: epriestley

Differential Revision: https://secure.phabricator.com/D17377
2017-02-18 09:24:56 +00:00
Jakub Vrana
a778151f28 Fix errors found by PHPStan
Test Plan: Ran `phpstan analyze -a autoload.php phabricator/src`.

Reviewers: #blessed_reviewers, epriestley

Reviewed By: #blessed_reviewers, epriestley

Subscribers: Korvin, hach-que

Differential Revision: https://secure.phabricator.com/D17371
2017-02-17 10:10:15 +00:00
epriestley
237f94b830 Fix flaky subscribers policy rule unit test
Summary:
I'm about 90% sure this fixes the intermittent test failure on `testObjectSubscribersPolicyRule()` or whatever.

We use `spl_object_hash()` to identify objects when passing hints about policy changes to policy rules. This is hacky, and I think it's the source of the unit test issue.

Specifically, `spl_object_hash()` is approximately just returning the memory address of the object, and two objects can occasionally use the same memory address (one gets garbage collected; another uses the same memory).

If I replace `spl_object_hash()` with a static value like "zebra", the test failure reproduces.

Instead, sneak an object ID onto a runtime property. This is at least as hacky but shouldn't suffer from the same intermittent failure.

Test Plan: Ran `arc unit --everything`, but I never got a reliable repro of the issue in the first place, so who knows.

Reviewers: chad

Reviewed By: chad

Differential Revision: https://secure.phabricator.com/D17029
2016-12-11 12:27:57 -08:00
epriestley
29a3cd5121 Add "Manual Activities", to tell administrators to rebuild the search index
Summary:
Ref T11922. After updating to HEAD of `master`, you need to manually rebuild the index. We don't do this during `bin/storage upgrade` because it can take a very long time (`secure.phabricator.com` took roughly an hour) and can happen while Phabricator is running.

However, if we don't warn users about this they'll just get a broken index unless they go read the changelog (or file an issue, then we tell them to go read the changelog).

This adds a very simple table for notes to administrators so we can write a "you need to go rebuild the index" note, then adds one.

Administrators clear the note by completing the activity and running `bin/config done reindex`. This isn't automatic because there are various strategies you can use to approach the issue, which I'll discuss in greater detail in the linked documentation.

Also, fix an issue where `bin/storage upgrade --apply <patch>` could try to re-mark an already-applied patch as applied.

Test Plan:
  - Ran storage ugrades.
  - Got instructions to rebuild search index.
  - Cleared instructions with `bin/config done reindex`.

Reviewers: chad

Reviewed By: chad

Subscribers: avivey

Maniphest Tasks: T11922

Differential Revision: https://secure.phabricator.com/D16965
2016-11-30 11:23:54 -08:00
epriestley
9d0752063e Allow bin/storage adjust to adjust table engines
Summary:
Ref T11741. On recent-enough versions of MySQL, we would prefer to use InnoDB for fulltext indexes instead of MyISAM.

Allow `bin/storage adjust` to read actual and expected table engines, and apply adjustments as necessary.

We have one existing bad table that uses the wrong engine, `metamta_applicationemail`. This change corrects that table.

Test Plan:
  - Ran `bin/storage upgrade`.
  - Saw the adjustment phase apply this change properly:

```
>>>[463] <query> ALTER TABLE `local_metamta`.`metamta_applicationemail` COLLATE = 'utf8mb4_bin', ENGINE = 'InnoDB'
```

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T11741

Differential Revision: https://secure.phabricator.com/D16941
2016-11-25 15:13:40 -08:00
epriestley
ff3333548f Create and populate a stopwords table for InnoDB fulltext indexes to use in the future
Summary:
Ref T11741. InnoDB uses a stopwords table instead of a stopwords file.

During `storage upgrade`, synchronize the table from the stopwords file on disk.

Test Plan:
  - Ran `storage upgrade`.
  - Ran `select * from stopwords`, saw stopwords.
  - Added some garbage to the table.
  - Ran `storage upgrade`, saw it remove it.

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T11741

Differential Revision: https://secure.phabricator.com/D16940
2016-11-25 15:13:08 -08:00
epriestley
ce18a8e208 Fix two setup issues arising from partitioning support
Summary:
Ref T11044.

  - Use shorter lock names. Fixes T11916.
  - These granular exceptions now always raise as a more generic "Cluster" exception, even for a single host, because there's less special code around running just one database.

Test Plan:
  - Configured bad `mysql.port`, ran `bin/storage upgrade`, got a more helpful error message.
  - Ran `bin/storage upgrade --trace`, saw shorter lock names.

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T11044, T11916

Differential Revision: https://secure.phabricator.com/D16924
2016-11-23 07:20:29 -08:00
epriestley
0ed767b967 Fix a couple of partition migration bugs
Summary:
Ref T11044. Few issues here:

  - The `PhutilProxyException` is missing an argument (hit this while in read-only mode).
  - The `$ref_key` is unused.
  - When you add a new master to an existing cluster, we can incorrectly apply `.php` patches which we should not reapply. Instead, mark them as already-applied.

Test Plan:
  - Poked this locally, but will initialize `secure004` as an empty master to be sure.

Reviewers: chad, avivey

Reviewed By: avivey

Maniphest Tasks: T11044

Differential Revision: https://secure.phabricator.com/D16916
2016-11-22 10:57:24 -08:00
epriestley
8c89fc38fc Allow persistent connections to be configured per database host
Summary: Ref T11044. Fixes T11672. In T11672, persistent connections seem to work fine, but they can require `max_connections` and other settings to be raised. Since most users don't need them, make them an advanced option.

Test Plan: Configured persistent connections, loaded some pages, observed persistent connections get used.

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T11044, T11672

Differential Revision: https://secure.phabricator.com/D16913
2016-11-22 10:55:45 -08:00
epriestley
f89f708692 Apply storage patches patch-by-patch, not database-by-database
Summary:
Ref T11044. Sometimes we have a sequence of patches like this:

  - `01.newtable.sql`: Adds a new table to Files.
  - `02.movedata.php`: Moves some data that used to live in Tokens to the new table.

This is fairly rare, but not unheard of. More commonly, we can have this sequence:

  - `03.newtable.sql`: Add a new column to Phame.
  - `04.setvalue.php`: Copy data into that new column.

In both cases, when applying database-by-database, we can get into trouble.

  - In the first case, if Files is on a different master, we'll try to move data into a new table before creating the table.
  - In the second case, if Phame is on a different master, the PHP patch will connect to it before we add the new column.

In either case, we try to interact with tables or columns which don't exist yet.

Instead, apply each patch in order, to all databases which need it. So we'll apply `01.newtable.sql` EVERYWHERE first, then move on.

In the case of PHP patches, we also now only apply them once, since they never make schema changes. It should normally be OK to apply them more than once safely, but this is a little faster and a little safer if we ever make a mistake.

Test Plan:
  - Ran `bin/storage upgrade` on single-host and clustered setups.
  - Initialized new storage on single-host and clustered setups.
  - Upgraded again after initialization.
  - Ran with `--apply`.
  - Ran with `--dry-run`.

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T11044

Differential Revision: https://secure.phabricator.com/D16912
2016-11-22 09:24:58 -08:00
epriestley
e6bfa1bd23 Remove "mysql.configuration-provider" configuration option
Summary:
Ref T11044. This was old Facebook cruft for reading configuration from SMC (and maybe doing some other questionable things). See D183.

(See also D175 for discussion of this from 2011.)

In modern Phabricator, you can subclass `SiteConfig` to provide dynamic configuration, and we do so in the Phacility cluster. This lets you change any config, and change in response to requests (e.g., for instancing) and is generally more powerful than this mechanism was.

This configuration provider theoretically let you roll your own replication or partitioning, but in practice I believe no one ever did, and no one ever could have anyway without more support in the upstream (for migrations, read-after-write, etc).

Test Plan:
  - Grepped for removed option.
  - Browsed around with clustering off.

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T11044

Differential Revision: https://secure.phabricator.com/D16911
2016-11-22 09:24:46 -08:00
epriestley
4da74166fe When storage is partitioned, refuse to serve requests unless web and databases agree on partitioning
Summary:
Ref T11044. One popular tool in a modern operations environment is Puppet. The primary purpose of this tool is to randomly revert hosts to older or different configurations.

Introducing an element of chaotic unpredictability into operations trains staff to be on high alert at all times, rather than lulled into complacency by predictability or consistency.

When Puppet reverts a Phabricator host's configuration to an older version, we might start writing data to a lot of crazy places where it shouldn't go. This will create a big sticky mess that is virtually impossible to undo, mostly because we'll get two files with ID 123 or two tasks with ID 456 or whatever else and good luck with that.

Instead, after changing the partition layout, require `bin/storage partition` to be run. This writes a copy of the config everywhere.

Then, when we start serving web requests, make sure every database has the exact same config. This will foil Puppet by refusing to run requests on hosts it has reverted.

Test Plan:
  - Changed partition configuration.
  - Ran Phabricator.
  - FOILED!
  - Ran `bin/storage partition` to sync config.
  - Things worked again.

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T11044

Differential Revision: https://secure.phabricator.com/D16910
2016-11-22 04:15:46 -08:00
epriestley
bac27fb403 Remove "mysql.implementation" configuration
Summary:
Ref T11044. Fixes T10931. This option has essentially never been useful for anything, and we've picked the best implementation for a long time (MySQLi if available, MySQL if not).

I am not aware of any reason to ever set this manually. If someone comes up with some bizarre but legitimate use case that I haven't thought of, we can modularize it.

Test Plan: Browsed around. Grepped for `mysql.implementation`.

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T10931, T11044

Differential Revision: https://secure.phabricator.com/D16909
2016-11-22 04:15:34 -08:00
epriestley
326d5bf800 Detect replicating masters and fatal (also, warn on nonreplicating replicas)
Summary:
Ref T10759. Check master/replica status during startup.

After D16903, this also means that we check this status after a database comes back online after being unreachable.

If a master is replicating, fatal (since this can do a million kinds of bad things).

If a replica is not replicating, warn (this just means the replica is behind so some data is at risk).

Also: if your masters were actually configured properly (mine weren't until this change detected it), we would throw away patches as we applied them, so they would only apply to the //first// master. Instead, properly apply all migration patches to all masters.

Test Plan:
  - Started Phabricator with a replicating master, got a fatal.
  - Stopped replication on a replica, got a warning.
  - With two non-replicating masters, upgraded storage.

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T10759

Differential Revision: https://secure.phabricator.com/D16904
2016-11-21 15:55:22 -08:00
epriestley
78040e0ff5 Run "DatabaseSetup" checks against all configured hosts
Summary:
Ref T10759. Currently, these checks run only against configured masters. Instead, check every host.

These checks also sort of cheat through restart during a recovery, when some hosts will be unreachable: they test for "disaster" by seeing if no masters are reachable, and just skip all the checks in that case.

This is bad for at least two reasons:

  - After recent changes, it is possible that //some// masters are dead but it's still OK to start. For example, "slowvote" may have no master, but everything else is reachable. We can safely run without slowvote.
  - It's possible to start during a disaster and miss important setup checks completely, since we skip them, get a clean bill of health, and never re-test them.

Instead:

  - Test each host individually.
  - Fundamental problems (lack of InnoDB, bad schema) are fatal on any host.
  - If we can't connect, raise it as a //warning// to make sure we check it later. If you start during a disaster, we still want to make sure that schemata are up to date if you later recover a host.

In particular, I'm going to add these checks soon:

  - Fatal if a "master" is replicating.
  - Fatal if a "replica" is not replicating.
  - Fatal if a database partition config differs from web partition config.
  - When we let a database off with a warning because it's down, and later upgrade it to a fatal because we discover it is broken after it comes up again, fatal everything. Currently, we keep running if we "discover" the presence of new fatals after surviving setup checks for the first time.

Test Plan:
  - Configured with multiple masters, intentionally broke one (simulating a disaster where one master is lost), saw Phabricator still startup.
  - Tested individual setup checks by intentionally breaking them.

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T10759

Differential Revision: https://secure.phabricator.com/D16902
2016-11-21 15:49:07 -08:00
epriestley
55e21565b5 Support application partitioning across multiple masters
Summary:
Ref T11044. I'm going to hold this until after the release cut, but I think it's good to go.

This allows installs to configure multiple masters in `cluster.databases` and partition applications across them (for example, put Maniphest on a dedicated database).

When we make a Maniphest connection we go look up which master we should be hitting first, then connect to it.

This has at least approximately been planned for many years, so the actual change is largely just making sure that your config makes sense.

Test Plan:
  - Configured `db001.epriestley.com` and `db002.epriestley.com` as master/master.
  - Partitioned applications between them.
  - Interacted with various applications, saw writes go to the correct host.
  - Viewed "Database Servers" and saw partitioning information.
  - Ran schema upgrades.

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T11044

Differential Revision: https://secure.phabricator.com/D16876
2016-11-19 14:14:39 -08:00
epriestley
558d194302 Update bin/storage workflows to accommodate multiple masters
Summary: Depends on D16847. Ref T11044. This updates the remaining storage-related workflows from the CLI to accommodate multiple masters.

Test Plan:
  - Configured multiple masters.
  - Ran all `bin/storage` workflows.
  - Ran `arc unit --everything`.

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T11044

Differential Revision: https://secure.phabricator.com/D16848
2016-11-12 16:37:47 -08:00
epriestley
ecc598f18d Support multiple database masters and convert easy callers
Summary:
Ref T11044. This moves toward partitioned application databases:

  - You can define multiple masters.
  - Convert all the easily-convertible code to become multi-master aware.

This doesn't convert most of `bin/storage` or "Config > Database (Stuff)" yet, as both are quite involved. They still work for now, but only operate on the first master instead of all masters.

Test Plan: Configured multiple masters, browsed around, ran `bin/storage` commands, ran `bin/storage --host ...`.

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T11044

Differential Revision: https://secure.phabricator.com/D16115
2016-11-12 16:30:20 -08:00
epriestley
49448a87c1 Rough in most of Calendar exports
Summary:
Ref T10747. Rough flow is:

  - Run a query.
  - Select a new "Export Events..." action.
  - This lets you define an "Export", which has a unique URL you can paste into Google Calendar or Calendar.app or whatever.

Most of this does nothing yet but here's the boilerplate.

Test Plan: Doesn't do anything yet.

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T10747

Differential Revision: https://secure.phabricator.com/D16675
2016-10-06 04:06:35 -07:00