1
0
Fork 0
mirror of https://we.phorge.it/source/phorge.git synced 2024-11-28 09:42:41 +01:00
Commit graph

386 commits

Author SHA1 Message Date
epriestley
bcd87e0e3f Don't apply patches or mark patches applied with bin/storage upgrade --dryrun
Summary: Fixes T12682.

Test Plan: Ran `bin/storage upgrade --dryrun` repeatedly with un-applied patches, saw it not apply them and not mark them applied.

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T12682

Differential Revision: https://secure.phabricator.com/D17837
2017-05-05 19:57:18 -07:00
epriestley
85ff1d5c2d Reduce the impact of bin/storage dump
Summary:
Ref T12646.

  - Use "wb1" instead of "wb" to use level 1 gzip compression (faster, less compressy). Locally, this went about 2x faster and the output only grew 4% larger.
  - LinesOfALargeExecFuture does a lot of unnecessary string operations, and can boil down to a busy wait. The process is pretty saturated by I/O so this isn't the end of the world, but just use raw ExecFuture with FutureIterator so that we wait in `select()`.
  - Also, nice the process to +19 so we try to give other things CPU.

Test Plan:
  - Ran `bin/storage dump --compress --output ...`.
  - Saw CPU time for my local database drop from ~240s to ~90s, with a 4% larger output. Most of this was adding the `1`, but the ExecFuture thing helped a little, too.
  - I'm not sure what a great way to test `nice` in a local environment is and it's system dependent anyway, but nothing got worse / blew up.
  - Used `gzcat | head` and `gzcat | tail` on the result to sanity-check that everything was preserved.

Reviewers: chad, amckinley

Reviewed By: chad

Maniphest Tasks: T12646

Differential Revision: https://secure.phabricator.com/D17795
2017-04-26 12:08:59 -07:00
Austin McKinley
febd68039f Add initial infrastructure for adding ModularTransaction support to Application config changes
Summary: Part of the groundwork for T11476.

Test Plan: ran `./bin/storage upgrade` and observed expected DB tables

Reviewers: epriestley

Reviewed By: epriestley

Subscribers: Korvin

Maniphest Tasks: T11476

Differential Revision: https://secure.phabricator.com/D17736
2017-04-19 15:44:57 -07:00
epriestley
d1421bc3a1 Add "bin/storage optimize" to run OPTIMIZE TABLE on everything
Summary:
Even with `innodb_file_per_table` enabled, individual table files on disk don't normally shrink.

For most tables, like `maniphest_task`, this is fine, since the data in the table normally never shrinks, or only shinks a tiny amount.

However, some tables (like the "worker" and "daemon" tables) grow very large during a huge import but most of the data is later deleted by garbage collection. In these cases, this lost space can be reclaimed by running `OPTIMIZE TABLE` on the tables.

Add a script to `OPTIMIZE TABLE` every table.

My primary goal here is just to reduce storage pressure on `db001` since there are a couple of "import the linux kernel" installs on that host wasting a bunch of space. We're not in any trouble, but this should buy us a good chunk of headroom.

Test Plan: Ran `bin/storage optimize` locally and manually ran `OPTIMIZE TABLE` in production, saw tables get optimized.

Reviewers: chad

Reviewed By: chad

Subscribers: cspeckmim

Differential Revision: https://secure.phabricator.com/D17640
2017-04-08 15:15:49 -07:00
Jakub Vrana
9f3cde4db7 Fix errors found by PHPStan
Test Plan: None.

Reviewers: #blessed_reviewers, epriestley

Reviewed By: #blessed_reviewers, epriestley

Subscribers: epriestley

Differential Revision: https://secure.phabricator.com/D17377
2017-02-18 09:24:56 +00:00
Jakub Vrana
a778151f28 Fix errors found by PHPStan
Test Plan: Ran `phpstan analyze -a autoload.php phabricator/src`.

Reviewers: #blessed_reviewers, epriestley

Reviewed By: #blessed_reviewers, epriestley

Subscribers: Korvin, hach-que

Differential Revision: https://secure.phabricator.com/D17371
2017-02-17 10:10:15 +00:00
epriestley
237f94b830 Fix flaky subscribers policy rule unit test
Summary:
I'm about 90% sure this fixes the intermittent test failure on `testObjectSubscribersPolicyRule()` or whatever.

We use `spl_object_hash()` to identify objects when passing hints about policy changes to policy rules. This is hacky, and I think it's the source of the unit test issue.

Specifically, `spl_object_hash()` is approximately just returning the memory address of the object, and two objects can occasionally use the same memory address (one gets garbage collected; another uses the same memory).

If I replace `spl_object_hash()` with a static value like "zebra", the test failure reproduces.

Instead, sneak an object ID onto a runtime property. This is at least as hacky but shouldn't suffer from the same intermittent failure.

Test Plan: Ran `arc unit --everything`, but I never got a reliable repro of the issue in the first place, so who knows.

Reviewers: chad

Reviewed By: chad

Differential Revision: https://secure.phabricator.com/D17029
2016-12-11 12:27:57 -08:00
epriestley
29a3cd5121 Add "Manual Activities", to tell administrators to rebuild the search index
Summary:
Ref T11922. After updating to HEAD of `master`, you need to manually rebuild the index. We don't do this during `bin/storage upgrade` because it can take a very long time (`secure.phabricator.com` took roughly an hour) and can happen while Phabricator is running.

However, if we don't warn users about this they'll just get a broken index unless they go read the changelog (or file an issue, then we tell them to go read the changelog).

This adds a very simple table for notes to administrators so we can write a "you need to go rebuild the index" note, then adds one.

Administrators clear the note by completing the activity and running `bin/config done reindex`. This isn't automatic because there are various strategies you can use to approach the issue, which I'll discuss in greater detail in the linked documentation.

Also, fix an issue where `bin/storage upgrade --apply <patch>` could try to re-mark an already-applied patch as applied.

Test Plan:
  - Ran storage ugrades.
  - Got instructions to rebuild search index.
  - Cleared instructions with `bin/config done reindex`.

Reviewers: chad

Reviewed By: chad

Subscribers: avivey

Maniphest Tasks: T11922

Differential Revision: https://secure.phabricator.com/D16965
2016-11-30 11:23:54 -08:00
epriestley
9d0752063e Allow bin/storage adjust to adjust table engines
Summary:
Ref T11741. On recent-enough versions of MySQL, we would prefer to use InnoDB for fulltext indexes instead of MyISAM.

Allow `bin/storage adjust` to read actual and expected table engines, and apply adjustments as necessary.

We have one existing bad table that uses the wrong engine, `metamta_applicationemail`. This change corrects that table.

Test Plan:
  - Ran `bin/storage upgrade`.
  - Saw the adjustment phase apply this change properly:

```
>>>[463] <query> ALTER TABLE `local_metamta`.`metamta_applicationemail` COLLATE = 'utf8mb4_bin', ENGINE = 'InnoDB'
```

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T11741

Differential Revision: https://secure.phabricator.com/D16941
2016-11-25 15:13:40 -08:00
epriestley
ff3333548f Create and populate a stopwords table for InnoDB fulltext indexes to use in the future
Summary:
Ref T11741. InnoDB uses a stopwords table instead of a stopwords file.

During `storage upgrade`, synchronize the table from the stopwords file on disk.

Test Plan:
  - Ran `storage upgrade`.
  - Ran `select * from stopwords`, saw stopwords.
  - Added some garbage to the table.
  - Ran `storage upgrade`, saw it remove it.

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T11741

Differential Revision: https://secure.phabricator.com/D16940
2016-11-25 15:13:08 -08:00
epriestley
ce18a8e208 Fix two setup issues arising from partitioning support
Summary:
Ref T11044.

  - Use shorter lock names. Fixes T11916.
  - These granular exceptions now always raise as a more generic "Cluster" exception, even for a single host, because there's less special code around running just one database.

Test Plan:
  - Configured bad `mysql.port`, ran `bin/storage upgrade`, got a more helpful error message.
  - Ran `bin/storage upgrade --trace`, saw shorter lock names.

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T11044, T11916

Differential Revision: https://secure.phabricator.com/D16924
2016-11-23 07:20:29 -08:00
epriestley
0ed767b967 Fix a couple of partition migration bugs
Summary:
Ref T11044. Few issues here:

  - The `PhutilProxyException` is missing an argument (hit this while in read-only mode).
  - The `$ref_key` is unused.
  - When you add a new master to an existing cluster, we can incorrectly apply `.php` patches which we should not reapply. Instead, mark them as already-applied.

Test Plan:
  - Poked this locally, but will initialize `secure004` as an empty master to be sure.

Reviewers: chad, avivey

Reviewed By: avivey

Maniphest Tasks: T11044

Differential Revision: https://secure.phabricator.com/D16916
2016-11-22 10:57:24 -08:00
epriestley
8c89fc38fc Allow persistent connections to be configured per database host
Summary: Ref T11044. Fixes T11672. In T11672, persistent connections seem to work fine, but they can require `max_connections` and other settings to be raised. Since most users don't need them, make them an advanced option.

Test Plan: Configured persistent connections, loaded some pages, observed persistent connections get used.

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T11044, T11672

Differential Revision: https://secure.phabricator.com/D16913
2016-11-22 10:55:45 -08:00
epriestley
f89f708692 Apply storage patches patch-by-patch, not database-by-database
Summary:
Ref T11044. Sometimes we have a sequence of patches like this:

  - `01.newtable.sql`: Adds a new table to Files.
  - `02.movedata.php`: Moves some data that used to live in Tokens to the new table.

This is fairly rare, but not unheard of. More commonly, we can have this sequence:

  - `03.newtable.sql`: Add a new column to Phame.
  - `04.setvalue.php`: Copy data into that new column.

In both cases, when applying database-by-database, we can get into trouble.

  - In the first case, if Files is on a different master, we'll try to move data into a new table before creating the table.
  - In the second case, if Phame is on a different master, the PHP patch will connect to it before we add the new column.

In either case, we try to interact with tables or columns which don't exist yet.

Instead, apply each patch in order, to all databases which need it. So we'll apply `01.newtable.sql` EVERYWHERE first, then move on.

In the case of PHP patches, we also now only apply them once, since they never make schema changes. It should normally be OK to apply them more than once safely, but this is a little faster and a little safer if we ever make a mistake.

Test Plan:
  - Ran `bin/storage upgrade` on single-host and clustered setups.
  - Initialized new storage on single-host and clustered setups.
  - Upgraded again after initialization.
  - Ran with `--apply`.
  - Ran with `--dry-run`.

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T11044

Differential Revision: https://secure.phabricator.com/D16912
2016-11-22 09:24:58 -08:00
epriestley
e6bfa1bd23 Remove "mysql.configuration-provider" configuration option
Summary:
Ref T11044. This was old Facebook cruft for reading configuration from SMC (and maybe doing some other questionable things). See D183.

(See also D175 for discussion of this from 2011.)

In modern Phabricator, you can subclass `SiteConfig` to provide dynamic configuration, and we do so in the Phacility cluster. This lets you change any config, and change in response to requests (e.g., for instancing) and is generally more powerful than this mechanism was.

This configuration provider theoretically let you roll your own replication or partitioning, but in practice I believe no one ever did, and no one ever could have anyway without more support in the upstream (for migrations, read-after-write, etc).

Test Plan:
  - Grepped for removed option.
  - Browsed around with clustering off.

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T11044

Differential Revision: https://secure.phabricator.com/D16911
2016-11-22 09:24:46 -08:00
epriestley
4da74166fe When storage is partitioned, refuse to serve requests unless web and databases agree on partitioning
Summary:
Ref T11044. One popular tool in a modern operations environment is Puppet. The primary purpose of this tool is to randomly revert hosts to older or different configurations.

Introducing an element of chaotic unpredictability into operations trains staff to be on high alert at all times, rather than lulled into complacency by predictability or consistency.

When Puppet reverts a Phabricator host's configuration to an older version, we might start writing data to a lot of crazy places where it shouldn't go. This will create a big sticky mess that is virtually impossible to undo, mostly because we'll get two files with ID 123 or two tasks with ID 456 or whatever else and good luck with that.

Instead, after changing the partition layout, require `bin/storage partition` to be run. This writes a copy of the config everywhere.

Then, when we start serving web requests, make sure every database has the exact same config. This will foil Puppet by refusing to run requests on hosts it has reverted.

Test Plan:
  - Changed partition configuration.
  - Ran Phabricator.
  - FOILED!
  - Ran `bin/storage partition` to sync config.
  - Things worked again.

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T11044

Differential Revision: https://secure.phabricator.com/D16910
2016-11-22 04:15:46 -08:00
epriestley
bac27fb403 Remove "mysql.implementation" configuration
Summary:
Ref T11044. Fixes T10931. This option has essentially never been useful for anything, and we've picked the best implementation for a long time (MySQLi if available, MySQL if not).

I am not aware of any reason to ever set this manually. If someone comes up with some bizarre but legitimate use case that I haven't thought of, we can modularize it.

Test Plan: Browsed around. Grepped for `mysql.implementation`.

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T10931, T11044

Differential Revision: https://secure.phabricator.com/D16909
2016-11-22 04:15:34 -08:00
epriestley
326d5bf800 Detect replicating masters and fatal (also, warn on nonreplicating replicas)
Summary:
Ref T10759. Check master/replica status during startup.

After D16903, this also means that we check this status after a database comes back online after being unreachable.

If a master is replicating, fatal (since this can do a million kinds of bad things).

If a replica is not replicating, warn (this just means the replica is behind so some data is at risk).

Also: if your masters were actually configured properly (mine weren't until this change detected it), we would throw away patches as we applied them, so they would only apply to the //first// master. Instead, properly apply all migration patches to all masters.

Test Plan:
  - Started Phabricator with a replicating master, got a fatal.
  - Stopped replication on a replica, got a warning.
  - With two non-replicating masters, upgraded storage.

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T10759

Differential Revision: https://secure.phabricator.com/D16904
2016-11-21 15:55:22 -08:00
epriestley
78040e0ff5 Run "DatabaseSetup" checks against all configured hosts
Summary:
Ref T10759. Currently, these checks run only against configured masters. Instead, check every host.

These checks also sort of cheat through restart during a recovery, when some hosts will be unreachable: they test for "disaster" by seeing if no masters are reachable, and just skip all the checks in that case.

This is bad for at least two reasons:

  - After recent changes, it is possible that //some// masters are dead but it's still OK to start. For example, "slowvote" may have no master, but everything else is reachable. We can safely run without slowvote.
  - It's possible to start during a disaster and miss important setup checks completely, since we skip them, get a clean bill of health, and never re-test them.

Instead:

  - Test each host individually.
  - Fundamental problems (lack of InnoDB, bad schema) are fatal on any host.
  - If we can't connect, raise it as a //warning// to make sure we check it later. If you start during a disaster, we still want to make sure that schemata are up to date if you later recover a host.

In particular, I'm going to add these checks soon:

  - Fatal if a "master" is replicating.
  - Fatal if a "replica" is not replicating.
  - Fatal if a database partition config differs from web partition config.
  - When we let a database off with a warning because it's down, and later upgrade it to a fatal because we discover it is broken after it comes up again, fatal everything. Currently, we keep running if we "discover" the presence of new fatals after surviving setup checks for the first time.

Test Plan:
  - Configured with multiple masters, intentionally broke one (simulating a disaster where one master is lost), saw Phabricator still startup.
  - Tested individual setup checks by intentionally breaking them.

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T10759

Differential Revision: https://secure.phabricator.com/D16902
2016-11-21 15:49:07 -08:00
epriestley
55e21565b5 Support application partitioning across multiple masters
Summary:
Ref T11044. I'm going to hold this until after the release cut, but I think it's good to go.

This allows installs to configure multiple masters in `cluster.databases` and partition applications across them (for example, put Maniphest on a dedicated database).

When we make a Maniphest connection we go look up which master we should be hitting first, then connect to it.

This has at least approximately been planned for many years, so the actual change is largely just making sure that your config makes sense.

Test Plan:
  - Configured `db001.epriestley.com` and `db002.epriestley.com` as master/master.
  - Partitioned applications between them.
  - Interacted with various applications, saw writes go to the correct host.
  - Viewed "Database Servers" and saw partitioning information.
  - Ran schema upgrades.

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T11044

Differential Revision: https://secure.phabricator.com/D16876
2016-11-19 14:14:39 -08:00
epriestley
558d194302 Update bin/storage workflows to accommodate multiple masters
Summary: Depends on D16847. Ref T11044. This updates the remaining storage-related workflows from the CLI to accommodate multiple masters.

Test Plan:
  - Configured multiple masters.
  - Ran all `bin/storage` workflows.
  - Ran `arc unit --everything`.

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T11044

Differential Revision: https://secure.phabricator.com/D16848
2016-11-12 16:37:47 -08:00
epriestley
ecc598f18d Support multiple database masters and convert easy callers
Summary:
Ref T11044. This moves toward partitioned application databases:

  - You can define multiple masters.
  - Convert all the easily-convertible code to become multi-master aware.

This doesn't convert most of `bin/storage` or "Config > Database (Stuff)" yet, as both are quite involved. They still work for now, but only operate on the first master instead of all masters.

Test Plan: Configured multiple masters, browsed around, ran `bin/storage` commands, ran `bin/storage --host ...`.

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T11044

Differential Revision: https://secure.phabricator.com/D16115
2016-11-12 16:30:20 -08:00
epriestley
49448a87c1 Rough in most of Calendar exports
Summary:
Ref T10747. Rough flow is:

  - Run a query.
  - Select a new "Export Events..." action.
  - This lets you define an "Export", which has a unique URL you can paste into Google Calendar or Calendar.app or whatever.

Most of this does nothing yet but here's the boilerplate.

Test Plan: Doesn't do anything yet.

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T10747

Differential Revision: https://secure.phabricator.com/D16675
2016-10-06 04:06:35 -07:00
epriestley
38b10f05a2 For now, disable persistent connections and the "max_connections" setup warning
Summary:
Ref T11672. At low loads, this causes us to use more connections, which is pushing some installs over the default limits.

Rather than trying to walk users through changing `max_connections`, `open_files_limit`, `fs.file-max`, `ulimit`, etc., just put things back for now. After T11044 we should have headroom to use persistent connections within the default limits on all reasonable systems..

Test Plan: Loaded Phabricator, poked around.

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T11672

Differential Revision: https://secure.phabricator.com/D16591
2016-09-23 12:42:26 -07:00
epriestley
f8c2225268 Use persistent database connections from web contexts
Summary:
Ref T11672. Depends on D16577. When establishing a connection from a webserver context, try to use persistent connections.

The hope is that this will fix outbound port exhaustion issues experienced on repository hosts handling large queue volumes.

Test Plan:
Added this to a page:

```lang=php
    $tables = array(
      new PhabricatorUser(),
      new ManiphestTask(),
      new DifferentialRevision(),
      new PhabricatorRepository(),
      new PhabricatorPaste(),
    );

    $ids = array();
    foreach ($tables as $table) {
      $conn = $table->establishConnection('r');

      $cid = queryfx_one(
        $conn,
        'SELECT CONNECTION_ID() cid');

      $ids[get_class($table)] = $cid['cid'];
    }

    var_dump($ids);
```

Reloaded the page a bunch of times and saw no reissued connections (the pool seems to keep a particular connection bound to a particular database), but did see connection reuse across requests.

That is, across reloads the same connection IDs appeared, but the same connection ID never appeared twice in the same request. This is what we want.

Also googled for issues with persistent connections, but everything I found was unconcerning and obscure (local variables and other very complex state that we don't use), and a bunch of the docs are reassuring (transactions, etc., get reset properly).

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T11672

Differential Revision: https://secure.phabricator.com/D16578
2016-09-21 14:46:28 -07:00
epriestley
b7e51877c3 Make storage adjustment a little nicer, especially the first time
Summary:
Fixes T11583.

  - When users run `bin/storage upgrade` for the first time on a new install, we currently give them a prompt which feels rough and which they can only reasonably ever answer "yes" to.
  - We generally use cautionary language ("found issues with schema") in this workflow. Adjustments are now routine, so use more neutral and progress-oriented language ("found adjustments to apply").

Test Plan:
  - Ran `bin/storage upgrade --namesapce kappa123`, got an adjustment using neutral language without prompting.
  - Dropped a key, ran `bin/storage upgrade`, got normal workflow (but with more neutral language).

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T11583

Differential Revision: https://secure.phabricator.com/D16509
2016-09-07 09:09:05 -07:00
epriestley
7eee5c5f6f Fix two "proably" typos
Summary: Caught one of these while reviewing docs, grepped for the other one.

Test Plan: `grep`, reading

Reviewers: chad

Reviewed By: chad

Differential Revision: https://secure.phabricator.com/D16498
2016-09-06 08:59:07 -07:00
epriestley
7e365fd3f1 Allow bin/storage renamespace to work with underscores
If the namespace is something like "test_example" we currently fail to
renamespace the dump.

(Cowboy committing this since this is currently blocking a data export.)

Test Plan:

  - Renamespaced a local dump, examined the output, saw 60 create / 60 use, reimported it.
  - Will export in production.

Auditors: chad
2016-09-06 07:03:45 -07:00
epriestley
2c5b1dc20a Provide "--output" flags for "bin/storage renamespace"
Summary:
Ref T6996. Depends on D16407. This does the same stuff as D16407, but for `bin/storage renamespace`. In particular:

  - Support writing directly to a file (so we can get good errors on failure).
  - Support in-process compression.

Also add support for reading out of a `storage dump` subprocess, so we don't have to do a dump-to-disk + renamespace + compress dance and can just stream out of MySQL directly to a compressed file on disk.

This is used in the second stage of instance exports (see T7148).

It would be nice to share more code with `bin/storage dump`, and possibly to just make this a flag for it, although we still do need to do the file-based version when importing (vs exporting). I figured that was better left for another time.

Test Plan:
Ran `bin/storage renamespace --live --output x --from A --to B --compress --overwrite` and similar commands.

Verified that a compressed, renamespaced dump came out of the other end.

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T6996

Differential Revision: https://secure.phabricator.com/D16410
2016-08-17 09:04:14 -07:00
epriestley
37b8ec5bb7 Provide an "--output" mode for "bin/storage dump" with better error handling
Summary:
Ref T6996. If you do this kind of thing in the shell, you don't get a good error by default if the `dump` command fails:

```
$ bin/storage dump | gzip > output.sql.gz
```

This can be worked around with some elaborate bash tricks, but they're really clunky and uninintuitive.

We also need to do this in several places (while writing backups; while performing exports), and I don't want to copy clunky bash tricks all over the codebase.

Instead, provide `--output` and `--compress` flags which just do this processing inside `bin/storage dump`. It will fail appropriately if any of the underlying operations fail. This also makes the write a little safer (refuses to overwrite) and the code more reusable.

Test Plan:
  - Did three dumps, with no flags, `--output`, and `--output --compress`.
  - Verified all three took similar amounts of time and were identical except for "Date Exported" timestamps in comments (except that the compressed one was compressed).
  - Used `gunzip` to examine the compressed one, verified it was really compressed.
  - Faked a write error, saw properly command behavior (clean up file + exit with error).

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T6996

Differential Revision: https://secure.phabricator.com/D16407
2016-08-17 09:03:45 -07:00
epriestley
5e3efca08a In taskmaster daemons, only close connections which were not used recently
Summary:
Ref T11458. Depends on D16388. Currently, we're very aggressive about closing connections in the taskmaster daemons.

This can end up taking up a lot of resources. In particular, because the outgoing port for outbound connections normally can not be reused for 60 seconds after a connection closes, we may exhaust outbound ports on the host if there's a big queue full of stuff that's being processed very quickly.

At a minimum, we //always// are holding open a `worker` connection, which we always need again right away. So even in the best case we end up opening/closing this about once per second and each daemon takes up about ~60 outbound ports when it should take up ~1.

So, make two adjustments:

  - First, only close connections which we haven't issued a query on in the last 60 seconds. This should prevent us from closing connections that we'll need again immediately in most cases. In the worst case, we shouldn't be eating up any extra ports under default TCP behavior.
  - Second, explicitly close connections. We were relying on implicit/GC behavior (maybe as a holdover from very long ago, before we got connection wrappers in place?), which probably did about the same thing but isn't as predictable and can't be profiled or instrumented.

Test Plan:
This is somewhat difficult to test completely convincingly in isolation since the problem behavior depends on production scales and the workload, and to some degree on configuration.

I tested that this stuff baiscally works by adding logging to connect/close and running the daemons, verifying that they churned connections a lot before this change (e.g., ~1/s even at no load) and churn rarely afterward (e.g., almost never at no load).

I ran some workload through them to make sure I didn't completely break anything.

The best real test is just seeing how production responds. Current inbound/outbound connections on `secure001` are 1,200:

```
secure001 $ netstat -t | grep :mysql | wc -l
1164
```

Current outbound from `repo001` are 18,600:

```
repo001 $ netstat -t | grep :mysql | wc -l
18663
```

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T11458

Differential Revision: https://secure.phabricator.com/D16389
2016-08-11 12:03:56 -07:00
epriestley
3b45608c78 Fix misleading error message when only cluster database masters are configured
Summary:
Fixes T11446. We can raise the misleading error:

> No valid databases are configured!

...when a valid master is configured but unreachable.

Instead, more carefully raise either "nothing is configured" or "nothing is reachable".

Test Plan: Configured only a master, artificially severed it, got "nothing is reachable" instead of "nothing is configured".

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T11446

Differential Revision: https://secure.phabricator.com/D16386
2016-08-10 13:06:12 -07:00
epriestley
e8083ad63a Increase the storage size for commit summaries
Summary:
Fixes T11453. Currently, commit message summaries are limited to 80 bytes. This may only be 20-40 characters for CJK languages or langauges with Cyrillic script.

Increase storage size to 255, then truncate to the shorter of 255 bytes or 80 glyphs. This preserves the same behavior for latin languages, but is less tight for Russian, etc.

Some minor additional changes:

  - Provide a way to ask "how much data fits in this column?" so we don't have to duplicate column lengths across summary checks or UI errors like "title too long".
  - Remove the `text80` datatype, since no other columns use it and we have no use cases (or likely use cases) for it.

Test Plan:
  - Made a commit with a Cyrillic title, saw reasonable summarization in UI:

{F1757522}

  - Added and ran unit tests.
  - Grepped for removed `SUMMARY_MAX_LENGTH` constant.
  - Grepped for removed `text80` data type.

Reviewers: avivey, chad

Reviewed By: avivey

Subscribers: avivey

Maniphest Tasks: T11453

Differential Revision: https://secure.phabricator.com/D16385
2016-08-10 11:12:45 -07:00
Eitan Adler
f032146591 Convert bin/storage workflow to presume utf8mb4 as the default encoding
Summary: see comments on rP68904d941c54

Test Plan: run ,/bin/storage upgrade

Reviewers: epriestley, #blessed_reviewers

Reviewed By: epriestley, #blessed_reviewers

Subscribers: Korvin

Differential Revision: https://secure.phabricator.com/D16370
2016-08-10 05:56:27 -07:00
epriestley
9160da1afb Add a Packages application and PackagePublisher
Summary:
Ref T8116. Partially scavenged from D14152. This roughs in a new Packages application for Arcanist extensions and third-party applications, and adds a "Publisher" object.

A "Publisher" represents an individual or entity who is publishing a package, like "Phacility". It's explicitly //not// necessarily the original author -- just the primary entity vouching for the safety of the code.

A publisher just has a name and a unique key for now. For example, Phacility might have "Phacility" and "phacility", respectively.

Unique keys are immutable, e.g., the package "phacility/arcanist" will always be exactly the same package by exactly the same publisher.

Test Plan: {F1731621}

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T8116

Differential Revision: https://secure.phabricator.com/D16314
2016-07-27 12:21:57 -07:00
Aviv Eyal
68904d941c bin/storage shell: force TCP
Summary:
`mysql` has the magic feature of ignoring port arguments and using the socket when connecting to localhost.

This flag makes it not do that.

Test Plan: `./bin/storage shell`, execute `status`, see `Connection: localhost via TCP/IP`.

Reviewers: joshuaspence, #blessed_reviewers, epriestley

Reviewed By: #blessed_reviewers, epriestley

Subscribers: Korvin

Differential Revision: https://secure.phabricator.com/D16317
2016-07-21 23:42:27 +00:00
epriestley
9827cc1622 Provide a missing timeout on the non-cluster connection pathway
Summary:
Ref T11232. The cluster connection pathway specifies a timeout when connecting, but this connection pathway does not. (I'm not sure if we just never did or if it got lost at some point.)

Soon, T11044 will obsolete this and unify the database connection pathways, but that's a more complicated change.

I'm not sure if this will fix T11232, but it can't hurt.

Test Plan: Put a `throw` on timeout specifications. Before the change: did not hit it in non-cluster configurations. After the change: hit it.

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T11232

Differential Revision: https://secure.phabricator.com/D16194
2016-06-29 11:20:01 -07:00
epriestley
b352cae97d Don't stop on read-only mode for read-only storage workflows
Summary:
Fixes T11042. Currently, all `bin/storage` workflows stop if `cluster.read-only` is set:

```
$ ./bin/storage adjust
Usage Exception: Phabricator is currently in read-only mode. Use --force to override this mode.
```

However, some of them (`status`, `dump`, `databases`, etc) are read-only anyway and safe to run. Don't prompt in these cases.

Test Plan:
  - Set `cluster.read-only` to `true`.
  - Ran `bin/storage dump`, `bin/storage status`, etc. No longer received messages.

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T11042

Differential Revision: https://secure.phabricator.com/D15987
2016-05-30 09:30:43 -07:00
epriestley
0630fef9fc Prevent web queries from running for more than 30 seconds
Summary:
Ref T10849. This enforces a global 30-second per-query time limit for anything not coming from the CLI.

If we run into another issue with MySQL hanging in the future, this should prevent it from being nearly as bad as it was.

Test Plan:
  - Set value to 0, verified the UI threw an exception immediately.
  - Set value back to 30, browsed around a bunch of pages.

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T10849

Differential Revision: https://secure.phabricator.com/D15799
2016-04-26 07:59:09 -07:00
epriestley
595f203816 Correct RepositoryURI schema and propagate adjust exit code correctly
Summary:
Fixes T10830.

  - The return code from `storage adjust` did not propagate correct.
  - There was one column issue which I missed the first time around because I had a bunch of unrelated stuff locally.

Test Plan:
  - Ran `bin/storage upgrade -f` with failures, used `echo $?` to make sure it exited nonzero.
  - Got fully clean `bin/storage adjust` by dropping all my extra local tables.

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T10830

Differential Revision: https://secure.phabricator.com/D15746
2016-04-18 08:11:22 -07:00
epriestley
7852ec1619 Use --master-data, not --dump-slave, in bin/storage dump
Summary: These flags do slightly different things, I actually want --master-data here. My test databases are setup half-weird and work with either statement, which is why I missed this.

Test Plan: Ran a dump against master, got the right CHANGE MASTER statement with no warnings.

Reviewers: chad

Reviewed By: chad

Differential Revision: https://secure.phabricator.com/D15716
2016-04-14 14:56:21 -07:00
epriestley
bbb321395a Support Aphlict clustering
Summary:
Ref T6915. This allows multiple notification servers to talk to each other:

  - Every server has a list of every other server, including itself.
  - Every server generates a unique fingerprint at startup, like "XjeHuPKPBKHUmXkB".
  - Every time a server gets a message, it marks it with its personal fingerprint, then sends it to every other server.
  - Servers do not retransmit messages that they've already seen (already marked with their fingerprint).
  - Servers learn other servers' fingerprints after they send them a message, and stop sending them messages they've already seen.

This is pretty crude, and the first message to a cluster will transmit N^2 times, but N is going to be like 3 or 4 in even the most extreme cases for a very long time.

The fingerprinting stops cycles, and stops servers from sending themselves copies of messages.

We don't need to do anything more sophisticated than this because it's fine if some notifications get lost when a server dies. Clients will reconnect after a short period of time and life will continue.

Test Plan:
  - Wrote two server configs.
  - Started two servers.
  - Told Phabricator about all four services.
  - Loaded Chrome and Safari.
  - Saw them connect to different servers.
  - Sent messages in one, got notifications in the other (magic!).
  - Saw the fingerprinting stuff work on the console, no infinite retransmission of messages, etc.

(This pretty much just worked when I ran it the first time so I probably missed something?)

{F1218835}

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T6915

Differential Revision: https://secure.phabricator.com/D15711
2016-04-14 13:26:30 -07:00
epriestley
5a0b7398ca Give bin/storage some replica-aware options
Summary:
Fixes T10758.

  - Adds a "--host" flag. If you specify this, we read your cluster config. This lets you dump from a replica.
  - Adds a "--for-replica" flag to `storage dump`. This makes `mysqldump` include a `CHANGE MASTER ...` statement in the output, which is useful when setting up a replica for the first time.

Test Plan:
  - Dumped master and replica cluster databases.
  - Dumped non-cluster databases.
  - Ran various other commands (help, status, etc).

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T10758

Differential Revision: https://secure.phabricator.com/D15714
2016-04-14 13:23:35 -07:00
epriestley
146fb646f9 Automatically degrade to read-only mode when unable to connect to the master
Summary:
Ref T4571. If we fail to connect to the master, automatically try to degrade into a temporary read-only mode ("UNREACHABLE") for the remainder of the request, if possible.

If the request was something like "load the homepage", that'll work fine. If it was something like "submit a comment", there's nothing we can do and we just have to fail.

Detecting this condition imposes a performance penalty: every request checks the connection and gives the database a long time to respond, since we don't want to drop writes unless we have to. So the degraded mode works, but it's really slow, and may perpetuate the problem if the root issue is load-related.

This lays the groundwork for improving this case by degrading futher into a "SEVERED" mode which will persist across requests. In the future, if several requests in a short period of time fail, we'll sever the database host and refuse to try to connect to it for a little while, connecting directly to replicas instead (basically, we're "health checking" the master, like a load balancer would health check a web application server). This will give us a better (much faster) degraded mode in a major service disruption, and reduce load on the master if the root cause is load-related, giving it a better chance of recovering on its own.

Test Plan:
  - Disabled master in config by changing the host/username, got degraded automatically to UNREACAHBLE mode immediately.
  - Faked full SEVERED mode, requests hit replicas and put me in the mode properly.
  - Made stuff work, hit some good pages.
  - Hit some non-cluster pages.

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T4571

Differential Revision: https://secure.phabricator.com/D15674
2016-04-10 12:20:13 -07:00
epriestley
e0a8cac703 When no master database is configured, automatically degrade to read-only mode
Summary: Ref T4571. If `cluster.databases` is configured but only has replicas, implicitly drop to read-only mode and send writes to a replica.

Test Plan:
  - Disabled the `master`, saw Phabricator automatically degrade into read-only mode against replicas.
  - (Also tested: explicit read-only mode, non-cluster mode, properly configured cluster mode).

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T4571

Differential Revision: https://secure.phabricator.com/D15672
2016-04-10 12:19:55 -07:00
epriestley
071741c61d When Phabricator is in read-only mode, explain why
Summary:
Ref T4571. Allows users to click the "read-only mode" notification to get more information about why an install is in read-only mode.

Installs can be in this mode for several reasons (explicit administrative action, no masters defined, no masters reachable), and it's useful to be able to tell the difference.

Test Plan: {F1212930}

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T4571

Differential Revision: https://secure.phabricator.com/D15671
2016-04-10 12:19:18 -07:00
epriestley
6a4a9bb2d2 When cluster.databases is configured, read the master connection from it
Summary:
Ref T4571. Ref T10759. Ref T10758. This isn't complete, but gets most of the job done:

  - When `cluster.databases` is set up, most things ignore `mysql.host` now.
  - You can `bin/storage upgrade` and stuff works.
  - You can browse around in the web UI and stuff works.

There's still a lot of weird tricky stuff to navigate, and this has real no advantages over configuring a single server yet (no automatic failover, etc).

Test Plan:
  - Configured `cluster.databases` to point at my `t1.micro` hosts in EC2 (master + replica).
  - Ran `bin/storage upgrade`, got a new install setup on them properly.
  - Survived setup warnings, browsed around.
  - Switched back to local config, ran `bin/storage upgrade`, browsed around, went through setup checks.
  - Intentionally broke config (bad hosts, no masters) and things seemed to react reasonably well.

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T4571, T10758, T10759

Differential Revision: https://secure.phabricator.com/D15668
2016-04-10 12:18:42 -07:00
epriestley
49d93dcf98 Add a cluster.read-only option
Summary:
Ref T4571. There will be a very long path beyond this, but add a basic read-only mode. You can explicitly enable this to put Phabricator in a sort of "maintenance" mode today if you're swapping databases or something.

In the long term, we'll automatically degrade into this mode if the master database is down.

Test Plan:
  - Enabled read-only mode.
  - Browsed around.
  - Didn't immediately see anything that was totally 100% broken.

Most stuff is 80-90% broken right now. For example:

  - Stuff like submitting comments doesn't work, and gives you a confusing, unhelpful error.
  - None of the UI really knows that it's read-only. EditEngine stuff should all hide itself and say "you can't add new comments while an install is in read-only mode", for example, but currently does not.

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T4571

Differential Revision: https://secure.phabricator.com/D15662
2016-04-09 13:40:47 -07:00
epriestley
5f170847ca Throw a more tailored error when a storage upgrade patch can't access a database
Summary: Fixes T8762.

Test Plan: Ran `bin/storage upgrade --namespace ... --user limited`, saw a more specific error.

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T8762

Differential Revision: https://secure.phabricator.com/D15080
2016-01-21 14:54:56 -08:00
epriestley
d37153f003 Make bin/storage upgrade and bin/storage adjust emit detailed messages if the user has no access to databases
Summary:
Ref T10195. Distinguish between "database does not exist" and "database exists, you just don't have permission to access it".

We can't easily get this information out of INFORMATION_SCHEMA but can just `SHOW TABLES IN ...` every database that looks like it's missing and then look at the error code.

Test Plan:
  - Created a user `limited` with limited access.
  - Ran `bin/storage adjust`.
  - Got hopefully more helpful messages about access problems, instead of "Missing" errors.

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T10195

Differential Revision: https://secure.phabricator.com/D15079
2016-01-21 13:06:00 -08:00