1
0
Fork 0
mirror of https://we.phorge.it/source/phorge.git synced 2024-11-28 01:32:42 +01:00
phorge-phorge/src/applications/drydock
epriestley ac7edf54af Fix bad counting in SQL when enforcing Drydock allocator soft limits
Summary:
Ref T9252. This fixes a bug from D14236. D14272 discusses the observable effects of the bug, primarily that the window for racing is widened from ~a few milliseconds to several minutes under our configuration.

This SQL query is missing a `GROUP BY` clause, so all of the resources get counted as having the same status (specifically, the alphabetically earliest status any resource had, I think). For test cases this often gets the right result since the number of resources may be small and they may all have the same status, but in production this isn't true. In particular, the allocator would sometimes see "35 destroyed resources" (or whatever), when the real counts were "32 destroyed resources + 3 pending resources".

Since this allocator behavior is soft/advisory this didn't cause any actual problems, per se (we do expect races here occasionally), it just made the race very very easy to hit. For example, Drydock in production currently has three pending working copy resources. Although we do expect this to be //possible//, getting 4 resources when the configured limit is 1 should be hard (not lightning strike / cosmic radiaion hard, but "happens once a year" hard).

Also exclude destroyed resources since we never care about them.

Test Plan:
Followed the plan from D14272 and restarted two Harbormaster workers at the same time.

After this patch was applied, they no longer created two different resources (we expect it to be possible for this to happen, just very hard).

We should still be able to force this race by putting something like `sleep(10)` right before the query, then `sleep(10)` right after it. That would prevent the allocators from seeing one another (so they would both think there were no other resources) and push us down the pathway where we exceed the soft limit.

Reviewers: chad, hach-que

Reviewed By: hach-que

Maniphest Tasks: T9252

Differential Revision: https://secure.phabricator.com/D14274
2015-10-14 06:18:10 -07:00
..
application Rough cut of DrydockRepositoryOperation 2015-10-13 15:46:12 -07:00
blueprint Fix bad counting in SQL when enforcing Drydock allocator soft limits 2015-10-14 06:18:10 -07:00
capability Give Drydock resources a proper expiry mechanism 2015-09-28 09:35:14 -07:00
constants Modernize Drydock SearchEngine implementations 2015-09-24 09:56:49 -07:00
controller Rough cut of DrydockRepositoryOperation 2015-10-13 15:46:12 -07:00
customfield Rough cut of "Blueprint Authorizations" 2015-10-10 07:15:25 -07:00
editor Allow Drydock blueprints to be disabled 2015-09-24 10:18:17 -07:00
exception Add more Drydock log types and some additional logging 2015-10-01 08:11:42 -07:00
garbagecollector Provide bin/garbage for interacting with garbage collection 2015-10-02 09:17:24 -07:00
interface Allow AlmanacHost blueprints to build a meaningful CommandInterface 2015-09-21 04:46:02 -07:00
logtype Use Drydock authorizations when acquiring leases 2015-10-12 17:02:35 -07:00
management Use Drydock authorizations when acquiring leases 2015-10-12 17:02:35 -07:00
operation If the stars align, make "Land Revision" kind of work 2015-10-13 15:46:30 -07:00
phid Rough cut of DrydockRepositoryOperation 2015-10-13 15:46:12 -07:00
query If the stars align, make "Land Revision" kind of work 2015-10-13 15:46:30 -07:00
storage If the stars align, make "Land Revision" kind of work 2015-10-13 15:46:30 -07:00
typeahead Improve Drydock log search engine 2015-08-24 21:13:20 +10:00
view Allow "Repository Automation" to be configured for repositories 2015-10-13 15:45:59 -07:00
worker Fix an issue where newly created Drydock resources could be improperly acquired 2015-10-14 06:16:21 -07:00