Summary: Ensures that newly-made `File` objects get indexed into the new ngrams index. Fixes T8788.
Test Plan:
- uploaded a file with daemons stopped; confirmed no new rows in ngrams table
- started daemons; confirmed indexing of previously-uploaded files happened
- uploaded a new file with daemons running; confirmed it got added to the index
Not sure how to test the changes to `PhabricatorFileUploadSource->writeChunkedFile()` and `PhabricatorChunkedFileStorageEngine->allocateChunks()`. I spent a few minutes trying to find their callers, but the first looks like it requires a Diffusion repo and the 2nd is only accessible via Conduit. I can test that stuff if necessary, but it's such a small change that I'm not worried about it.
Reviewers: epriestley
Reviewed By: epriestley
Subscribers: Korvin
Maniphest Tasks: T8788
Differential Revision: https://secure.phabricator.com/D17718
Summary: Ref T12509. This encourages code to move away from HMAC+SHA1 by making the method name more obviously undesirable.
Test Plan: `grep`, browsed around.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T12509
Differential Revision: https://secure.phabricator.com/D17632
Summary:
Ref T12509. This adds support for HMAC+SHA256 (instead of HMAC+SHA1). Although HMAC+SHA1 is not currently broken in any sense, SHA1 has a well-known collision and it's good to look at moving away from HMAC+SHA1.
The new mechanism also automatically generates and stores HMAC keys.
Currently, HMAC keys largely use a per-install constant defined in `security.hmac-key`. In theory this can be changed, but in practice essentially no install changes it.
We generally (in fact, always, I think?) don't use HMAC digests in a way where it matters that this key is well-known, but it's slightly better if this key is unique per class of use cases. Principally, if use cases have unique HMAC keys they are generally less vulnerable to precomputation attacks where an attacker might generate a large number of HMAC hashes of well-known values and use them in a nefarious way. The actual threat here is probably close to nonexistent, but we can harden against it without much extra effort.
Beyond that, this isn't something users should really have to think about or bother configuring.
Test Plan:
- Added unit tests.
- Used `bin/files integrity` to verify, strip, and recompute hashes.
- Tampered with a generated HMAC key, verified it invalidated hashes.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T12509
Differential Revision: https://secure.phabricator.com/D17630
Summary:
Ref T12470. Provides an "integrity" utility which runs in these modes:
- Verify: check that hashes match.
- Compute: backfill missing hashes.
- Strip: remove hashes. Useful for upgrading across a hash change.
- Corrupt: intentionally corrupt hashes. Useful for debugging.
- Overwrite: force hash recomputation.
Users normally shouldn't need to run any of this stuff, but this provides a reasonable toolkit for managing integrity hashes.
I'll recommend existing installs use `bin/files integrity --compute all` in the upgrade guidance to backfill hashes for existing files.
Test Plan:
- Ran the script in many modes against various files, saw expected operation, including:
- Verified a file, corrupted it, saw it fail.
- Verified a file, stripped it, saw it have no hash.
- Stripped a file, computed it, got a clean verify.
- Stripped a file, overwrote it, got a clean verify.
- Corrupted a file, overwrote it, got a clean verify.
- Overwrote a file, overwrote again, got a no-op.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T12470
Differential Revision: https://secure.phabricator.com/D17629
Summary:
Ref T12470. This helps defuse attacks where an adversary can directly take control of whatever storage engine files are being stored in and change data there. These attacks would require a significant level of access.
Such attackers could potentially attack ranges of AES-256-CBC encrypted files by using Phabricator as a decryption oracle if they were also able to compromise a Phabricator account with read access to the files.
By storing a hash of the data (and, in the case of AES-256-CBC files, the IV) when we write files, and verifying it before we decrypt or read them, we can detect and prevent this kind of tampering.
This also helps detect mundane corruption and integrity issues.
Test Plan:
- Added unit tests.
- Uploaded new files, saw them get integrity hashes.
- Manually corrupted file data, saw it fail. Used `bin/files cat --salvage` to read it anyway.
- Tampered with IVs, saw integrity failures.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T12470
Differential Revision: https://secure.phabricator.com/D17625
Summary:
Fixes T12079. Currently, when a file is encrypted and a request has "Content-Range", we apply the range first, //then// decrypt the result. This doesn't work since you can't start decrypting something from somewhere in the middle (at least, not with our cipher selection).
Instead: decrypt the result, //then// apply the range.
Test Plan: Added failing unit tests, made them pass
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T12079
Differential Revision: https://secure.phabricator.com/D17623
Summary:
Ref T11140. When reading and writing files, we optionally apply a "storage format" to them.
The default format is "raw", which means we just store the raw data.
This change modularizes formats and adds a "rot13" format, which proves formatting works and is testable. In the future, I'll add real encryption formats.
Test Plan:
- Added unit tests.
- Viewed files in web UI.
- Changed a file's format to rot13, saw the data get rotated on display.
- Set default format to rot13:
- Uploaded a small file, verified data was stored as rot13.
- Uploaded a large file, verified metadata was stored as "raw" (just a type, no actual data) and blob data was stored as rot13.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T11140
Differential Revision: https://secure.phabricator.com/D16122
Summary:
Ref T5155. Swaps Phabricator over to the new first-party S3 client using the v4 authentication API so it works in all regions.
The API requires an explicit region, so the new `amazon-s3.region` is now required. I'll write guidance about this.
Test Plan:
- Uploaded files to S3.
- Migrated ~1GB of files to S3.
- Loaded a bunch of files off S3.
- Browsed around the S3 bucket.
- Deleted a file, verified the data on S3 was destroyed.
- Hit new setup warning.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T5155
Differential Revision: https://secure.phabricator.com/D14982
Summary: Use `PhutilClassMapQuery` where appropriate.
Test Plan: Browsed around the UI to verify things seemed somewhat working.
Reviewers: epriestley, #blessed_reviewers
Reviewed By: epriestley, #blessed_reviewers
Subscribers: epriestley, Korvin
Differential Revision: https://secure.phabricator.com/D13429
Summary: All classes should extend from some other class. See D13275 for some explanation.
Test Plan: `arc unit`
Reviewers: epriestley, #blessed_reviewers
Reviewed By: epriestley, #blessed_reviewers
Subscribers: epriestley, Korvin
Differential Revision: https://secure.phabricator.com/D13283
Summary: This signature changed at some point after I tested things and I didn't catch it.
Test Plan: Destroyed a chunked large file with `bin/remove`.
Reviewers: btrahan
Reviewed By: btrahan
Subscribers: epriestley
Differential Revision: https://secure.phabricator.com/D12152
Summary:
Fixes T7621. The engine selection code started out making sense, but didn't make as much sense by the time I was done with it.
Specifically, from the vanilla file upload, we may incorrectly try to write directly to the chunk storage engine. This is incorrect, and produces a confusing/bad error.
Make chunk storage engines explicit and don't try to do single-file one-shot writes to them.
Test Plan:
- Tried to upload a large file with vanilla uploader, got better error message.
- Uploaded small and large files with drag and drop.
- Viewed {nav Files > Help/Options}.
Reviewers: btrahan
Reviewed By: btrahan
Subscribers: epriestley
Maniphest Tasks: T7621
Differential Revision: https://secure.phabricator.com/D12110
Summary: Ref T7149. This works now, so enable it.
Test Plan:
- Uploaded large and small files in Firefox, Safari and Chrome.
- Uploaded large files with `arc upload`.
- Stopped/resumed large files with all clients.
Reviewers: btrahan
Reviewed By: btrahan
Subscribers: epriestley
Maniphest Tasks: T7149
Differential Revision: https://secure.phabricator.com/D12079
Summary: Ref T7149. Return a real iterator from the Chunk engine, which processes chunks sequentially.
Test Plan:
This is a bit hard to read, but shows the underlying chunks being accessed one at a time and only some being accessed when requesting a range of a file:
```
$ ./bin/files cat F878 --trace --begin 100 --end 256
...
>>> [10] <query> SELECT * FROM `file_storageblob` WHERE `id` = 85
<<< [10] <query> 240 us
better software.
Phabricat>>> [11] <query> SELECT * FROM `file_storageblob` WHERE `id` = 84
<<< [11] <query> 205 us
or includes applications for:
>>> [12] <query> SELECT * FROM `file_storageblob` WHERE `id` = 83
<<< [12] <query> 226 us
- reviewing and auditing source>>> [13] <query> SELECT * FROM `file_storageblob` WHERE `id` = 82
<<< [13] <query> 203 us
code;
- hosting and browsing >>> [14] <query> SELECT * FROM `file_storageblob` WHERE `id` = 81
<<< [14] <query> 231 us
repositories;
- tracking bugs;
```
Reviewers: btrahan
Reviewed By: btrahan
Subscribers: joshuaspence, epriestley
Maniphest Tasks: T7149
Differential Revision: https://secure.phabricator.com/D12073
Summary: Ref T7149. We can't compute hashes of large files efficiently, but we can resume uploads by the same author, with the same name and file size, which are only partially completed. This seems like a reasonable heuristic that is unlikely to ever misfire, even if it's a little magical.
Test Plan:
- Forced chunking on.
- Started uploading a chunked file.
- Closed the browser window.
- Dropped it into a new window.
- Upload resumed //(!!!)//
- Did this again.
- Downloaded the final file, which successfully reconstructed the original file.
Reviewers: btrahan
Reviewed By: btrahan
Subscribers: joshuaspence, chad, epriestley
Maniphest Tasks: T7149
Differential Revision: https://secure.phabricator.com/D12070
Summary:
Ref T7149. This adds chunking support to drag-and-drop uploads. It never activates right now unless you hack things up, since the chunk engine is still hard-coded as disabled.
The overall approach is the same as `arc upload` in D12061, with some slight changes to the API return values to avoid a few extra HTTP calls.
Test Plan:
- Enabled chunk engine.
- Uploaded some READMEs in a bunch of tiny 32 byte chunks.
- Worked out of the box in Safari, Chrome, Firefox.
Reviewers: btrahan
Reviewed By: btrahan
Subscribers: epriestley
Maniphest Tasks: T7149
Differential Revision: https://secure.phabricator.com/D12066
Summary:
Ref T7149. This isn't complete and isn't active yet, but does basically work. I'll shore it up in the next few diffs.
The new workflow goes like this:
> Client, file.allocate(): I'd like to upload a file with length L, metadata M, and hash H.
Then the server returns `upload` (a boolean) and `filePHID` (a PHID). These mean:
| upload | filePHID | means |
|---|---|---|
| false | false | Server can't accept file.
| false | true | File data already known, file created from hash.
| true | false | Just upload normally.
| true | true | Query chunks to start or resume a chunked upload.
All but the last case are uninteresting and work like exising uploads with `file.uploadhash` (which we can eventually deprecate).
In the last case:
> Client, file.querychunks(): Give me a list of chunks that I should upload.
This returns all the chunks for the file. Chunks have a start byte, an end byte, and a "complete" flag to indicate that the server already has the data.
Then, the client fills in chunks by sending them:
> Client, file.uploadchunk(): Here is the data for one chunk.
This stuff doesn't work yet or has some caveats:
- I haven't tested resume much.
- Files need an "isPartial()" flag for partial uploads, and the UI needs to respect it.
- The JS client needs to become chunk-aware.
- Chunk size is set crazy low to make testing easier.
- Some debugging flags that I'll remove soon-ish.
- Downloading works, but still streams the whole file into memory.
- This storage engine is disabled by default (hardcoded as a unit test engine) because it's still sketchy.
- Need some code to remove the "isParital" flag when the last chunk is uploaded.
- Maybe do checksumming on chunks.
Test Plan:
- Hacked up `arc upload` (see next diff) to be chunk-aware and uploaded a readme in 18 32-byte chunks. Then downloaded it. Got the same file back that I uploaded.
- File UI now shows some basic chunk info for chunked files:
{F336434}
Reviewers: btrahan
Reviewed By: btrahan
Subscribers: joshuaspence, epriestley
Maniphest Tasks: T7149
Differential Revision: https://secure.phabricator.com/D12060
Summary:
Fixes T5843. File storage engines use a very old "selector" mechanism which makes them difficult to extend.
This mechanism predates widespread use of `PhutilSymbolLoader` to discover available implementations at runtime. Runtime discovery has generally proven more flexible and easier to use than explicit selection (although it sometimes needs more UI to support it in cases where order or enabled/disabled flags can not be directly determined).
Use a modern runtime discovery mechanism instead of an explicit selector. This might break any installs which subclassed the `Selector`, but I believe almost no such installs exist, and they'll receive a meaningful exception upon upgrading (any custom engines will no longer implement all of the required methods).
Looking forward, this modernizes infrastructure to prepare for new "virtual" chunked-storage engines, with the eventual goal of supporting very large file uploads and data import into the Phacility cluster.
This uses D12051 to add UI to make it easier to understand the state of storage engines.
Test Plan:
Used new UI panel to assess storage engines:
{F336270}
- Uploaded a small file, saw it go to MySQL engine.
- Uploaded a larger file, saw it go to S3 engine.
Reviewers: btrahan
Reviewed By: btrahan
Subscribers: epriestley
Maniphest Tasks: T5843
Differential Revision: https://secure.phabricator.com/D12053
Summary: Fixes T5798. We basically weren't using the caching mechanism. Also adds service calls for S3 stuff, and support for seeing a little info like you can for conduit.
Test Plan: uploaded a paste, looked at paste list - no s3 service calls. edited the paste, looked at paste list - no s3 service calls and edited content properly shown
Reviewers: epriestley
Reviewed By: epriestley
Subscribers: epriestley, Korvin
Maniphest Tasks: T5798
Differential Revision: https://secure.phabricator.com/D10294
Summary: I'm pretty sure that `@group` annotations are useless now... see D9855. Also fixed various other minor issues.
Test Plan: Eye-ball it.
Reviewers: #blessed_reviewers, epriestley, chad
Reviewed By: #blessed_reviewers, epriestley
Subscribers: epriestley, Korvin, hach-que
Differential Revision: https://secure.phabricator.com/D9859
Summary:
Via HackerOne. In regular expressions, "$" matches "end of input, or before terminating newline". This means that the expression `/^A$/` matches two strings: `"A"`, and `"A\n"`.
When we care about this, use `\z` instead, which matches "end of input" only.
This allowed registration of `"username\n"` and similar.
Test Plan:
- Grepped codebase for all calls to `preg_match()` / `preg_match_all()`.
- Fixed the ones where this seemed like it could have an impact.
- Added and executed unit tests.
Reviewers: btrahan
Reviewed By: btrahan
Subscribers: aran, epriestley
Differential Revision: https://secure.phabricator.com/D8516
Summary:
Ref T988. Not sure about this, feel free to push back or tweak it or whatever, but I want to reduce the amount of meta-text in the method documentation. Primarily this:
- Shortens "From parent implementation in ClassName:" to "ClassName".
- Tries to tweak the styles a bit so that it's relatively obvious what that means (hopefully?).
- Fixes an issue with tasks where some methods could be ignored.
Test Plan: {F57565}
Reviewers: chad
Reviewed By: chad
CC: aran
Maniphest Tasks: T988
Differential Revision: https://secure.phabricator.com/D6911
Summary: Storage is shared between files in a smart way. When uploading files, if other file have the same contentHash, then share storage. On delete, storage is permanently deleted only if no other files are sharing it
Test Plan: Upload multiple copies of the same file, while tracking database. Delete copies of files and check to see that the storage is only deleted if no other files are using it
Reviewers: epriestley
CC: aran, Korvin
Maniphest Tasks: T2454
Differential Revision: https://secure.phabricator.com/D4775
Summary: Fixes T2474. Adds a storage dummy storage engine for unit tests, and adds a couple of simple tests for basic file storage.
Test Plan: Ran `arc unit` to execute unit tests.
Reviewers: kwadwon
Reviewed By: kwadwon
CC: aran
Maniphest Tasks: T2474
Differential Revision: https://secure.phabricator.com/D4777
Summary:
- Fixes T2257. We wrote a 0-length file (erroneously?) and currently throw when retrieving it. This also happens if you intentionally upload an empty file. I'm not sure what happened with the image: we check for errors during the write, so its existence implies S3 told us the write was successful and then lost the data. Since this is a one-off, I'm not too worried about it. The indistinguishable case of an actually empty file is fixed, at least.
- Writes to a directory like "phabricator/ab/cd/efgh" instead of "phabricator/abcdefgh". When I had to go look for the file on S3 it took a few minutes of scrolling since the web interface isn't very fast. Make it so a file can be located by navigating through pieces of the hash.
Test Plan: Viewed an empty file, no fatal. Viewed the file from T2257 locally, no fatal (no data either, but it's gone). Uploaded a file, saw a nice path.
Reviewers: btrahan
Reviewed By: btrahan
CC: aran
Maniphest Tasks: T2257
Differential Revision: https://secure.phabricator.com/D4303
Summary:
Allows to use file storage in different Amazon S3 regions as well as it
should support different file storage services with S3 compliant API
(eg.: Bashos' Riak CS).
Test Plan:
1. Create S3 bucket in non-default AWS Region (e.g. EU/Ireland).
2. Set appropriate bucket policy to allow upload & download objects for the specified access credentials.
3. Set `amazon-s3.endpoint` in your configuration file to an appropriate value (e.g. `s3-eu-west-1.amazonaws.com` for EU region).
Reviewers: epriestley
Reviewed By: epriestley
CC: aran, Korvin
Differential Revision: https://secure.phabricator.com/D3965
Summary:
This commit doesn't change license of any file. It just makes the license implicit (inherited from LICENSE file in the root directory).
We are removing the headers for these reasons:
- It wastes space in editors, less code is visible in editor upon opening a file.
- It brings noise to diff of the first change of any file every year.
- It confuses Git file copy detection when creating small files.
- We don't have an explicit license header in other files (JS, CSS, images, documentation).
- Using license header in every file is not obligatory: http://www.apache.org/dev/apply-license.html#new.
This change is approved by Alma Chao (Lead Open Source and IP Counsel at Facebook).
Test Plan: Verified that the license survived only in LICENSE file and that it didn't modify externals.
Reviewers: epriestley, davidrecordon
Reviewed By: epriestley
CC: aran, Korvin
Maniphest Tasks: T2035
Differential Revision: https://secure.phabricator.com/D3886
Summary:
- `kill_init.php` said "Moving 1000 files" - I hope that this is not some limit in `FileFinder`.
- [src/infrastructure/celerity] `git mv utils.php map.php; git mv api/utils.php api.php`
- Comment `phutil_libraries` in `.arcconfig` and run `arc liberate`.
NOTE: `arc diff` timed out so I'm pushing it without review.
Test Plan:
/D1234
Browsed around, especially in `applications/repository/worker/commitchangeparser` and `applications/` in general.
Auditors: epriestley
Maniphest Tasks: T1103
Summary: See T1021. Raise configuration or implementation exceptions immediately. When all engines fail, raise an aggregate exception with details.
Test Plan: Forced all engines to fail, received an aggregate exception. Forced an engine to fail with a config exception, recevied it immediately.
Reviewers: btrahan, vrana, jungejason
Reviewed By: btrahan
CC: aran
Maniphest Tasks: T1021
Differential Revision: https://secure.phabricator.com/D2157
Filesystem::readRandomCharacters()
Summary: See T547. To improve auditability of use of crypto-sensitive hash
functions, use Filesystem::readRandomCharacters() in place of
sha1(Filesystem::readRandomBytes()) when we're just generating random ASCII
strings.
Test Plan:
- Generated a new PHID.
- Logged out and logged back in (to test sessions).
- Regenerated Conduit certificate.
- Created a new task, verified mail key generated sensibly.
- Created a new revision, verified mail key generated sensibly.
- Ran "arc list", got blocked, installed new certificate, ran "arc list"
again.
Reviewers: jungejason, nh, tuomaspelkonen, aran, benmathews
Reviewed By: jungejason
CC: aran, epriestley, jungejason
Differential Revision: 1000
Summary:
Provide a catchall mechanism to find unprotected writes.
- Depends on D758.
- Similar to WriteOnHTTPGet stuff from Facebook's stack.
- Since we have a small number of storage mechanisms and highly structured
read/write pathways, we can explicitly answer the question "is this page
performing a write?".
- Never allow writes without CSRF checks.
- This will probably break some things. That's fine: they're CSRF
vulnerabilities or weird edge cases that we can fix. But don't push to Facebook
for a few days unless you're prepared to deal with this.
- **>>> MEGADERP: All Conduit write APIs are currently vulnerable to CSRF!
<<<**
Test Plan:
- Ran some scripts that perform writes (scripts/search indexers), no issues.
- Performed normal CSRF submits.
- Added writes to an un-CSRF'd page, got an exception.
- Executed conduit methods.
- Did login/logout (this works because the logged-out user validates the
logged-out csrf "token").
- Did OAuth login.
- Did OAuth registration.
Reviewers: pedram, andrewjcg, erling, jungejason, tuomaspelkonen, aran,
codeblock
Commenters: pedram
CC: aran, epriestley, pedram
Differential Revision: 777
Summary: Implements an S3 storage engine option for Phabricator.
Test Plan:
- Uploaded files to S3.
- Looked at them.
- Verified they appeared in S3 using the S3 file browser.
Reviewed By: jungejason
Reviewers: jungejason, tuomaspelkonen, aran
CC: aran, jungejason
Differential Revision: 752
Summary:
See T344. Currently, there's a hard-coded 12MB filesize limit and some awkward
interactions with MySQL's max_allowed_packet. Make this system generally more
robust:
- Move the upload limit to configuration.
- Add setup steps which reconcile max_allowed_packet vs MySQL file storage
limits.
- Add a layer of indirection between uploading files and storage engines.
- Allow the definition of new storage engines.
- Define a local disk storage engine.
- Add a "storage engine selector" class which manages choosing which storage
engines to put files in.
- Document storage engines.
- Document file storage classes.
Test Plan:
Setup mode:
- Disabled MySQL storage engine, misconfigured it, configured it correctly.
- Disabled file storage engine, set it to something invalid, set it to
something valid.
- Verified max_allowed_packet is read correctly.
Application mode:
- Configured local file storage.
- Uploaded large and small files.
- Verified larger files were written to local storage.
- Verified smaller files were written to MySQL blob storage.
Documentation:
- Read documentation.
Reviewed By: jungejason
Reviewers: jungejason, tuomaspelkonen, aran
CC: aran, epriestley, jungejason
Differential Revision: 695