Summary: Ensures that newly-made `File` objects get indexed into the new ngrams index. Fixes T8788.
Test Plan:
- uploaded a file with daemons stopped; confirmed no new rows in ngrams table
- started daemons; confirmed indexing of previously-uploaded files happened
- uploaded a new file with daemons running; confirmed it got added to the index
Not sure how to test the changes to `PhabricatorFileUploadSource->writeChunkedFile()` and `PhabricatorChunkedFileStorageEngine->allocateChunks()`. I spent a few minutes trying to find their callers, but the first looks like it requires a Diffusion repo and the 2nd is only accessible via Conduit. I can test that stuff if necessary, but it's such a small change that I'm not worried about it.
Reviewers: epriestley
Reviewed By: epriestley
Subscribers: Korvin
Maniphest Tasks: T8788
Differential Revision: https://secure.phabricator.com/D17718
Summary: Ref T12509. This encourages code to move away from HMAC+SHA1 by making the method name more obviously undesirable.
Test Plan: `grep`, browsed around.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T12509
Differential Revision: https://secure.phabricator.com/D17632
Summary:
Fixes T12079. Currently, when a file is encrypted and a request has "Content-Range", we apply the range first, //then// decrypt the result. This doesn't work since you can't start decrypting something from somewhere in the middle (at least, not with our cipher selection).
Instead: decrypt the result, //then// apply the range.
Test Plan: Added failing unit tests, made them pass
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T12079
Differential Revision: https://secure.phabricator.com/D17623
Summary:
Ref T11140. When reading and writing files, we optionally apply a "storage format" to them.
The default format is "raw", which means we just store the raw data.
This change modularizes formats and adds a "rot13" format, which proves formatting works and is testable. In the future, I'll add real encryption formats.
Test Plan:
- Added unit tests.
- Viewed files in web UI.
- Changed a file's format to rot13, saw the data get rotated on display.
- Set default format to rot13:
- Uploaded a small file, verified data was stored as rot13.
- Uploaded a large file, verified metadata was stored as "raw" (just a type, no actual data) and blob data was stored as rot13.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T11140
Differential Revision: https://secure.phabricator.com/D16122
Summary: This signature changed at some point after I tested things and I didn't catch it.
Test Plan: Destroyed a chunked large file with `bin/remove`.
Reviewers: btrahan
Reviewed By: btrahan
Subscribers: epriestley
Differential Revision: https://secure.phabricator.com/D12152
Summary: Ref T7149. This works now, so enable it.
Test Plan:
- Uploaded large and small files in Firefox, Safari and Chrome.
- Uploaded large files with `arc upload`.
- Stopped/resumed large files with all clients.
Reviewers: btrahan
Reviewed By: btrahan
Subscribers: epriestley
Maniphest Tasks: T7149
Differential Revision: https://secure.phabricator.com/D12079
Summary: Ref T7149. Return a real iterator from the Chunk engine, which processes chunks sequentially.
Test Plan:
This is a bit hard to read, but shows the underlying chunks being accessed one at a time and only some being accessed when requesting a range of a file:
```
$ ./bin/files cat F878 --trace --begin 100 --end 256
...
>>> [10] <query> SELECT * FROM `file_storageblob` WHERE `id` = 85
<<< [10] <query> 240 us
better software.
Phabricat>>> [11] <query> SELECT * FROM `file_storageblob` WHERE `id` = 84
<<< [11] <query> 205 us
or includes applications for:
>>> [12] <query> SELECT * FROM `file_storageblob` WHERE `id` = 83
<<< [12] <query> 226 us
- reviewing and auditing source>>> [13] <query> SELECT * FROM `file_storageblob` WHERE `id` = 82
<<< [13] <query> 203 us
code;
- hosting and browsing >>> [14] <query> SELECT * FROM `file_storageblob` WHERE `id` = 81
<<< [14] <query> 231 us
repositories;
- tracking bugs;
```
Reviewers: btrahan
Reviewed By: btrahan
Subscribers: joshuaspence, epriestley
Maniphest Tasks: T7149
Differential Revision: https://secure.phabricator.com/D12073
Summary: Ref T7149. We can't compute hashes of large files efficiently, but we can resume uploads by the same author, with the same name and file size, which are only partially completed. This seems like a reasonable heuristic that is unlikely to ever misfire, even if it's a little magical.
Test Plan:
- Forced chunking on.
- Started uploading a chunked file.
- Closed the browser window.
- Dropped it into a new window.
- Upload resumed //(!!!)//
- Did this again.
- Downloaded the final file, which successfully reconstructed the original file.
Reviewers: btrahan
Reviewed By: btrahan
Subscribers: joshuaspence, chad, epriestley
Maniphest Tasks: T7149
Differential Revision: https://secure.phabricator.com/D12070
Summary:
Ref T7149. This adds chunking support to drag-and-drop uploads. It never activates right now unless you hack things up, since the chunk engine is still hard-coded as disabled.
The overall approach is the same as `arc upload` in D12061, with some slight changes to the API return values to avoid a few extra HTTP calls.
Test Plan:
- Enabled chunk engine.
- Uploaded some READMEs in a bunch of tiny 32 byte chunks.
- Worked out of the box in Safari, Chrome, Firefox.
Reviewers: btrahan
Reviewed By: btrahan
Subscribers: epriestley
Maniphest Tasks: T7149
Differential Revision: https://secure.phabricator.com/D12066
Summary:
Ref T7149. This isn't complete and isn't active yet, but does basically work. I'll shore it up in the next few diffs.
The new workflow goes like this:
> Client, file.allocate(): I'd like to upload a file with length L, metadata M, and hash H.
Then the server returns `upload` (a boolean) and `filePHID` (a PHID). These mean:
| upload | filePHID | means |
|---|---|---|
| false | false | Server can't accept file.
| false | true | File data already known, file created from hash.
| true | false | Just upload normally.
| true | true | Query chunks to start or resume a chunked upload.
All but the last case are uninteresting and work like exising uploads with `file.uploadhash` (which we can eventually deprecate).
In the last case:
> Client, file.querychunks(): Give me a list of chunks that I should upload.
This returns all the chunks for the file. Chunks have a start byte, an end byte, and a "complete" flag to indicate that the server already has the data.
Then, the client fills in chunks by sending them:
> Client, file.uploadchunk(): Here is the data for one chunk.
This stuff doesn't work yet or has some caveats:
- I haven't tested resume much.
- Files need an "isPartial()" flag for partial uploads, and the UI needs to respect it.
- The JS client needs to become chunk-aware.
- Chunk size is set crazy low to make testing easier.
- Some debugging flags that I'll remove soon-ish.
- Downloading works, but still streams the whole file into memory.
- This storage engine is disabled by default (hardcoded as a unit test engine) because it's still sketchy.
- Need some code to remove the "isParital" flag when the last chunk is uploaded.
- Maybe do checksumming on chunks.
Test Plan:
- Hacked up `arc upload` (see next diff) to be chunk-aware and uploaded a readme in 18 32-byte chunks. Then downloaded it. Got the same file back that I uploaded.
- File UI now shows some basic chunk info for chunked files:
{F336434}
Reviewers: btrahan
Reviewed By: btrahan
Subscribers: joshuaspence, epriestley
Maniphest Tasks: T7149
Differential Revision: https://secure.phabricator.com/D12060