1
0
Fork 0
mirror of https://we.phorge.it/source/phorge.git synced 2024-12-02 11:42:42 +01:00
Commit graph

191 commits

Author SHA1 Message Date
epriestley
c25a8fabfc Remove all "ObjectHasFile" edge reads and writes
Summary: Ref T13603. Migrate all code which interacts with the "ObjectHasFile" edge to use the "Attachments" table instead.

Test Plan:
  - Edited a paste's view policy, confirmed associated file secret was scrambled.
  - Verified I could still view paste content as a user who could not naturally view the underlying file.

Subscribers: PHID-OPKG-gm6ozazyms6q6i22gyam

Maniphest Tasks: T13603

Differential Revision: https://secure.phabricator.com/D21819
2022-05-19 13:21:04 -07:00
epriestley
d017f3f210 Double-write file attachment to old "edge" storage and new "attachment" storage
Summary: Ref T13603. This adds a second write to new "attachment" storage to all writers except one in Paste, which creates the file inline.

Test Plan:
  - Updated a macro image, confirmed a write to "attachment" storage (transaction pathway).
  - Updated a blog profile image, confirmed a write to "attachment" storage (legacy pathway).

Maniphest Tasks: T13603

Differential Revision: https://secure.phabricator.com/D21816
2022-05-19 13:21:03 -07:00
epriestley
7fcc0f9ebd Remove "PhabricatorFile->detachFromObject()"
Summary:
Ref T13603. Currently, files are sometimes detached from objects. For example, when you change the image for a Macro, the old image is detached.

This is wrong: the image should remain attached so users who can view the macro can view the complete "alice change the image from X to Y" transaction. To be able to understand the change that was applied, you need to be able to view both files.

All workflows which currently detach files aren't conistent with the modern way applications behave, except maybe one callsite in a unit test, and that one's kind of moot.

Get rid of this stuff and just use PHID extraction to perform file attachment in all cases.

Test Plan: Created and edited macros, verified files were properly attached and remained attached across edits.

Maniphest Tasks: T13603

Differential Revision: https://secure.phabricator.com/D21815
2022-05-19 13:21:03 -07:00
epriestley
cfa42c5e65 Add database storage for a dedicated file attachment table
Summary: Ref T13603. Prepare to move this relationship out of edge storage into dedicated storage so it is easier to formalize better in the UI.

Test Plan: Ran `bin/storage upgrade`.

Subscribers: PHID-OPKG-gm6ozazyms6q6i22gyam

Maniphest Tasks: T13603

Differential Revision: https://secure.phabricator.com/D21813
2022-05-19 13:21:02 -07:00
epriestley
bfe7cdc5a2 Provide default image alt text in more contexts and support custom alt text
Summary:
Ref T13629.

  - Allow files to have custom alt text.
  - If a file doesn't have alt text, try to generate a plausible default alt text with the information we have.

Test Plan:
  - Viewed image files in DocumentEngine diffs, files, `{Fxxx}` embeds, and lightboxes.
  - Saw default alt text in all cases, or custom alt text if provided.
  - Set, modified, and removed file alt text. Viewed timeline and feed.
  - Pulled alt text with "conduit.search".

Maniphest Tasks: T13629

Differential Revision: https://secure.phabricator.com/D21596
2021-03-04 16:51:23 -08:00
epriestley
eab561bb87 Add "uri" to the API results for File objects
Summary: Ref T13528. This supports "arc upload --browse ...".

Test Plan: Called "file.search", saw URIs in results.

Maniphest Tasks: T13528

Differential Revision: https://secure.phabricator.com/D21204
2020-05-01 09:11:59 -07:00
epriestley
11cf8f05b1 Remove "getApplicationTransactionObject()" from ApplicationTransactionInterface
Summary:
Depends on D19919. Ref T11351. This method appeared in D8802 (note that "get...Object" was renamed to "get...Transaction" there, so this method was actually "new" even though a method of the same name had existed before).

The goal at the time was to let Harbormaster post build results to Diffs and have them end up on Revisions, but this eventually got a better implementation (see below) where the Harbormaster-specific code can just specify a "publishable object" where build results should go.

The new `get...Object` semantics ultimately broke some stuff, and the actual implementation in Differential was removed in D10911, so this method hasn't really served a purpose since December 2014. I think that broke the Harbormaster thing by accident and we just lived with it for a bit, then Harbormaster got some more work and D17139 introduced "publishable" objects which was a better approach. This was later refined by D19281.

So: the original problem (sending build results to the right place) has a good solution now, this method hasn't done anything for 4 years, and it was probably a bad idea in the first place since it's pretty weird/surprising/fragile.

Note that `Comment` objects still have an unrelated method with the same name. In that case, the method ties the `Comment` storage object to the related `Transaction` storage object.

Test Plan: Grepped for `getApplicationTransactionObject`, verified that all remaining callsites are related to `Comment` objects.

Reviewers: amckinley

Reviewed By: amckinley

Subscribers: PHID-OPKG-gm6ozazyms6q6i22gyam

Maniphest Tasks: T11351

Differential Revision: https://secure.phabricator.com/D19920
2018-12-20 15:16:19 -08:00
epriestley
937e88c399 Remove obsolete, no-op implementations of "willRenderTimeline()"
Summary:
Depends on D19918. Ref T11351. In D19918, I removed all calls to this method. Now, remove all implementations.

All of these implementations just `return $timeline`, only the three sites in D19918 did anything interesting.

Test Plan: Used `grep willRenderTimeline` to find callsites, found none.

Reviewers: amckinley

Reviewed By: amckinley

Subscribers: PHID-OPKG-gm6ozazyms6q6i22gyam

Maniphest Tasks: T11351

Differential Revision: https://secure.phabricator.com/D19919
2018-12-20 15:04:49 -08:00
epriestley
91abc0f027 Stop indexing the chunk data objects for large Files stored in multiple chunks
Summary:
Ref T13164. See PHI766. Currently, when file data is stored in small chunks, we submit each chunk to the indexing engine.

However, chunks are never surfaced directly and can never be found via any search/query, so this work is pointless. Just skip it.

(It would be nice to do this a little more formally on `IndexableInterface` or similar as `isThisAnIndexableObject()`, but we'd have to add like a million empty "yes, index this always" methods to do that, and it seems unlikely that we'll end up with too many other objects like these.)

Test Plan:
  - Ran `bin/harbormaster rebuild-log --id ... --force` before and after change, saw about 200 fewer queries after the change.
  - Uploaded a uniquely named file and searched for it to make sure I didn't break that.

Reviewers: amckinley

Reviewed By: amckinley

Maniphest Tasks: T13164

Differential Revision: https://secure.phabricator.com/D19563
2018-08-03 14:36:12 -07:00
Kenneth Endfinger
6bdd74584e
Fix file encoding migration
Summary:
See [[ https://discourse.phabricator-community.org/t/file-encryption-corruption-when-trying-to-encode-existing-files/1605 | Discourse ]]

When migrating to aes-256-cbc, integrity hashes were not updated, so data was not properly

Test Plan:
I ran [[ https://gist.github.com/kaendfinger/3e0d78350af0ebe4e74b2c8a79707bae | this test script ]] to ensure it worked.
I created some files with lipsum, ensured that after encoding them with aes-256-cbc, they were not able to be cat'd.
After applying this patch and rerunning the script, it worked successfully.

Reviewers: epriestley, amckinley, #blessed_reviewers

Reviewed By: epriestley, #blessed_reviewers

Subscribers: Korvin

Tags: #files, #storage

Differential Revision: https://secure.phabricator.com/D19533
2018-07-23 17:41:10 -05:00
epriestley
70056a9072 When creating a file by downloading a URI, truncate the length of the default name
Summary:
See <https://discourse.phabricator-community.org/t/embedding-external-images-url-show-error-for-long-urls/1339>.

When we download a file from a URI, we provide a default name based on the URI. However, if the URI is something like `http://example.com/very-very-very-....-long.jpg` with more than 255 characters, we may suggest a name which won't fit into the `name` column of `PhabricatorFile`.

Instead, suggest a default name no longer than 64 bytes.

Test Plan:
  - Used the `{image ...}` example from the Discourse report locally; got an image with a truncated name.
  - Used a normal `{image ...}`, got an image file with a normal name.

Reviewers: amckinley

Reviewed By: amckinley

Differential Revision: https://secure.phabricator.com/D19353
2018-04-12 13:29:53 -07:00
epriestley
fb4ce851c4 Add a PDF document "rendering" engine
Summary:
Depends on D19251. Ref T13105. This adds rendering engine support for PDFs.

It doesn't actually render them, it just renders a link which you can click to view them in a new window. This is much easier than actually rendering them inline and at least 95% as good most of the time (and probably more-than-100%-as-good some of the time).

This makes PDF a viewable MIME type by default and adds a narrow CSP exception for it. See also T13112.

Test Plan:
  - Viewed PDFs in Files, got a link to view them in a new tab.
  - Clicked the link in Safari, Chrome, and Firefox; got inline PDFs.
  - Verified primary CSP is still `object-src 'none'` with `curl ...`.
  - Interacted with the vanilla lightbox element to check that it still works.

Maniphest Tasks: T13105

Differential Revision: https://secure.phabricator.com/D19252
2018-03-23 07:14:17 -07:00
epriestley
3c4f31e4b9 Dynamically composite favicons from customizable sources
Summary: Ref T13103. Make favicons customizable, and perform dynamic compositing to add marker to indicate things like "unread messages".

Test Plan: Viewed favicons in Safari, Firefox and Chrome. With unread messages, saw pink dot composited into icon.

Maniphest Tasks: T13103

Differential Revision: https://secure.phabricator.com/D19209
2018-03-12 15:28:41 -07:00
epriestley
ab579f2511 Never generate file download forms which point to the CDN domain, tighten "form-action" CSP
Summary:
Depends on D19155. Ref T13094. Ref T4340.

We can't currently implement a strict `form-action 'self'` content security policy because some file downloads rely on a `<form />` which sometimes POSTs to the CDN domain.

Broadly, stop generating these forms. We just redirect instead, and show an interstitial confirm dialog if no CDN domain is configured. This makes the UX for installs with no CDN domain a little worse and the UX for everyone else better.

Then, implement the stricter Content-Security-Policy.

This also removes extra confirm dialogs for downloading Harbormaster build logs and data exports.

Test Plan:
  - Went through the plain data export, data export with bulk jobs, ssh key generation, calendar ICS download, Diffusion data, Paste data, Harbormaster log data, and normal file data download workflows with a CDN domain.
  - Went through all those workflows again without a CDN domain.
  - Grepped for affected symbols (`getCDNURI()`, `getDownloadURI()`).
  - Added an evil form to a page, tried to submit it, was rejected.
  - Went through the ReCaptcha and Stripe flows again to see if they're submitting any forms.

Subscribers: PHID-OPKG-gm6ozazyms6q6i22gyam

Maniphest Tasks: T13094, T4340

Differential Revision: https://secure.phabricator.com/D19156
2018-02-28 17:20:12 -08:00
epriestley
8b8a3142b3 Support export of data in files larger than 8MB
Summary:
Depends on D18952. Ref T13049. For files larger than 8MB, we need to engage the chunk storage engine. `PhabricatorFile::newFromFileData()` always writes a single chunk, and can't handle files larger than the mandatory chunk threshold (8MB).

Use `IteratorUploadSource`, which can, and "stream" the data into it. This should raise the limit from 8MB to 2GB (maximum size of a string in PHP).

If we need to go above 2GB we could stream CSV and text pretty easily, and JSON without too much trouble, but Excel might be trickier. Hopefully no one is trying to export 2GB+ datafiles, though.

Test Plan:
  - Changed the JSON exporter to just export 8MB of the letter "q": `return str_repeat('q', 1024 * 1024 * 9);`.
  - Before change: fatal, "no storage engine can store this file".
  - After change: export works cleanly.

Reviewers: amckinley

Reviewed By: amckinley

Maniphest Tasks: T13049

Differential Revision: https://secure.phabricator.com/D18953
2018-01-29 15:58:34 -08:00
epriestley
e1d6bad864 Stop trying to assess the image dimensions of large files and file chunks
Summary:
Depends on D18828. Ref T7789. See <https://discourse.phabricator-community.org/t/git-lfs-fails-with-large-images/584>.

Currently, when you upload a large (>4MB) image, we may try to assess the dimensions for the image and for each individual chunk.

At best, this is slow and not useful. At worst, it fatals or consumes a ton of memory and I/O we don't need to be using.

Instead:

  - Don't try to assess dimensions for chunked files.
  - Don't try to assess dimensions for the chunks themselves.
  - Squelch errors for bad data, etc., that `gd` can't actually read, since we recover sensibly.

Test Plan:
  - Created a 2048x2048 PNG in Photoshop using the "Random Noise" filter which weighs 8.5MB.
  - Uploaded it.
  - Before patch: got complaints in log about `imagecreatefromstring()` failing, although the actual upload went OK in my environment.
  - After patch: clean log, no attempt to detect the size of a big image.
  - Also uploaded a small image, got dimensions detected properly still.

Reviewers: amckinley

Reviewed By: amckinley

Maniphest Tasks: T7789

Differential Revision: https://secure.phabricator.com/D18830
2017-12-18 09:17:32 -08:00
epriestley
b04d9fdc0b Add a missing key to PhabricatorFile for destroying Files
Summary: See PHI176. Depends on D18733. We issue a query when deleting files that currently doesn't hit any keys.

Test Plan:
Ran `./bin/remove destroy --force --trace F56376` to get the query.

Ran ##SELECT * FROM `file` WHERE storageEngine = 'blob' AND storageHandle = '23366' LIMIT 1## before and after the change.

Before:

```
mysql> explain SELECT * FROM `file` WHERE storageEngine = 'blob' AND storageHandle = '23366' LIMIT 1;
+----+-------------+-------+------+---------------+------+---------+------+-------+-------------+
| id | select_type | table | type | possible_keys | key  | key_len | ref  | rows  | Extra       |
+----+-------------+-------+------+---------------+------+---------+------+-------+-------------+
|  1 | SIMPLE      | file  | ALL  | NULL          | NULL | NULL    | NULL | 33866 | Using where |
+----+-------------+-------+------+---------------+------+---------+------+-------+-------------+
1 row in set (0.01 sec)
```

After:

```
mysql> explain SELECT * FROM `file` WHERE storageEngine = 'blob' AND storageHandle = '23366' LIMIT 1;
+----+-------------+-------+------+---------------+------------+---------+-------------+------+------------------------------------+
| id | select_type | table | type | possible_keys | key        | key_len | ref         | rows | Extra                              |
+----+-------------+-------+------+---------------+------------+---------+-------------+------+------------------------------------+
|  1 | SIMPLE      | file  | ref  | key_engine    | key_engine | 388     | const,const |  190 | Using index condition; Using where |
+----+-------------+-------+------+---------------+------------+---------+-------------+------+------------------------------------+
1 row in set (0.00 sec)
```

Reviewers: amckinley

Reviewed By: amckinley

Differential Revision: https://secure.phabricator.com/D18734
2017-10-26 18:20:12 -07:00
epriestley
9777c66576 Allow Phabricator to run with "enable_post_data_reading" disabled
Summary:
Ref T13008. Depends on D18701. The overall goal here is to make turning `enable_post_data_reading` off not break things, so we can run rate limiting checks before we read file uploads.

The biggest blocker for this is that turning it off stops `$_FILES` from coming into existence.

This //appears// to mostly work. Specifically:

  - Skip the `max_post_size` check when POST is off, since it's meaningless.
  - Don't read or scrub $_POST at startup when POST is off.
  - When we rebuild REQUEST and POST before processing requests, do multipart parsing if we need to and rebuild FILES.
  - Skip the `is_uploaded_file()` check if we built FILES ourselves.

This probably breaks a couple of small things, like maybe `__profile__` and other DarkConsole triggers over POST, and probably some other weird stuff. The parsers may also need more work than they've received so far.

I also need to verify that this actually works (i.e., lets us run code without reading the request body) but I'll include that in the change where I update the actual rate limiting.

Test Plan:
  - Disabled `enable_post_data_reading`.
  - Uploaded a file with a vanilla upload form (project profile image).
  - Uploaded a file with drag and drop.
  - Used DarkConsole.
  - Submitted comments.
  - Created a task.
  - Browsed around.

Reviewers: amckinley

Reviewed By: amckinley

Maniphest Tasks: T13008

Differential Revision: https://secure.phabricator.com/D18702
2017-10-13 13:11:11 -07:00
epriestley
17fc447503 Don't compute MIME type of noninitial chunks from diffusion.filecontentquery
Summary:
Ref T12857. This is generally fairly fuzzy for now, but here's something concrete: when we build a large file with `diffusion.filecontentquery`, we compute the MIME type of all chunks, not just the initial chunk.

Instead, pass a dummy MIME type to non-initial chunks so we don't try to compute them. This mirrors logic elsewhere, in `file.uploadchunk`. This should perhaps be centralized at some point, but it's a bit tricky since the file doesn't know that it's a chunk until later.

Also, clean up the `TempFile` immediately -- this shouldn't actually affect anything, but we don't need it to live any longer than this.

Test Plan:
  - Made `hashFileContent()` return `null` to skip the chunk cache.
  - Added `phlog()` to the MIME type computation.
  - Loaded a 12MB file in Diffusion.
  - Before patch: Saw 3x MIME type computations, one for each 4MB chunk.
  - After patch: Saw 1x MIME type computation, for initial chunk only.

Reviewers: chad, amckinley

Reviewed By: chad

Maniphest Tasks: T12857

Differential Revision: https://secure.phabricator.com/D18138
2017-06-20 05:45:20 -07:00
Chad Little
d6a620be45 Update Maniphest for modular transactions
Summary: Ref T12671. This modernized Maniphest transactions to modular transactions.

Test Plan:
- Create Task
- Edit Task
- Raise Priority
- Change Status
- Merge as a duplicate
- Create Subtask
- Claim Task
- Assign Project
- Move on Workboard
- Set a cover image
- Assign story points
- Change story points
- Generate lots via lipsum
- Bulk edit tasks
- Leave comments
- Award Token

I'm sure I'm missing something.

Reviewers: epriestley

Reviewed By: epriestley

Subscribers: hazelyang, Korvin

Maniphest Tasks: T12671

Differential Revision: https://secure.phabricator.com/D17844
2017-05-15 10:29:06 -07:00
Austin McKinley
ece9579d25 Switch File deletion to use ModularTransactions
Summary: Fixes T12587. Adds a new `PhabricatorFileDeleteTransaction` that enqueues `File` delete tasks.

Test Plan:
  - hack `PhabricatorFileQuery` to ignore isDeleted state
  - stop daemons
  - upload a file, delete it from the UI
  - check that the DB has updated isDeleted = 1
  - check timeline rendering in `File` detail view
  - start daemons
  - confirm rows are deleted from DB

Reviewers: epriestley

Reviewed By: epriestley

Subscribers: Korvin, thoughtpolice

Maniphest Tasks: T12587

Differential Revision: https://secure.phabricator.com/D17723
2017-04-18 13:01:51 -07:00
Austin McKinley
be00264ae7 Make daemons perform file deletion
Summary:
Deletion is a possibly time-intensive process, especially with large
files that are backed by high-latency, chunked storage (such as
S3). Even ~200mb objects take minutes to delete, which makes for an
unhappy experience. Fixes T10828.

Test Plan:
Delete a large file, and stare in awe of the swiftness with
which I am redirected to the main file application.

Reviewers: #blessed_reviewers, epriestley

Reviewed By: #blessed_reviewers, epriestley

Subscribers: thoughtpolice, Korvin

Maniphest Tasks: T10828

Differential Revision: https://secure.phabricator.com/D15743
2017-04-18 11:09:41 -07:00
Austin McKinley
b54adc6161 Kick off indexing for File objects on creation
Summary: Ensures that newly-made `File` objects get indexed into the new ngrams index. Fixes T8788.

Test Plan:
  - uploaded a file with daemons stopped; confirmed no new rows in ngrams table
  - started daemons; confirmed indexing of previously-uploaded files happened
  - uploaded a new file with daemons running; confirmed it got added to the index

Not sure how to test the changes to `PhabricatorFileUploadSource->writeChunkedFile()` and `PhabricatorChunkedFileStorageEngine->allocateChunks()`. I spent a few minutes trying to find their callers, but the first looks like it requires a Diffusion repo and the 2nd is only accessible via Conduit. I can test that stuff if necessary, but it's such a small change that I'm not worried about it.

Reviewers: epriestley

Reviewed By: epriestley

Subscribers: Korvin

Maniphest Tasks: T8788

Differential Revision: https://secure.phabricator.com/D17718
2017-04-18 08:38:34 -07:00
Austin McKinley
976fbee877 Implement ngram search for File objects
Summary: Follows the outline in D15656 for implementing ngram search for names of File objects. Also created FileFullTextEngine, because without implementing `PhabricatorFulltextInterface`, `./bin/search` complains that `File` is not an indexable type.

Test Plan:
  - ran `./bin/storage upgrade` to apply the schema change
  - confirmed the presence of a new `file_filename_ngrams` table
  - added a couple file objects
  - ran `bin/search index --type file --force`
  - confirmed the presence of rows in `file_filename_ngrams`
  - did a few keyword searches and saw expected results

Reviewers: epriestley

Reviewed By: epriestley

Subscribers: Korvin

Maniphest Tasks: T8788

Differential Revision: https://secure.phabricator.com/D17702
2017-04-17 17:37:20 -07:00
epriestley
a7a068f84c Correct two parameter strictness issues with file uploads
Summary:
Fixes T12531. Strictness fallout from adding typechecking in D17616.

  - `chunkedHash` is not a real parameter, so the new typechecking was unhappy about it.
  - `mime-type` no longer allows `null`.

Test Plan:
  - Ran `arc upload --conduit-uri ... 12MB.zero` on a 12MB file full of zeroes.
  - Before patch: badness, failure, fallback to one-shot uploads.
  - After patch: success and glory.

Reviewers: chad

Subscribers: joshuaspence

Maniphest Tasks: T12531

Differential Revision: https://secure.phabricator.com/D17651
2017-04-10 16:01:15 -07:00
epriestley
08a4225437 Provide "bin/files integrity" for debugging, maintaining and backfilling integrity hashes
Summary:
Ref T12470. Provides an "integrity" utility which runs in these modes:

  - Verify: check that hashes match.
  - Compute: backfill missing hashes.
  - Strip: remove hashes. Useful for upgrading across a hash change.
  - Corrupt: intentionally corrupt hashes. Useful for debugging.
  - Overwrite: force hash recomputation.

Users normally shouldn't need to run any of this stuff, but this provides a reasonable toolkit for managing integrity hashes.

I'll recommend existing installs use `bin/files integrity --compute all` in the upgrade guidance to backfill hashes for existing files.

Test Plan:
  - Ran the script in many modes against various files, saw expected operation, including:
  - Verified a file, corrupted it, saw it fail.
  - Verified a file, stripped it, saw it have no hash.
  - Stripped a file, computed it, got a clean verify.
  - Stripped a file, overwrote it, got a clean verify.
  - Corrupted a file, overwrote it, got a clean verify.
  - Overwrote a file, overwrote again, got a no-op.

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T12470

Differential Revision: https://secure.phabricator.com/D17629
2017-04-06 15:42:43 -07:00
epriestley
63828f5806 Store and verify content integrity checksums for files
Summary:
Ref T12470. This helps defuse attacks where an adversary can directly take control of whatever storage engine files are being stored in and change data there. These attacks would require a significant level of access.

Such attackers could potentially attack ranges of AES-256-CBC encrypted files by using Phabricator as a decryption oracle if they were also able to compromise a Phabricator account with read access to the files.

By storing a hash of the data (and, in the case of AES-256-CBC files, the IV) when we write files, and verifying it before we decrypt or read them, we can detect and prevent this kind of tampering.

This also helps detect mundane corruption and integrity issues.

Test Plan:
  - Added unit tests.
  - Uploaded new files, saw them get integrity hashes.
  - Manually corrupted file data, saw it fail. Used `bin/files cat --salvage` to read it anyway.
  - Tampered with IVs, saw integrity failures.

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T12470

Differential Revision: https://secure.phabricator.com/D17625
2017-04-05 11:12:31 -07:00
epriestley
45fc4f6f64 Iterate over ranges correctly for encryped files
Summary:
Fixes T12079. Currently, when a file is encrypted and a request has "Content-Range", we apply the range first, //then// decrypt the result. This doesn't work since you can't start decrypting something from somewhere in the middle (at least, not with our cipher selection).

Instead: decrypt the result, //then// apply the range.

Test Plan: Added failing unit tests, made them pass

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T12079

Differential Revision: https://secure.phabricator.com/D17623
2017-04-05 09:56:30 -07:00
epriestley
58011a4e8e Upgrade File content hashing to SHA256
Summary:
Ref T12464. This defuses any possible SHA1-collision attacks by using SHA256, for which there is no known collision.

(SHA256 hashes are larger -- 256 bits -- so expand the storage column to 64 bytes to hold them.)

Test Plan:
  - Uploaded the same file twice, saw the two files generate the same SHA256 content hash and use the same underlying data.
  - Tried with a fake hash algorihtm ("quackxyz") to make sure the failure mode worked/degraded correctly if we don't have SHA256 for some reason. Got two valid files with two copies of the same data, as expected.

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T12464

Differential Revision: https://secure.phabricator.com/D17620
2017-04-04 16:23:08 -07:00
epriestley
440ef5b7a7 Remove SHA1 file content hashing and make Files work without any hashing
Summary:
Ref T12464. We currently use SHA1 to detect when two files have the same content so we don't have to store two copies of the data.

Now that a SHA1 collision is known, this is theoretically dangerous. T12464 describes the shape of a possible attack.

Before replacing this with something more robust, shore things up so things work correctly if we don't hash at all. This mechanism is entirely optional; it only helps us store less data if some files are duplicates.

(This mechanism is also less important now than it once was, before we added temporary files.)

Test Plan: Uploaded multiple identical files, saw the uploads work and the files store separate copies of the same data.

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T12464

Differential Revision: https://secure.phabricator.com/D17619
2017-04-04 16:22:10 -07:00
epriestley
873b39be82 Remove PhabricatorFile::buildFromFileDataOrHash()
Summary:
Ref T12464. This is a very old method which can return an existing file instead of creating a new one, if there's some existing file with the same content.

In the best case this is a bad idea. This being somewhat reasonable predates policies, temporary files, etc. Modern methods like `newFromFileData()` do this right: they share underlying data in storage, but not the actual `File` records.

Specifically, this is the case where we get into trouble:

  - I upload a private file with content "X".
  - You somehow generate a file with the same content by, say, viewing a raw diff in Differential.
  - If the diff had the same content, you get my file, but you don't have permission to see it or whatever so everything breaks and is terrible.

Just get rid of this.

Test Plan:
  - Generated an SSH key.
  - Viewed a raw diff in Differential.
  - (Did not test Phragment.)

Reviewers: chad

Reviewed By: chad

Subscribers: hach-que

Maniphest Tasks: T12464

Differential Revision: https://secure.phabricator.com/D17617
2017-04-04 16:18:00 -07:00
epriestley
45b386596e Make the Files "TTL" API more structured
Summary:
Ref T11357. When creating a file, callers can currently specify a `ttl`. However, it isn't unambiguous what you're supposed to pass, and some callers get it wrong.

For example, to mean "this file expires in 60 minutes", you might pass either of these:

  - `time() + phutil_units('60 minutes in seconds')`
  - `phutil_units('60 minutes in seconds')`

The former means "60 minutes from now". The latter means "1 AM, January 1, 1970". In practice, because the GC normally runs only once every four hours (at least, until recently), and all the bad TTLs are cases where files are normally accessed immediately, these 1970 TTLs didn't cause any real problems.

Split `ttl` into `ttl.relative` and `ttl.absolute`, and make sure the values are sane. Then correct all callers, and simplify out the `time()` calls where possible to make switching to `PhabricatorTime` easier.

Test Plan:
- Generated an SSH keypair.
- Viewed a changeset.
- Viewed a raw diff.
- Viewed a commit's file data.
- Viewed a temporary file's details, saw expiration date and relative time.
- Ran unit tests.
- (Didn't really test Phragment.)

Reviewers: chad

Reviewed By: chad

Subscribers: hach-que

Maniphest Tasks: T11357

Differential Revision: https://secure.phabricator.com/D17616
2017-04-04 16:16:28 -07:00
epriestley
2369fa38e1 Provide a modern ("v3") API for querying files ("file.search")
Summary: Ref T11357. Implements a modern `file.search` for files, and freezes `file.info`.

Test Plan: Ran `file.search` from the Conduit console.

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T11357

Differential Revision: https://secure.phabricator.com/D17612
2017-04-04 16:15:36 -07:00
epriestley
260a08a128 Move Files editing and commenting to EditEngine
Summary:
Ref T11357. This moves editing and commenting (but not creation) to EditEngine.

Since only the name is really editable, this is pretty straightforward.

Test Plan: Renamed files; commented on files.

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T11357

Differential Revision: https://secure.phabricator.com/D17611
2017-04-04 16:15:11 -07:00
epriestley
8500f78e45 Move Files to ModularTransactions
Summary: Ref T11357. A lot of file creation doesn't go through transactions, so we only actually have one real transaction type: editing a file name.

Test Plan:
Created and edited files.

{F4559287}

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T11357

Differential Revision: https://secure.phabricator.com/D17610
2017-04-04 10:25:05 -07:00
epriestley
706c21375e Remove empty implementations of describeAutomaticCapabilities()
Summary:
This has been replaced by `PolicyCodex` after D16830. Also:

  - Rebuild Celerity map to fix grumpy unit test.
  - Fix one issue on the policy exception workflow to accommodate the new code.

Test Plan:
  - `arc unit --everything`
  - Viewed policy explanations.
  - Viewed policy errors.

Reviewers: chad

Reviewed By: chad

Subscribers: hach-que, PHID-OPKG-gm6ozazyms6q6i22gyam

Differential Revision: https://secure.phabricator.com/D16831
2016-11-09 15:24:22 -08:00
Josh Cox
eea540c5e4 Endpoint+controller for a remarkup image proxy
Summary:
Ref T4190. Currently only have the endpoint and controller working. I added caching so subsequent attempts to proxy the same image should result in the same redirect URL. Still need to:

- Write a remarkup rule that uses the endpoint

Test Plan: Hit /file/imageproxy/?uri=http://i.imgur.com/nTvVrYN.jpg and are served the picture

Reviewers: epriestley, #blessed_reviewers

Reviewed By: epriestley, #blessed_reviewers

Subscribers: Korvin, epriestley, yelirekim

Maniphest Tasks: T4190

Differential Revision: https://secure.phabricator.com/D16581
2016-09-23 10:28:24 -04:00
epriestley
af5769a6be Add a "--copy" flag to "bin/files migrate"
Summary:
Ref T11596. When exporting data from the Phacility cluster, we `bin/files migrate` data from S3 into a database dump on the `aux` tier.

With current semantics, this //moves// the data and destroys it in S3.

Add a `--copy` flag to //copy// the data instead. This leaves the old copy around, which is what we want for exports.

Test Plan:
  - Ran `bin/files migrate` to go from `blob` to `disk` with `--copy`. Verified a copy was left in the database.
  - Copied it back, verified a copy was left on disk (total: 2 database copies, 1 disk copy).
  - Moved it back without copy, verified database was destroyed and disk was created (total: 1 database copy, 2 disk copies).
  - Moved it back without copy, verified local disk was destroyed and blob was created (total: 2 datbabase copies, 1 disk copy).

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T11596

Differential Revision: https://secure.phabricator.com/D16497
2016-09-06 13:53:59 -07:00
epriestley
56bd762dd3 Allow file comments to be edited
Summary:
Fixes T10750. Files have some outdated cache/key code which prevents recording an edit history on file comments.

Remove this ancient cruft.

(Users must `bin/storage adjust` after upgrading to this patch to reap the benefits.)

Test Plan:
  - Ran `bin/storage adjust`.
  - Edited a comment in Files.

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T10750

Differential Revision: https://secure.phabricator.com/D16312
2016-07-18 16:17:43 -07:00
epriestley
8ad61d0150 Simplify "builtin file" management and recover from races
Summary:
Fixes T11307. Fixes T8124. Currently, builtin files are tracked by using a special transform with an invalid source ID.

Just use a dedicated column instead. The transform thing is too clever/weird/hacky and exposes us to issues with the "file" and "transform" tables getting out of sync (possibly the issue in T11307?) and with race conditions.

Test Plan:
  - Loaded profile "edit picture" page, saw builtins.
  - Deleted all builtin files, put 3 second sleep in the storage engine write, loaded profile page in two windows.
    - Before patch: one of them failed with a race.
    - After patch: both of them loaded.

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T8124, T11307

Differential Revision: https://secure.phabricator.com/D16271
2016-07-11 09:25:34 -07:00
epriestley
01862b8f23 Detect the MIME type of large files by examining the first chunk
Summary:
Fixes T11242. See that task for detailed discussion.

Previously, it didn't particularly matter that we don't MIME detect chunked files since they were all just big blobs of junk (PSDs, zips/tarballs, whatever) that we handled uniformly.

However, videos are large and the MIME type also matters.

  - Detect the overall mime type by detecitng the MIME type of the first chunk. This appears to work properly, at least for video.
  - Skip mime type detection on other chunks, which we were performing and ignoring. This makes uploading chunked files a little faster since we don't need to write stuff to disk.

Test Plan:
Uploaded a 50MB video locally, saw it as chunks with a "video/mp4" mime type, played it in the browser in Phabricator as an embedded HTML 5 video.

{F1706837}

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T11242

Differential Revision: https://secure.phabricator.com/D16204
2016-06-30 13:57:39 -07:00
epriestley
67084a6953 Support AES256 at-rest encryption in Files
Summary:
Ref T11140. This makes encryption actually work:

  - Provide a new configuation option, `keyring`, for specifying encryption keys.
  - One key may be marked as `default`. This activates AES256 encryption for Files.
  - Add `bin/files generate-key`. This is helps when generating valid encryption keys.
  - Add `bin/files encode`. This changes the storage encoding of a file, and helps test encodings and migrate existing data.
  - Add `bin/files cycle`. This re-encodes the block key with a new master key, if your master key leaks or you're just paraonid.
  - Document all these options and behaviors.

Test Plan:
  - Configured a bad `keyring`, hit a bunch of different errors.
  - Used `bin/files generate-key` to try to generate bad keys, got appropriate errors ("raw doesn't support keys", etc).
  - Used `bin/files generate-key` to generate an AES256 key.
  - Put the new AES256 key into the `keyring`, without `default`.
  - Uploaded a new file, verified it still uploaded as raw data (no `default` key yet).
  - Used `bin/files encode` to change a file to ROT13 and back to raw. Verified old data got deleted and new data got stored properly.
  - Used `bin/files encode --key ...` to explicitly convert a file to AES256 with my non-default key.
  - Forced a re-encode of an AES256 file, verified the old data was deleted and a new key and IV were generated.
  - Used `bin/files cycle` to try to cycle raw/rot13 files, got errors.
  - Used `bin/files cycle` to cycle AES256 files. Verified metadata changed but file data did not. Verified file data was still decryptable with metadata.
  - Ran `bin/files cycle --all`.
  - Ran `encode` and `cycle` on chunked files, saw commands fail properly. These commands operate on the underlying data blocks, not the chunk metadata.
  - Set key to `default`, uploaded a file, saw it stored as AES256.
  - Read documentation.

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T11140

Differential Revision: https://secure.phabricator.com/D16127
2016-06-16 08:08:56 -07:00
epriestley
39afc0f97c Add an AES256 storage format for at-rest encryption
Summary:
Ref T11140. This doesn't do anything yet since there's no way to enable it and no way to store master keys.

Those are slightly tougher problems and I'm not totally satisfied that I have an approach I really like for either problem, so I may wait for a bit before tackling them. Once they're solved, this does the mechanical encrypt/decrypt stuff, though.

This design is substantially similar to the AWS S3 server-side encryption design, and intended as an analog for it. The decisions AWS has made in design generally seem reasonable to me.

Each block of file data is encrypted with a unique key and a unique IV, and then that key and IV are encrypted with the master key (and a distinct, unique IV). This is better than just encrypting with the master key directly because:

  - You can rotate the master key later and only need to re-encrypt a small amount of key data (about 48 bytes per file chunk), instead of re-encrypting all of the actual file data (up to 4MB per file chunk).
  - Instead of putting the master key on every server, you can put it on some dedicated keyserver which accepts encrypted keys, decrypts them, and returns plaintext keys, and can send it 32-byte keys for decryption instead of 4MB blocks of file data.
  - You have to compromise the master key, the database, AND the file store to get the file data. This is probably not much of a barrier realistically, but it does make attacks very slightly harder.

The "KeyRing" thing may change once I figure out how I want users to store master keys, but it was the simplest approach to get the unit tests working.

Test Plan:
  - Ran unit tests.
  - Dumped raw data, saw encrypted blob.
  - No way to actually use this in the real application yet so it can't be tested too extensively.

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T11140

Differential Revision: https://secure.phabricator.com/D16124
2016-06-16 08:05:57 -07:00
epriestley
1049feb0ed Add support to Files for file storage formats, to support encryption-at-rest
Summary:
Ref T11140. When reading and writing files, we optionally apply a "storage format" to them.

The default format is "raw", which means we just store the raw data.

This change modularizes formats and adds a "rot13" format, which proves formatting works and is testable. In the future, I'll add real encryption formats.

Test Plan:
  - Added unit tests.
  - Viewed files in web UI.
  - Changed a file's format to rot13, saw the data get rotated on display.
  - Set default format to rot13:
    - Uploaded a small file, verified data was stored as rot13.
    - Uploaded a large file, verified metadata was stored as "raw" (just a type, no actual data) and blob data was stored as rot13.

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T11140

Differential Revision: https://secure.phabricator.com/D16122
2016-06-15 11:17:53 -07:00
epriestley
411cf13457 Add Videos to Remarkup
Summary: Ref T6916. Added video to remarkup using D7156 as reference.

Test Plan:
  - Viewed video files (MP4, Ogg) in Safari, Chrome, Firefox (some don't work, e.g., OGG in Safari, but nothing we can really do about that).
  - Used `alt`.
  - Used `autoplay`.
  - Used `loop`.
  - Used `media=audio`.
  - Viewed file detail page.

Reviewers: nateguchi2, chad, #blessed_reviewers

Reviewed By: chad, #blessed_reviewers

Subscribers: asherkin, ivo, joshuaspence, Korvin, epriestley

Tags: #remarkup

Maniphest Tasks: T6916

Differential Revision: https://secure.phabricator.com/D11297
2016-06-07 13:20:25 -07:00
epriestley
f2c36a934e Provide an <input type="file"> control in Remarkup for mobile and users with esoteric windowing systems
Summary:
Ref T5187. This definitely feels a bit flimsy and I'm going to hold it until I cut the release since it changes a couple of things about Workflow in general, but it seems to work OK and most of it is fine.

The intent is described in T5187#176236.

In practice, most of that works like I describe, then the `phui-file-upload` behavior gets some weird glue to figure out if the input is part of the form. Not the most elegant system, but I think it'll hold until we come up with many reasons to write a lot more Javascript.

Test Plan:
Used both drag-and-drop and the upload dialog to upload files in Safari, Firefox and Chrome.

{F1653716}

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T5187

Differential Revision: https://secure.phabricator.com/D15953
2016-05-20 16:24:22 -07:00
epriestley
5664c838fb Reduce thumbnail flickering in comment previews
Summary:
Ref T10262. Currently, we always render a tag like this when you `{F123}` an image in remarkup:

```
<img src="/xform/preview/abcdef/" />
```

This either generates the preview or redirects to an existing preview. This is a good behavior in general, because the preview may take a while to generate and we don't want to wait for it to generate on the server side.

However, this flickers a lot in Safari. We might be able to cache this, but we really shouldn't, since the preview URI isn't a legitimately stable/permanent one.

Instead, do a (cheap) server-side check to see if the preview already exists. If it does, return a direct URI. This gives us a stable thumbnail in Safari.

Test Plan:
  - Dragged a dog picture into comment box.
  - Typed text.
  - Thing didn't flicker like crazy all the time in Safari.

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T10262

Differential Revision: https://secure.phabricator.com/D15646
2016-04-06 15:52:52 -07:00
epriestley
439821c7b2 Don't require one-time tokens to view file resources
Summary:
Ref T10262. This removes one-time tokens and makes file data responses always-cacheable (for 30 days).

The URI will stop working once any attached object changes its view policy, or the file view policy itself changes.

Files with `canCDN` (totally public data like profile images, CSS, JS, etc) use "cache-control: public" so they can be CDN'd.

Files without `canCDN` use "cache-control: private" so they won't be cached by the CDN. They could still be cached by a misbehaving local cache, but if you don't want your users seeing one anothers' secret files you should configure your local network properly.

Our "Cache-Control" headers were also from 1999 or something, update them to be more modern/sane. I can't find any evidence that any browser has done the wrong thing with this simpler ruleset in the last ~10 years.

Test Plan:
  - Configured alternate file domain.
  - Viewed site: stuff worked.
  - Accessed a file on primary domain, got redirected to alternate domain.
  - Verified proper cache headers for `canCDN` (public) and non-`canCDN` (private) files.
  - Uploaded a file to a task, edited task policy, verified it scrambled the old URI.
  - Reloaded task, new URI generated transparently.

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T10262

Differential Revision: https://secure.phabricator.com/D15642
2016-04-06 14:14:36 -07:00
epriestley
f9836cb646 Scramble file secrets when related objects change policies
Summary:
Ref T10262. Files have an internal secret key which is partially used to control access to them, and determines part of the URL you need to access them. Scramble (regenerate) the secret when:

  - the view policy for the file itself changes (and the new policy is not "public" or "all users"); or
  - the view policy or space for an object the file is attached to changes (and the file policy is not "public" or "all users").

This basically means that when you change the visibility of a task, any old URLs for attached files stop working and new ones are implicitly generated.

Test Plan:
  - Attached a file to a task, used `SELECT * FROM file WHERE id = ...` to inspect the secret.
  - Set view policy to public, same secret.
  - Set view policy to me, new secret.
  - Changed task view policy, new secret.
  - Changed task space, new secret.
  - Changed task title, same old secret.
  - Added and ran unit tests which cover this behavior.

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T10262

Differential Revision: https://secure.phabricator.com/D15641
2016-04-06 14:14:16 -07:00
epriestley
772c658aac Convert one-time file access tokens to modular token types
Summary: Fixes T10603. This is the last of the ad-hoc temporary tokens.

Test Plan:
  - Used a file token.
  - Viewed type in {nav Config > Temporary Tokens}.

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T10603

Differential Revision: https://secure.phabricator.com/D15481
2016-03-16 09:34:52 -07:00