Summary:
Depends on D18952. Ref T13049. For files larger than 8MB, we need to engage the chunk storage engine. `PhabricatorFile::newFromFileData()` always writes a single chunk, and can't handle files larger than the mandatory chunk threshold (8MB).
Use `IteratorUploadSource`, which can, and "stream" the data into it. This should raise the limit from 8MB to 2GB (maximum size of a string in PHP).
If we need to go above 2GB we could stream CSV and text pretty easily, and JSON without too much trouble, but Excel might be trickier. Hopefully no one is trying to export 2GB+ datafiles, though.
Test Plan:
- Changed the JSON exporter to just export 8MB of the letter "q": `return str_repeat('q', 1024 * 1024 * 9);`.
- Before change: fatal, "no storage engine can store this file".
- After change: export works cleanly.
Reviewers: amckinley
Reviewed By: amckinley
Maniphest Tasks: T13049
Differential Revision: https://secure.phabricator.com/D18953
Summary:
See <https://discourse.phabricator-community.org/t/files-created-from-repository-contents-slightly-over-one-chunk-in-size-are-truncated-to-exactly-one-chunk-in-size/988/1>. Three issues here:
- When we finish reading `git cat-file ...` or whatever, we can end up with more than one chunk worth of bytes left in the internal buffer if the read is fast. Use `while` instead of `if` to make sure we write the whole buffer.
- Limiting output with `setStdoutSizeLimit()` isn't really a reliable way to limit the size if we're also reading from the buffer. It's also pretty indirect and confusing. Instead, just let the `FileUploadSource` explicitly implement a byte limit in a straightforward way.
- We weren't setting the time limit correctly on the main path.
Overall, this could cause >4MB files to "write" as 4MB files, with the rest of the file left in the UploadSource buffer. Since these files were technically under the limit, they could return as valid. This was intermittent.
Test Plan:
- Pushed a ~4.2MB file.
- Reloaded Diffusion a bunch, sometimes saw the `while/if` buffer race and produce a 4MB file with a prompt to download it. (Other times, the buffer worked right and the page just says "this file is too big, sorry").
- Applied patches.
- Reloaded Diffusion a bunch, no longer saw bad behavior or truncated files.
Reviewers: amckinley
Reviewed By: amckinley
Differential Revision: https://secure.phabricator.com/D18885
Summary:
Depends on D18828. Ref T7789. See <https://discourse.phabricator-community.org/t/git-lfs-fails-with-large-images/584>.
Currently, when you upload a large (>4MB) image, we may try to assess the dimensions for the image and for each individual chunk.
At best, this is slow and not useful. At worst, it fatals or consumes a ton of memory and I/O we don't need to be using.
Instead:
- Don't try to assess dimensions for chunked files.
- Don't try to assess dimensions for the chunks themselves.
- Squelch errors for bad data, etc., that `gd` can't actually read, since we recover sensibly.
Test Plan:
- Created a 2048x2048 PNG in Photoshop using the "Random Noise" filter which weighs 8.5MB.
- Uploaded it.
- Before patch: got complaints in log about `imagecreatefromstring()` failing, although the actual upload went OK in my environment.
- After patch: clean log, no attempt to detect the size of a big image.
- Also uploaded a small image, got dimensions detected properly still.
Reviewers: amckinley
Reviewed By: amckinley
Maniphest Tasks: T7789
Differential Revision: https://secure.phabricator.com/D18830
Summary:
Ref T12857. This is generally fairly fuzzy for now, but here's something concrete: when we build a large file with `diffusion.filecontentquery`, we compute the MIME type of all chunks, not just the initial chunk.
Instead, pass a dummy MIME type to non-initial chunks so we don't try to compute them. This mirrors logic elsewhere, in `file.uploadchunk`. This should perhaps be centralized at some point, but it's a bit tricky since the file doesn't know that it's a chunk until later.
Also, clean up the `TempFile` immediately -- this shouldn't actually affect anything, but we don't need it to live any longer than this.
Test Plan:
- Made `hashFileContent()` return `null` to skip the chunk cache.
- Added `phlog()` to the MIME type computation.
- Loaded a 12MB file in Diffusion.
- Before patch: Saw 3x MIME type computations, one for each 4MB chunk.
- After patch: Saw 1x MIME type computation, for initial chunk only.
Reviewers: chad, amckinley
Reviewed By: chad
Maniphest Tasks: T12857
Differential Revision: https://secure.phabricator.com/D18138
Summary: Fixes T12828. This API was tightened up D17616, but I missed this callsite.
Test Plan: I've just got a good feeling in my bones.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T12828
Differential Revision: https://secure.phabricator.com/D18119
Summary: Ensures that newly-made `File` objects get indexed into the new ngrams index. Fixes T8788.
Test Plan:
- uploaded a file with daemons stopped; confirmed no new rows in ngrams table
- started daemons; confirmed indexing of previously-uploaded files happened
- uploaded a new file with daemons running; confirmed it got added to the index
Not sure how to test the changes to `PhabricatorFileUploadSource->writeChunkedFile()` and `PhabricatorChunkedFileStorageEngine->allocateChunks()`. I spent a few minutes trying to find their callers, but the first looks like it requires a Diffusion repo and the 2nd is only accessible via Conduit. I can test that stuff if necessary, but it's such a small change that I'm not worried about it.
Reviewers: epriestley
Reviewed By: epriestley
Subscribers: Korvin
Maniphest Tasks: T8788
Differential Revision: https://secure.phabricator.com/D17718
Summary: Fixes T12536. Nothing reads this parameter; `PhabricatorFile::newChunkedFile` sets the `isPartial` flag automatically.
Test Plan: Grepped for `isPartial`.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T12536
Differential Revision: https://secure.phabricator.com/D17654
Summary:
Ref T11357. When creating a file, callers can currently specify a `ttl`. However, it isn't unambiguous what you're supposed to pass, and some callers get it wrong.
For example, to mean "this file expires in 60 minutes", you might pass either of these:
- `time() + phutil_units('60 minutes in seconds')`
- `phutil_units('60 minutes in seconds')`
The former means "60 minutes from now". The latter means "1 AM, January 1, 1970". In practice, because the GC normally runs only once every four hours (at least, until recently), and all the bad TTLs are cases where files are normally accessed immediately, these 1970 TTLs didn't cause any real problems.
Split `ttl` into `ttl.relative` and `ttl.absolute`, and make sure the values are sane. Then correct all callers, and simplify out the `time()` calls where possible to make switching to `PhabricatorTime` easier.
Test Plan:
- Generated an SSH keypair.
- Viewed a changeset.
- Viewed a raw diff.
- Viewed a commit's file data.
- Viewed a temporary file's details, saw expiration date and relative time.
- Ran unit tests.
- (Didn't really test Phragment.)
Reviewers: chad
Reviewed By: chad
Subscribers: hach-que
Maniphest Tasks: T11357
Differential Revision: https://secure.phabricator.com/D17616
Summary:
Ref T7789. Ref T10604. This implements the `upload` action, which streams file data into Files.
This makes Git LFS actually work, at least roughly.
Test Plan:
- Tracked files in an LFS repository.
- Pushed LFS data (`git lfs track '*.png'; git add something.png; git commit -m ...; git push`).
- Pulled LFS data (`git checkout master^; rm -rf .git/lfs; git checkout master; open something.png`).
- Verified LFS refs show up in the gitlfsref table.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T7789, T10604
Differential Revision: https://secure.phabricator.com/D15492
Summary: Fixes T10273. The threshold is `null` if no chunk engines are available, but the code didn't handle this properly.
Test Plan: Disabled all chunk engines, reloaded, hit issue described in task. Applied patch, got clean file content.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T10273
Differential Revision: https://secure.phabricator.com/D15179
Summary:
Fixes T10186. After D14970, `diffusion.filecontentquery` puts the content in a file and returns the file PHID.
However, it does this in a way that doesn't go through the chunking engine, so it will fail for files larger than the chunk threshold (generally, 8MB).
Instead, stream the file from the underlying command directly into chunked storage.
Test Plan:
- Made a commit including a really big file: 4dcd4c492b
- Used `diffusion.filecontentquery` to load file content.
- Parsed/imported commit locally.
- Used `diffusion.filecontentquery` to load content for smaller files (README, etc).
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T10186
Differential Revision: https://secure.phabricator.com/D15072