1
0
Fork 0
mirror of https://we.phorge.it/source/phorge.git synced 2024-12-01 11:12:42 +01:00
phorge-phorge/src/applications/files/engine/PhabricatorChunkedFileStorageEngine.php

189 lines
5 KiB
PHP
Raw Normal View History

Add a chunking storage engine for files Summary: Ref T7149. This isn't complete and isn't active yet, but does basically work. I'll shore it up in the next few diffs. The new workflow goes like this: > Client, file.allocate(): I'd like to upload a file with length L, metadata M, and hash H. Then the server returns `upload` (a boolean) and `filePHID` (a PHID). These mean: | upload | filePHID | means | |---|---|---| | false | false | Server can't accept file. | false | true | File data already known, file created from hash. | true | false | Just upload normally. | true | true | Query chunks to start or resume a chunked upload. All but the last case are uninteresting and work like exising uploads with `file.uploadhash` (which we can eventually deprecate). In the last case: > Client, file.querychunks(): Give me a list of chunks that I should upload. This returns all the chunks for the file. Chunks have a start byte, an end byte, and a "complete" flag to indicate that the server already has the data. Then, the client fills in chunks by sending them: > Client, file.uploadchunk(): Here is the data for one chunk. This stuff doesn't work yet or has some caveats: - I haven't tested resume much. - Files need an "isPartial()" flag for partial uploads, and the UI needs to respect it. - The JS client needs to become chunk-aware. - Chunk size is set crazy low to make testing easier. - Some debugging flags that I'll remove soon-ish. - Downloading works, but still streams the whole file into memory. - This storage engine is disabled by default (hardcoded as a unit test engine) because it's still sketchy. - Need some code to remove the "isParital" flag when the last chunk is uploaded. - Maybe do checksumming on chunks. Test Plan: - Hacked up `arc upload` (see next diff) to be chunk-aware and uploaded a readme in 18 32-byte chunks. Then downloaded it. Got the same file back that I uploaded. - File UI now shows some basic chunk info for chunked files: {F336434} Reviewers: btrahan Reviewed By: btrahan Subscribers: joshuaspence, epriestley Maniphest Tasks: T7149 Differential Revision: https://secure.phabricator.com/D12060
2015-03-13 19:30:02 +01:00
<?php
final class PhabricatorChunkedFileStorageEngine
extends PhabricatorFileStorageEngine {
public function getEngineIdentifier() {
return 'chunks';
}
public function getEnginePriority() {
return 60000;
}
/**
* We can write chunks if we have at least one valid storage engine
* underneath us.
*/
public function canWriteFiles() {
return (bool)$this->getWritableEngine();
}
public function hasFilesizeLimit() {
return false;
}
public function isChunkEngine() {
return true;
}
public function writeFile($data, array $params) {
// The chunk engine does not support direct writes.
throw new PhutilMethodNotImplementedException();
}
public function readFile($handle) {
// This is inefficient, but makes the API work as expected.
$chunks = $this->loadAllChunks($handle, true);
$buffer = '';
foreach ($chunks as $chunk) {
$data_file = $chunk->getDataFile();
if (!$data_file) {
throw new Exception(pht('This file data is incomplete!'));
}
$buffer .= $chunk->getDataFile()->loadFileData();
}
return $buffer;
}
public function deleteFile($handle) {
$engine = new PhabricatorDestructionEngine();
$chunks = $this->loadAllChunks($handle);
foreach ($chunks as $chunk) {
$engine->destroyObject($chunk);
}
}
private function loadAllChunks($handle, $need_files) {
$chunks = id(new PhabricatorFileChunkQuery())
->setViewer(PhabricatorUser::getOmnipotentUser())
->withChunkHandles(array($handle))
->needDataFiles($need_files)
->execute();
$chunks = msort($chunks, 'getByteStart');
return $chunks;
}
/**
* Compute a chunked file hash for the viewer.
*
* We can not currently compute a real hash for chunked file uploads (because
* no process sees all of the file data).
*
* We also can not trust the hash that the user claims to have computed. If
* we trust the user, they can upload some `evil.exe` and claim it has the
* same file hash as `good.exe`. When another user later uploads the real
* `good.exe`, we'll just create a reference to the existing `evil.exe`. Users
* who download `good.exe` will then receive `evil.exe`.
*
* Instead, we rehash the user's claimed hash with account secrets. This
* allows users to resume file uploads, but not collide with other users.
*
* Ideally, we'd like to be able to verify hashes, but this is complicated
* and time consuming and gives us a fairly small benefit.
*
* @param PhabricatorUser Viewing user.
* @param string Claimed file hash.
* @return string Rehashed file hash.
*/
public static function getChunkedHash(PhabricatorUser $viewer, $hash) {
if (!$viewer->getPHID()) {
throw new Exception(
pht('Unable to compute chunked hash without real viewer!'));
}
$input = $viewer->getAccountSecret().':'.$hash.':'.$viewer->getPHID();
return self::getChunkedHashForInput($input);
}
public static function getChunkedHashForInput($input) {
$rehash = PhabricatorHash::digest($input);
// Add a suffix to identify this as a chunk hash.
$rehash = substr($rehash, 0, -2).'-C';
return $rehash;
Add a chunking storage engine for files Summary: Ref T7149. This isn't complete and isn't active yet, but does basically work. I'll shore it up in the next few diffs. The new workflow goes like this: > Client, file.allocate(): I'd like to upload a file with length L, metadata M, and hash H. Then the server returns `upload` (a boolean) and `filePHID` (a PHID). These mean: | upload | filePHID | means | |---|---|---| | false | false | Server can't accept file. | false | true | File data already known, file created from hash. | true | false | Just upload normally. | true | true | Query chunks to start or resume a chunked upload. All but the last case are uninteresting and work like exising uploads with `file.uploadhash` (which we can eventually deprecate). In the last case: > Client, file.querychunks(): Give me a list of chunks that I should upload. This returns all the chunks for the file. Chunks have a start byte, an end byte, and a "complete" flag to indicate that the server already has the data. Then, the client fills in chunks by sending them: > Client, file.uploadchunk(): Here is the data for one chunk. This stuff doesn't work yet or has some caveats: - I haven't tested resume much. - Files need an "isPartial()" flag for partial uploads, and the UI needs to respect it. - The JS client needs to become chunk-aware. - Chunk size is set crazy low to make testing easier. - Some debugging flags that I'll remove soon-ish. - Downloading works, but still streams the whole file into memory. - This storage engine is disabled by default (hardcoded as a unit test engine) because it's still sketchy. - Need some code to remove the "isParital" flag when the last chunk is uploaded. - Maybe do checksumming on chunks. Test Plan: - Hacked up `arc upload` (see next diff) to be chunk-aware and uploaded a readme in 18 32-byte chunks. Then downloaded it. Got the same file back that I uploaded. - File UI now shows some basic chunk info for chunked files: {F336434} Reviewers: btrahan Reviewed By: btrahan Subscribers: joshuaspence, epriestley Maniphest Tasks: T7149 Differential Revision: https://secure.phabricator.com/D12060
2015-03-13 19:30:02 +01:00
}
public function allocateChunks($length, array $properties) {
$file = PhabricatorFile::newChunkedFile($this, $length, $properties);
$chunk_size = $this->getChunkSize();
$handle = $file->getStorageHandle();
$chunks = array();
for ($ii = 0; $ii < $length; $ii += $chunk_size) {
$chunks[] = PhabricatorFileChunk::initializeNewChunk(
$handle,
$ii,
min($ii + $chunk_size, $length));
}
$file->openTransaction();
foreach ($chunks as $chunk) {
$chunk->save();
}
$file->save();
$file->saveTransaction();
return $file;
}
/**
* Find a storage engine which is suitable for storing chunks.
*
* This engine must be a writable engine, have a filesize limit larger than
* the chunk limit, and must not be a chunk engine itself.
*/
Add a chunking storage engine for files Summary: Ref T7149. This isn't complete and isn't active yet, but does basically work. I'll shore it up in the next few diffs. The new workflow goes like this: > Client, file.allocate(): I'd like to upload a file with length L, metadata M, and hash H. Then the server returns `upload` (a boolean) and `filePHID` (a PHID). These mean: | upload | filePHID | means | |---|---|---| | false | false | Server can't accept file. | false | true | File data already known, file created from hash. | true | false | Just upload normally. | true | true | Query chunks to start or resume a chunked upload. All but the last case are uninteresting and work like exising uploads with `file.uploadhash` (which we can eventually deprecate). In the last case: > Client, file.querychunks(): Give me a list of chunks that I should upload. This returns all the chunks for the file. Chunks have a start byte, an end byte, and a "complete" flag to indicate that the server already has the data. Then, the client fills in chunks by sending them: > Client, file.uploadchunk(): Here is the data for one chunk. This stuff doesn't work yet or has some caveats: - I haven't tested resume much. - Files need an "isPartial()" flag for partial uploads, and the UI needs to respect it. - The JS client needs to become chunk-aware. - Chunk size is set crazy low to make testing easier. - Some debugging flags that I'll remove soon-ish. - Downloading works, but still streams the whole file into memory. - This storage engine is disabled by default (hardcoded as a unit test engine) because it's still sketchy. - Need some code to remove the "isParital" flag when the last chunk is uploaded. - Maybe do checksumming on chunks. Test Plan: - Hacked up `arc upload` (see next diff) to be chunk-aware and uploaded a readme in 18 32-byte chunks. Then downloaded it. Got the same file back that I uploaded. - File UI now shows some basic chunk info for chunked files: {F336434} Reviewers: btrahan Reviewed By: btrahan Subscribers: joshuaspence, epriestley Maniphest Tasks: T7149 Differential Revision: https://secure.phabricator.com/D12060
2015-03-13 19:30:02 +01:00
private function getWritableEngine() {
// NOTE: We can't just load writable engines or we'll loop forever.
$engines = PhabricatorFileStorageEngine::loadAllEngines();
foreach ($engines as $engine) {
if ($engine->isChunkEngine()) {
continue;
}
if ($engine->isTestEngine()) {
continue;
}
if (!$engine->canWriteFiles()) {
continue;
}
if ($engine->hasFilesizeLimit()) {
if ($engine->getFilesizeLimit() < $this->getChunkSize()) {
continue;
}
}
return true;
}
return false;
}
public function getChunkSize() {
return (4 * 1024 * 1024);
Add a chunking storage engine for files Summary: Ref T7149. This isn't complete and isn't active yet, but does basically work. I'll shore it up in the next few diffs. The new workflow goes like this: > Client, file.allocate(): I'd like to upload a file with length L, metadata M, and hash H. Then the server returns `upload` (a boolean) and `filePHID` (a PHID). These mean: | upload | filePHID | means | |---|---|---| | false | false | Server can't accept file. | false | true | File data already known, file created from hash. | true | false | Just upload normally. | true | true | Query chunks to start or resume a chunked upload. All but the last case are uninteresting and work like exising uploads with `file.uploadhash` (which we can eventually deprecate). In the last case: > Client, file.querychunks(): Give me a list of chunks that I should upload. This returns all the chunks for the file. Chunks have a start byte, an end byte, and a "complete" flag to indicate that the server already has the data. Then, the client fills in chunks by sending them: > Client, file.uploadchunk(): Here is the data for one chunk. This stuff doesn't work yet or has some caveats: - I haven't tested resume much. - Files need an "isPartial()" flag for partial uploads, and the UI needs to respect it. - The JS client needs to become chunk-aware. - Chunk size is set crazy low to make testing easier. - Some debugging flags that I'll remove soon-ish. - Downloading works, but still streams the whole file into memory. - This storage engine is disabled by default (hardcoded as a unit test engine) because it's still sketchy. - Need some code to remove the "isParital" flag when the last chunk is uploaded. - Maybe do checksumming on chunks. Test Plan: - Hacked up `arc upload` (see next diff) to be chunk-aware and uploaded a readme in 18 32-byte chunks. Then downloaded it. Got the same file back that I uploaded. - File UI now shows some basic chunk info for chunked files: {F336434} Reviewers: btrahan Reviewed By: btrahan Subscribers: joshuaspence, epriestley Maniphest Tasks: T7149 Differential Revision: https://secure.phabricator.com/D12060
2015-03-13 19:30:02 +01:00
}
public function getFileDataIterator(PhabricatorFile $file, $begin, $end) {
$chunks = id(new PhabricatorFileChunkQuery())
->setViewer(PhabricatorUser::getOmnipotentUser())
->withChunkHandles(array($file->getStorageHandle()))
->withByteRange($begin, $end)
->needDataFiles(true)
->execute();
return new PhabricatorFileChunkIterator($chunks, $begin, $end);
}
Add a chunking storage engine for files Summary: Ref T7149. This isn't complete and isn't active yet, but does basically work. I'll shore it up in the next few diffs. The new workflow goes like this: > Client, file.allocate(): I'd like to upload a file with length L, metadata M, and hash H. Then the server returns `upload` (a boolean) and `filePHID` (a PHID). These mean: | upload | filePHID | means | |---|---|---| | false | false | Server can't accept file. | false | true | File data already known, file created from hash. | true | false | Just upload normally. | true | true | Query chunks to start or resume a chunked upload. All but the last case are uninteresting and work like exising uploads with `file.uploadhash` (which we can eventually deprecate). In the last case: > Client, file.querychunks(): Give me a list of chunks that I should upload. This returns all the chunks for the file. Chunks have a start byte, an end byte, and a "complete" flag to indicate that the server already has the data. Then, the client fills in chunks by sending them: > Client, file.uploadchunk(): Here is the data for one chunk. This stuff doesn't work yet or has some caveats: - I haven't tested resume much. - Files need an "isPartial()" flag for partial uploads, and the UI needs to respect it. - The JS client needs to become chunk-aware. - Chunk size is set crazy low to make testing easier. - Some debugging flags that I'll remove soon-ish. - Downloading works, but still streams the whole file into memory. - This storage engine is disabled by default (hardcoded as a unit test engine) because it's still sketchy. - Need some code to remove the "isParital" flag when the last chunk is uploaded. - Maybe do checksumming on chunks. Test Plan: - Hacked up `arc upload` (see next diff) to be chunk-aware and uploaded a readme in 18 32-byte chunks. Then downloaded it. Got the same file back that I uploaded. - File UI now shows some basic chunk info for chunked files: {F336434} Reviewers: btrahan Reviewed By: btrahan Subscribers: joshuaspence, epriestley Maniphest Tasks: T7149 Differential Revision: https://secure.phabricator.com/D12060
2015-03-13 19:30:02 +01:00
}