1
0
Fork 0
mirror of https://we.phorge.it/source/phorge.git synced 2025-01-23 21:18:19 +01:00
phorge-phorge/src/applications/files/query/PhabricatorFileChunkQuery.php

135 lines
3 KiB
PHP
Raw Normal View History

Add a chunking storage engine for files Summary: Ref T7149. This isn't complete and isn't active yet, but does basically work. I'll shore it up in the next few diffs. The new workflow goes like this: > Client, file.allocate(): I'd like to upload a file with length L, metadata M, and hash H. Then the server returns `upload` (a boolean) and `filePHID` (a PHID). These mean: | upload | filePHID | means | |---|---|---| | false | false | Server can't accept file. | false | true | File data already known, file created from hash. | true | false | Just upload normally. | true | true | Query chunks to start or resume a chunked upload. All but the last case are uninteresting and work like exising uploads with `file.uploadhash` (which we can eventually deprecate). In the last case: > Client, file.querychunks(): Give me a list of chunks that I should upload. This returns all the chunks for the file. Chunks have a start byte, an end byte, and a "complete" flag to indicate that the server already has the data. Then, the client fills in chunks by sending them: > Client, file.uploadchunk(): Here is the data for one chunk. This stuff doesn't work yet or has some caveats: - I haven't tested resume much. - Files need an "isPartial()" flag for partial uploads, and the UI needs to respect it. - The JS client needs to become chunk-aware. - Chunk size is set crazy low to make testing easier. - Some debugging flags that I'll remove soon-ish. - Downloading works, but still streams the whole file into memory. - This storage engine is disabled by default (hardcoded as a unit test engine) because it's still sketchy. - Need some code to remove the "isParital" flag when the last chunk is uploaded. - Maybe do checksumming on chunks. Test Plan: - Hacked up `arc upload` (see next diff) to be chunk-aware and uploaded a readme in 18 32-byte chunks. Then downloaded it. Got the same file back that I uploaded. - File UI now shows some basic chunk info for chunked files: {F336434} Reviewers: btrahan Reviewed By: btrahan Subscribers: joshuaspence, epriestley Maniphest Tasks: T7149 Differential Revision: https://secure.phabricator.com/D12060
2015-03-13 11:30:02 -07:00
<?php
final class PhabricatorFileChunkQuery
extends PhabricatorCursorPagedPolicyAwareQuery {
private $chunkHandles;
private $rangeStart;
private $rangeEnd;
private $isComplete;
Add a chunking storage engine for files Summary: Ref T7149. This isn't complete and isn't active yet, but does basically work. I'll shore it up in the next few diffs. The new workflow goes like this: > Client, file.allocate(): I'd like to upload a file with length L, metadata M, and hash H. Then the server returns `upload` (a boolean) and `filePHID` (a PHID). These mean: | upload | filePHID | means | |---|---|---| | false | false | Server can't accept file. | false | true | File data already known, file created from hash. | true | false | Just upload normally. | true | true | Query chunks to start or resume a chunked upload. All but the last case are uninteresting and work like exising uploads with `file.uploadhash` (which we can eventually deprecate). In the last case: > Client, file.querychunks(): Give me a list of chunks that I should upload. This returns all the chunks for the file. Chunks have a start byte, an end byte, and a "complete" flag to indicate that the server already has the data. Then, the client fills in chunks by sending them: > Client, file.uploadchunk(): Here is the data for one chunk. This stuff doesn't work yet or has some caveats: - I haven't tested resume much. - Files need an "isPartial()" flag for partial uploads, and the UI needs to respect it. - The JS client needs to become chunk-aware. - Chunk size is set crazy low to make testing easier. - Some debugging flags that I'll remove soon-ish. - Downloading works, but still streams the whole file into memory. - This storage engine is disabled by default (hardcoded as a unit test engine) because it's still sketchy. - Need some code to remove the "isParital" flag when the last chunk is uploaded. - Maybe do checksumming on chunks. Test Plan: - Hacked up `arc upload` (see next diff) to be chunk-aware and uploaded a readme in 18 32-byte chunks. Then downloaded it. Got the same file back that I uploaded. - File UI now shows some basic chunk info for chunked files: {F336434} Reviewers: btrahan Reviewed By: btrahan Subscribers: joshuaspence, epriestley Maniphest Tasks: T7149 Differential Revision: https://secure.phabricator.com/D12060
2015-03-13 11:30:02 -07:00
private $needDataFiles;
public function withChunkHandles(array $handles) {
$this->chunkHandles = $handles;
return $this;
}
public function withByteRange($start, $end) {
$this->rangeStart = $start;
$this->rangeEnd = $end;
return $this;
}
public function withIsComplete($complete) {
$this->isComplete = $complete;
return $this;
}
Add a chunking storage engine for files Summary: Ref T7149. This isn't complete and isn't active yet, but does basically work. I'll shore it up in the next few diffs. The new workflow goes like this: > Client, file.allocate(): I'd like to upload a file with length L, metadata M, and hash H. Then the server returns `upload` (a boolean) and `filePHID` (a PHID). These mean: | upload | filePHID | means | |---|---|---| | false | false | Server can't accept file. | false | true | File data already known, file created from hash. | true | false | Just upload normally. | true | true | Query chunks to start or resume a chunked upload. All but the last case are uninteresting and work like exising uploads with `file.uploadhash` (which we can eventually deprecate). In the last case: > Client, file.querychunks(): Give me a list of chunks that I should upload. This returns all the chunks for the file. Chunks have a start byte, an end byte, and a "complete" flag to indicate that the server already has the data. Then, the client fills in chunks by sending them: > Client, file.uploadchunk(): Here is the data for one chunk. This stuff doesn't work yet or has some caveats: - I haven't tested resume much. - Files need an "isPartial()" flag for partial uploads, and the UI needs to respect it. - The JS client needs to become chunk-aware. - Chunk size is set crazy low to make testing easier. - Some debugging flags that I'll remove soon-ish. - Downloading works, but still streams the whole file into memory. - This storage engine is disabled by default (hardcoded as a unit test engine) because it's still sketchy. - Need some code to remove the "isParital" flag when the last chunk is uploaded. - Maybe do checksumming on chunks. Test Plan: - Hacked up `arc upload` (see next diff) to be chunk-aware and uploaded a readme in 18 32-byte chunks. Then downloaded it. Got the same file back that I uploaded. - File UI now shows some basic chunk info for chunked files: {F336434} Reviewers: btrahan Reviewed By: btrahan Subscribers: joshuaspence, epriestley Maniphest Tasks: T7149 Differential Revision: https://secure.phabricator.com/D12060
2015-03-13 11:30:02 -07:00
public function needDataFiles($need) {
$this->needDataFiles = $need;
return $this;
}
protected function loadPage() {
$table = new PhabricatorFileChunk();
$conn_r = $table->establishConnection('r');
$data = queryfx_all(
$conn_r,
'SELECT * FROM %T %Q %Q %Q',
$table->getTableName(),
$this->buildWhereClause($conn_r),
$this->buildOrderClause($conn_r),
$this->buildLimitClause($conn_r));
return $table->loadAllFromArray($data);
}
protected function willFilterPage(array $chunks) {
if ($this->needDataFiles) {
$file_phids = mpull($chunks, 'getDataFilePHID');
$file_phids = array_filter($file_phids);
if ($file_phids) {
$files = id(new PhabricatorFileQuery())
->setViewer($this->getViewer())
->setParentQuery($this)
->withPHIDs($file_phids)
->execute();
$files = mpull($files, null, 'getPHID');
} else {
$files = array();
}
foreach ($chunks as $key => $chunk) {
$data_phid = $chunk->getDataFilePHID();
if (!$data_phid) {
$chunk->attachDataFile(null);
continue;
}
$file = idx($files, $data_phid);
if (!$file) {
unset($chunks[$key]);
$this->didRejectResult($chunk);
continue;
}
$chunk->attachDataFile($file);
}
if (!$chunks) {
return $chunks;
}
}
return $chunks;
}
protected function buildWhereClause(AphrontDatabaseConnection $conn_r) {
Add a chunking storage engine for files Summary: Ref T7149. This isn't complete and isn't active yet, but does basically work. I'll shore it up in the next few diffs. The new workflow goes like this: > Client, file.allocate(): I'd like to upload a file with length L, metadata M, and hash H. Then the server returns `upload` (a boolean) and `filePHID` (a PHID). These mean: | upload | filePHID | means | |---|---|---| | false | false | Server can't accept file. | false | true | File data already known, file created from hash. | true | false | Just upload normally. | true | true | Query chunks to start or resume a chunked upload. All but the last case are uninteresting and work like exising uploads with `file.uploadhash` (which we can eventually deprecate). In the last case: > Client, file.querychunks(): Give me a list of chunks that I should upload. This returns all the chunks for the file. Chunks have a start byte, an end byte, and a "complete" flag to indicate that the server already has the data. Then, the client fills in chunks by sending them: > Client, file.uploadchunk(): Here is the data for one chunk. This stuff doesn't work yet or has some caveats: - I haven't tested resume much. - Files need an "isPartial()" flag for partial uploads, and the UI needs to respect it. - The JS client needs to become chunk-aware. - Chunk size is set crazy low to make testing easier. - Some debugging flags that I'll remove soon-ish. - Downloading works, but still streams the whole file into memory. - This storage engine is disabled by default (hardcoded as a unit test engine) because it's still sketchy. - Need some code to remove the "isParital" flag when the last chunk is uploaded. - Maybe do checksumming on chunks. Test Plan: - Hacked up `arc upload` (see next diff) to be chunk-aware and uploaded a readme in 18 32-byte chunks. Then downloaded it. Got the same file back that I uploaded. - File UI now shows some basic chunk info for chunked files: {F336434} Reviewers: btrahan Reviewed By: btrahan Subscribers: joshuaspence, epriestley Maniphest Tasks: T7149 Differential Revision: https://secure.phabricator.com/D12060
2015-03-13 11:30:02 -07:00
$where = array();
if ($this->chunkHandles !== null) {
$where[] = qsprintf(
$conn_r,
'chunkHandle IN (%Ls)',
$this->chunkHandles);
}
if ($this->rangeStart !== null) {
$where[] = qsprintf(
$conn_r,
'byteEnd > %d',
$this->rangeStart);
}
if ($this->rangeEnd !== null) {
$where[] = qsprintf(
$conn_r,
'byteStart < %d',
$this->rangeEnd);
}
if ($this->isComplete !== null) {
if ($this->isComplete) {
$where[] = qsprintf(
$conn_r,
'dataFilePHID IS NOT NULL');
} else {
$where[] = qsprintf(
$conn_r,
'dataFilePHID IS NULL');
}
}
Add a chunking storage engine for files Summary: Ref T7149. This isn't complete and isn't active yet, but does basically work. I'll shore it up in the next few diffs. The new workflow goes like this: > Client, file.allocate(): I'd like to upload a file with length L, metadata M, and hash H. Then the server returns `upload` (a boolean) and `filePHID` (a PHID). These mean: | upload | filePHID | means | |---|---|---| | false | false | Server can't accept file. | false | true | File data already known, file created from hash. | true | false | Just upload normally. | true | true | Query chunks to start or resume a chunked upload. All but the last case are uninteresting and work like exising uploads with `file.uploadhash` (which we can eventually deprecate). In the last case: > Client, file.querychunks(): Give me a list of chunks that I should upload. This returns all the chunks for the file. Chunks have a start byte, an end byte, and a "complete" flag to indicate that the server already has the data. Then, the client fills in chunks by sending them: > Client, file.uploadchunk(): Here is the data for one chunk. This stuff doesn't work yet or has some caveats: - I haven't tested resume much. - Files need an "isPartial()" flag for partial uploads, and the UI needs to respect it. - The JS client needs to become chunk-aware. - Chunk size is set crazy low to make testing easier. - Some debugging flags that I'll remove soon-ish. - Downloading works, but still streams the whole file into memory. - This storage engine is disabled by default (hardcoded as a unit test engine) because it's still sketchy. - Need some code to remove the "isParital" flag when the last chunk is uploaded. - Maybe do checksumming on chunks. Test Plan: - Hacked up `arc upload` (see next diff) to be chunk-aware and uploaded a readme in 18 32-byte chunks. Then downloaded it. Got the same file back that I uploaded. - File UI now shows some basic chunk info for chunked files: {F336434} Reviewers: btrahan Reviewed By: btrahan Subscribers: joshuaspence, epriestley Maniphest Tasks: T7149 Differential Revision: https://secure.phabricator.com/D12060
2015-03-13 11:30:02 -07:00
$where[] = $this->buildPagingClause($conn_r);
return $this->formatWhereClause($where);
}
public function getQueryApplicationClass() {
return 'PhabricatorFilesApplication';
}
}