Add a chunking storage engine for files
Summary:
Ref T7149. This isn't complete and isn't active yet, but does basically work. I'll shore it up in the next few diffs.
The new workflow goes like this:
> Client, file.allocate(): I'd like to upload a file with length L, metadata M, and hash H.
Then the server returns `upload` (a boolean) and `filePHID` (a PHID). These mean:
| upload | filePHID | means |
|---|---|---|
| false | false | Server can't accept file.
| false | true | File data already known, file created from hash.
| true | false | Just upload normally.
| true | true | Query chunks to start or resume a chunked upload.
All but the last case are uninteresting and work like exising uploads with `file.uploadhash` (which we can eventually deprecate).
In the last case:
> Client, file.querychunks(): Give me a list of chunks that I should upload.
This returns all the chunks for the file. Chunks have a start byte, an end byte, and a "complete" flag to indicate that the server already has the data.
Then, the client fills in chunks by sending them:
> Client, file.uploadchunk(): Here is the data for one chunk.
This stuff doesn't work yet or has some caveats:
- I haven't tested resume much.
- Files need an "isPartial()" flag for partial uploads, and the UI needs to respect it.
- The JS client needs to become chunk-aware.
- Chunk size is set crazy low to make testing easier.
- Some debugging flags that I'll remove soon-ish.
- Downloading works, but still streams the whole file into memory.
- This storage engine is disabled by default (hardcoded as a unit test engine) because it's still sketchy.
- Need some code to remove the "isParital" flag when the last chunk is uploaded.
- Maybe do checksumming on chunks.
Test Plan:
- Hacked up `arc upload` (see next diff) to be chunk-aware and uploaded a readme in 18 32-byte chunks. Then downloaded it. Got the same file back that I uploaded.
- File UI now shows some basic chunk info for chunked files:
{F336434}
Reviewers: btrahan
Reviewed By: btrahan
Subscribers: joshuaspence, epriestley
Maniphest Tasks: T7149
Differential Revision: https://secure.phabricator.com/D12060
2015-03-13 11:30:02 -07:00
|
|
|
<?php
|
|
|
|
|
|
|
|
final class PhabricatorFileChunkQuery
|
|
|
|
extends PhabricatorCursorPagedPolicyAwareQuery {
|
|
|
|
|
|
|
|
private $chunkHandles;
|
|
|
|
private $rangeStart;
|
|
|
|
private $rangeEnd;
|
2015-03-13 11:30:24 -07:00
|
|
|
private $isComplete;
|
Add a chunking storage engine for files
Summary:
Ref T7149. This isn't complete and isn't active yet, but does basically work. I'll shore it up in the next few diffs.
The new workflow goes like this:
> Client, file.allocate(): I'd like to upload a file with length L, metadata M, and hash H.
Then the server returns `upload` (a boolean) and `filePHID` (a PHID). These mean:
| upload | filePHID | means |
|---|---|---|
| false | false | Server can't accept file.
| false | true | File data already known, file created from hash.
| true | false | Just upload normally.
| true | true | Query chunks to start or resume a chunked upload.
All but the last case are uninteresting and work like exising uploads with `file.uploadhash` (which we can eventually deprecate).
In the last case:
> Client, file.querychunks(): Give me a list of chunks that I should upload.
This returns all the chunks for the file. Chunks have a start byte, an end byte, and a "complete" flag to indicate that the server already has the data.
Then, the client fills in chunks by sending them:
> Client, file.uploadchunk(): Here is the data for one chunk.
This stuff doesn't work yet or has some caveats:
- I haven't tested resume much.
- Files need an "isPartial()" flag for partial uploads, and the UI needs to respect it.
- The JS client needs to become chunk-aware.
- Chunk size is set crazy low to make testing easier.
- Some debugging flags that I'll remove soon-ish.
- Downloading works, but still streams the whole file into memory.
- This storage engine is disabled by default (hardcoded as a unit test engine) because it's still sketchy.
- Need some code to remove the "isParital" flag when the last chunk is uploaded.
- Maybe do checksumming on chunks.
Test Plan:
- Hacked up `arc upload` (see next diff) to be chunk-aware and uploaded a readme in 18 32-byte chunks. Then downloaded it. Got the same file back that I uploaded.
- File UI now shows some basic chunk info for chunked files:
{F336434}
Reviewers: btrahan
Reviewed By: btrahan
Subscribers: joshuaspence, epriestley
Maniphest Tasks: T7149
Differential Revision: https://secure.phabricator.com/D12060
2015-03-13 11:30:02 -07:00
|
|
|
private $needDataFiles;
|
|
|
|
|
|
|
|
public function withChunkHandles(array $handles) {
|
|
|
|
$this->chunkHandles = $handles;
|
|
|
|
return $this;
|
|
|
|
}
|
|
|
|
|
|
|
|
public function withByteRange($start, $end) {
|
|
|
|
$this->rangeStart = $start;
|
|
|
|
$this->rangeEnd = $end;
|
|
|
|
return $this;
|
|
|
|
}
|
|
|
|
|
2015-03-13 11:30:24 -07:00
|
|
|
public function withIsComplete($complete) {
|
|
|
|
$this->isComplete = $complete;
|
|
|
|
return $this;
|
|
|
|
}
|
|
|
|
|
Add a chunking storage engine for files
Summary:
Ref T7149. This isn't complete and isn't active yet, but does basically work. I'll shore it up in the next few diffs.
The new workflow goes like this:
> Client, file.allocate(): I'd like to upload a file with length L, metadata M, and hash H.
Then the server returns `upload` (a boolean) and `filePHID` (a PHID). These mean:
| upload | filePHID | means |
|---|---|---|
| false | false | Server can't accept file.
| false | true | File data already known, file created from hash.
| true | false | Just upload normally.
| true | true | Query chunks to start or resume a chunked upload.
All but the last case are uninteresting and work like exising uploads with `file.uploadhash` (which we can eventually deprecate).
In the last case:
> Client, file.querychunks(): Give me a list of chunks that I should upload.
This returns all the chunks for the file. Chunks have a start byte, an end byte, and a "complete" flag to indicate that the server already has the data.
Then, the client fills in chunks by sending them:
> Client, file.uploadchunk(): Here is the data for one chunk.
This stuff doesn't work yet or has some caveats:
- I haven't tested resume much.
- Files need an "isPartial()" flag for partial uploads, and the UI needs to respect it.
- The JS client needs to become chunk-aware.
- Chunk size is set crazy low to make testing easier.
- Some debugging flags that I'll remove soon-ish.
- Downloading works, but still streams the whole file into memory.
- This storage engine is disabled by default (hardcoded as a unit test engine) because it's still sketchy.
- Need some code to remove the "isParital" flag when the last chunk is uploaded.
- Maybe do checksumming on chunks.
Test Plan:
- Hacked up `arc upload` (see next diff) to be chunk-aware and uploaded a readme in 18 32-byte chunks. Then downloaded it. Got the same file back that I uploaded.
- File UI now shows some basic chunk info for chunked files:
{F336434}
Reviewers: btrahan
Reviewed By: btrahan
Subscribers: joshuaspence, epriestley
Maniphest Tasks: T7149
Differential Revision: https://secure.phabricator.com/D12060
2015-03-13 11:30:02 -07:00
|
|
|
public function needDataFiles($need) {
|
|
|
|
$this->needDataFiles = $need;
|
|
|
|
return $this;
|
|
|
|
}
|
|
|
|
|
|
|
|
protected function loadPage() {
|
|
|
|
$table = new PhabricatorFileChunk();
|
|
|
|
$conn_r = $table->establishConnection('r');
|
|
|
|
|
|
|
|
$data = queryfx_all(
|
|
|
|
$conn_r,
|
|
|
|
'SELECT * FROM %T %Q %Q %Q',
|
|
|
|
$table->getTableName(),
|
|
|
|
$this->buildWhereClause($conn_r),
|
|
|
|
$this->buildOrderClause($conn_r),
|
|
|
|
$this->buildLimitClause($conn_r));
|
|
|
|
|
|
|
|
return $table->loadAllFromArray($data);
|
|
|
|
}
|
|
|
|
|
|
|
|
protected function willFilterPage(array $chunks) {
|
|
|
|
|
|
|
|
if ($this->needDataFiles) {
|
|
|
|
$file_phids = mpull($chunks, 'getDataFilePHID');
|
|
|
|
$file_phids = array_filter($file_phids);
|
|
|
|
if ($file_phids) {
|
|
|
|
$files = id(new PhabricatorFileQuery())
|
|
|
|
->setViewer($this->getViewer())
|
|
|
|
->setParentQuery($this)
|
|
|
|
->withPHIDs($file_phids)
|
|
|
|
->execute();
|
|
|
|
$files = mpull($files, null, 'getPHID');
|
|
|
|
} else {
|
|
|
|
$files = array();
|
|
|
|
}
|
|
|
|
|
|
|
|
foreach ($chunks as $key => $chunk) {
|
|
|
|
$data_phid = $chunk->getDataFilePHID();
|
|
|
|
if (!$data_phid) {
|
|
|
|
$chunk->attachDataFile(null);
|
|
|
|
continue;
|
|
|
|
}
|
|
|
|
|
|
|
|
$file = idx($files, $data_phid);
|
|
|
|
if (!$file) {
|
|
|
|
unset($chunks[$key]);
|
|
|
|
$this->didRejectResult($chunk);
|
|
|
|
continue;
|
|
|
|
}
|
|
|
|
|
|
|
|
$chunk->attachDataFile($file);
|
|
|
|
}
|
|
|
|
|
|
|
|
if (!$chunks) {
|
|
|
|
return $chunks;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
return $chunks;
|
|
|
|
}
|
|
|
|
|
|
|
|
private function buildWhereClause(AphrontDatabaseConnection $conn_r) {
|
|
|
|
$where = array();
|
|
|
|
|
|
|
|
if ($this->chunkHandles !== null) {
|
|
|
|
$where[] = qsprintf(
|
|
|
|
$conn_r,
|
|
|
|
'chunkHandle IN (%Ls)',
|
|
|
|
$this->chunkHandles);
|
|
|
|
}
|
|
|
|
|
|
|
|
if ($this->rangeStart !== null) {
|
|
|
|
$where[] = qsprintf(
|
|
|
|
$conn_r,
|
|
|
|
'byteEnd > %d',
|
|
|
|
$this->rangeStart);
|
|
|
|
}
|
|
|
|
|
|
|
|
if ($this->rangeEnd !== null) {
|
|
|
|
$where[] = qsprintf(
|
|
|
|
$conn_r,
|
|
|
|
'byteStart < %d',
|
|
|
|
$this->rangeEnd);
|
|
|
|
}
|
|
|
|
|
2015-03-13 11:30:24 -07:00
|
|
|
if ($this->isComplete !== null) {
|
|
|
|
if ($this->isComplete) {
|
|
|
|
$where[] = qsprintf(
|
|
|
|
$conn_r,
|
|
|
|
'dataFilePHID IS NOT NULL');
|
|
|
|
} else {
|
|
|
|
$where[] = qsprintf(
|
|
|
|
$conn_r,
|
|
|
|
'dataFilePHID IS NULL');
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
Add a chunking storage engine for files
Summary:
Ref T7149. This isn't complete and isn't active yet, but does basically work. I'll shore it up in the next few diffs.
The new workflow goes like this:
> Client, file.allocate(): I'd like to upload a file with length L, metadata M, and hash H.
Then the server returns `upload` (a boolean) and `filePHID` (a PHID). These mean:
| upload | filePHID | means |
|---|---|---|
| false | false | Server can't accept file.
| false | true | File data already known, file created from hash.
| true | false | Just upload normally.
| true | true | Query chunks to start or resume a chunked upload.
All but the last case are uninteresting and work like exising uploads with `file.uploadhash` (which we can eventually deprecate).
In the last case:
> Client, file.querychunks(): Give me a list of chunks that I should upload.
This returns all the chunks for the file. Chunks have a start byte, an end byte, and a "complete" flag to indicate that the server already has the data.
Then, the client fills in chunks by sending them:
> Client, file.uploadchunk(): Here is the data for one chunk.
This stuff doesn't work yet or has some caveats:
- I haven't tested resume much.
- Files need an "isPartial()" flag for partial uploads, and the UI needs to respect it.
- The JS client needs to become chunk-aware.
- Chunk size is set crazy low to make testing easier.
- Some debugging flags that I'll remove soon-ish.
- Downloading works, but still streams the whole file into memory.
- This storage engine is disabled by default (hardcoded as a unit test engine) because it's still sketchy.
- Need some code to remove the "isParital" flag when the last chunk is uploaded.
- Maybe do checksumming on chunks.
Test Plan:
- Hacked up `arc upload` (see next diff) to be chunk-aware and uploaded a readme in 18 32-byte chunks. Then downloaded it. Got the same file back that I uploaded.
- File UI now shows some basic chunk info for chunked files:
{F336434}
Reviewers: btrahan
Reviewed By: btrahan
Subscribers: joshuaspence, epriestley
Maniphest Tasks: T7149
Differential Revision: https://secure.phabricator.com/D12060
2015-03-13 11:30:02 -07:00
|
|
|
$where[] = $this->buildPagingClause($conn_r);
|
|
|
|
|
|
|
|
return $this->formatWhereClause($where);
|
|
|
|
}
|
|
|
|
|
|
|
|
public function getQueryApplicationClass() {
|
|
|
|
return 'PhabricatorFilesApplication';
|
|
|
|
}
|
|
|
|
|
|
|
|
}
|