Add a chunking storage engine for files
Summary:
Ref T7149. This isn't complete and isn't active yet, but does basically work. I'll shore it up in the next few diffs.
The new workflow goes like this:
> Client, file.allocate(): I'd like to upload a file with length L, metadata M, and hash H.
Then the server returns `upload` (a boolean) and `filePHID` (a PHID). These mean:
| upload | filePHID | means |
|---|---|---|
| false | false | Server can't accept file.
| false | true | File data already known, file created from hash.
| true | false | Just upload normally.
| true | true | Query chunks to start or resume a chunked upload.
All but the last case are uninteresting and work like exising uploads with `file.uploadhash` (which we can eventually deprecate).
In the last case:
> Client, file.querychunks(): Give me a list of chunks that I should upload.
This returns all the chunks for the file. Chunks have a start byte, an end byte, and a "complete" flag to indicate that the server already has the data.
Then, the client fills in chunks by sending them:
> Client, file.uploadchunk(): Here is the data for one chunk.
This stuff doesn't work yet or has some caveats:
- I haven't tested resume much.
- Files need an "isPartial()" flag for partial uploads, and the UI needs to respect it.
- The JS client needs to become chunk-aware.
- Chunk size is set crazy low to make testing easier.
- Some debugging flags that I'll remove soon-ish.
- Downloading works, but still streams the whole file into memory.
- This storage engine is disabled by default (hardcoded as a unit test engine) because it's still sketchy.
- Need some code to remove the "isParital" flag when the last chunk is uploaded.
- Maybe do checksumming on chunks.
Test Plan:
- Hacked up `arc upload` (see next diff) to be chunk-aware and uploaded a readme in 18 32-byte chunks. Then downloaded it. Got the same file back that I uploaded.
- File UI now shows some basic chunk info for chunked files:
{F336434}
Reviewers: btrahan
Reviewed By: btrahan
Subscribers: joshuaspence, epriestley
Maniphest Tasks: T7149
Differential Revision: https://secure.phabricator.com/D12060
2015-03-13 11:30:02 -07:00
|
|
|
<?php
|
|
|
|
|
|
|
|
final class PhabricatorFileChunkQuery
|
|
|
|
extends PhabricatorCursorPagedPolicyAwareQuery {
|
|
|
|
|
|
|
|
private $chunkHandles;
|
|
|
|
private $rangeStart;
|
|
|
|
private $rangeEnd;
|
2015-03-13 11:30:24 -07:00
|
|
|
private $isComplete;
|
Add a chunking storage engine for files
Summary:
Ref T7149. This isn't complete and isn't active yet, but does basically work. I'll shore it up in the next few diffs.
The new workflow goes like this:
> Client, file.allocate(): I'd like to upload a file with length L, metadata M, and hash H.
Then the server returns `upload` (a boolean) and `filePHID` (a PHID). These mean:
| upload | filePHID | means |
|---|---|---|
| false | false | Server can't accept file.
| false | true | File data already known, file created from hash.
| true | false | Just upload normally.
| true | true | Query chunks to start or resume a chunked upload.
All but the last case are uninteresting and work like exising uploads with `file.uploadhash` (which we can eventually deprecate).
In the last case:
> Client, file.querychunks(): Give me a list of chunks that I should upload.
This returns all the chunks for the file. Chunks have a start byte, an end byte, and a "complete" flag to indicate that the server already has the data.
Then, the client fills in chunks by sending them:
> Client, file.uploadchunk(): Here is the data for one chunk.
This stuff doesn't work yet or has some caveats:
- I haven't tested resume much.
- Files need an "isPartial()" flag for partial uploads, and the UI needs to respect it.
- The JS client needs to become chunk-aware.
- Chunk size is set crazy low to make testing easier.
- Some debugging flags that I'll remove soon-ish.
- Downloading works, but still streams the whole file into memory.
- This storage engine is disabled by default (hardcoded as a unit test engine) because it's still sketchy.
- Need some code to remove the "isParital" flag when the last chunk is uploaded.
- Maybe do checksumming on chunks.
Test Plan:
- Hacked up `arc upload` (see next diff) to be chunk-aware and uploaded a readme in 18 32-byte chunks. Then downloaded it. Got the same file back that I uploaded.
- File UI now shows some basic chunk info for chunked files:
{F336434}
Reviewers: btrahan
Reviewed By: btrahan
Subscribers: joshuaspence, epriestley
Maniphest Tasks: T7149
Differential Revision: https://secure.phabricator.com/D12060
2015-03-13 11:30:02 -07:00
|
|
|
private $needDataFiles;
|
|
|
|
|
|
|
|
public function withChunkHandles(array $handles) {
|
|
|
|
$this->chunkHandles = $handles;
|
|
|
|
return $this;
|
|
|
|
}
|
|
|
|
|
|
|
|
public function withByteRange($start, $end) {
|
|
|
|
$this->rangeStart = $start;
|
|
|
|
$this->rangeEnd = $end;
|
|
|
|
return $this;
|
|
|
|
}
|
|
|
|
|
2015-03-13 11:30:24 -07:00
|
|
|
public function withIsComplete($complete) {
|
|
|
|
$this->isComplete = $complete;
|
|
|
|
return $this;
|
|
|
|
}
|
|
|
|
|
Add a chunking storage engine for files
Summary:
Ref T7149. This isn't complete and isn't active yet, but does basically work. I'll shore it up in the next few diffs.
The new workflow goes like this:
> Client, file.allocate(): I'd like to upload a file with length L, metadata M, and hash H.
Then the server returns `upload` (a boolean) and `filePHID` (a PHID). These mean:
| upload | filePHID | means |
|---|---|---|
| false | false | Server can't accept file.
| false | true | File data already known, file created from hash.
| true | false | Just upload normally.
| true | true | Query chunks to start or resume a chunked upload.
All but the last case are uninteresting and work like exising uploads with `file.uploadhash` (which we can eventually deprecate).
In the last case:
> Client, file.querychunks(): Give me a list of chunks that I should upload.
This returns all the chunks for the file. Chunks have a start byte, an end byte, and a "complete" flag to indicate that the server already has the data.
Then, the client fills in chunks by sending them:
> Client, file.uploadchunk(): Here is the data for one chunk.
This stuff doesn't work yet or has some caveats:
- I haven't tested resume much.
- Files need an "isPartial()" flag for partial uploads, and the UI needs to respect it.
- The JS client needs to become chunk-aware.
- Chunk size is set crazy low to make testing easier.
- Some debugging flags that I'll remove soon-ish.
- Downloading works, but still streams the whole file into memory.
- This storage engine is disabled by default (hardcoded as a unit test engine) because it's still sketchy.
- Need some code to remove the "isParital" flag when the last chunk is uploaded.
- Maybe do checksumming on chunks.
Test Plan:
- Hacked up `arc upload` (see next diff) to be chunk-aware and uploaded a readme in 18 32-byte chunks. Then downloaded it. Got the same file back that I uploaded.
- File UI now shows some basic chunk info for chunked files:
{F336434}
Reviewers: btrahan
Reviewed By: btrahan
Subscribers: joshuaspence, epriestley
Maniphest Tasks: T7149
Differential Revision: https://secure.phabricator.com/D12060
2015-03-13 11:30:02 -07:00
|
|
|
public function needDataFiles($need) {
|
|
|
|
$this->needDataFiles = $need;
|
|
|
|
return $this;
|
|
|
|
}
|
|
|
|
|
|
|
|
protected function loadPage() {
|
|
|
|
$table = new PhabricatorFileChunk();
|
|
|
|
$conn_r = $table->establishConnection('r');
|
|
|
|
|
|
|
|
$data = queryfx_all(
|
|
|
|
$conn_r,
|
|
|
|
'SELECT * FROM %T %Q %Q %Q',
|
|
|
|
$table->getTableName(),
|
|
|
|
$this->buildWhereClause($conn_r),
|
|
|
|
$this->buildOrderClause($conn_r),
|
|
|
|
$this->buildLimitClause($conn_r));
|
|
|
|
|
|
|
|
return $table->loadAllFromArray($data);
|
|
|
|
}
|
|
|
|
|
|
|
|
protected function willFilterPage(array $chunks) {
|
|
|
|
|
|
|
|
if ($this->needDataFiles) {
|
|
|
|
$file_phids = mpull($chunks, 'getDataFilePHID');
|
|
|
|
$file_phids = array_filter($file_phids);
|
|
|
|
if ($file_phids) {
|
|
|
|
$files = id(new PhabricatorFileQuery())
|
|
|
|
->setViewer($this->getViewer())
|
|
|
|
->setParentQuery($this)
|
|
|
|
->withPHIDs($file_phids)
|
|
|
|
->execute();
|
|
|
|
$files = mpull($files, null, 'getPHID');
|
|
|
|
} else {
|
|
|
|
$files = array();
|
|
|
|
}
|
|
|
|
|
|
|
|
foreach ($chunks as $key => $chunk) {
|
|
|
|
$data_phid = $chunk->getDataFilePHID();
|
|
|
|
if (!$data_phid) {
|
|
|
|
$chunk->attachDataFile(null);
|
|
|
|
continue;
|
|
|
|
}
|
|
|
|
|
|
|
|
$file = idx($files, $data_phid);
|
|
|
|
if (!$file) {
|
|
|
|
unset($chunks[$key]);
|
|
|
|
$this->didRejectResult($chunk);
|
|
|
|
continue;
|
|
|
|
}
|
|
|
|
|
|
|
|
$chunk->attachDataFile($file);
|
|
|
|
}
|
|
|
|
|
|
|
|
if (!$chunks) {
|
|
|
|
return $chunks;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
return $chunks;
|
|
|
|
}
|
|
|
|
|
Make buildWhereClause() a method of AphrontCursorPagedPolicyAwareQuery
Summary:
Ref T4100. Ref T5595.
To support a unified "Projects:" query across all applications, a future diff is going to add a set of "Edge Logic" capabilities to `PolicyAwareQuery` which write the required SELECT, JOIN, WHERE, HAVING and GROUP clauses for you.
With the addition of "Edge Logic", we'll have three systems which may need to build components of query claues: ordering/paging, customfields/applicationsearch, and edge logic.
For most clauses, queries don't currently call into the parent explicitly to get default components. I want to move more query construction logic up the class tree so it can be shared.
For most methods, this isn't a problem, but many subclasses define a `buildWhereClause()`. Make all such definitions protected and consistent.
This causes no behavioral changes.
Test Plan: Ran `arc unit --everything`, which does a pretty through job of verifying this statically.
Reviewers: btrahan
Reviewed By: btrahan
Subscribers: yelirekim, hach-que, epriestley
Maniphest Tasks: T4100, T5595
Differential Revision: https://secure.phabricator.com/D12453
2015-04-18 07:08:30 -07:00
|
|
|
protected function buildWhereClause(AphrontDatabaseConnection $conn_r) {
|
Add a chunking storage engine for files
Summary:
Ref T7149. This isn't complete and isn't active yet, but does basically work. I'll shore it up in the next few diffs.
The new workflow goes like this:
> Client, file.allocate(): I'd like to upload a file with length L, metadata M, and hash H.
Then the server returns `upload` (a boolean) and `filePHID` (a PHID). These mean:
| upload | filePHID | means |
|---|---|---|
| false | false | Server can't accept file.
| false | true | File data already known, file created from hash.
| true | false | Just upload normally.
| true | true | Query chunks to start or resume a chunked upload.
All but the last case are uninteresting and work like exising uploads with `file.uploadhash` (which we can eventually deprecate).
In the last case:
> Client, file.querychunks(): Give me a list of chunks that I should upload.
This returns all the chunks for the file. Chunks have a start byte, an end byte, and a "complete" flag to indicate that the server already has the data.
Then, the client fills in chunks by sending them:
> Client, file.uploadchunk(): Here is the data for one chunk.
This stuff doesn't work yet or has some caveats:
- I haven't tested resume much.
- Files need an "isPartial()" flag for partial uploads, and the UI needs to respect it.
- The JS client needs to become chunk-aware.
- Chunk size is set crazy low to make testing easier.
- Some debugging flags that I'll remove soon-ish.
- Downloading works, but still streams the whole file into memory.
- This storage engine is disabled by default (hardcoded as a unit test engine) because it's still sketchy.
- Need some code to remove the "isParital" flag when the last chunk is uploaded.
- Maybe do checksumming on chunks.
Test Plan:
- Hacked up `arc upload` (see next diff) to be chunk-aware and uploaded a readme in 18 32-byte chunks. Then downloaded it. Got the same file back that I uploaded.
- File UI now shows some basic chunk info for chunked files:
{F336434}
Reviewers: btrahan
Reviewed By: btrahan
Subscribers: joshuaspence, epriestley
Maniphest Tasks: T7149
Differential Revision: https://secure.phabricator.com/D12060
2015-03-13 11:30:02 -07:00
|
|
|
$where = array();
|
|
|
|
|
|
|
|
if ($this->chunkHandles !== null) {
|
|
|
|
$where[] = qsprintf(
|
|
|
|
$conn_r,
|
|
|
|
'chunkHandle IN (%Ls)',
|
|
|
|
$this->chunkHandles);
|
|
|
|
}
|
|
|
|
|
|
|
|
if ($this->rangeStart !== null) {
|
|
|
|
$where[] = qsprintf(
|
|
|
|
$conn_r,
|
|
|
|
'byteEnd > %d',
|
|
|
|
$this->rangeStart);
|
|
|
|
}
|
|
|
|
|
|
|
|
if ($this->rangeEnd !== null) {
|
|
|
|
$where[] = qsprintf(
|
|
|
|
$conn_r,
|
|
|
|
'byteStart < %d',
|
|
|
|
$this->rangeEnd);
|
|
|
|
}
|
|
|
|
|
2015-03-13 11:30:24 -07:00
|
|
|
if ($this->isComplete !== null) {
|
|
|
|
if ($this->isComplete) {
|
|
|
|
$where[] = qsprintf(
|
|
|
|
$conn_r,
|
|
|
|
'dataFilePHID IS NOT NULL');
|
|
|
|
} else {
|
|
|
|
$where[] = qsprintf(
|
|
|
|
$conn_r,
|
|
|
|
'dataFilePHID IS NULL');
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
Add a chunking storage engine for files
Summary:
Ref T7149. This isn't complete and isn't active yet, but does basically work. I'll shore it up in the next few diffs.
The new workflow goes like this:
> Client, file.allocate(): I'd like to upload a file with length L, metadata M, and hash H.
Then the server returns `upload` (a boolean) and `filePHID` (a PHID). These mean:
| upload | filePHID | means |
|---|---|---|
| false | false | Server can't accept file.
| false | true | File data already known, file created from hash.
| true | false | Just upload normally.
| true | true | Query chunks to start or resume a chunked upload.
All but the last case are uninteresting and work like exising uploads with `file.uploadhash` (which we can eventually deprecate).
In the last case:
> Client, file.querychunks(): Give me a list of chunks that I should upload.
This returns all the chunks for the file. Chunks have a start byte, an end byte, and a "complete" flag to indicate that the server already has the data.
Then, the client fills in chunks by sending them:
> Client, file.uploadchunk(): Here is the data for one chunk.
This stuff doesn't work yet or has some caveats:
- I haven't tested resume much.
- Files need an "isPartial()" flag for partial uploads, and the UI needs to respect it.
- The JS client needs to become chunk-aware.
- Chunk size is set crazy low to make testing easier.
- Some debugging flags that I'll remove soon-ish.
- Downloading works, but still streams the whole file into memory.
- This storage engine is disabled by default (hardcoded as a unit test engine) because it's still sketchy.
- Need some code to remove the "isParital" flag when the last chunk is uploaded.
- Maybe do checksumming on chunks.
Test Plan:
- Hacked up `arc upload` (see next diff) to be chunk-aware and uploaded a readme in 18 32-byte chunks. Then downloaded it. Got the same file back that I uploaded.
- File UI now shows some basic chunk info for chunked files:
{F336434}
Reviewers: btrahan
Reviewed By: btrahan
Subscribers: joshuaspence, epriestley
Maniphest Tasks: T7149
Differential Revision: https://secure.phabricator.com/D12060
2015-03-13 11:30:02 -07:00
|
|
|
$where[] = $this->buildPagingClause($conn_r);
|
|
|
|
|
|
|
|
return $this->formatWhereClause($where);
|
|
|
|
}
|
|
|
|
|
|
|
|
public function getQueryApplicationClass() {
|
|
|
|
return 'PhabricatorFilesApplication';
|
|
|
|
}
|
|
|
|
|
|
|
|
}
|