phorge-phorge/src/applications/files/query/PhabricatorFileChunkQuery.php

<?php

final class PhabricatorFileChunkQuery
  extends PhabricatorCursorPagedPolicyAwareQuery {

  private $chunkHandles;
  private $rangeStart;
  private $rangeEnd;
  private $isComplete;
  private $needDataFiles;

  public function withChunkHandles(array $handles) {
    $this->chunkHandles = $handles;
    return $this;
  }

  public function withByteRange($start, $end) {
    $this->rangeStart = $start;
    $this->rangeEnd = $end;
    return $this;
  }

  public function withIsComplete($complete) {
    $this->isComplete = $complete;
    return $this;
  }

  public function needDataFiles($need) {
    $this->needDataFiles = $need;
    return $this;
  }

  protected function loadPage() {
    $table = new PhabricatorFileChunk();
    $conn_r = $table->establishConnection('r');

    $data = queryfx_all(
      $conn_r,
      'SELECT * FROM %T %Q %Q %Q',
      $table->getTableName(),
      $this->buildWhereClause($conn_r),
      $this->buildOrderClause($conn_r),
      $this->buildLimitClause($conn_r));

    return $table->loadAllFromArray($data);
  }

  protected function willFilterPage(array $chunks) {

    if ($this->needDataFiles) {
      $file_phids = mpull($chunks, 'getDataFilePHID');
      $file_phids = array_filter($file_phids);
      if ($file_phids) {
        $files = id(new PhabricatorFileQuery())
          ->setViewer($this->getViewer())
          ->setParentQuery($this)
          ->withPHIDs($file_phids)
          ->execute();
        $files = mpull($files, null, 'getPHID');
      } else {
        $files = array();
      }

      foreach ($chunks as $key => $chunk) {
        $data_phid = $chunk->getDataFilePHID();
        if (!$data_phid) {
          $chunk->attachDataFile(null);
          continue;
        }

        $file = idx($files, $data_phid);
        if (!$file) {
          unset($chunks[$key]);
          $this->didRejectResult($chunk);
          continue;
        }

        $chunk->attachDataFile($file);
      }

      if (!$chunks) {
        return $chunks;
      }
    }

    return $chunks;
  }

  protected function buildWhereClause(AphrontDatabaseConnection $conn_r) {
    $where = array();

    if ($this->chunkHandles !== null) {
      $where[] = qsprintf(
        $conn_r,
        'chunkHandle IN (%Ls)',
        $this->chunkHandles);
    }

    if ($this->rangeStart !== null) {
      $where[] = qsprintf(
        $conn_r,
        'byteEnd > %d',
        $this->rangeStart);
    }

    if ($this->rangeEnd !== null) {
      $where[] = qsprintf(
        $conn_r,
        'byteStart < %d',
        $this->rangeEnd);
    }

    if ($this->isComplete !== null) {
      if ($this->isComplete) {
        $where[] = qsprintf(
          $conn_r,
          'dataFilePHID IS NOT NULL');
      } else {
        $where[] = qsprintf(
          $conn_r,
          'dataFilePHID IS NULL');
      }
    }

    $where[] = $this->buildPagingClause($conn_r);

    return $this->formatWhereClause($where);
  }

  public function getQueryApplicationClass() {
    return 'PhabricatorFilesApplication';
  }

}
Add a chunking storage engine for files Summary: Ref T7149. This isn't complete and isn't active yet, but does basically work. I'll shore it up in the next few diffs. The new workflow goes like this: > Client, file.allocate(): I'd like to upload a file with length L, metadata M, and hash H. Then the server returns `upload` (a boolean) and `filePHID` (a PHID). These mean: \| upload \| filePHID \| means \| \|---\|---\|---\| \| false \| false \| Server can't accept file. \| false \| true \| File data already known, file created from hash. \| true \| false \| Just upload normally. \| true \| true \| Query chunks to start or resume a chunked upload. All but the last case are uninteresting and work like exising uploads with `file.uploadhash` (which we can eventually deprecate). In the last case: > Client, file.querychunks(): Give me a list of chunks that I should upload. This returns all the chunks for the file. Chunks have a start byte, an end byte, and a "complete" flag to indicate that the server already has the data. Then, the client fills in chunks by sending them: > Client, file.uploadchunk(): Here is the data for one chunk. This stuff doesn't work yet or has some caveats: - I haven't tested resume much. - Files need an "isPartial()" flag for partial uploads, and the UI needs to respect it. - The JS client needs to become chunk-aware. - Chunk size is set crazy low to make testing easier. - Some debugging flags that I'll remove soon-ish. - Downloading works, but still streams the whole file into memory. - This storage engine is disabled by default (hardcoded as a unit test engine) because it's still sketchy. - Need some code to remove the "isParital" flag when the last chunk is uploaded. - Maybe do checksumming on chunks. Test Plan: - Hacked up `arc upload` (see next diff) to be chunk-aware and uploaded a readme in 18 32-byte chunks. Then downloaded it. Got the same file back that I uploaded. - File UI now shows some basic chunk info for chunked files: {F336434} Reviewers: btrahan Reviewed By: btrahan Subscribers: joshuaspence, epriestley Maniphest Tasks: T7149 Differential Revision: https://secure.phabricator.com/D12060 2015-03-13 11:30:02 -07:00			`<?php`

			`final class PhabricatorFileChunkQuery`
			`extends PhabricatorCursorPagedPolicyAwareQuery {`

			`private $chunkHandles;`
			`private $rangeStart;`
			`private $rangeEnd;`
Add support for partially uploaded files Summary: Ref T7149. This flags allocated but incomplete files and doesn't explode when trying to download them. Files are marked complete when the last chunk is uploaded. I added a key on `<authorPHID, isPartial>` so we can show you a list of partially uploaded files and prompt you to resume them at some point down the road. Test Plan: Massaged debugging settings and uploaded README.md very slowly in 32b chunks. Saw the file lose its "Partial" flag when the last chunk finished. Reviewers: btrahan Reviewed By: btrahan Subscribers: joshuaspence, epriestley Maniphest Tasks: T7149 Differential Revision: https://secure.phabricator.com/D12063 2015-03-13 11:30:24 -07:00			`private $isComplete;`
Add a chunking storage engine for files Summary: Ref T7149. This isn't complete and isn't active yet, but does basically work. I'll shore it up in the next few diffs. The new workflow goes like this: > Client, file.allocate(): I'd like to upload a file with length L, metadata M, and hash H. Then the server returns `upload` (a boolean) and `filePHID` (a PHID). These mean: \| upload \| filePHID \| means \| \|---\|---\|---\| \| false \| false \| Server can't accept file. \| false \| true \| File data already known, file created from hash. \| true \| false \| Just upload normally. \| true \| true \| Query chunks to start or resume a chunked upload. All but the last case are uninteresting and work like exising uploads with `file.uploadhash` (which we can eventually deprecate). In the last case: > Client, file.querychunks(): Give me a list of chunks that I should upload. This returns all the chunks for the file. Chunks have a start byte, an end byte, and a "complete" flag to indicate that the server already has the data. Then, the client fills in chunks by sending them: > Client, file.uploadchunk(): Here is the data for one chunk. This stuff doesn't work yet or has some caveats: - I haven't tested resume much. - Files need an "isPartial()" flag for partial uploads, and the UI needs to respect it. - The JS client needs to become chunk-aware. - Chunk size is set crazy low to make testing easier. - Some debugging flags that I'll remove soon-ish. - Downloading works, but still streams the whole file into memory. - This storage engine is disabled by default (hardcoded as a unit test engine) because it's still sketchy. - Need some code to remove the "isParital" flag when the last chunk is uploaded. - Maybe do checksumming on chunks. Test Plan: - Hacked up `arc upload` (see next diff) to be chunk-aware and uploaded a readme in 18 32-byte chunks. Then downloaded it. Got the same file back that I uploaded. - File UI now shows some basic chunk info for chunked files: {F336434} Reviewers: btrahan Reviewed By: btrahan Subscribers: joshuaspence, epriestley Maniphest Tasks: T7149 Differential Revision: https://secure.phabricator.com/D12060 2015-03-13 11:30:02 -07:00			`private $needDataFiles;`

			`public function withChunkHandles(array $handles) {`
			`$this->chunkHandles = $handles;`
			`return $this;`
			`}`

			`public function withByteRange($start, $end) {`
			`$this->rangeStart = $start;`
			`$this->rangeEnd = $end;`
			`return $this;`
			`}`

Add support for partially uploaded files Summary: Ref T7149. This flags allocated but incomplete files and doesn't explode when trying to download them. Files are marked complete when the last chunk is uploaded. I added a key on `<authorPHID, isPartial>` so we can show you a list of partially uploaded files and prompt you to resume them at some point down the road. Test Plan: Massaged debugging settings and uploaded README.md very slowly in 32b chunks. Saw the file lose its "Partial" flag when the last chunk finished. Reviewers: btrahan Reviewed By: btrahan Subscribers: joshuaspence, epriestley Maniphest Tasks: T7149 Differential Revision: https://secure.phabricator.com/D12063 2015-03-13 11:30:24 -07:00			`public function withIsComplete($complete) {`
			`$this->isComplete = $complete;`
			`return $this;`
			`}`

Add a chunking storage engine for files Summary: Ref T7149. This isn't complete and isn't active yet, but does basically work. I'll shore it up in the next few diffs. The new workflow goes like this: > Client, file.allocate(): I'd like to upload a file with length L, metadata M, and hash H. Then the server returns `upload` (a boolean) and `filePHID` (a PHID). These mean: \| upload \| filePHID \| means \| \|---\|---\|---\| \| false \| false \| Server can't accept file. \| false \| true \| File data already known, file created from hash. \| true \| false \| Just upload normally. \| true \| true \| Query chunks to start or resume a chunked upload. All but the last case are uninteresting and work like exising uploads with `file.uploadhash` (which we can eventually deprecate). In the last case: > Client, file.querychunks(): Give me a list of chunks that I should upload. This returns all the chunks for the file. Chunks have a start byte, an end byte, and a "complete" flag to indicate that the server already has the data. Then, the client fills in chunks by sending them: > Client, file.uploadchunk(): Here is the data for one chunk. This stuff doesn't work yet or has some caveats: - I haven't tested resume much. - Files need an "isPartial()" flag for partial uploads, and the UI needs to respect it. - The JS client needs to become chunk-aware. - Chunk size is set crazy low to make testing easier. - Some debugging flags that I'll remove soon-ish. - Downloading works, but still streams the whole file into memory. - This storage engine is disabled by default (hardcoded as a unit test engine) because it's still sketchy. - Need some code to remove the "isParital" flag when the last chunk is uploaded. - Maybe do checksumming on chunks. Test Plan: - Hacked up `arc upload` (see next diff) to be chunk-aware and uploaded a readme in 18 32-byte chunks. Then downloaded it. Got the same file back that I uploaded. - File UI now shows some basic chunk info for chunked files: {F336434} Reviewers: btrahan Reviewed By: btrahan Subscribers: joshuaspence, epriestley Maniphest Tasks: T7149 Differential Revision: https://secure.phabricator.com/D12060 2015-03-13 11:30:02 -07:00			`public function needDataFiles($need) {`
			`$this->needDataFiles = $need;`
			`return $this;`
			`}`

			`protected function loadPage() {`
			`$table = new PhabricatorFileChunk();`
			`$conn_r = $table->establishConnection('r');`

			`$data = queryfx_all(`
			`$conn_r,`
			`'SELECT * FROM %T %Q %Q %Q',`
			`$table->getTableName(),`
			`$this->buildWhereClause($conn_r),`
			`$this->buildOrderClause($conn_r),`
			`$this->buildLimitClause($conn_r));`

			`return $table->loadAllFromArray($data);`
			`}`

			`protected function willFilterPage(array $chunks) {`

			`if ($this->needDataFiles) {`
			`$file_phids = mpull($chunks, 'getDataFilePHID');`
			`$file_phids = array_filter($file_phids);`
			`if ($file_phids) {`
			`$files = id(new PhabricatorFileQuery())`
			`->setViewer($this->getViewer())`
			`->setParentQuery($this)`
			`->withPHIDs($file_phids)`
			`->execute();`
			`$files = mpull($files, null, 'getPHID');`
			`} else {`
			`$files = array();`
			`}`

			`foreach ($chunks as $key => $chunk) {`
			`$data_phid = $chunk->getDataFilePHID();`
			`if (!$data_phid) {`
			`$chunk->attachDataFile(null);`
			`continue;`
			`}`

			`$file = idx($files, $data_phid);`
			`if (!$file) {`
			`unset($chunks[$key]);`
			`$this->didRejectResult($chunk);`
			`continue;`
			`}`

			`$chunk->attachDataFile($file);`
			`}`

			`if (!$chunks) {`
			`return $chunks;`
			`}`
			`}`

			`return $chunks;`
			`}`

Make buildWhereClause() a method of AphrontCursorPagedPolicyAwareQuery Summary: Ref T4100. Ref T5595. To support a unified "Projects:" query across all applications, a future diff is going to add a set of "Edge Logic" capabilities to `PolicyAwareQuery` which write the required SELECT, JOIN, WHERE, HAVING and GROUP clauses for you. With the addition of "Edge Logic", we'll have three systems which may need to build components of query claues: ordering/paging, customfields/applicationsearch, and edge logic. For most clauses, queries don't currently call into the parent explicitly to get default components. I want to move more query construction logic up the class tree so it can be shared. For most methods, this isn't a problem, but many subclasses define a `buildWhereClause()`. Make all such definitions protected and consistent. This causes no behavioral changes. Test Plan: Ran `arc unit --everything`, which does a pretty through job of verifying this statically. Reviewers: btrahan Reviewed By: btrahan Subscribers: yelirekim, hach-que, epriestley Maniphest Tasks: T4100, T5595 Differential Revision: https://secure.phabricator.com/D12453 2015-04-18 07:08:30 -07:00			`protected function buildWhereClause(AphrontDatabaseConnection $conn_r) {`
Add a chunking storage engine for files Summary: Ref T7149. This isn't complete and isn't active yet, but does basically work. I'll shore it up in the next few diffs. The new workflow goes like this: > Client, file.allocate(): I'd like to upload a file with length L, metadata M, and hash H. Then the server returns `upload` (a boolean) and `filePHID` (a PHID). These mean: \| upload \| filePHID \| means \| \|---\|---\|---\| \| false \| false \| Server can't accept file. \| false \| true \| File data already known, file created from hash. \| true \| false \| Just upload normally. \| true \| true \| Query chunks to start or resume a chunked upload. All but the last case are uninteresting and work like exising uploads with `file.uploadhash` (which we can eventually deprecate). In the last case: > Client, file.querychunks(): Give me a list of chunks that I should upload. This returns all the chunks for the file. Chunks have a start byte, an end byte, and a "complete" flag to indicate that the server already has the data. Then, the client fills in chunks by sending them: > Client, file.uploadchunk(): Here is the data for one chunk. This stuff doesn't work yet or has some caveats: - I haven't tested resume much. - Files need an "isPartial()" flag for partial uploads, and the UI needs to respect it. - The JS client needs to become chunk-aware. - Chunk size is set crazy low to make testing easier. - Some debugging flags that I'll remove soon-ish. - Downloading works, but still streams the whole file into memory. - This storage engine is disabled by default (hardcoded as a unit test engine) because it's still sketchy. - Need some code to remove the "isParital" flag when the last chunk is uploaded. - Maybe do checksumming on chunks. Test Plan: - Hacked up `arc upload` (see next diff) to be chunk-aware and uploaded a readme in 18 32-byte chunks. Then downloaded it. Got the same file back that I uploaded. - File UI now shows some basic chunk info for chunked files: {F336434} Reviewers: btrahan Reviewed By: btrahan Subscribers: joshuaspence, epriestley Maniphest Tasks: T7149 Differential Revision: https://secure.phabricator.com/D12060 2015-03-13 11:30:02 -07:00			`$where = array();`

			`if ($this->chunkHandles !== null) {`
			`$where[] = qsprintf(`
			`$conn_r,`
			`'chunkHandle IN (%Ls)',`
			`$this->chunkHandles);`
			`}`

			`if ($this->rangeStart !== null) {`
			`$where[] = qsprintf(`
			`$conn_r,`
			`'byteEnd > %d',`
			`$this->rangeStart);`
			`}`

			`if ($this->rangeEnd !== null) {`
			`$where[] = qsprintf(`
			`$conn_r,`
			`'byteStart < %d',`
			`$this->rangeEnd);`
			`}`

Add support for partially uploaded files Summary: Ref T7149. This flags allocated but incomplete files and doesn't explode when trying to download them. Files are marked complete when the last chunk is uploaded. I added a key on `<authorPHID, isPartial>` so we can show you a list of partially uploaded files and prompt you to resume them at some point down the road. Test Plan: Massaged debugging settings and uploaded README.md very slowly in 32b chunks. Saw the file lose its "Partial" flag when the last chunk finished. Reviewers: btrahan Reviewed By: btrahan Subscribers: joshuaspence, epriestley Maniphest Tasks: T7149 Differential Revision: https://secure.phabricator.com/D12063 2015-03-13 11:30:24 -07:00			`if ($this->isComplete !== null) {`
			`if ($this->isComplete) {`
			`$where[] = qsprintf(`
			`$conn_r,`
			`'dataFilePHID IS NOT NULL');`
			`} else {`
			`$where[] = qsprintf(`
			`$conn_r,`
			`'dataFilePHID IS NULL');`
			`}`
			`}`

Add a chunking storage engine for files Summary: Ref T7149. This isn't complete and isn't active yet, but does basically work. I'll shore it up in the next few diffs. The new workflow goes like this: > Client, file.allocate(): I'd like to upload a file with length L, metadata M, and hash H. Then the server returns `upload` (a boolean) and `filePHID` (a PHID). These mean: \| upload \| filePHID \| means \| \|---\|---\|---\| \| false \| false \| Server can't accept file. \| false \| true \| File data already known, file created from hash. \| true \| false \| Just upload normally. \| true \| true \| Query chunks to start or resume a chunked upload. All but the last case are uninteresting and work like exising uploads with `file.uploadhash` (which we can eventually deprecate). In the last case: > Client, file.querychunks(): Give me a list of chunks that I should upload. This returns all the chunks for the file. Chunks have a start byte, an end byte, and a "complete" flag to indicate that the server already has the data. Then, the client fills in chunks by sending them: > Client, file.uploadchunk(): Here is the data for one chunk. This stuff doesn't work yet or has some caveats: - I haven't tested resume much. - Files need an "isPartial()" flag for partial uploads, and the UI needs to respect it. - The JS client needs to become chunk-aware. - Chunk size is set crazy low to make testing easier. - Some debugging flags that I'll remove soon-ish. - Downloading works, but still streams the whole file into memory. - This storage engine is disabled by default (hardcoded as a unit test engine) because it's still sketchy. - Need some code to remove the "isParital" flag when the last chunk is uploaded. - Maybe do checksumming on chunks. Test Plan: - Hacked up `arc upload` (see next diff) to be chunk-aware and uploaded a readme in 18 32-byte chunks. Then downloaded it. Got the same file back that I uploaded. - File UI now shows some basic chunk info for chunked files: {F336434} Reviewers: btrahan Reviewed By: btrahan Subscribers: joshuaspence, epriestley Maniphest Tasks: T7149 Differential Revision: https://secure.phabricator.com/D12060 2015-03-13 11:30:02 -07:00			`$where[] = $this->buildPagingClause($conn_r);`

			`return $this->formatWhereClause($where);`
			`}`

			`public function getQueryApplicationClass() {`
			`return 'PhabricatorFilesApplication';`
			`}`

			`}`