Remove docs that are too outdated to be updated

(rewrite will be better).
2024-04-04 12:36:23 +02:00 · 2009-05-01 11:28:52 +03:00 · 2009-05-01 11:28:52 +03:00 · be06858d5c
commit be06858d5c
parent 0255401e57
5 changed files with 0 additions and 956 deletions
--- a/doc/liblzma-advanced.txt
+++ b/doc/liblzma-advanced.txt
@ -1,324 +0,0 @@
 Advanced features of liblzma
 ----------------------------
 0. Introduction
    Most developers need only the basic features of liblzma. These
    features allow single-threaded encoding and decoding of .lzma files
    in streamed mode.
    In some cases developers want more. The .lzma file format is
    designed to allow multi-threaded encoding and decoding and limited
    random-access reading. These features are possible in non-streamed
    mode and limitedly also in streamed mode.
    To take advange of these features, the application needs a custom
    .lzma file format handler. liblzma provides a set of tools to ease
    this task, but it's still quite a bit of work to get a good custom
    .lzma handler done.
 1. Where to begin
    Start by reading the .lzma file format specification. Understanding
    the basics of the .lzma file structure is required to implement a
    custom .lzma file handler and to understand the rest of this document.
 2. The basic components
 2.1. Stream Header and tail
    Stream Header begins the .lzma Stream and Stream tail ends it. Stream
    Header is defined in the file format specification, but Stream tail
    isn't (thus I write "tail" with a lower-case letter). Stream tail is
    simply the Stream Flags and the Footer Magic Bytes fields together.
    It was done this way in liblzma, because the Block coders take care
    of the rest of the stuff in the Stream Footer.
    For now, the size of Stream Header is fixed to 11 bytes. The header
    <lzma/stream_flags.h> defines LZMA_STREAM_HEADER_SIZE, which you
    should use instead of a hardcoded number. Similarly, Stream tail
    is fixed to 3 bytes, and there is a constant LZMA_STREAM_TAIL_SIZE.
    It is possible, that a future version of the .lzma format will have
    variable-sized Stream Header and tail. As of writing, this seems so
    unlikely though, that it was considered simplest to just use a
    constant instead of providing a functions to get and store the sizes
    of the Stream Header and tail.
 2.x. Stream tail
    For now, the size of Stream tail is fixed to 3 bytes. The header
    <lzma/stream_flags.h> defines LZMA_STREAM_TAIL_SIZE, which you
    should use instead of a hardcoded number.
 3. Keeping track of size information
    The lzma_info_* functions found from <lzma/info.h> should ease the
    task of keeping track of sizes of the Blocks and also the Stream
    as a whole. Using these functions is strongly recommended, because
    there are surprisingly many situations where an error can occur,
    and these functions check for possible errors every time some new
    information becomes available.
    If you find lzma_info_* functions lacking something that you would
    find useful, please contact the author.
 3.1. Start offset of the Stream
    If you are storing the .lzma Stream inside anothe file format, or
    for some other reason are placing the .lzma Stream to somewhere
    else than to the beginning of the file, you should tell the starting
    offset of the Stream using lzma_info_start_offset_set().
    The start offset of the Stream is used for two distinct purporses.
    First, knowing the start offset of the Stream allows
    lzma_info_alignment_get() to correctly calculate the alignment of
    every Block. This information is given to the Block encoder, which
    will calculate the size of Header Padding so that Compressed Data
    is alignment at an optimal offset.
    Another use for start offset of the Stream is in random-access
    reading. If you set the start offset of the Stream, lzma_info_locate()
    will be able to calculate the offset relative to the beginning of the
    file containing the Stream (instead of offset relative to the
    beginning of the Stream).
 3.2. Size of Stream Header
    While the size of Stream Header is constant (11 bytes) in the current
    version of the .lzma file format, this may change in future.
 3.3. Size of Header Metadata Block
    This information is needed when doing random-access reading, and
    to verify the value of this field stored in Footer Metadata Block.
 3.4. Total Size of the Data Blocks
 3.5. Uncompressed Size of Data Blocks
 3.6. Index
 x. Alignment
    There are a few slightly different types of alignment issues when
    working with .lzma files.
    The .lzma format doesn't strictly require any kind of alignment.
    However, if the encoder carefully optimizes the alignment in all
    situations, it can improve compression ratio, speed of the encoder
    and decoder, and slightly help if the files get damaged and need
    recovery.
    Alignment has the most significant effect compression ratio FIXME
 x.1. Compression ratio
    Some filters take advantage of the alignment of the input data.
    To get the best compression ratio, make sure that you feed these
    filters correctly aligned data.
    Some filters (e.g. LZMA) don't necessarily mind too much if the
    input doesn't match the preferred alignment. With these filters
    the penalty in compression ratio depends on the specific type of
    data being compressed.
    Other filters (e.g. PowerPC executable filter) won't work at all
    with data that is improperly aligned. While the data can still
    be de-filtered back to its original form, the benefit of the
    filtering (better compression ratio) is completely lost, because
    these filters expect certain patterns at properly aligned offsets.
    The compression ratio may even worse with incorrectly aligned input
    than without the filter.
 x.1.1. Inter-filter alignment
    When there are multiple filters chained, checking the alignment can
    be useful not only with the input of the first filter and output of
    the last filter, but also between the filters.
    Inter-filter alignment important especially with the Subblock filter.
 x.1.2. Further compression with external tools
    This is relatively rare situation in practice, but still worth
    understanding.
    Let's say that there are several SPARC executables, which are each
    filtered to separate .lzma files using only the SPARC filter. If
    Uncompressed Size is written to the Block Header, the size of Block
    Header may vary between the .lzma files. If no Padding is used in
    the Block Header to correct the alignment, the starting offset of
    the Compressed Data field will be differently aligned in different
    .lzma files.
    All these .lzma files are archived into a single .tar archive. Due
    to nature of the .tar format, every file is aligned inside the
    archive to an offset that is a multiple of 512 bytes.
    The .tar archive is compressed into a new .lzma file using the LZMA
    filter with options, that prefer input alignment of four bytes. Now
    if the independent .lzma files don't have the same alignment of
    the Compressed Data fields, the LZMA filter will be unable to take
    advantage of the input alignment between the files in the .tar
    archive, which reduces compression ratio.
    Thus, even if you have only single Block per file, it can be good for
    compression ratio to align the Compressed Data to optimal offset.
 x.2. Speed
    Most modern computers are faster when multi-byte data is located
    at aligned offsets in RAM. Proper alignment of the Compressed Data
    fields can slightly increase the speed of some filters.
 x.3. Recovery
    Aligning every Block Header to start at an offset with big enough
    alignment may ease or at least speed up recovery of broken files.
 y. Typical usage cases
 y.x. Parsing the Stream backwards
    You may need to parse the Stream backwards if you need to get
    information such as the sizes of the Stream, Index, or Extra.
    The basic procedure to do this follows.
    Locate the end of the Stream. If the Stream is stored as is in a
    standalone .lzma file, simply seek to the end of the file and start
    reading backwards using appropriate buffer size. The file format
    specification allows arbitrary amount of Footer Padding (zero or more
    NUL bytes), which you skip before trying to decode the Stream tail.
    Once you have located the end of the Stream (a non-NULL byte), make
    sure you have at least the last LZMA_STREAM_TAIL_SIZE bytes of the
    Stream in a buffer. If there isn't enough bytes left from the file,
    the file is too small to contain a valid Stream. Decode the Stream
    tail using lzma_stream_tail_decoder(). Store the offset of the first
    byte of the Stream tail; you will need it later.
    You may now want to do some internal verifications e.g. if the Check
    type is supported by the liblzma build you are using.
    Decode the Backward Size field with lzma_vli_reverse_decode(). The
    field is at maximum of LZMA_VLI_BYTES_MAX bytes long. Check that
    Backward Size is not zero. Store the offset of the first byte of
    the Backward Size; you will need it later.
    Now you know the Total Size of the last Block of the Stream. It's the
    value of Backward Size plus the size of the Backward Size field. Note
    that you cannot use lzma_vli_size() to calculate the size since there
    might be padding; you need to use the real observed size of the
    Backward Size field.
    At this point, the operation continues differently for Single-Block
    and Multi-Block Streams.
 y.x.1. Single-Block Stream
    There might be Uncompressed Size field present in the Stream Footer.
    You cannot know it for sure unless you have already parsed the Block
    Header earlier. For security reasons, you probably want to try to
    decode the Uncompressed Size field, but you must not indicate any
    error if decoding fails. Later you can give the decoded Uncompressed
    Size to Block decoder if Uncopmressed Size isn't otherwise known;
    this prevents it from producing too much output in case of (possibly
    intentionally) corrupt file.
    Calculate the start offset of the Stream:
        backward_offset - backward_size - LZMA_STREAM_HEADER_SIZE
    backward_offset is the offset of the first byte of the Backward Size
    field. Remember to check for integer overflows, which can occur with
    invalid input files.
    Seek to the beginning of the Stream. Decode the Stream Header using
    lzma_stream_header_decoder(). Verify that the decoded Stream Flags
    match the values found from Stream tail. You can use the
    lzma_stream_flags_is_equal() macro for this.
    Decode the Block Header. Verify that it isn't a Metadata Block, since
    Single-Block Streams cannot have Metadata. If Uncompressed Size is
    present in the Block Header, the value you tried to decode from the
    Stream Footer must be ignored, since Uncompressed Size wasn't actually
    present there. If Block Header doesn't have Uncompressed Size, and
    decoding the Uncompressed Size field from the Stream Footer failed,
    the file is corrupt.
    If you were only looking for the Uncompressed Size of the Stream,
    you now got that information, and you can stop processing the Stream.
    To decode the Block, the same instructions apply as described in
    FIXME. However, because you have some extra known information decoded
    from the Stream Footer, you should give this information to the Block
    decoder so that it can verify it while decoding:
      - If Uncompressed Size is not present in the Block Header, set
        lzma_options_block.uncompressed_size to the value you decoded
        from the Stream Footer.
      - Always set lzma_options_block.total_size to backward_size +
        size_of_backward_size (you calculated this sum earlier already).
 y.x.2. Multi-Block Stream
    Calculate the start offset of the Footer Metadata Block:
        backward_offset - backward_size
    backward_offset is the offset of the first byte of the Backward Size
    field. Remember to check for integer overflows, which can occur with
    broken input files.
    Decode the Block Header. Verify that it is a Metadata Block. Set
    lzma_options_block.total_size to backward_size + size_of_backward_size
    (you calculated this sum earlier already). Then decode the Footer
    Metadata Block.
    Store the decoded Footer Metadata to lzma_info structure using
    lzma_info_set_metadata(). Set also the offset of the Backward Size
    field using lzma_info_size_set(). Then you can get the start offset
    of the Stream using lzma_info_size_get(). Note that any of these steps
    may fail so don't omit error checking.
    Seek to the beginning of the Stream. Decode the Stream Header using
    lzma_stream_header_decoder(). Verify that the decoded Stream Flags
    match the values found from Stream tail. You can use the
    lzma_stream_flags_is_equal() macro for this.
    If you were only looking for the Uncompressed Size of the Stream,
    it's possible that you already have it now. If Uncompressed Size (or
    whatever information you were looking for) isn't available yet,
    continue by decoding also the Header Metadata Block. (If some
    information is missing, the Header Metadata Block has to be present.)
    Decoding the Data Blocks goes the same way as described in FIXME.
 y.x.3. Variations
    If you know the offset of the beginning of the Stream, you may want
    to parse the Stream Header before parsing the Stream tail.
--- a/doc/liblzma-hacking.txt
+++ b/doc/liblzma-hacking.txt
@ -1,112 +0,0 @@
 Hacking liblzma
 ---------------
 0. Preface
    This document gives some overall information about the internals of
    liblzma, which should make it easier to start reading and modifying
    the code.
 1. Programming language
    liblzma was written in C99. If you use GCC, this means that you need
    at least GCC 3.x.x. GCC 2 isn't and won't be supported.
    Some GCC-specific extensions are used *conditionally*. They aren't
    required to build a full-featured library. Don't make the code rely
    on any non-standard compiler extensions or even C99 features that
    aren't portable between almost-C99 compatible compilers (for example
    non-static inlines).
    The public API headers are in C89. This is to avoid frustrating those
    who maintain programs, which are strictly in C89 or C++.
    An assumption about sizeof(size_t) is made. If this assumption is
    wrong, some porting is probably needed:
        sizeof(uint32_t) <= sizeof(size_t) <= sizeof(uint64_t)
 2. Internal vs. external API
        Input                         Output
          v     Application             ^
          |     liblzma public API      |
          |     Stream coder            |
          |     Block coder             |
          |     Filter coder            |
          |     ...                     |
          v     Filter coder            ^
        Application
          `-- liblzma public API
                `-- Stream coder
                      |-- Stream info handler
                      |-- Stream Header coder
                      |-- Block Header coder
                      |     `-- Filter Flags coder
                      |-- Metadata coder
                      |     `-- Block coder
                      |           `-- Filter 0
                      |                 `-- Filter 1
                      |                     ...
                      |-- Data Block coder
                      |     `-- Filter 0
                      |           `-- Filter 1
                      |               ...
                      `-- Stream tail coder
 x. Designing new filters
    All filters must be designed so that the decoder cannot consume
    arbitrary amount input without producing any decoded output. Failing
    to follow this rule makes liblzma vulnerable to DoS attacks if
    untrusted files are decoded (usually they are untrusted).
    An example should clarify the reason behind this requirement: There
    are two filters in the chain. The decoder of the first filter produces
    huge amount of output (many gigabytes or more) with a few bytes of
    input, which gets passed to the decoder of the second filter. If the
    data passed to the second filter is interpreted as something that
    produces no output (e.g. padding), the filter chain as a whole
    produces no output and consumes no input for a long period of time.
    The above problem was present in the first versions of the Subblock
    filter. A tiny .lzma file could have taken several years to decode
    while it wouldn't produce any output at all. The problem was fixed
    by adding limits for number of consecutive Padding bytes, and requiring
    that some decoded output must be produced between Set Subfilter and
    Unset Subfilter.
 x. Implementing new filters
    If the filter supports embedding End of Payload Marker, make sure that
    when your filter detects End of Payload Marker,
      - the usage of End of Payload Marker is actually allowed (i.e. End
        of Input isn't used); and
      - it also checks that there is no more input coming from the next
        filter in the chain.
    The second requirement is slightly tricky. It's possible that the next
    filter hasn't returned LZMA_STREAM_END yet. It may even need a few
    bytes more input before it will do so. You need to give it as much
    input as it needs, and verify that it doesn't produce any output.
    Don't call the next filter in the chain after it has returned
    LZMA_STREAM_END (except in encoder if action == LZMA_SYNC_FLUSH).
    It will result undefined behavior.
    Be pedantic. If the input data isn't exactly valid, reject it.
    At the moment, liblzma isn't modular. You will need to edit several
    files in src/liblzma/common to include support for a new filter. grep
    for LZMA_FILTER_LZMA to locate the files needing changes.
--- a/doc/liblzma-intro.txt
+++ b/doc/liblzma-intro.txt
@ -1,194 +0,0 @@
 Introduction to liblzma
 -----------------------
 Writing applications to work with liblzma
    liblzma API is split in several subheaders to improve readability and
    maintainance. The subheaders must not be #included directly. lzma.h
    requires that certain integer types and macros are available when
    the header is #included. On systems that have inttypes.h that conforms
    to C99, the following will work:
        #include <sys/types.h>
        #include <inttypes.h>
        #include <lzma.h>
    Those who have used zlib should find liblzma's API easy to use.
    To developers who haven't used zlib before, I recommend learning
    zlib first, because zlib has excellent documentation.
    While the API is similar to that of zlib, there are some major
    differences, which are summarized below.
    For basic stream encoding, zlib has three functions (deflateInit(),
    deflate(), and deflateEnd()). Similarly, there are three functions
    for stream decoding (inflateInit(), inflate(), and inflateEnd()).
    liblzma has only single coding and ending function. Thus, to
    encode one may use, for example, lzma_stream_encoder_single(),
    lzma_code(), and lzma_end(). Simlarly for decoding, one may
    use lzma_auto_decoder(), lzma_code(), and lzma_end().
    zlib has deflateReset() and inflateReset() to reset the stream
    structure without reallocating all the memory. In liblzma, all
    coder initialization functions are like zlib's reset functions:
    the first-time initializations are done with the same functions
    as the reinitializations (resetting).
    To make all this work, liblzma needs to know when lzma_stream
    doesn't already point to an allocated and initialized coder.
    This is achieved by initializing lzma_stream structure with
    LZMA_STREAM_INIT (static initialization) or LZMA_STREAM_INIT_VAR
    (for exampple when new lzma_stream has been allocated with malloc()).
    This initialization should be done exactly once per lzma_stream
    structure to avoid leaking memory. Calling lzma_end() will leave
    lzma_stream into a state comparable to the state achieved with
    LZMA_STREAM_INIT and LZMA_STREAM_INIT_VAR.
    Example probably clarifies a lot. With zlib, compression goes
    roughly like this:
        z_stream strm;
        deflateInit(&strm, level);
        deflate(&strm, Z_RUN);
        deflate(&strm, Z_RUN);
        ...
        deflate(&strm, Z_FINISH);
        deflateEnd(&strm) or deflateReset(&strm)
    With liblzma, it's slightly different:
        lzma_stream strm = LZMA_STREAM_INIT;
        lzma_stream_encoder_single(&strm, &options);
        lzma_code(&strm, LZMA_RUN);
        lzma_code(&strm, LZMA_RUN);
        ...
        lzma_code(&strm, LZMA_FINISH);
        lzma_end(&strm) or reinitialize for new coding work
     Reinitialization in the last step can be any function that can
     initialize lzma_stream; it doesn't need to be the same function
     that was used for the previous initialization. If it is the same
     function, liblzma will usually be able to re-use most of the
     existing memory allocations (depends on how much the initialization
     options change). If you reinitialize with different function,
     liblzma will automatically free the memory of the previous coder.
 File formats
    liblzma supports multiple container formats for the compressed data.
    Different initialization functions initialize the lzma_stream to
    process different container formats. See the details from the public
    header files.
    The following functions are the most commonly used:
      - lzma_stream_encoder_single(): Encodes Single-Block Stream; this
        the recommended format for most purporses.
      - lzma_alone_encoder(): Useful if you need to encode into the
        legacy LZMA_Alone format.
      - lzma_auto_decoder(): Decoder that automatically detects the
        file format; recommended when you decode compressed files on
        disk, because this way compatibility with the legacy LZMA_Alone
        format is transparent.
      - lzma_stream_decoder(): Decoder for Single- and Multi-Block
        Streams; this is good if you want to accept only .lzma Streams.
 Filters
    liblzma supports multiple filters (algorithm implementations). The new
    .lzma format supports filter-chain having up to seven filters. In the
    filter chain, the output of one filter is input of the next filter in
    the chain. The legacy LZMA_Alone format supports only one filter, and
    that must always be LZMA.
        General-purporse compression:
            LZMA        The main algorithm of liblzma (surprise!)
        Branch/Call/Jump filters for executables:
            x86         This filter is known as BCJ in 7-Zip
            IA64        IA-64 (Itanium)
            PowerPC     Big endian PowerPC
            ARM
            ARM-Thumb
            SPARC
        Other filters:
            Copy        Dummy filter that simply copies all the data
                        from input to output.
            Subblock    Multi-purporse filter, that can
                          - embed End of Payload Marker if the previous
                            filter in the chain doesn't support it; and
                          - apply Subfilters, which filter only part
                            of the same compressed Block in the Stream.
    Branch/Call/Jump filters never change the size of the data. They
    should usually be used as a pre-filter for some compression filter
    like LZMA.
 Integrity checks
    The .lzma Stream format uses CRC32 as the integrity check for
    different file format headers. It is possible to omit CRC32 from
    the Block Headers, but not from Stream Header. This is the reason
    why CRC32 code cannot be disabled when building liblzma (in addition,
    the LZMA encoder uses CRC32 for hashing, so that's another reason).
    The integrity check of the actual data is calculated from the
    uncompressed data. This check can be CRC32, CRC64, or SHA256.
    It can also be omitted completely, although that usually is not
    a good thing to do. There are free IDs left, so support for new
    checks algorithms can be added later.
 API and ABI stability
    The API and ABI of liblzma isn't stable yet, although no huge
    changes should happen. One potential place for change is the
    lzma_options_subblock structure.
    In the 4.42.0alpha phase, the shared library version number won't
    be updated even if ABI breaks. I don't want to track the ABI changes
    yet. Just rebuild everything when you upgrade liblzma until we get
    to the beta stage.
 Size of the library
    While liblzma isn't huge, it is quite far from the smallest possible
    LZMA implementation: full liblzma binary (with support for all
    filters and other features) is way over 100 KiB, but the plain raw
    LZMA decoder is only 5-10 KiB.
    To decrease the size of the library, you can omit parts of the library
    by passing certain options to the `configure' script. Disabling
    everything but the decoders of the require filters will usually give
    you a small enough library, but if you need a decoder for example
    embedded in the operating system kernel, the code from liblzma probably
    isn't suitable as is.
    If you need a minimal implementation supporting .lzma Streams, you
    may need to do partial rewrite. liblzma uses stateful API like zlib.
    That increases the size of the library. Using callback API or even
    simpler buffer-to-buffer API would allow smaller implementation.
    LZMA SDK contains smaller LZMA decoder written in ANSI-C than
    liblzma, so you may want to take a look at that code. However,
    it doesn't (at least not yet) support the new .lzma Stream format.
 Documentation
    There's no other documentation than the public headers and this
    text yet. Real docs will be written some day, I hope.
--- a/doc/liblzma-security.txt
+++ b/doc/liblzma-security.txt
@ -1,219 +0,0 @@
 Using liblzma securely
 ----------------------
 0. Introduction
    This document discusses how to use liblzma securely. There are issues
    that don't apply to zlib or libbzip2, so reading this document is
    strongly recommended even for those who are very familiar with zlib
    or libbzip2.
    While making liblzma itself as secure as possible is essential, it's
    out of scope of this document.
 1. Memory usage
    The memory usage of liblzma varies a lot.
 1.1. Problem sources
 1.1.1. Block coder
    The memory requirements of Block encoder depend on the used filters
    and their settings. The memory requirements of the Block decoder
    depend on the which filters and with which filter settings the Block
    was encoded. Usually the memory requirements of a decoder are equal
    or less than the requirements of the encoder with the same settings.
    While the typical memory requirements to decode a Block is from a few
    hundred kilobytes to tens of megabytes, a maliciously constructed
    files can require a lot more RAM to decode. With the current filters,
    the maximum amount is about 7 GiB. If you use multi-threaded decoding,
    every Block can require this amount of RAM, thus a four-threaded
    decoder could suddenly try to allocate 28 GiB of RAM.
    If you don't limit the maximum memory usage in any way, and there are
    no resource limits set on the operating system side, one malicious
    input file can run the system out of memory, or at least make it swap
    badly for a long time. This is exceptionally bad on servers e.g.
    email server doing virus scanning on incoming messages.
 1.1.2. Metadata decoder
    Multi-Block .lzma files contain at least one Metadata Block.
    Externally the Metadata Blocks are similar to Data Blocks, so all
    the issues mentioned about memory usage of Data Blocks applies to
    Metadata Blocks too.
    The uncompressed content of Metadata Blocks contain information about
    the Stream as a whole, and optionally some Extra Records. The
    information about the Stream is kept in liblzma's internal data
    structures in RAM. Extra Records can contain arbitrary data. They are
    not interpreted by liblzma, but liblzma will provide them to the
    application in uninterpreted form if the application wishes so.
    Usually the Uncompressed Size of a Metadata Block is small. Even on
    extreme cases, it shouldn't be much bigger than a few megabytes. Once
    the Metadata has been parsed into native data structures in liblzma,
    it usually takes a little more memory than in the encoded form. For
    all normal files, this is no problem, since the resulting memory usage
    won't be too much.
    The problem is that a maliciously constructed Metadata Block can
    contain huge amount of "information", which liblzma will try to store
    in its internal data structures. This may cause liblzma to allocate
    all the available RAM unless some kind of resource usage limits are
    applied.
    Note that the Extra Records in Metadata are always parsed but, but
    memory is allocated for them only if the application has requested
    liblzma to provide the Extra Records to the application.
 1.2. Solutions
    If you need to decode files from untrusted sources (most people do),
    you must limit the memory usage to avoid denial of service (DoS)
    conditions caused by malicious input files.
    The first step is to find out how much memory you are allowed consume
    at maximum. This may be a hardcoded constant or derived from the
    available RAM; whatever is appropriate in the application.
    The simplest solution is to use setrlimit() if the kernel supports
    RLIMIT_AS, which limits the memory usage of the whole process.
    For more portable and fine-grained limiting, you can use
    memory limiter functions found from <lzma/memlimit.h>.
 1.2.1. Encoder
    lzma_memory_usage() will give you a rough estimate about the memory
    usage of the given filter chain. To dramatically simplify the internal
    implementation, this function doesn't take into account all the small
    helper data structures needed in various places; only the structures
    with significant memory usage are taken into account. Still, the
    accuracy of this function should be well within a mebibyte.
    The Subblock filter is a special case. If a Subfilter has been
    specified, it isn't taken into account when lzma_memory_usage()
    calculates the memory usage. You need to calculate the memory usage
    of the Subfilter separately.
    Keeping track of Blocks in a Multi-Block Stream takes a few dozen
    bytes of RAM per Block (size of the lzma_index structure plus overhead
    of malloc()). It isn't a good idea to put tens of thousands of Blocks
    into a Stream unless you have a very good reason to do so (compressed
    dictionary could be an example of such situation).
    Also keep the number and sizes of Extra Records sane. If you produce
    the list of Extra Records automatically from some untrusted source,
    you should not only validate the content of these Records, but also
    their memory usage.
 1.2.2. Decoder
    A single-threaded decoder should simply use a memory limiter and
    indicate an error if it runs out of memory.
    Memory-limiting with multi-threaded decoding is tricky. The simple
    solution is to divide the maximum allowed memory usage with the
    maximum allowed threads, and give each Block decoder their own
    independent lzma_memory_limiter. The drawback is that if one Block
    needs notably more RAM than any other Block, the decoder will run out
    of memory when in reality there would be plenty of free RAM.
    An attractive alternative would be using shared lzma_memory_limiter.
    Depending on the application and the expected type of input, this may
    either be the best solution or a source of hard-to-repeat problems.
    Consider the following requirements:
      - You use a maximum of n threads.
      - x(i) is the decoder memory requirements of the Block number i
        in an expected input Stream.
      - The memory limiter is set to higher value than the sum of n
        highest values x(i).
    (If you are better at explaining the above conditions, please
    contribute your improved version.)
    If the above conditions aren't met, it is possible that the decoding
    will fail unpredictably. That is, on the same machine using the same
    settings, the decoding may sometimes succeed and sometimes fail. This
    is because sometimes threads may run so that the Blocks with highest
    memory usage are tried to be decoded at the same time.
    Most .lzma files have all the Blocks encoded with identical settings,
    or at least the memory usage won't vary dramatically. That's why most
    multi-threaded decoders probably want to use the simple "separate
    lzma_memory_limiter for each thread" solution, possibly falling back
    to single-threaded mode in case the per-thread memory limits aren't
    enough in multi-threaded mode.
 FIXME: Memory usage of Stream info.
 [
 ]
 2. Huge uncompressed output
 2.1. Data Blocks
    Decoding a tiny .lzma file can produce huge amount of uncompressed
    output. There is an example file of 45 bytes, which decodes to 64 PiB
    (that's 2^56 bytes). Uncompressing such a file to disk is likely to
    fill even a bigger disk array. If the data is written to a pipe, it
    may not fill the disk, but would still take very long time to finish.
    To avoid denial of service conditions caused by huge amount of
    uncompressed output, applications using liblzma should use some method
    to limit the amount of output produced. The exact method depends on
    the application.
    All valid .lzma Streams make it possible to find out the uncompressed
    size of the Stream without actually uncompressing the data. This
    information is available in at least one of the Metadata Blocks.
    Once the uncompressed size is parsed, the decoder can verify that
    it doesn't exceed certain limits (e.g. available disk space).
    When the uncompressed size is known, the decoder can actively keep
    track of the amount of output produced so far, and that it doesn't
    exceed the known uncompressed size. If it does exceed, the file is
    known to be corrupt and an error should be indicated without
    continuing to decode the rest of the file.
    Unfortunately, finding the uncompressed size beforehand is often
    possible only in non-streamed mode, because the needed information
    could be in the Footer Metdata Block, which (obviously) is at the
    end of the Stream. In purely streamed mode decoding, one may need to
    use some rough arbitrary limits to prevent the problems described in
    the beginning of this section.
 2.2. Metadata
    Metadata is stored in Metadata Blocks, which are very similar to
    Data Blocks. Thus, the uncompressed size can be huge just like with
    Data Blocks. The difference is, that the contents of Metadata Blocks
    aren't given to the application as is, but parsed by liblzma. Still,
    reading through a huge Metadata can take very long time, effectively
    creating a denial of service like piping decoded a Data Block to
    another process would do.
    At first it would seem that using a memory limiter would prevent
    this issue as a side effect. But it does so only if the application
    requests liblzma to allocate the Extra Records and provide them to
    the application. If Extra Records aren't requested, they aren't
    allocated either. Still, the Extra Records are being read through
    to validate that the Metadata is in proper format.
    The solution is to limit the Uncompressed Size of a Metadata Block
    to some relatively large value. This will make liblzma to give an
    error when the given limit is reached.
--- a/doc/lzma-intro.txt
+++ b/doc/lzma-intro.txt
@ -1,107 +0,0 @@
 Introduction to the lzma command line tool
 ------------------------------------------
 Overview
    The lzma command line tool is similar to gzip and bzip2, but for
    compressing and uncompressing .lzma files.
 Supported file formats
    By default, the tool creates files in the new .lzma format. This can
    be overriden with --format=FMT command line option. Use --format=alone
    to create files in the old LZMA_Alone format.
    By default, the tool uncompresses both the new .lzma format and
    LZMA_Alone format. This is to make it transparent to switch from
    the old LZMA_Alone format to the new .lzma format. Since both
    formats use the same filename suffix, average user should never
    notice which format was used.
 Differences to gzip and bzip2
  Standard input and output
    Both gzip and bzip2 refuse to write compressed data to a terminal and
    read compressed data from a terminal. With gzip (but not with bzip2),
    this can be overriden with the `--force' option. lzma follows the
    behavior of gzip here.
  Usage of LZMA_OPT environment variable
    gzip and bzip2 read GZIP and BZIP2 environment variables at startup.
    These variables may contain extra command line options.
    gzip and bzip2 allow passing not only options, but also end-of-options
    indicator (`--') and filenames via the environment variable. No quoting
    is supported with the filenames.
    Here are examples with gzip. bzip2 behaves identically.
        bash$ echo asdf > 'foo bar'
        bash$ GZIP='"foo bar"' gzip
        gzip: "foo: No such file or directory
        gzip: bar": No such file or directory
        bash$ GZIP=-- gzip --help
        gzip: --help: No such file or directory
    lzma silently ignores all non-option arguments given via the
    environment variable LZMA_OPT. Like on the command line, everything
    after `--' is taken as non-options, and thus ignored in LZMA_OPT.
        bash$ LZMA_OPT='--help' lzma --version     # Displays help
        bash$ LZMA_OPT='-- --help' lzma --version  # Displays version
 Filter chain presets
    Like in gzip and bzip2, lzma supports numbered presets from 1 to 9
    where 1 is the fastest and 9 the best compression. 1 and 2 are for
    fast compressing with small memory usage, 3 to 6 for good compression
    ratio with medium memory usage, and 7 to 9 for excellent compression
    ratio with higher memory requirements. The default is 7 if memory
    usage limit allows.
    In future, there will probably be an option like --preset=NAME, which
    will contain more special presets for specific file types.
    It's also possible that there will be some heuristics to select good
    filters. For example, the tool could detect when a .tar archive is
    being compressed, and enable x86 filter only for those files in the
    .tar archive that are ELF or PE executables for x86.
 Specifying custom filter chains
    Custom filter chains are specified by using long options with the name
    of the filters in correct order. For example, to pass the input data to
    the x86 filter and the output of that to the LZMA filter, the following
    command will do:
        lzma --x86 --lzma filename
    Some filters accept options, which are specified as a comma-separated
    list of key=value pairs:
        lzma --delta=distance=4 --lzma=dict=4Mi,lc=8,lp=2 filename
 Memory usage control
    By default, the command line tool limits memory usage to 1/3 of the
    available physical RAM. If no preset or custom filter chain has been
    given, the default preset will be used. If the memory limit is too
    low for the default preset, the tool will silently switch to lower
    preset.
    When a preset or a custom filter chain has been specified and the
    memory limit is too low, an error message is displayed and no files
    are processed.
    If the decoder hits the memory usage limit, an error is displayed and
    no more files are processed.