mirror of
https://git.tukaani.org/xz.git
synced 2024-04-04 12:36:23 +02:00
cb1f34988c
This does it only when ... appears outside macro calls. Thanks to Bjarni Ingi Gislason.
2805 lines
64 KiB
Groff
2805 lines
64 KiB
Groff
'\" t
|
|
.\"
|
|
.\" Author: Lasse Collin
|
|
.\"
|
|
.\" This file has been put into the public domain.
|
|
.\" You can do whatever you want with this file.
|
|
.\"
|
|
.TH XZ 1 "2020-11-01" "Tukaani" "XZ Utils"
|
|
.
|
|
.SH NAME
|
|
xz, unxz, xzcat, lzma, unlzma, lzcat \- Compress or decompress .xz and .lzma files
|
|
.
|
|
.SH SYNOPSIS
|
|
.B xz
|
|
.RI [ option... ]
|
|
.RI [ file... ]
|
|
.
|
|
.SH COMMAND ALIASES
|
|
.B unxz
|
|
is equivalent to
|
|
.BR "xz \-\-decompress" .
|
|
.br
|
|
.B xzcat
|
|
is equivalent to
|
|
.BR "xz \-\-decompress \-\-stdout" .
|
|
.br
|
|
.B lzma
|
|
is equivalent to
|
|
.BR "xz \-\-format=lzma" .
|
|
.br
|
|
.B unlzma
|
|
is equivalent to
|
|
.BR "xz \-\-format=lzma \-\-decompress" .
|
|
.br
|
|
.B lzcat
|
|
is equivalent to
|
|
.BR "xz \-\-format=lzma \-\-decompress \-\-stdout" .
|
|
.PP
|
|
When writing scripts that need to decompress files,
|
|
it is recommended to always use the name
|
|
.B xz
|
|
with appropriate arguments
|
|
.RB ( "xz \-d"
|
|
or
|
|
.BR "xz \-dc" )
|
|
instead of the names
|
|
.B unxz
|
|
and
|
|
.BR xzcat .
|
|
.
|
|
.SH DESCRIPTION
|
|
.B xz
|
|
is a general-purpose data compression tool with
|
|
command line syntax similar to
|
|
.BR gzip (1)
|
|
and
|
|
.BR bzip2 (1).
|
|
The native file format is the
|
|
.B .xz
|
|
format, but the legacy
|
|
.B .lzma
|
|
format used by LZMA Utils and
|
|
raw compressed streams with no container format headers
|
|
are also supported.
|
|
.PP
|
|
.B xz
|
|
compresses or decompresses each
|
|
.I file
|
|
according to the selected operation mode.
|
|
If no
|
|
.I files
|
|
are given or
|
|
.I file
|
|
is
|
|
.BR \- ,
|
|
.B xz
|
|
reads from standard input and writes the processed data
|
|
to standard output.
|
|
.B xz
|
|
will refuse (display an error and skip the
|
|
.IR file )
|
|
to write compressed data to standard output if it is a terminal.
|
|
Similarly,
|
|
.B xz
|
|
will refuse to read compressed data
|
|
from standard input if it is a terminal.
|
|
.PP
|
|
Unless
|
|
.B \-\-stdout
|
|
is specified,
|
|
.I files
|
|
other than
|
|
.B \-
|
|
are written to a new file whose name is derived from the source
|
|
.I file
|
|
name:
|
|
.IP \(bu 3
|
|
When compressing, the suffix of the target file format
|
|
.RB ( .xz
|
|
or
|
|
.BR .lzma )
|
|
is appended to the source filename to get the target filename.
|
|
.IP \(bu 3
|
|
When decompressing, the
|
|
.B .xz
|
|
or
|
|
.B .lzma
|
|
suffix is removed from the filename to get the target filename.
|
|
.B xz
|
|
also recognizes the suffixes
|
|
.B .txz
|
|
and
|
|
.BR .tlz ,
|
|
and replaces them with the
|
|
.B .tar
|
|
suffix.
|
|
.PP
|
|
If the target file already exists, an error is displayed and the
|
|
.I file
|
|
is skipped.
|
|
.PP
|
|
Unless writing to standard output,
|
|
.B xz
|
|
will display a warning and skip the
|
|
.I file
|
|
if any of the following applies:
|
|
.IP \(bu 3
|
|
.I File
|
|
is not a regular file.
|
|
Symbolic links are not followed,
|
|
and thus they are not considered to be regular files.
|
|
.IP \(bu 3
|
|
.I File
|
|
has more than one hard link.
|
|
.IP \(bu 3
|
|
.I File
|
|
has setuid, setgid, or sticky bit set.
|
|
.IP \(bu 3
|
|
The operation mode is set to compress and the
|
|
.I file
|
|
already has a suffix of the target file format
|
|
.RB ( .xz
|
|
or
|
|
.B .txz
|
|
when compressing to the
|
|
.B .xz
|
|
format, and
|
|
.B .lzma
|
|
or
|
|
.B .tlz
|
|
when compressing to the
|
|
.B .lzma
|
|
format).
|
|
.IP \(bu 3
|
|
The operation mode is set to decompress and the
|
|
.I file
|
|
doesn't have a suffix of any of the supported file formats
|
|
.RB ( .xz ,
|
|
.BR .txz ,
|
|
.BR .lzma ,
|
|
or
|
|
.BR .tlz ).
|
|
.PP
|
|
After successfully compressing or decompressing the
|
|
.IR file ,
|
|
.B xz
|
|
copies the owner, group, permissions, access time,
|
|
and modification time from the source
|
|
.I file
|
|
to the target file.
|
|
If copying the group fails, the permissions are modified
|
|
so that the target file doesn't become accessible to users
|
|
who didn't have permission to access the source
|
|
.IR file .
|
|
.B xz
|
|
doesn't support copying other metadata like access control lists
|
|
or extended attributes yet.
|
|
.PP
|
|
Once the target file has been successfully closed, the source
|
|
.I file
|
|
is removed unless
|
|
.B \-\-keep
|
|
was specified.
|
|
The source
|
|
.I file
|
|
is never removed if the output is written to standard output.
|
|
.PP
|
|
Sending
|
|
.B SIGINFO
|
|
or
|
|
.B SIGUSR1
|
|
to the
|
|
.B xz
|
|
process makes it print progress information to standard error.
|
|
This has only limited use since when standard error
|
|
is a terminal, using
|
|
.B \-\-verbose
|
|
will display an automatically updating progress indicator.
|
|
.
|
|
.SS "Memory usage"
|
|
The memory usage of
|
|
.B xz
|
|
varies from a few hundred kilobytes to several gigabytes
|
|
depending on the compression settings.
|
|
The settings used when compressing a file determine
|
|
the memory requirements of the decompressor.
|
|
Typically the decompressor needs 5\ % to 20\ % of
|
|
the amount of memory that the compressor needed when
|
|
creating the file.
|
|
For example, decompressing a file created with
|
|
.B xz \-9
|
|
currently requires 65\ MiB of memory.
|
|
Still, it is possible to have
|
|
.B .xz
|
|
files that require several gigabytes of memory to decompress.
|
|
.PP
|
|
Especially users of older systems may find
|
|
the possibility of very large memory usage annoying.
|
|
To prevent uncomfortable surprises,
|
|
.B xz
|
|
has a built-in memory usage limiter, which is disabled by default.
|
|
While some operating systems provide ways to limit
|
|
the memory usage of processes, relying on it
|
|
wasn't deemed to be flexible enough (for example, using
|
|
.BR ulimit (1)
|
|
to limit virtual memory tends to cripple
|
|
.BR mmap (2)).
|
|
.PP
|
|
The memory usage limiter can be enabled with
|
|
the command line option \fB\-\-memlimit=\fIlimit\fR.
|
|
Often it is more convenient to enable the limiter
|
|
by default by setting the environment variable
|
|
.BR XZ_DEFAULTS ,
|
|
for example,
|
|
.BR XZ_DEFAULTS=\-\-memlimit=150MiB .
|
|
It is possible to set the limits separately
|
|
for compression and decompression
|
|
by using \fB\-\-memlimit\-compress=\fIlimit\fR and
|
|
\fB\-\-memlimit\-decompress=\fIlimit\fR.
|
|
Using these two options outside
|
|
.B XZ_DEFAULTS
|
|
is rarely useful because a single run of
|
|
.B xz
|
|
cannot do both compression and decompression and
|
|
.BI \-\-memlimit= limit
|
|
(or \fB\-M\fR \fIlimit\fR)
|
|
is shorter to type on the command line.
|
|
.PP
|
|
If the specified memory usage limit is exceeded when decompressing,
|
|
.B xz
|
|
will display an error and decompressing the file will fail.
|
|
If the limit is exceeded when compressing,
|
|
.B xz
|
|
will try to scale the settings down so that the limit
|
|
is no longer exceeded (except when using \fB\-\-format=raw\fR
|
|
or \fB\-\-no\-adjust\fR).
|
|
This way the operation won't fail unless the limit is very small.
|
|
The scaling of the settings is done in steps that don't
|
|
match the compression level presets, for example, if the limit is
|
|
only slightly less than the amount required for
|
|
.BR "xz \-9" ,
|
|
the settings will be scaled down only a little,
|
|
not all the way down to
|
|
.BR "xz \-8" .
|
|
.
|
|
.SS "Concatenation and padding with .xz files"
|
|
It is possible to concatenate
|
|
.B .xz
|
|
files as is.
|
|
.B xz
|
|
will decompress such files as if they were a single
|
|
.B .xz
|
|
file.
|
|
.PP
|
|
It is possible to insert padding between the concatenated parts
|
|
or after the last part.
|
|
The padding must consist of null bytes and the size
|
|
of the padding must be a multiple of four bytes.
|
|
This can be useful, for example, if the
|
|
.B .xz
|
|
file is stored on a medium that measures file sizes
|
|
in 512-byte blocks.
|
|
.PP
|
|
Concatenation and padding are not allowed with
|
|
.B .lzma
|
|
files or raw streams.
|
|
.
|
|
.SH OPTIONS
|
|
.
|
|
.SS "Integer suffixes and special values"
|
|
In most places where an integer argument is expected,
|
|
an optional suffix is supported to easily indicate large integers.
|
|
There must be no space between the integer and the suffix.
|
|
.TP
|
|
.B KiB
|
|
Multiply the integer by 1,024 (2^10).
|
|
.BR Ki ,
|
|
.BR k ,
|
|
.BR kB ,
|
|
.BR K ,
|
|
and
|
|
.B KB
|
|
are accepted as synonyms for
|
|
.BR KiB .
|
|
.TP
|
|
.B MiB
|
|
Multiply the integer by 1,048,576 (2^20).
|
|
.BR Mi ,
|
|
.BR m ,
|
|
.BR M ,
|
|
and
|
|
.B MB
|
|
are accepted as synonyms for
|
|
.BR MiB .
|
|
.TP
|
|
.B GiB
|
|
Multiply the integer by 1,073,741,824 (2^30).
|
|
.BR Gi ,
|
|
.BR g ,
|
|
.BR G ,
|
|
and
|
|
.B GB
|
|
are accepted as synonyms for
|
|
.BR GiB .
|
|
.PP
|
|
The special value
|
|
.B max
|
|
can be used to indicate the maximum integer value
|
|
supported by the option.
|
|
.
|
|
.SS "Operation mode"
|
|
If multiple operation mode options are given,
|
|
the last one takes effect.
|
|
.TP
|
|
.BR \-z ", " \-\-compress
|
|
Compress.
|
|
This is the default operation mode when no operation mode option
|
|
is specified and no other operation mode is implied from
|
|
the command name (for example,
|
|
.B unxz
|
|
implies
|
|
.BR \-\-decompress ).
|
|
.TP
|
|
.BR \-d ", " \-\-decompress ", " \-\-uncompress
|
|
Decompress.
|
|
.TP
|
|
.BR \-t ", " \-\-test
|
|
Test the integrity of compressed
|
|
.IR files .
|
|
This option is equivalent to
|
|
.B "\-\-decompress \-\-stdout"
|
|
except that the decompressed data is discarded instead of being
|
|
written to standard output.
|
|
No files are created or removed.
|
|
.TP
|
|
.BR \-l ", " \-\-list
|
|
Print information about compressed
|
|
.IR files .
|
|
No uncompressed output is produced,
|
|
and no files are created or removed.
|
|
In list mode, the program cannot read
|
|
the compressed data from standard
|
|
input or from other unseekable sources.
|
|
.IP ""
|
|
The default listing shows basic information about
|
|
.IR files ,
|
|
one file per line.
|
|
To get more detailed information, use also the
|
|
.B \-\-verbose
|
|
option.
|
|
For even more information, use
|
|
.B \-\-verbose
|
|
twice, but note that this may be slow, because getting all the extra
|
|
information requires many seeks.
|
|
The width of verbose output exceeds
|
|
80 characters, so piping the output to, for example,
|
|
.B "less\ \-S"
|
|
may be convenient if the terminal isn't wide enough.
|
|
.IP ""
|
|
The exact output may vary between
|
|
.B xz
|
|
versions and different locales.
|
|
For machine-readable output,
|
|
.B \-\-robot \-\-list
|
|
should be used.
|
|
.
|
|
.SS "Operation modifiers"
|
|
.TP
|
|
.BR \-k ", " \-\-keep
|
|
Don't delete the input files.
|
|
.TP
|
|
.BR \-f ", " \-\-force
|
|
This option has several effects:
|
|
.RS
|
|
.IP \(bu 3
|
|
If the target file already exists,
|
|
delete it before compressing or decompressing.
|
|
.IP \(bu 3
|
|
Compress or decompress even if the input is
|
|
a symbolic link to a regular file,
|
|
has more than one hard link,
|
|
or has the setuid, setgid, or sticky bit set.
|
|
The setuid, setgid, and sticky bits are not copied
|
|
to the target file.
|
|
.IP \(bu 3
|
|
When used with
|
|
.B \-\-decompress
|
|
.B \-\-stdout
|
|
and
|
|
.B xz
|
|
cannot recognize the type of the source file,
|
|
copy the source file as is to standard output.
|
|
This allows
|
|
.B xzcat
|
|
.B \-\-force
|
|
to be used like
|
|
.BR cat (1)
|
|
for files that have not been compressed with
|
|
.BR xz .
|
|
Note that in future,
|
|
.B xz
|
|
might support new compressed file formats, which may make
|
|
.B xz
|
|
decompress more types of files instead of copying them as is to
|
|
standard output.
|
|
.BI \-\-format= format
|
|
can be used to restrict
|
|
.B xz
|
|
to decompress only a single file format.
|
|
.RE
|
|
.TP
|
|
.BR \-c ", " \-\-stdout ", " \-\-to\-stdout
|
|
Write the compressed or decompressed data to
|
|
standard output instead of a file.
|
|
This implies
|
|
.BR \-\-keep .
|
|
.TP
|
|
.B \-\-single\-stream
|
|
Decompress only the first
|
|
.B .xz
|
|
stream, and
|
|
silently ignore possible remaining input data following the stream.
|
|
Normally such trailing garbage makes
|
|
.B xz
|
|
display an error.
|
|
.IP ""
|
|
.B xz
|
|
never decompresses more than one stream from
|
|
.B .lzma
|
|
files or raw streams, but this option still makes
|
|
.B xz
|
|
ignore the possible trailing data after the
|
|
.B .lzma
|
|
file or raw stream.
|
|
.IP ""
|
|
This option has no effect if the operation mode is not
|
|
.B \-\-decompress
|
|
or
|
|
.BR \-\-test .
|
|
.TP
|
|
.B \-\-no\-sparse
|
|
Disable creation of sparse files.
|
|
By default, if decompressing into a regular file,
|
|
.B xz
|
|
tries to make the file sparse if the decompressed data contains
|
|
long sequences of binary zeros.
|
|
It also works when writing to standard output
|
|
as long as standard output is connected to a regular file
|
|
and certain additional conditions are met to make it safe.
|
|
Creating sparse files may save disk space and speed up
|
|
the decompression by reducing the amount of disk I/O.
|
|
.TP
|
|
\fB\-S\fR \fI.suf\fR, \fB\-\-suffix=\fI.suf
|
|
When compressing, use
|
|
.I .suf
|
|
as the suffix for the target file instead of
|
|
.B .xz
|
|
or
|
|
.BR .lzma .
|
|
If not writing to standard output and
|
|
the source file already has the suffix
|
|
.IR .suf ,
|
|
a warning is displayed and the file is skipped.
|
|
.IP ""
|
|
When decompressing, recognize files with the suffix
|
|
.I .suf
|
|
in addition to files with the
|
|
.BR .xz ,
|
|
.BR .txz ,
|
|
.BR .lzma ,
|
|
or
|
|
.B .tlz
|
|
suffix.
|
|
If the source file has the suffix
|
|
.IR .suf ,
|
|
the suffix is removed to get the target filename.
|
|
.IP ""
|
|
When compressing or decompressing raw streams
|
|
.RB ( \-\-format=raw ),
|
|
the suffix must always be specified unless
|
|
writing to standard output,
|
|
because there is no default suffix for raw streams.
|
|
.TP
|
|
\fB\-\-files\fR[\fB=\fIfile\fR]
|
|
Read the filenames to process from
|
|
.IR file ;
|
|
if
|
|
.I file
|
|
is omitted, filenames are read from standard input.
|
|
Filenames must be terminated with the newline character.
|
|
A dash
|
|
.RB ( \- )
|
|
is taken as a regular filename; it doesn't mean standard input.
|
|
If filenames are given also as command line arguments, they are
|
|
processed before the filenames read from
|
|
.IR file .
|
|
.TP
|
|
\fB\-\-files0\fR[\fB=\fIfile\fR]
|
|
This is identical to \fB\-\-files\fR[\fB=\fIfile\fR] except
|
|
that each filename must be terminated with the null character.
|
|
.
|
|
.SS "Basic file format and compression options"
|
|
.TP
|
|
\fB\-F\fR \fIformat\fR, \fB\-\-format=\fIformat
|
|
Specify the file
|
|
.I format
|
|
to compress or decompress:
|
|
.RS
|
|
.TP
|
|
.B auto
|
|
This is the default.
|
|
When compressing,
|
|
.B auto
|
|
is equivalent to
|
|
.BR xz .
|
|
When decompressing,
|
|
the format of the input file is automatically detected.
|
|
Note that raw streams (created with
|
|
.BR \-\-format=raw )
|
|
cannot be auto-detected.
|
|
.TP
|
|
.B xz
|
|
Compress to the
|
|
.B .xz
|
|
file format, or accept only
|
|
.B .xz
|
|
files when decompressing.
|
|
.TP
|
|
.BR lzma ", " alone
|
|
Compress to the legacy
|
|
.B .lzma
|
|
file format, or accept only
|
|
.B .lzma
|
|
files when decompressing.
|
|
The alternative name
|
|
.B alone
|
|
is provided for backwards compatibility with LZMA Utils.
|
|
.TP
|
|
.B raw
|
|
Compress or uncompress a raw stream (no headers).
|
|
This is meant for advanced users only.
|
|
To decode raw streams, you need use
|
|
.B \-\-format=raw
|
|
and explicitly specify the filter chain,
|
|
which normally would have been stored in the container headers.
|
|
.RE
|
|
.TP
|
|
\fB\-C\fR \fIcheck\fR, \fB\-\-check=\fIcheck
|
|
Specify the type of the integrity check.
|
|
The check is calculated from the uncompressed data and
|
|
stored in the
|
|
.B .xz
|
|
file.
|
|
This option has an effect only when compressing into the
|
|
.B .xz
|
|
format; the
|
|
.B .lzma
|
|
format doesn't support integrity checks.
|
|
The integrity check (if any) is verified when the
|
|
.B .xz
|
|
file is decompressed.
|
|
.IP ""
|
|
Supported
|
|
.I check
|
|
types:
|
|
.RS
|
|
.TP
|
|
.B none
|
|
Don't calculate an integrity check at all.
|
|
This is usually a bad idea.
|
|
This can be useful when integrity of the data is verified
|
|
by other means anyway.
|
|
.TP
|
|
.B crc32
|
|
Calculate CRC32 using the polynomial from IEEE-802.3 (Ethernet).
|
|
.TP
|
|
.B crc64
|
|
Calculate CRC64 using the polynomial from ECMA-182.
|
|
This is the default, since it is slightly better than CRC32
|
|
at detecting damaged files and the speed difference is negligible.
|
|
.TP
|
|
.B sha256
|
|
Calculate SHA-256.
|
|
This is somewhat slower than CRC32 and CRC64.
|
|
.RE
|
|
.IP ""
|
|
Integrity of the
|
|
.B .xz
|
|
headers is always verified with CRC32.
|
|
It is not possible to change or disable it.
|
|
.TP
|
|
.B \-\-ignore\-check
|
|
Don't verify the integrity check of the compressed data when decompressing.
|
|
The CRC32 values in the
|
|
.B .xz
|
|
headers will still be verified normally.
|
|
.IP ""
|
|
.B "Do not use this option unless you know what you are doing."
|
|
Possible reasons to use this option:
|
|
.RS
|
|
.IP \(bu 3
|
|
Trying to recover data from a corrupt .xz file.
|
|
.IP \(bu 3
|
|
Speeding up decompression.
|
|
This matters mostly with SHA-256 or
|
|
with files that have compressed extremely well.
|
|
It's recommended to not use this option for this purpose
|
|
unless the file integrity is verified externally in some other way.
|
|
.RE
|
|
.TP
|
|
.BR \-0 " ... " \-9
|
|
Select a compression preset level.
|
|
The default is
|
|
.BR \-6 .
|
|
If multiple preset levels are specified,
|
|
the last one takes effect.
|
|
If a custom filter chain was already specified, setting
|
|
a compression preset level clears the custom filter chain.
|
|
.IP ""
|
|
The differences between the presets are more significant than with
|
|
.BR gzip (1)
|
|
and
|
|
.BR bzip2 (1).
|
|
The selected compression settings determine
|
|
the memory requirements of the decompressor,
|
|
thus using a too high preset level might make it painful
|
|
to decompress the file on an old system with little RAM.
|
|
Specifically,
|
|
.B "it's not a good idea to blindly use \-9 for everything"
|
|
like it often is with
|
|
.BR gzip (1)
|
|
and
|
|
.BR bzip2 (1).
|
|
.RS
|
|
.TP
|
|
.BR "\-0" " ... " "\-3"
|
|
These are somewhat fast presets.
|
|
.B \-0
|
|
is sometimes faster than
|
|
.B "gzip \-9"
|
|
while compressing much better.
|
|
The higher ones often have speed comparable to
|
|
.BR bzip2 (1)
|
|
with comparable or better compression ratio,
|
|
although the results
|
|
depend a lot on the type of data being compressed.
|
|
.TP
|
|
.BR "\-4" " ... " "\-6"
|
|
Good to very good compression while keeping
|
|
decompressor memory usage reasonable even for old systems.
|
|
.B \-6
|
|
is the default, which is usually a good choice
|
|
for distributing files that need to be decompressible
|
|
even on systems with only 16\ MiB RAM.
|
|
.RB ( \-5e
|
|
or
|
|
.B \-6e
|
|
may be worth considering too.
|
|
See
|
|
.BR \-\-extreme .)
|
|
.TP
|
|
.B "\-7 ... \-9"
|
|
These are like
|
|
.B \-6
|
|
but with higher compressor and decompressor memory requirements.
|
|
These are useful only when compressing files bigger than
|
|
8\ MiB, 16\ MiB, and 32\ MiB, respectively.
|
|
.RE
|
|
.IP ""
|
|
On the same hardware, the decompression speed is approximately
|
|
a constant number of bytes of compressed data per second.
|
|
In other words, the better the compression,
|
|
the faster the decompression will usually be.
|
|
This also means that the amount of uncompressed output
|
|
produced per second can vary a lot.
|
|
.IP ""
|
|
The following table summarises the features of the presets:
|
|
.RS
|
|
.RS
|
|
.PP
|
|
.TS
|
|
tab(;);
|
|
c c c c c
|
|
n n n n n.
|
|
Preset;DictSize;CompCPU;CompMem;DecMem
|
|
\-0;256 KiB;0;3 MiB;1 MiB
|
|
\-1;1 MiB;1;9 MiB;2 MiB
|
|
\-2;2 MiB;2;17 MiB;3 MiB
|
|
\-3;4 MiB;3;32 MiB;5 MiB
|
|
\-4;4 MiB;4;48 MiB;5 MiB
|
|
\-5;8 MiB;5;94 MiB;9 MiB
|
|
\-6;8 MiB;6;94 MiB;9 MiB
|
|
\-7;16 MiB;6;186 MiB;17 MiB
|
|
\-8;32 MiB;6;370 MiB;33 MiB
|
|
\-9;64 MiB;6;674 MiB;65 MiB
|
|
.TE
|
|
.RE
|
|
.RE
|
|
.IP ""
|
|
Column descriptions:
|
|
.RS
|
|
.IP \(bu 3
|
|
DictSize is the LZMA2 dictionary size.
|
|
It is waste of memory to use a dictionary bigger than
|
|
the size of the uncompressed file.
|
|
This is why it is good to avoid using the presets
|
|
.BR \-7 " ... " \-9
|
|
when there's no real need for them.
|
|
At
|
|
.B \-6
|
|
and lower, the amount of memory wasted is
|
|
usually low enough to not matter.
|
|
.IP \(bu 3
|
|
CompCPU is a simplified representation of the LZMA2 settings
|
|
that affect compression speed.
|
|
The dictionary size affects speed too,
|
|
so while CompCPU is the same for levels
|
|
.BR \-6 " ... " \-9 ,
|
|
higher levels still tend to be a little slower.
|
|
To get even slower and thus possibly better compression, see
|
|
.BR \-\-extreme .
|
|
.IP \(bu 3
|
|
CompMem contains the compressor memory requirements
|
|
in the single-threaded mode.
|
|
It may vary slightly between
|
|
.B xz
|
|
versions.
|
|
Memory requirements of some of the future multithreaded modes may
|
|
be dramatically higher than that of the single-threaded mode.
|
|
.IP \(bu 3
|
|
DecMem contains the decompressor memory requirements.
|
|
That is, the compression settings determine
|
|
the memory requirements of the decompressor.
|
|
The exact decompressor memory usage is slightly more than
|
|
the LZMA2 dictionary size, but the values in the table
|
|
have been rounded up to the next full MiB.
|
|
.RE
|
|
.TP
|
|
.BR \-e ", " \-\-extreme
|
|
Use a slower variant of the selected compression preset level
|
|
.RB ( \-0 " ... " \-9 )
|
|
to hopefully get a little bit better compression ratio,
|
|
but with bad luck this can also make it worse.
|
|
Decompressor memory usage is not affected,
|
|
but compressor memory usage increases a little at preset levels
|
|
.BR \-0 " ... " \-3 .
|
|
.IP ""
|
|
Since there are two presets with dictionary sizes
|
|
4\ MiB and 8\ MiB, the presets
|
|
.B \-3e
|
|
and
|
|
.B \-5e
|
|
use slightly faster settings (lower CompCPU) than
|
|
.B \-4e
|
|
and
|
|
.BR \-6e ,
|
|
respectively.
|
|
That way no two presets are identical.
|
|
.RS
|
|
.RS
|
|
.PP
|
|
.TS
|
|
tab(;);
|
|
c c c c c
|
|
n n n n n.
|
|
Preset;DictSize;CompCPU;CompMem;DecMem
|
|
\-0e;256 KiB;8;4 MiB;1 MiB
|
|
\-1e;1 MiB;8;13 MiB;2 MiB
|
|
\-2e;2 MiB;8;25 MiB;3 MiB
|
|
\-3e;4 MiB;7;48 MiB;5 MiB
|
|
\-4e;4 MiB;8;48 MiB;5 MiB
|
|
\-5e;8 MiB;7;94 MiB;9 MiB
|
|
\-6e;8 MiB;8;94 MiB;9 MiB
|
|
\-7e;16 MiB;8;186 MiB;17 MiB
|
|
\-8e;32 MiB;8;370 MiB;33 MiB
|
|
\-9e;64 MiB;8;674 MiB;65 MiB
|
|
.TE
|
|
.RE
|
|
.RE
|
|
.IP ""
|
|
For example, there are a total of four presets that use
|
|
8\ MiB dictionary, whose order from the fastest to the slowest is
|
|
.BR \-5 ,
|
|
.BR \-6 ,
|
|
.BR \-5e ,
|
|
and
|
|
.BR \-6e .
|
|
.TP
|
|
.B \-\-fast
|
|
.PD 0
|
|
.TP
|
|
.B \-\-best
|
|
.PD
|
|
These are somewhat misleading aliases for
|
|
.B \-0
|
|
and
|
|
.BR \-9 ,
|
|
respectively.
|
|
These are provided only for backwards compatibility
|
|
with LZMA Utils.
|
|
Avoid using these options.
|
|
.TP
|
|
.BI \-\-block\-size= size
|
|
When compressing to the
|
|
.B .xz
|
|
format, split the input data into blocks of
|
|
.I size
|
|
bytes.
|
|
The blocks are compressed independently from each other,
|
|
which helps with multi-threading and
|
|
makes limited random-access decompression possible.
|
|
This option is typically used to override the default
|
|
block size in multi-threaded mode,
|
|
but this option can be used in single-threaded mode too.
|
|
.IP ""
|
|
In multi-threaded mode about three times
|
|
.I size
|
|
bytes will be allocated in each thread for buffering input and output.
|
|
The default
|
|
.I size
|
|
is three times the LZMA2 dictionary size or 1 MiB,
|
|
whichever is more.
|
|
Typically a good value is 2\(en4 times
|
|
the size of the LZMA2 dictionary or at least 1 MiB.
|
|
Using
|
|
.I size
|
|
less than the LZMA2 dictionary size is waste of RAM
|
|
because then the LZMA2 dictionary buffer will never get fully used.
|
|
The sizes of the blocks are stored in the block headers,
|
|
which a future version of
|
|
.B xz
|
|
will use for multi-threaded decompression.
|
|
.IP ""
|
|
In single-threaded mode no block splitting is done by default.
|
|
Setting this option doesn't affect memory usage.
|
|
No size information is stored in block headers,
|
|
thus files created in single-threaded mode
|
|
won't be identical to files created in multi-threaded mode.
|
|
The lack of size information also means that a future version of
|
|
.B xz
|
|
won't be able decompress the files in multi-threaded mode.
|
|
.TP
|
|
.BI \-\-block\-list= sizes
|
|
When compressing to the
|
|
.B .xz
|
|
format, start a new block after
|
|
the given intervals of uncompressed data.
|
|
.IP ""
|
|
The uncompressed
|
|
.I sizes
|
|
of the blocks are specified as a comma-separated list.
|
|
Omitting a size (two or more consecutive commas) is a shorthand
|
|
to use the size of the previous block.
|
|
.IP ""
|
|
If the input file is bigger than the sum of
|
|
.IR sizes ,
|
|
the last value in
|
|
.I sizes
|
|
is repeated until the end of the file.
|
|
A special value of
|
|
.B 0
|
|
may be used as the last value to indicate that
|
|
the rest of the file should be encoded as a single block.
|
|
.IP ""
|
|
If one specifies
|
|
.I sizes
|
|
that exceed the encoder's block size
|
|
(either the default value in threaded mode or
|
|
the value specified with \fB\-\-block\-size=\fIsize\fR),
|
|
the encoder will create additional blocks while
|
|
keeping the boundaries specified in
|
|
.IR sizes .
|
|
For example, if one specifies
|
|
.B \-\-block\-size=10MiB
|
|
.B \-\-block\-list=5MiB,10MiB,8MiB,12MiB,24MiB
|
|
and the input file is 80 MiB,
|
|
one will get 11 blocks:
|
|
5, 10, 8, 10, 2, 10, 10, 4, 10, 10, and 1 MiB.
|
|
.IP ""
|
|
In multi-threaded mode the sizes of the blocks
|
|
are stored in the block headers.
|
|
This isn't done in single-threaded mode,
|
|
so the encoded output won't be
|
|
identical to that of the multi-threaded mode.
|
|
.TP
|
|
.BI \-\-flush\-timeout= timeout
|
|
When compressing, if more than
|
|
.I timeout
|
|
milliseconds (a positive integer) has passed since the previous flush and
|
|
reading more input would block,
|
|
all the pending input data is flushed from the encoder and
|
|
made available in the output stream.
|
|
This can be useful if
|
|
.B xz
|
|
is used to compress data that is streamed over a network.
|
|
Small
|
|
.I timeout
|
|
values make the data available at the receiving end
|
|
with a small delay, but large
|
|
.I timeout
|
|
values give better compression ratio.
|
|
.IP ""
|
|
This feature is disabled by default.
|
|
If this option is specified more than once, the last one takes effect.
|
|
The special
|
|
.I timeout
|
|
value of
|
|
.B 0
|
|
can be used to explicitly disable this feature.
|
|
.IP ""
|
|
This feature is not available on non-POSIX systems.
|
|
.IP ""
|
|
.\" FIXME
|
|
.B "This feature is still experimental."
|
|
Currently
|
|
.B xz
|
|
is unsuitable for decompressing the stream in real time due to how
|
|
.B xz
|
|
does buffering.
|
|
.TP
|
|
.BI \-\-memlimit\-compress= limit
|
|
Set a memory usage limit for compression.
|
|
If this option is specified multiple times,
|
|
the last one takes effect.
|
|
.IP ""
|
|
If the compression settings exceed the
|
|
.IR limit ,
|
|
.B xz
|
|
will adjust the settings downwards so that
|
|
the limit is no longer exceeded and display a notice that
|
|
automatic adjustment was done.
|
|
Such adjustments are not made when compressing with
|
|
.B \-\-format=raw
|
|
or if
|
|
.B \-\-no\-adjust
|
|
has been specified.
|
|
In those cases, an error is displayed and
|
|
.B xz
|
|
will exit with exit status 1.
|
|
.IP ""
|
|
The
|
|
.I limit
|
|
can be specified in multiple ways:
|
|
.RS
|
|
.IP \(bu 3
|
|
The
|
|
.I limit
|
|
can be an absolute value in bytes.
|
|
Using an integer suffix like
|
|
.B MiB
|
|
can be useful.
|
|
Example:
|
|
.B "\-\-memlimit\-compress=80MiB"
|
|
.IP \(bu 3
|
|
The
|
|
.I limit
|
|
can be specified as a percentage of total physical memory (RAM).
|
|
This can be useful especially when setting the
|
|
.B XZ_DEFAULTS
|
|
environment variable in a shell initialization script
|
|
that is shared between different computers.
|
|
That way the limit is automatically bigger
|
|
on systems with more memory.
|
|
Example:
|
|
.B "\-\-memlimit\-compress=70%"
|
|
.IP \(bu 3
|
|
The
|
|
.I limit
|
|
can be reset back to its default value by setting it to
|
|
.BR 0 .
|
|
This is currently equivalent to setting the
|
|
.I limit
|
|
to
|
|
.B max
|
|
(no memory usage limit).
|
|
Once multithreading support has been implemented,
|
|
there may be a difference between
|
|
.B 0
|
|
and
|
|
.B max
|
|
for the multithreaded case, so it is recommended to use
|
|
.B 0
|
|
instead of
|
|
.B max
|
|
until the details have been decided.
|
|
.RE
|
|
.IP ""
|
|
For 32-bit
|
|
.B xz
|
|
there is a special case: if the
|
|
.I limit
|
|
would be over
|
|
.BR "4020\ MiB" ,
|
|
the
|
|
.I limit
|
|
is set to
|
|
.BR "4020\ MiB" .
|
|
(The values
|
|
.B 0
|
|
and
|
|
.B max
|
|
aren't affected by this.
|
|
A similar feature doesn't exist for decompression.)
|
|
This can be helpful when a 32-bit executable has access
|
|
to 4\ GiB address space while hopefully doing no harm in other situations.
|
|
.IP ""
|
|
See also the section
|
|
.BR "Memory usage" .
|
|
.TP
|
|
.BI \-\-memlimit\-decompress= limit
|
|
Set a memory usage limit for decompression.
|
|
This also affects the
|
|
.B \-\-list
|
|
mode.
|
|
If the operation is not possible without exceeding the
|
|
.IR limit ,
|
|
.B xz
|
|
will display an error and decompressing the file will fail.
|
|
See
|
|
.BI \-\-memlimit\-compress= limit
|
|
for possible ways to specify the
|
|
.IR limit .
|
|
.TP
|
|
\fB\-M\fR \fIlimit\fR, \fB\-\-memlimit=\fIlimit\fR, \fB\-\-memory=\fIlimit
|
|
This is equivalent to specifying \fB\-\-memlimit\-compress=\fIlimit
|
|
\fB\-\-memlimit\-decompress=\fIlimit\fR.
|
|
.TP
|
|
.B \-\-no\-adjust
|
|
Display an error and exit if the compression settings exceed
|
|
the memory usage limit.
|
|
The default is to adjust the settings downwards so
|
|
that the memory usage limit is not exceeded.
|
|
Automatic adjusting is always disabled when creating raw streams
|
|
.RB ( \-\-format=raw ).
|
|
.TP
|
|
\fB\-T\fR \fIthreads\fR, \fB\-\-threads=\fIthreads
|
|
Specify the number of worker threads to use.
|
|
Setting
|
|
.I threads
|
|
to a special value
|
|
.B 0
|
|
makes
|
|
.B xz
|
|
use as many threads as there are CPU cores on the system.
|
|
The actual number of threads can be less than
|
|
.I threads
|
|
if the input file is not big enough
|
|
for threading with the given settings or
|
|
if using more threads would exceed the memory usage limit.
|
|
.IP ""
|
|
Currently the only threading method is to split the input into
|
|
blocks and compress them independently from each other.
|
|
The default block size depends on the compression level and
|
|
can be overridden with the
|
|
.BI \-\-block\-size= size
|
|
option.
|
|
.IP ""
|
|
Threaded decompression hasn't been implemented yet.
|
|
It will only work on files that contain multiple blocks
|
|
with size information in block headers.
|
|
All files compressed in multi-threaded mode meet this condition,
|
|
but files compressed in single-threaded mode don't even if
|
|
.BI \-\-block\-size= size
|
|
is used.
|
|
.
|
|
.SS "Custom compressor filter chains"
|
|
A custom filter chain allows specifying
|
|
the compression settings in detail instead of relying on
|
|
the settings associated to the presets.
|
|
When a custom filter chain is specified,
|
|
preset options (\fB\-0\fR \&...\& \fB\-9\fR and \fB\-\-extreme\fR)
|
|
earlier on the command line are forgotten.
|
|
If a preset option is specified
|
|
after one or more custom filter chain options,
|
|
the new preset takes effect and
|
|
the custom filter chain options specified earlier are forgotten.
|
|
.PP
|
|
A filter chain is comparable to piping on the command line.
|
|
When compressing, the uncompressed input goes to the first filter,
|
|
whose output goes to the next filter (if any).
|
|
The output of the last filter gets written to the compressed file.
|
|
The maximum number of filters in the chain is four,
|
|
but typically a filter chain has only one or two filters.
|
|
.PP
|
|
Many filters have limitations on where they can be
|
|
in the filter chain:
|
|
some filters can work only as the last filter in the chain,
|
|
some only as a non-last filter, and some work in any position
|
|
in the chain.
|
|
Depending on the filter, this limitation is either inherent to
|
|
the filter design or exists to prevent security issues.
|
|
.PP
|
|
A custom filter chain is specified by using one or more
|
|
filter options in the order they are wanted in the filter chain.
|
|
That is, the order of filter options is significant!
|
|
When decoding raw streams
|
|
.RB ( \-\-format=raw ),
|
|
the filter chain is specified in the same order as
|
|
it was specified when compressing.
|
|
.PP
|
|
Filters take filter-specific
|
|
.I options
|
|
as a comma-separated list.
|
|
Extra commas in
|
|
.I options
|
|
are ignored.
|
|
Every option has a default value, so you need to
|
|
specify only those you want to change.
|
|
.PP
|
|
To see the whole filter chain and
|
|
.IR options ,
|
|
use
|
|
.B "xz \-vv"
|
|
(that is, use
|
|
.B \-\-verbose
|
|
twice).
|
|
This works also for viewing the filter chain options used by presets.
|
|
.TP
|
|
\fB\-\-lzma1\fR[\fB=\fIoptions\fR]
|
|
.PD 0
|
|
.TP
|
|
\fB\-\-lzma2\fR[\fB=\fIoptions\fR]
|
|
.PD
|
|
Add LZMA1 or LZMA2 filter to the filter chain.
|
|
These filters can be used only as the last filter in the chain.
|
|
.IP ""
|
|
LZMA1 is a legacy filter,
|
|
which is supported almost solely due to the legacy
|
|
.B .lzma
|
|
file format, which supports only LZMA1.
|
|
LZMA2 is an updated
|
|
version of LZMA1 to fix some practical issues of LZMA1.
|
|
The
|
|
.B .xz
|
|
format uses LZMA2 and doesn't support LZMA1 at all.
|
|
Compression speed and ratios of LZMA1 and LZMA2
|
|
are practically the same.
|
|
.IP ""
|
|
LZMA1 and LZMA2 share the same set of
|
|
.IR options :
|
|
.RS
|
|
.TP
|
|
.BI preset= preset
|
|
Reset all LZMA1 or LZMA2
|
|
.I options
|
|
to
|
|
.IR preset .
|
|
.I Preset
|
|
consist of an integer, which may be followed by single-letter
|
|
preset modifiers.
|
|
The integer can be from
|
|
.B 0
|
|
to
|
|
.BR 9 ,
|
|
matching the command line options \fB\-0\fR \&...\& \fB\-9\fR.
|
|
The only supported modifier is currently
|
|
.BR e ,
|
|
which matches
|
|
.BR \-\-extreme .
|
|
If no
|
|
.B preset
|
|
is specified, the default values of LZMA1 or LZMA2
|
|
.I options
|
|
are taken from the preset
|
|
.BR 6 .
|
|
.TP
|
|
.BI dict= size
|
|
Dictionary (history buffer)
|
|
.I size
|
|
indicates how many bytes of the recently processed
|
|
uncompressed data is kept in memory.
|
|
The algorithm tries to find repeating byte sequences (matches) in
|
|
the uncompressed data, and replace them with references
|
|
to the data currently in the dictionary.
|
|
The bigger the dictionary, the higher is the chance
|
|
to find a match.
|
|
Thus, increasing dictionary
|
|
.I size
|
|
usually improves compression ratio, but
|
|
a dictionary bigger than the uncompressed file is waste of memory.
|
|
.IP ""
|
|
Typical dictionary
|
|
.I size
|
|
is from 64\ KiB to 64\ MiB.
|
|
The minimum is 4\ KiB.
|
|
The maximum for compression is currently 1.5\ GiB (1536\ MiB).
|
|
The decompressor already supports dictionaries up to
|
|
one byte less than 4\ GiB, which is the maximum for
|
|
the LZMA1 and LZMA2 stream formats.
|
|
.IP ""
|
|
Dictionary
|
|
.I size
|
|
and match finder
|
|
.RI ( mf )
|
|
together determine the memory usage of the LZMA1 or LZMA2 encoder.
|
|
The same (or bigger) dictionary
|
|
.I size
|
|
is required for decompressing that was used when compressing,
|
|
thus the memory usage of the decoder is determined
|
|
by the dictionary size used when compressing.
|
|
The
|
|
.B .xz
|
|
headers store the dictionary
|
|
.I size
|
|
either as
|
|
.RI "2^" n
|
|
or
|
|
.RI "2^" n " + 2^(" n "\-1),"
|
|
so these
|
|
.I sizes
|
|
are somewhat preferred for compression.
|
|
Other
|
|
.I sizes
|
|
will get rounded up when stored in the
|
|
.B .xz
|
|
headers.
|
|
.TP
|
|
.BI lc= lc
|
|
Specify the number of literal context bits.
|
|
The minimum is 0 and the maximum is 4; the default is 3.
|
|
In addition, the sum of
|
|
.I lc
|
|
and
|
|
.I lp
|
|
must not exceed 4.
|
|
.IP ""
|
|
All bytes that cannot be encoded as matches
|
|
are encoded as literals.
|
|
That is, literals are simply 8-bit bytes
|
|
that are encoded one at a time.
|
|
.IP ""
|
|
The literal coding makes an assumption that the highest
|
|
.I lc
|
|
bits of the previous uncompressed byte correlate
|
|
with the next byte.
|
|
For example, in typical English text, an upper-case letter is
|
|
often followed by a lower-case letter, and a lower-case
|
|
letter is usually followed by another lower-case letter.
|
|
In the US-ASCII character set, the highest three bits are 010
|
|
for upper-case letters and 011 for lower-case letters.
|
|
When
|
|
.I lc
|
|
is at least 3, the literal coding can take advantage of
|
|
this property in the uncompressed data.
|
|
.IP ""
|
|
The default value (3) is usually good.
|
|
If you want maximum compression, test
|
|
.BR lc=4 .
|
|
Sometimes it helps a little, and
|
|
sometimes it makes compression worse.
|
|
If it makes it worse, test
|
|
.B lc=2
|
|
too.
|
|
.TP
|
|
.BI lp= lp
|
|
Specify the number of literal position bits.
|
|
The minimum is 0 and the maximum is 4; the default is 0.
|
|
.IP ""
|
|
.I Lp
|
|
affects what kind of alignment in the uncompressed data is
|
|
assumed when encoding literals.
|
|
See
|
|
.I pb
|
|
below for more information about alignment.
|
|
.TP
|
|
.BI pb= pb
|
|
Specify the number of position bits.
|
|
The minimum is 0 and the maximum is 4; the default is 2.
|
|
.IP ""
|
|
.I Pb
|
|
affects what kind of alignment in the uncompressed data is
|
|
assumed in general.
|
|
The default means four-byte alignment
|
|
.RI (2^ pb =2^2=4),
|
|
which is often a good choice when there's no better guess.
|
|
.IP ""
|
|
When the alignment is known, setting
|
|
.I pb
|
|
accordingly may reduce the file size a little.
|
|
For example, with text files having one-byte
|
|
alignment (US-ASCII, ISO-8859-*, UTF-8), setting
|
|
.B pb=0
|
|
can improve compression slightly.
|
|
For UTF-16 text,
|
|
.B pb=1
|
|
is a good choice.
|
|
If the alignment is an odd number like 3 bytes,
|
|
.B pb=0
|
|
might be the best choice.
|
|
.IP ""
|
|
Even though the assumed alignment can be adjusted with
|
|
.I pb
|
|
and
|
|
.IR lp ,
|
|
LZMA1 and LZMA2 still slightly favor 16-byte alignment.
|
|
It might be worth taking into account when designing file formats
|
|
that are likely to be often compressed with LZMA1 or LZMA2.
|
|
.TP
|
|
.BI mf= mf
|
|
Match finder has a major effect on encoder speed,
|
|
memory usage, and compression ratio.
|
|
Usually Hash Chain match finders are faster than Binary Tree
|
|
match finders.
|
|
The default depends on the
|
|
.IR preset :
|
|
0 uses
|
|
.BR hc3 ,
|
|
1\(en3
|
|
use
|
|
.BR hc4 ,
|
|
and the rest use
|
|
.BR bt4 .
|
|
.IP ""
|
|
The following match finders are supported.
|
|
The memory usage formulas below are rough approximations,
|
|
which are closest to the reality when
|
|
.I dict
|
|
is a power of two.
|
|
.RS
|
|
.TP
|
|
.B hc3
|
|
Hash Chain with 2- and 3-byte hashing
|
|
.br
|
|
Minimum value for
|
|
.IR nice :
|
|
3
|
|
.br
|
|
Memory usage:
|
|
.br
|
|
.I dict
|
|
* 7.5 (if
|
|
.I dict
|
|
<= 16 MiB);
|
|
.br
|
|
.I dict
|
|
* 5.5 + 64 MiB (if
|
|
.I dict
|
|
> 16 MiB)
|
|
.TP
|
|
.B hc4
|
|
Hash Chain with 2-, 3-, and 4-byte hashing
|
|
.br
|
|
Minimum value for
|
|
.IR nice :
|
|
4
|
|
.br
|
|
Memory usage:
|
|
.br
|
|
.I dict
|
|
* 7.5 (if
|
|
.I dict
|
|
<= 32 MiB);
|
|
.br
|
|
.I dict
|
|
* 6.5 (if
|
|
.I dict
|
|
> 32 MiB)
|
|
.TP
|
|
.B bt2
|
|
Binary Tree with 2-byte hashing
|
|
.br
|
|
Minimum value for
|
|
.IR nice :
|
|
2
|
|
.br
|
|
Memory usage:
|
|
.I dict
|
|
* 9.5
|
|
.TP
|
|
.B bt3
|
|
Binary Tree with 2- and 3-byte hashing
|
|
.br
|
|
Minimum value for
|
|
.IR nice :
|
|
3
|
|
.br
|
|
Memory usage:
|
|
.br
|
|
.I dict
|
|
* 11.5 (if
|
|
.I dict
|
|
<= 16 MiB);
|
|
.br
|
|
.I dict
|
|
* 9.5 + 64 MiB (if
|
|
.I dict
|
|
> 16 MiB)
|
|
.TP
|
|
.B bt4
|
|
Binary Tree with 2-, 3-, and 4-byte hashing
|
|
.br
|
|
Minimum value for
|
|
.IR nice :
|
|
4
|
|
.br
|
|
Memory usage:
|
|
.br
|
|
.I dict
|
|
* 11.5 (if
|
|
.I dict
|
|
<= 32 MiB);
|
|
.br
|
|
.I dict
|
|
* 10.5 (if
|
|
.I dict
|
|
> 32 MiB)
|
|
.RE
|
|
.TP
|
|
.BI mode= mode
|
|
Compression
|
|
.I mode
|
|
specifies the method to analyze
|
|
the data produced by the match finder.
|
|
Supported
|
|
.I modes
|
|
are
|
|
.B fast
|
|
and
|
|
.BR normal .
|
|
The default is
|
|
.B fast
|
|
for
|
|
.I presets
|
|
0\(en3 and
|
|
.B normal
|
|
for
|
|
.I presets
|
|
4\(en9.
|
|
.IP ""
|
|
Usually
|
|
.B fast
|
|
is used with Hash Chain match finders and
|
|
.B normal
|
|
with Binary Tree match finders.
|
|
This is also what the
|
|
.I presets
|
|
do.
|
|
.TP
|
|
.BI nice= nice
|
|
Specify what is considered to be a nice length for a match.
|
|
Once a match of at least
|
|
.I nice
|
|
bytes is found, the algorithm stops
|
|
looking for possibly better matches.
|
|
.IP ""
|
|
.I Nice
|
|
can be 2\(en273 bytes.
|
|
Higher values tend to give better compression ratio
|
|
at the expense of speed.
|
|
The default depends on the
|
|
.IR preset .
|
|
.TP
|
|
.BI depth= depth
|
|
Specify the maximum search depth in the match finder.
|
|
The default is the special value of 0,
|
|
which makes the compressor determine a reasonable
|
|
.I depth
|
|
from
|
|
.I mf
|
|
and
|
|
.IR nice .
|
|
.IP ""
|
|
Reasonable
|
|
.I depth
|
|
for Hash Chains is 4\(en100 and 16\(en1000 for Binary Trees.
|
|
Using very high values for
|
|
.I depth
|
|
can make the encoder extremely slow with some files.
|
|
Avoid setting the
|
|
.I depth
|
|
over 1000 unless you are prepared to interrupt
|
|
the compression in case it is taking far too long.
|
|
.RE
|
|
.IP ""
|
|
When decoding raw streams
|
|
.RB ( \-\-format=raw ),
|
|
LZMA2 needs only the dictionary
|
|
.IR size .
|
|
LZMA1 needs also
|
|
.IR lc ,
|
|
.IR lp ,
|
|
and
|
|
.IR pb .
|
|
.TP
|
|
\fB\-\-x86\fR[\fB=\fIoptions\fR]
|
|
.PD 0
|
|
.TP
|
|
\fB\-\-powerpc\fR[\fB=\fIoptions\fR]
|
|
.TP
|
|
\fB\-\-ia64\fR[\fB=\fIoptions\fR]
|
|
.TP
|
|
\fB\-\-arm\fR[\fB=\fIoptions\fR]
|
|
.TP
|
|
\fB\-\-armthumb\fR[\fB=\fIoptions\fR]
|
|
.TP
|
|
\fB\-\-sparc\fR[\fB=\fIoptions\fR]
|
|
.PD
|
|
Add a branch/call/jump (BCJ) filter to the filter chain.
|
|
These filters can be used only as a non-last filter
|
|
in the filter chain.
|
|
.IP ""
|
|
A BCJ filter converts relative addresses in
|
|
the machine code to their absolute counterparts.
|
|
This doesn't change the size of the data,
|
|
but it increases redundancy,
|
|
which can help LZMA2 to produce 0\(en15\ % smaller
|
|
.B .xz
|
|
file.
|
|
The BCJ filters are always reversible,
|
|
so using a BCJ filter for wrong type of data
|
|
doesn't cause any data loss, although it may make
|
|
the compression ratio slightly worse.
|
|
.IP ""
|
|
It is fine to apply a BCJ filter on a whole executable;
|
|
there's no need to apply it only on the executable section.
|
|
Applying a BCJ filter on an archive that contains both executable
|
|
and non-executable files may or may not give good results,
|
|
so it generally isn't good to blindly apply a BCJ filter when
|
|
compressing binary packages for distribution.
|
|
.IP ""
|
|
These BCJ filters are very fast and
|
|
use insignificant amount of memory.
|
|
If a BCJ filter improves compression ratio of a file,
|
|
it can improve decompression speed at the same time.
|
|
This is because, on the same hardware,
|
|
the decompression speed of LZMA2 is roughly
|
|
a fixed number of bytes of compressed data per second.
|
|
.IP ""
|
|
These BCJ filters have known problems related to
|
|
the compression ratio:
|
|
.RS
|
|
.IP \(bu 3
|
|
Some types of files containing executable code
|
|
(for example, object files, static libraries, and Linux kernel modules)
|
|
have the addresses in the instructions filled with filler values.
|
|
These BCJ filters will still do the address conversion,
|
|
which will make the compression worse with these files.
|
|
.IP \(bu 3
|
|
Applying a BCJ filter on an archive containing multiple similar
|
|
executables can make the compression ratio worse than not using
|
|
a BCJ filter.
|
|
This is because the BCJ filter doesn't detect the boundaries
|
|
of the executable files, and doesn't reset
|
|
the address conversion counter for each executable.
|
|
.RE
|
|
.IP ""
|
|
Both of the above problems will be fixed
|
|
in the future in a new filter.
|
|
The old BCJ filters will still be useful in embedded systems,
|
|
because the decoder of the new filter will be bigger
|
|
and use more memory.
|
|
.IP ""
|
|
Different instruction sets have different alignment:
|
|
.RS
|
|
.RS
|
|
.PP
|
|
.TS
|
|
tab(;);
|
|
l n l
|
|
l n l.
|
|
Filter;Alignment;Notes
|
|
x86;1;32-bit or 64-bit x86
|
|
PowerPC;4;Big endian only
|
|
ARM;4;Little endian only
|
|
ARM-Thumb;2;Little endian only
|
|
IA-64;16;Big or little endian
|
|
SPARC;4;Big or little endian
|
|
.TE
|
|
.RE
|
|
.RE
|
|
.IP ""
|
|
Since the BCJ-filtered data is usually compressed with LZMA2,
|
|
the compression ratio may be improved slightly if
|
|
the LZMA2 options are set to match the
|
|
alignment of the selected BCJ filter.
|
|
For example, with the IA-64 filter, it's good to set
|
|
.B pb=4
|
|
with LZMA2 (2^4=16).
|
|
The x86 filter is an exception;
|
|
it's usually good to stick to LZMA2's default
|
|
four-byte alignment when compressing x86 executables.
|
|
.IP ""
|
|
All BCJ filters support the same
|
|
.IR options :
|
|
.RS
|
|
.TP
|
|
.BI start= offset
|
|
Specify the start
|
|
.I offset
|
|
that is used when converting between relative
|
|
and absolute addresses.
|
|
The
|
|
.I offset
|
|
must be a multiple of the alignment of the filter
|
|
(see the table above).
|
|
The default is zero.
|
|
In practice, the default is good; specifying a custom
|
|
.I offset
|
|
is almost never useful.
|
|
.RE
|
|
.TP
|
|
\fB\-\-delta\fR[\fB=\fIoptions\fR]
|
|
Add the Delta filter to the filter chain.
|
|
The Delta filter can be only used as a non-last filter
|
|
in the filter chain.
|
|
.IP ""
|
|
Currently only simple byte-wise delta calculation is supported.
|
|
It can be useful when compressing, for example, uncompressed bitmap images
|
|
or uncompressed PCM audio.
|
|
However, special purpose algorithms may give significantly better
|
|
results than Delta + LZMA2.
|
|
This is true especially with audio,
|
|
which compresses faster and better, for example, with
|
|
.BR flac (1).
|
|
.IP ""
|
|
Supported
|
|
.IR options :
|
|
.RS
|
|
.TP
|
|
.BI dist= distance
|
|
Specify the
|
|
.I distance
|
|
of the delta calculation in bytes.
|
|
.I distance
|
|
must be 1\(en256.
|
|
The default is 1.
|
|
.IP ""
|
|
For example, with
|
|
.B dist=2
|
|
and eight-byte input A1 B1 A2 B3 A3 B5 A4 B7, the output will be
|
|
A1 B1 01 02 01 02 01 02.
|
|
.RE
|
|
.
|
|
.SS "Other options"
|
|
.TP
|
|
.BR \-q ", " \-\-quiet
|
|
Suppress warnings and notices.
|
|
Specify this twice to suppress errors too.
|
|
This option has no effect on the exit status.
|
|
That is, even if a warning was suppressed,
|
|
the exit status to indicate a warning is still used.
|
|
.TP
|
|
.BR \-v ", " \-\-verbose
|
|
Be verbose.
|
|
If standard error is connected to a terminal,
|
|
.B xz
|
|
will display a progress indicator.
|
|
Specifying
|
|
.B \-\-verbose
|
|
twice will give even more verbose output.
|
|
.IP ""
|
|
The progress indicator shows the following information:
|
|
.RS
|
|
.IP \(bu 3
|
|
Completion percentage is shown
|
|
if the size of the input file is known.
|
|
That is, the percentage cannot be shown in pipes.
|
|
.IP \(bu 3
|
|
Amount of compressed data produced (compressing)
|
|
or consumed (decompressing).
|
|
.IP \(bu 3
|
|
Amount of uncompressed data consumed (compressing)
|
|
or produced (decompressing).
|
|
.IP \(bu 3
|
|
Compression ratio, which is calculated by dividing
|
|
the amount of compressed data processed so far by
|
|
the amount of uncompressed data processed so far.
|
|
.IP \(bu 3
|
|
Compression or decompression speed.
|
|
This is measured as the amount of uncompressed data consumed
|
|
(compression) or produced (decompression) per second.
|
|
It is shown after a few seconds have passed since
|
|
.B xz
|
|
started processing the file.
|
|
.IP \(bu 3
|
|
Elapsed time in the format M:SS or H:MM:SS.
|
|
.IP \(bu 3
|
|
Estimated remaining time is shown
|
|
only when the size of the input file is
|
|
known and a couple of seconds have already passed since
|
|
.B xz
|
|
started processing the file.
|
|
The time is shown in a less precise format which
|
|
never has any colons, for example, 2 min 30 s.
|
|
.RE
|
|
.IP ""
|
|
When standard error is not a terminal,
|
|
.B \-\-verbose
|
|
will make
|
|
.B xz
|
|
print the filename, compressed size, uncompressed size,
|
|
compression ratio, and possibly also the speed and elapsed time
|
|
on a single line to standard error after compressing or
|
|
decompressing the file.
|
|
The speed and elapsed time are included only when
|
|
the operation took at least a few seconds.
|
|
If the operation didn't finish, for example, due to user interruption,
|
|
also the completion percentage is printed
|
|
if the size of the input file is known.
|
|
.TP
|
|
.BR \-Q ", " \-\-no\-warn
|
|
Don't set the exit status to 2
|
|
even if a condition worth a warning was detected.
|
|
This option doesn't affect the verbosity level, thus both
|
|
.B \-\-quiet
|
|
and
|
|
.B \-\-no\-warn
|
|
have to be used to not display warnings and
|
|
to not alter the exit status.
|
|
.TP
|
|
.B \-\-robot
|
|
Print messages in a machine-parsable format.
|
|
This is intended to ease writing frontends that want to use
|
|
.B xz
|
|
instead of liblzma, which may be the case with various scripts.
|
|
The output with this option enabled is meant to be stable across
|
|
.B xz
|
|
releases.
|
|
See the section
|
|
.B "ROBOT MODE"
|
|
for details.
|
|
.TP
|
|
.B \-\-info\-memory
|
|
Display, in human-readable format, how much physical memory (RAM)
|
|
.B xz
|
|
thinks the system has and the memory usage limits for compression
|
|
and decompression, and exit successfully.
|
|
.TP
|
|
.BR \-h ", " \-\-help
|
|
Display a help message describing the most commonly used options,
|
|
and exit successfully.
|
|
.TP
|
|
.BR \-H ", " \-\-long\-help
|
|
Display a help message describing all features of
|
|
.BR xz ,
|
|
and exit successfully
|
|
.TP
|
|
.BR \-V ", " \-\-version
|
|
Display the version number of
|
|
.B xz
|
|
and liblzma in human readable format.
|
|
To get machine-parsable output, specify
|
|
.B \-\-robot
|
|
before
|
|
.BR \-\-version .
|
|
.
|
|
.SH "ROBOT MODE"
|
|
The robot mode is activated with the
|
|
.B \-\-robot
|
|
option.
|
|
It makes the output of
|
|
.B xz
|
|
easier to parse by other programs.
|
|
Currently
|
|
.B \-\-robot
|
|
is supported only together with
|
|
.BR \-\-version ,
|
|
.BR \-\-info\-memory ,
|
|
and
|
|
.BR \-\-list .
|
|
It will be supported for compression and
|
|
decompression in the future.
|
|
.
|
|
.SS Version
|
|
.B "xz \-\-robot \-\-version"
|
|
will print the version number of
|
|
.B xz
|
|
and liblzma in the following format:
|
|
.PP
|
|
.BI XZ_VERSION= XYYYZZZS
|
|
.br
|
|
.BI LIBLZMA_VERSION= XYYYZZZS
|
|
.TP
|
|
.I X
|
|
Major version.
|
|
.TP
|
|
.I YYY
|
|
Minor version.
|
|
Even numbers are stable.
|
|
Odd numbers are alpha or beta versions.
|
|
.TP
|
|
.I ZZZ
|
|
Patch level for stable releases or
|
|
just a counter for development releases.
|
|
.TP
|
|
.I S
|
|
Stability.
|
|
0 is alpha, 1 is beta, and 2 is stable.
|
|
.I S
|
|
should be always 2 when
|
|
.I YYY
|
|
is even.
|
|
.PP
|
|
.I XYYYZZZS
|
|
are the same on both lines if
|
|
.B xz
|
|
and liblzma are from the same XZ Utils release.
|
|
.PP
|
|
Examples: 4.999.9beta is
|
|
.B 49990091
|
|
and
|
|
5.0.0 is
|
|
.BR 50000002 .
|
|
.
|
|
.SS "Memory limit information"
|
|
.B "xz \-\-robot \-\-info\-memory"
|
|
prints a single line with three tab-separated columns:
|
|
.IP 1. 4
|
|
Total amount of physical memory (RAM) in bytes
|
|
.IP 2. 4
|
|
Memory usage limit for compression in bytes.
|
|
A special value of zero indicates the default setting,
|
|
which for single-threaded mode is the same as no limit.
|
|
.IP 3. 4
|
|
Memory usage limit for decompression in bytes.
|
|
A special value of zero indicates the default setting,
|
|
which for single-threaded mode is the same as no limit.
|
|
.PP
|
|
In the future, the output of
|
|
.B "xz \-\-robot \-\-info\-memory"
|
|
may have more columns, but never more than a single line.
|
|
.
|
|
.SS "List mode"
|
|
.B "xz \-\-robot \-\-list"
|
|
uses tab-separated output.
|
|
The first column of every line has a string
|
|
that indicates the type of the information found on that line:
|
|
.TP
|
|
.B name
|
|
This is always the first line when starting to list a file.
|
|
The second column on the line is the filename.
|
|
.TP
|
|
.B file
|
|
This line contains overall information about the
|
|
.B .xz
|
|
file.
|
|
This line is always printed after the
|
|
.B name
|
|
line.
|
|
.TP
|
|
.B stream
|
|
This line type is used only when
|
|
.B \-\-verbose
|
|
was specified.
|
|
There are as many
|
|
.B stream
|
|
lines as there are streams in the
|
|
.B .xz
|
|
file.
|
|
.TP
|
|
.B block
|
|
This line type is used only when
|
|
.B \-\-verbose
|
|
was specified.
|
|
There are as many
|
|
.B block
|
|
lines as there are blocks in the
|
|
.B .xz
|
|
file.
|
|
The
|
|
.B block
|
|
lines are shown after all the
|
|
.B stream
|
|
lines; different line types are not interleaved.
|
|
.TP
|
|
.B summary
|
|
This line type is used only when
|
|
.B \-\-verbose
|
|
was specified twice.
|
|
This line is printed after all
|
|
.B block
|
|
lines.
|
|
Like the
|
|
.B file
|
|
line, the
|
|
.B summary
|
|
line contains overall information about the
|
|
.B .xz
|
|
file.
|
|
.TP
|
|
.B totals
|
|
This line is always the very last line of the list output.
|
|
It shows the total counts and sizes.
|
|
.PP
|
|
The columns of the
|
|
.B file
|
|
lines:
|
|
.PD 0
|
|
.RS
|
|
.IP 2. 4
|
|
Number of streams in the file
|
|
.IP 3. 4
|
|
Total number of blocks in the stream(s)
|
|
.IP 4. 4
|
|
Compressed size of the file
|
|
.IP 5. 4
|
|
Uncompressed size of the file
|
|
.IP 6. 4
|
|
Compression ratio, for example,
|
|
.BR 0.123 .
|
|
If ratio is over 9.999, three dashes
|
|
.RB ( \-\-\- )
|
|
are displayed instead of the ratio.
|
|
.IP 7. 4
|
|
Comma-separated list of integrity check names.
|
|
The following strings are used for the known check types:
|
|
.BR None ,
|
|
.BR CRC32 ,
|
|
.BR CRC64 ,
|
|
and
|
|
.BR SHA\-256 .
|
|
For unknown check types,
|
|
.BI Unknown\- N
|
|
is used, where
|
|
.I N
|
|
is the Check ID as a decimal number (one or two digits).
|
|
.IP 8. 4
|
|
Total size of stream padding in the file
|
|
.RE
|
|
.PD
|
|
.PP
|
|
The columns of the
|
|
.B stream
|
|
lines:
|
|
.PD 0
|
|
.RS
|
|
.IP 2. 4
|
|
Stream number (the first stream is 1)
|
|
.IP 3. 4
|
|
Number of blocks in the stream
|
|
.IP 4. 4
|
|
Compressed start offset
|
|
.IP 5. 4
|
|
Uncompressed start offset
|
|
.IP 6. 4
|
|
Compressed size (does not include stream padding)
|
|
.IP 7. 4
|
|
Uncompressed size
|
|
.IP 8. 4
|
|
Compression ratio
|
|
.IP 9. 4
|
|
Name of the integrity check
|
|
.IP 10. 4
|
|
Size of stream padding
|
|
.RE
|
|
.PD
|
|
.PP
|
|
The columns of the
|
|
.B block
|
|
lines:
|
|
.PD 0
|
|
.RS
|
|
.IP 2. 4
|
|
Number of the stream containing this block
|
|
.IP 3. 4
|
|
Block number relative to the beginning of the stream
|
|
(the first block is 1)
|
|
.IP 4. 4
|
|
Block number relative to the beginning of the file
|
|
.IP 5. 4
|
|
Compressed start offset relative to the beginning of the file
|
|
.IP 6. 4
|
|
Uncompressed start offset relative to the beginning of the file
|
|
.IP 7. 4
|
|
Total compressed size of the block (includes headers)
|
|
.IP 8. 4
|
|
Uncompressed size
|
|
.IP 9. 4
|
|
Compression ratio
|
|
.IP 10. 4
|
|
Name of the integrity check
|
|
.RE
|
|
.PD
|
|
.PP
|
|
If
|
|
.B \-\-verbose
|
|
was specified twice, additional columns are included on the
|
|
.B block
|
|
lines.
|
|
These are not displayed with a single
|
|
.BR \-\-verbose ,
|
|
because getting this information requires many seeks
|
|
and can thus be slow:
|
|
.PD 0
|
|
.RS
|
|
.IP 11. 4
|
|
Value of the integrity check in hexadecimal
|
|
.IP 12. 4
|
|
Block header size
|
|
.IP 13. 4
|
|
Block flags:
|
|
.B c
|
|
indicates that compressed size is present, and
|
|
.B u
|
|
indicates that uncompressed size is present.
|
|
If the flag is not set, a dash
|
|
.RB ( \- )
|
|
is shown instead to keep the string length fixed.
|
|
New flags may be added to the end of the string in the future.
|
|
.IP 14. 4
|
|
Size of the actual compressed data in the block (this excludes
|
|
the block header, block padding, and check fields)
|
|
.IP 15. 4
|
|
Amount of memory (in bytes) required to decompress
|
|
this block with this
|
|
.B xz
|
|
version
|
|
.IP 16. 4
|
|
Filter chain.
|
|
Note that most of the options used at compression time
|
|
cannot be known, because only the options
|
|
that are needed for decompression are stored in the
|
|
.B .xz
|
|
headers.
|
|
.RE
|
|
.PD
|
|
.PP
|
|
The columns of the
|
|
.B summary
|
|
lines:
|
|
.PD 0
|
|
.RS
|
|
.IP 2. 4
|
|
Amount of memory (in bytes) required to decompress
|
|
this file with this
|
|
.B xz
|
|
version
|
|
.IP 3. 4
|
|
.B yes
|
|
or
|
|
.B no
|
|
indicating if all block headers have both compressed size and
|
|
uncompressed size stored in them
|
|
.PP
|
|
.I Since
|
|
.B xz
|
|
.I 5.1.2alpha:
|
|
.IP 4. 4
|
|
Minimum
|
|
.B xz
|
|
version required to decompress the file
|
|
.RE
|
|
.PD
|
|
.PP
|
|
The columns of the
|
|
.B totals
|
|
line:
|
|
.PD 0
|
|
.RS
|
|
.IP 2. 4
|
|
Number of streams
|
|
.IP 3. 4
|
|
Number of blocks
|
|
.IP 4. 4
|
|
Compressed size
|
|
.IP 5. 4
|
|
Uncompressed size
|
|
.IP 6. 4
|
|
Average compression ratio
|
|
.IP 7. 4
|
|
Comma-separated list of integrity check names
|
|
that were present in the files
|
|
.IP 8. 4
|
|
Stream padding size
|
|
.IP 9. 4
|
|
Number of files.
|
|
This is here to
|
|
keep the order of the earlier columns the same as on
|
|
.B file
|
|
lines.
|
|
.PD
|
|
.RE
|
|
.PP
|
|
If
|
|
.B \-\-verbose
|
|
was specified twice, additional columns are included on the
|
|
.B totals
|
|
line:
|
|
.PD 0
|
|
.RS
|
|
.IP 10. 4
|
|
Maximum amount of memory (in bytes) required to decompress
|
|
the files with this
|
|
.B xz
|
|
version
|
|
.IP 11. 4
|
|
.B yes
|
|
or
|
|
.B no
|
|
indicating if all block headers have both compressed size and
|
|
uncompressed size stored in them
|
|
.PP
|
|
.I Since
|
|
.B xz
|
|
.I 5.1.2alpha:
|
|
.IP 12. 4
|
|
Minimum
|
|
.B xz
|
|
version required to decompress the file
|
|
.RE
|
|
.PD
|
|
.PP
|
|
Future versions may add new line types and
|
|
new columns can be added to the existing line types,
|
|
but the existing columns won't be changed.
|
|
.
|
|
.SH "EXIT STATUS"
|
|
.TP
|
|
.B 0
|
|
All is good.
|
|
.TP
|
|
.B 1
|
|
An error occurred.
|
|
.TP
|
|
.B 2
|
|
Something worth a warning occurred,
|
|
but no actual errors occurred.
|
|
.PP
|
|
Notices (not warnings or errors) printed on standard error
|
|
don't affect the exit status.
|
|
.
|
|
.SH ENVIRONMENT
|
|
.B xz
|
|
parses space-separated lists of options
|
|
from the environment variables
|
|
.B XZ_DEFAULTS
|
|
and
|
|
.BR XZ_OPT ,
|
|
in this order, before parsing the options from the command line.
|
|
Note that only options are parsed from the environment variables;
|
|
all non-options are silently ignored.
|
|
Parsing is done with
|
|
.BR getopt_long (3)
|
|
which is used also for the command line arguments.
|
|
.TP
|
|
.B XZ_DEFAULTS
|
|
User-specific or system-wide default options.
|
|
Typically this is set in a shell initialization script to enable
|
|
.BR xz 's
|
|
memory usage limiter by default.
|
|
Excluding shell initialization scripts
|
|
and similar special cases, scripts must never set or unset
|
|
.BR XZ_DEFAULTS .
|
|
.TP
|
|
.B XZ_OPT
|
|
This is for passing options to
|
|
.B xz
|
|
when it is not possible to set the options directly on the
|
|
.B xz
|
|
command line.
|
|
This is the case when
|
|
.B xz
|
|
is run by a script or tool, for example, GNU
|
|
.BR tar (1):
|
|
.RS
|
|
.RS
|
|
.PP
|
|
.nf
|
|
.ft CW
|
|
XZ_OPT=\-2v tar caf foo.tar.xz foo
|
|
.ft R
|
|
.fi
|
|
.RE
|
|
.RE
|
|
.IP ""
|
|
Scripts may use
|
|
.BR XZ_OPT ,
|
|
for example, to set script-specific default compression options.
|
|
It is still recommended to allow users to override
|
|
.B XZ_OPT
|
|
if that is reasonable.
|
|
For example, in
|
|
.BR sh (1)
|
|
scripts one may use something like this:
|
|
.RS
|
|
.RS
|
|
.PP
|
|
.nf
|
|
.ft CW
|
|
XZ_OPT=${XZ_OPT\-"\-7e"}
|
|
export XZ_OPT
|
|
.ft R
|
|
.fi
|
|
.RE
|
|
.RE
|
|
.
|
|
.SH "LZMA UTILS COMPATIBILITY"
|
|
The command line syntax of
|
|
.B xz
|
|
is practically a superset of
|
|
.BR lzma ,
|
|
.BR unlzma ,
|
|
and
|
|
.B lzcat
|
|
as found from LZMA Utils 4.32.x.
|
|
In most cases, it is possible to replace
|
|
LZMA Utils with XZ Utils without breaking existing scripts.
|
|
There are some incompatibilities though,
|
|
which may sometimes cause problems.
|
|
.
|
|
.SS "Compression preset levels"
|
|
The numbering of the compression level presets is not identical in
|
|
.B xz
|
|
and LZMA Utils.
|
|
The most important difference is how dictionary sizes
|
|
are mapped to different presets.
|
|
Dictionary size is roughly equal to the decompressor memory usage.
|
|
.RS
|
|
.PP
|
|
.TS
|
|
tab(;);
|
|
c c c
|
|
c n n.
|
|
Level;xz;LZMA Utils
|
|
\-0;256 KiB;N/A
|
|
\-1;1 MiB;64 KiB
|
|
\-2;2 MiB;1 MiB
|
|
\-3;4 MiB;512 KiB
|
|
\-4;4 MiB;1 MiB
|
|
\-5;8 MiB;2 MiB
|
|
\-6;8 MiB;4 MiB
|
|
\-7;16 MiB;8 MiB
|
|
\-8;32 MiB;16 MiB
|
|
\-9;64 MiB;32 MiB
|
|
.TE
|
|
.RE
|
|
.PP
|
|
The dictionary size differences affect
|
|
the compressor memory usage too,
|
|
but there are some other differences between
|
|
LZMA Utils and XZ Utils, which
|
|
make the difference even bigger:
|
|
.RS
|
|
.PP
|
|
.TS
|
|
tab(;);
|
|
c c c
|
|
c n n.
|
|
Level;xz;LZMA Utils 4.32.x
|
|
\-0;3 MiB;N/A
|
|
\-1;9 MiB;2 MiB
|
|
\-2;17 MiB;12 MiB
|
|
\-3;32 MiB;12 MiB
|
|
\-4;48 MiB;16 MiB
|
|
\-5;94 MiB;26 MiB
|
|
\-6;94 MiB;45 MiB
|
|
\-7;186 MiB;83 MiB
|
|
\-8;370 MiB;159 MiB
|
|
\-9;674 MiB;311 MiB
|
|
.TE
|
|
.RE
|
|
.PP
|
|
The default preset level in LZMA Utils is
|
|
.B \-7
|
|
while in XZ Utils it is
|
|
.BR \-6 ,
|
|
so both use an 8 MiB dictionary by default.
|
|
.
|
|
.SS "Streamed vs. non-streamed .lzma files"
|
|
The uncompressed size of the file can be stored in the
|
|
.B .lzma
|
|
header.
|
|
LZMA Utils does that when compressing regular files.
|
|
The alternative is to mark that uncompressed size is unknown
|
|
and use end-of-payload marker to indicate
|
|
where the decompressor should stop.
|
|
LZMA Utils uses this method when uncompressed size isn't known,
|
|
which is the case, for example, in pipes.
|
|
.PP
|
|
.B xz
|
|
supports decompressing
|
|
.B .lzma
|
|
files with or without end-of-payload marker, but all
|
|
.B .lzma
|
|
files created by
|
|
.B xz
|
|
will use end-of-payload marker and have uncompressed size
|
|
marked as unknown in the
|
|
.B .lzma
|
|
header.
|
|
This may be a problem in some uncommon situations.
|
|
For example, a
|
|
.B .lzma
|
|
decompressor in an embedded device might work
|
|
only with files that have known uncompressed size.
|
|
If you hit this problem, you need to use LZMA Utils
|
|
or LZMA SDK to create
|
|
.B .lzma
|
|
files with known uncompressed size.
|
|
.
|
|
.SS "Unsupported .lzma files"
|
|
The
|
|
.B .lzma
|
|
format allows
|
|
.I lc
|
|
values up to 8, and
|
|
.I lp
|
|
values up to 4.
|
|
LZMA Utils can decompress files with any
|
|
.I lc
|
|
and
|
|
.IR lp ,
|
|
but always creates files with
|
|
.B lc=3
|
|
and
|
|
.BR lp=0 .
|
|
Creating files with other
|
|
.I lc
|
|
and
|
|
.I lp
|
|
is possible with
|
|
.B xz
|
|
and with LZMA SDK.
|
|
.PP
|
|
The implementation of the LZMA1 filter in liblzma
|
|
requires that the sum of
|
|
.I lc
|
|
and
|
|
.I lp
|
|
must not exceed 4.
|
|
Thus,
|
|
.B .lzma
|
|
files, which exceed this limitation, cannot be decompressed with
|
|
.BR xz .
|
|
.PP
|
|
LZMA Utils creates only
|
|
.B .lzma
|
|
files which have a dictionary size of
|
|
.RI "2^" n
|
|
(a power of 2) but accepts files with any dictionary size.
|
|
liblzma accepts only
|
|
.B .lzma
|
|
files which have a dictionary size of
|
|
.RI "2^" n
|
|
or
|
|
.RI "2^" n " + 2^(" n "\-1)."
|
|
This is to decrease false positives when detecting
|
|
.B .lzma
|
|
files.
|
|
.PP
|
|
These limitations shouldn't be a problem in practice,
|
|
since practically all
|
|
.B .lzma
|
|
files have been compressed with settings that liblzma will accept.
|
|
.
|
|
.SS "Trailing garbage"
|
|
When decompressing,
|
|
LZMA Utils silently ignore everything after the first
|
|
.B .lzma
|
|
stream.
|
|
In most situations, this is a bug.
|
|
This also means that LZMA Utils
|
|
don't support decompressing concatenated
|
|
.B .lzma
|
|
files.
|
|
.PP
|
|
If there is data left after the first
|
|
.B .lzma
|
|
stream,
|
|
.B xz
|
|
considers the file to be corrupt unless
|
|
.B \-\-single\-stream
|
|
was used.
|
|
This may break obscure scripts which have
|
|
assumed that trailing garbage is ignored.
|
|
.
|
|
.SH NOTES
|
|
.
|
|
.SS "Compressed output may vary"
|
|
The exact compressed output produced from
|
|
the same uncompressed input file
|
|
may vary between XZ Utils versions even if
|
|
compression options are identical.
|
|
This is because the encoder can be improved
|
|
(faster or better compression)
|
|
without affecting the file format.
|
|
The output can vary even between different
|
|
builds of the same XZ Utils version,
|
|
if different build options are used.
|
|
.PP
|
|
The above means that once
|
|
.B \-\-rsyncable
|
|
has been implemented,
|
|
the resulting files won't necessarily be rsyncable
|
|
unless both old and new files have been compressed
|
|
with the same xz version.
|
|
This problem can be fixed if a part of the encoder
|
|
implementation is frozen to keep rsyncable output
|
|
stable across xz versions.
|
|
.
|
|
.SS "Embedded .xz decompressors"
|
|
Embedded
|
|
.B .xz
|
|
decompressor implementations like XZ Embedded don't necessarily
|
|
support files created with integrity
|
|
.I check
|
|
types other than
|
|
.B none
|
|
and
|
|
.BR crc32 .
|
|
Since the default is
|
|
.BR \-\-check=crc64 ,
|
|
you must use
|
|
.B \-\-check=none
|
|
or
|
|
.B \-\-check=crc32
|
|
when creating files for embedded systems.
|
|
.PP
|
|
Outside embedded systems, all
|
|
.B .xz
|
|
format decompressors support all the
|
|
.I check
|
|
types, or at least are able to decompress
|
|
the file without verifying the
|
|
integrity check if the particular
|
|
.I check
|
|
is not supported.
|
|
.PP
|
|
XZ Embedded supports BCJ filters,
|
|
but only with the default start offset.
|
|
.
|
|
.SH EXAMPLES
|
|
.
|
|
.SS Basics
|
|
Compress the file
|
|
.I foo
|
|
into
|
|
.I foo.xz
|
|
using the default compression level
|
|
.RB ( \-6 ),
|
|
and remove
|
|
.I foo
|
|
if compression is successful:
|
|
.RS
|
|
.PP
|
|
.nf
|
|
.ft CW
|
|
xz foo
|
|
.ft R
|
|
.fi
|
|
.RE
|
|
.PP
|
|
Decompress
|
|
.I bar.xz
|
|
into
|
|
.I bar
|
|
and don't remove
|
|
.I bar.xz
|
|
even if decompression is successful:
|
|
.RS
|
|
.PP
|
|
.nf
|
|
.ft CW
|
|
xz \-dk bar.xz
|
|
.ft R
|
|
.fi
|
|
.RE
|
|
.PP
|
|
Create
|
|
.I baz.tar.xz
|
|
with the preset
|
|
.B \-4e
|
|
.RB ( "\-4 \-\-extreme" ),
|
|
which is slower than the default
|
|
.BR \-6 ,
|
|
but needs less memory for compression and decompression (48\ MiB
|
|
and 5\ MiB, respectively):
|
|
.RS
|
|
.PP
|
|
.nf
|
|
.ft CW
|
|
tar cf \- baz | xz \-4e > baz.tar.xz
|
|
.ft R
|
|
.fi
|
|
.RE
|
|
.PP
|
|
A mix of compressed and uncompressed files can be decompressed
|
|
to standard output with a single command:
|
|
.RS
|
|
.PP
|
|
.nf
|
|
.ft CW
|
|
xz \-dcf a.txt b.txt.xz c.txt d.txt.lzma > abcd.txt
|
|
.ft R
|
|
.fi
|
|
.RE
|
|
.
|
|
.SS "Parallel compression of many files"
|
|
On GNU and *BSD,
|
|
.BR find (1)
|
|
and
|
|
.BR xargs (1)
|
|
can be used to parallelize compression of many files:
|
|
.RS
|
|
.PP
|
|
.nf
|
|
.ft CW
|
|
find . \-type f \e! \-name '*.xz' \-print0 \e
|
|
| xargs \-0r \-P4 \-n16 xz \-T1
|
|
.ft R
|
|
.fi
|
|
.RE
|
|
.PP
|
|
The
|
|
.B \-P
|
|
option to
|
|
.BR xargs (1)
|
|
sets the number of parallel
|
|
.B xz
|
|
processes.
|
|
The best value for the
|
|
.B \-n
|
|
option depends on how many files there are to be compressed.
|
|
If there are only a couple of files,
|
|
the value should probably be 1;
|
|
with tens of thousands of files,
|
|
100 or even more may be appropriate to reduce the number of
|
|
.B xz
|
|
processes that
|
|
.BR xargs (1)
|
|
will eventually create.
|
|
.PP
|
|
The option
|
|
.B \-T1
|
|
for
|
|
.B xz
|
|
is there to force it to single-threaded mode, because
|
|
.BR xargs (1)
|
|
is used to control the amount of parallelization.
|
|
.
|
|
.SS "Robot mode"
|
|
Calculate how many bytes have been saved in total
|
|
after compressing multiple files:
|
|
.RS
|
|
.PP
|
|
.nf
|
|
.ft CW
|
|
xz \-\-robot \-\-list *.xz | awk '/^totals/{print $5\-$4}'
|
|
.ft R
|
|
.fi
|
|
.RE
|
|
.PP
|
|
A script may want to know that it is using new enough
|
|
.BR xz .
|
|
The following
|
|
.BR sh (1)
|
|
script checks that the version number of the
|
|
.B xz
|
|
tool is at least 5.0.0.
|
|
This method is compatible with old beta versions,
|
|
which didn't support the
|
|
.B \-\-robot
|
|
option:
|
|
.RS
|
|
.PP
|
|
.nf
|
|
.ft CW
|
|
if ! eval "$(xz \-\-robot \-\-version 2> /dev/null)" ||
|
|
[ "$XZ_VERSION" \-lt 50000002 ]; then
|
|
echo "Your xz is too old."
|
|
fi
|
|
unset XZ_VERSION LIBLZMA_VERSION
|
|
.ft R
|
|
.fi
|
|
.RE
|
|
.PP
|
|
Set a memory usage limit for decompression using
|
|
.BR XZ_OPT ,
|
|
but if a limit has already been set, don't increase it:
|
|
.RS
|
|
.PP
|
|
.nf
|
|
.ft CW
|
|
NEWLIM=$((123 << 20)) # 123 MiB
|
|
OLDLIM=$(xz \-\-robot \-\-info\-memory | cut \-f3)
|
|
if [ $OLDLIM \-eq 0 \-o $OLDLIM \-gt $NEWLIM ]; then
|
|
XZ_OPT="$XZ_OPT \-\-memlimit\-decompress=$NEWLIM"
|
|
export XZ_OPT
|
|
fi
|
|
.ft R
|
|
.fi
|
|
.RE
|
|
.
|
|
.SS "Custom compressor filter chains"
|
|
The simplest use for custom filter chains is
|
|
customizing a LZMA2 preset.
|
|
This can be useful,
|
|
because the presets cover only a subset of the
|
|
potentially useful combinations of compression settings.
|
|
.PP
|
|
The CompCPU columns of the tables
|
|
from the descriptions of the options
|
|
.BR "\-0" " ... " "\-9"
|
|
and
|
|
.B \-\-extreme
|
|
are useful when customizing LZMA2 presets.
|
|
Here are the relevant parts collected from those two tables:
|
|
.RS
|
|
.PP
|
|
.TS
|
|
tab(;);
|
|
c c
|
|
n n.
|
|
Preset;CompCPU
|
|
\-0;0
|
|
\-1;1
|
|
\-2;2
|
|
\-3;3
|
|
\-4;4
|
|
\-5;5
|
|
\-6;6
|
|
\-5e;7
|
|
\-6e;8
|
|
.TE
|
|
.RE
|
|
.PP
|
|
If you know that a file requires
|
|
somewhat big dictionary (for example, 32\ MiB) to compress well,
|
|
but you want to compress it quicker than
|
|
.B "xz \-8"
|
|
would do, a preset with a low CompCPU value (for example, 1)
|
|
can be modified to use a bigger dictionary:
|
|
.RS
|
|
.PP
|
|
.nf
|
|
.ft CW
|
|
xz \-\-lzma2=preset=1,dict=32MiB foo.tar
|
|
.ft R
|
|
.fi
|
|
.RE
|
|
.PP
|
|
With certain files, the above command may be faster than
|
|
.B "xz \-6"
|
|
while compressing significantly better.
|
|
However, it must be emphasized that only some files benefit from
|
|
a big dictionary while keeping the CompCPU value low.
|
|
The most obvious situation,
|
|
where a big dictionary can help a lot,
|
|
is an archive containing very similar files
|
|
of at least a few megabytes each.
|
|
The dictionary size has to be significantly bigger
|
|
than any individual file to allow LZMA2 to take
|
|
full advantage of the similarities between consecutive files.
|
|
.PP
|
|
If very high compressor and decompressor memory usage is fine,
|
|
and the file being compressed is
|
|
at least several hundred megabytes, it may be useful
|
|
to use an even bigger dictionary than the 64 MiB that
|
|
.B "xz \-9"
|
|
would use:
|
|
.RS
|
|
.PP
|
|
.nf
|
|
.ft CW
|
|
xz \-vv \-\-lzma2=dict=192MiB big_foo.tar
|
|
.ft R
|
|
.fi
|
|
.RE
|
|
.PP
|
|
Using
|
|
.B \-vv
|
|
.RB ( "\-\-verbose \-\-verbose" )
|
|
like in the above example can be useful
|
|
to see the memory requirements
|
|
of the compressor and decompressor.
|
|
Remember that using a dictionary bigger than
|
|
the size of the uncompressed file is waste of memory,
|
|
so the above command isn't useful for small files.
|
|
.PP
|
|
Sometimes the compression time doesn't matter,
|
|
but the decompressor memory usage has to be kept low, for example,
|
|
to make it possible to decompress the file on an embedded system.
|
|
The following command uses
|
|
.B \-6e
|
|
.RB ( "\-6 \-\-extreme" )
|
|
as a base and sets the dictionary to only 64\ KiB.
|
|
The resulting file can be decompressed with XZ Embedded
|
|
(that's why there is
|
|
.BR \-\-check=crc32 )
|
|
using about 100\ KiB of memory.
|
|
.RS
|
|
.PP
|
|
.nf
|
|
.ft CW
|
|
xz \-\-check=crc32 \-\-lzma2=preset=6e,dict=64KiB foo
|
|
.ft R
|
|
.fi
|
|
.RE
|
|
.PP
|
|
If you want to squeeze out as many bytes as possible,
|
|
adjusting the number of literal context bits
|
|
.RI ( lc )
|
|
and number of position bits
|
|
.RI ( pb )
|
|
can sometimes help.
|
|
Adjusting the number of literal position bits
|
|
.RI ( lp )
|
|
might help too, but usually
|
|
.I lc
|
|
and
|
|
.I pb
|
|
are more important.
|
|
For example, a source code archive contains mostly US-ASCII text,
|
|
so something like the following might give
|
|
slightly (like 0.1\ %) smaller file than
|
|
.B "xz \-6e"
|
|
(try also without
|
|
.BR lc=4 ):
|
|
.RS
|
|
.PP
|
|
.nf
|
|
.ft CW
|
|
xz \-\-lzma2=preset=6e,pb=0,lc=4 source_code.tar
|
|
.ft R
|
|
.fi
|
|
.RE
|
|
.PP
|
|
Using another filter together with LZMA2 can improve
|
|
compression with certain file types.
|
|
For example, to compress a x86-32 or x86-64 shared library
|
|
using the x86 BCJ filter:
|
|
.RS
|
|
.PP
|
|
.nf
|
|
.ft CW
|
|
xz \-\-x86 \-\-lzma2 libfoo.so
|
|
.ft R
|
|
.fi
|
|
.RE
|
|
.PP
|
|
Note that the order of the filter options is significant.
|
|
If
|
|
.B \-\-x86
|
|
is specified after
|
|
.BR \-\-lzma2 ,
|
|
.B xz
|
|
will give an error,
|
|
because there cannot be any filter after LZMA2,
|
|
and also because the x86 BCJ filter cannot be used
|
|
as the last filter in the chain.
|
|
.PP
|
|
The Delta filter together with LZMA2
|
|
can give good results with bitmap images.
|
|
It should usually beat PNG,
|
|
which has a few more advanced filters than simple
|
|
delta but uses Deflate for the actual compression.
|
|
.PP
|
|
The image has to be saved in uncompressed format,
|
|
for example, as uncompressed TIFF.
|
|
The distance parameter of the Delta filter is set
|
|
to match the number of bytes per pixel in the image.
|
|
For example, 24-bit RGB bitmap needs
|
|
.BR dist=3 ,
|
|
and it is also good to pass
|
|
.B pb=0
|
|
to LZMA2 to accommodate the three-byte alignment:
|
|
.RS
|
|
.PP
|
|
.nf
|
|
.ft CW
|
|
xz \-\-delta=dist=3 \-\-lzma2=pb=0 foo.tiff
|
|
.ft R
|
|
.fi
|
|
.RE
|
|
.PP
|
|
If multiple images have been put into a single archive (for example,
|
|
.BR .tar ),
|
|
the Delta filter will work on that too as long as all images
|
|
have the same number of bytes per pixel.
|
|
.
|
|
.SH "SEE ALSO"
|
|
.BR xzdec (1),
|
|
.BR xzdiff (1),
|
|
.BR xzgrep (1),
|
|
.BR xzless (1),
|
|
.BR xzmore (1),
|
|
.BR gzip (1),
|
|
.BR bzip2 (1),
|
|
.BR 7z (1)
|
|
.PP
|
|
XZ Utils: <https://tukaani.org/xz/>
|
|
.br
|
|
XZ Embedded: <https://tukaani.org/xz/embedded.html>
|
|
.br
|
|
LZMA SDK: <http://7-zip.org/sdk.html>
|