Instead of having Doxyfile.in configured by Autoconf, the Doxyfile
can have the tags that need to be configured piped into the doxygen
command through stdin with the overrides after Doxyfile's contents.
Going forward, the documentation should be generated in two different
modes: liblzma or internal.
liblzma is useful for most users. It is the documentation for just
the liblzma API header files. This is the default.
internal is for people who want to understand how xz and liblzma work.
It might be useful for people who want to contribute to the project.
Converts the existing lzma_index tests into tuktests and covers every
API function from index.h except for lzma_file_info_decoder, which can
be tested in the future.
(This commit combines related commits from the master branch.)
If Capsicum support is missing from the kernel or xz is being run
in an emulator that lacks Capsicum suport, the syscalls will fail
and set errno to ENOSYS. Previously xz would display and error and
exit, making xz unusable. Now it will check for ENOSYS and run
without sandbox support. Other tools like ssh behave similarly.
Displaying a warning for missing Capsicum support was considered
but such extra output would quickly become annoying. It would also
break test_scripts.sh in "make check".
Also move cap_enter() to be the first step instead of the last one.
This matches the example in the cap_rights_limit(2) man page. With
the current code it shouldn't make any practical difference though.
Thanks to Xin Li for the bug report, suggesting a fix, and testing:
https://github.com/tukaani-project/xz/pull/43
Thanks to Jia Tan for most of the original commits.
Now, the LZMA_VERSION_MAJOR, LZMA_VERSION_MINOR, and LZMA_VERSION_PATCH
macros do not need to be on consecutive lines in version.h. They can be
separated by more whitespace, comments, or even other content, as long
as they appear in the proper order (major, minor, patch).
lzma_lzma_preset() does not guarentee that the lzma_options_lzma are
usable in an encoder even if it returns false (success). If liblzma
is built with default configurations, then the options will always be
usable. However if the match finders hc3, hc4, or bt4 are disabled, then
the options may not be usable depending on the preset level requested.
The documentation was updated to reflect this complexity, since this
behavior was unclear before.
The static global variables can be disabled if encoders and decoders
are not built. If they are not disabled and -Werror is used, it will
cause an usused warning as an error.
All functions now explicitly specify parameter and return values.
The notes and code annotations were moved before the parameter and
return value descriptions for consistency.
Also, the description above lzma_filter_encoder_is_supported() about
not being able to list available filters was removed since
lzma_str_list_filters() will do this.
In the C99 and C17 standards, section 6.5.6 paragraph 8 means that
adding 0 to a null pointer is undefined behavior. As of writing,
"clang -fsanitize=undefined" (Clang 15) diagnoses this. However,
I'm not aware of any compiler that would take advantage of this
when optimizing (Clang 15 included). It's good to avoid this anyway
since compilers might some day infer that pointer arithmetic implies
that the pointer is not NULL. That is, the following foo() would then
unconditionally return 0, even for foo(NULL, 0):
void bar(char *a, char *b);
int foo(char *a, size_t n)
{
bar(a, a + n);
return a == NULL;
}
In contrast to C, C++ explicitly allows null pointer + 0. So if
the above is compiled as C++ then there is no undefined behavior
in the foo(NULL, 0) call.
To me it seems that changing the C standard would be the sane
thing to do (just add one sentence) as it would ensure that a huge
amount of old code won't break in the future. Based on web searches
it seems that a large number of codebases (where null pointer + 0
occurs) are being fixed instead to be future-proof in case compilers
will some day optimize based on it (like making the above foo(NULL, 0)
return 0) which in the worst case will cause security bugs.
Some projects don't plan to change it. For example, gnulib and thus
many GNU tools currently require that null pointer + 0 is defined:
https://lists.gnu.org/archive/html/bug-gnulib/2021-11/msg00000.htmlhttps://www.gnu.org/software/gnulib/manual/html_node/Other-portability-assumptions.html
In XZ Utils null pointer + 0 issue should be fixed after this
commit. This adds a few if-statements and thus branches to avoid
null pointer + 0. These check for size > 0 instead of ptr != NULL
because this way bugs where size > 0 && ptr == NULL will likely
get caught quickly. None of them are in hot spots so it shouldn't
matter for performance.
A little less readable version would be replacing
ptr + offset
with
offset != 0 ? ptr + offset : ptr
or creating a macro for it:
#define my_ptr_add(ptr, offset) \
((offset) != 0 ? ((ptr) + (offset)) : (ptr))
Checking for offset != 0 instead of ptr != NULL allows GCC >= 8.1,
Clang >= 7, and Clang-based ICX to optimize it to the very same code
as ptr + offset. That is, it won't create a branch. So for hot code
this could be a good solution to avoid null pointer + 0. Unfortunately
other compilers like ICC 2021 or MSVC 19.33 (VS2022) will create a
branch from my_ptr_add().
Thanks to Marcin Kowalczyk for reporting the problem:
https://github.com/tukaani-project/xz/issues/36
Standardizing each function to always specify parameters and return
values. Also moved the parameters and return values to the end of each
function description.
On MicroBlaze, GCC 12 is broken in sense that
__has_attribute(__symver__) returns true but it still doesn't
support the __symver__ attribute even though the platform is ELF
and symbol versioning is supported if using the traditional
__asm__(".symver ...") method. Avoiding the traditional method is
good because it breaks LTO (-flto) builds with GCC.
See also: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101766
For now the only extra symbols in liblzma_linux.map are the
compatibility symbols with the patch that spread from RHEL/CentOS 7.
These require the use of __symver__ attribute or __asm__(".symver ...")
in the C code. Compatibility with the patch from CentOS 7 doesn't
seem valuable on MicroBlaze so use liblzma_generic.map on MicroBlaze
instead. It doesn't require anything special in the C code and thus
no LTO issues either.
An alternative would be to detect support for __symver__
attribute in configure.ac and CMakeLists.txt and fall back
to __asm__(".symver ...") but then LTO would be silently broken
on MicroBlaze. It sounds likely that MicroBlaze is a special
case so let's treat it as a such because that is simpler. If
a similar issue exists on some other platform too then hopefully
someone will report it and this can be reconsidered.
(This doesn't do the same fix in CMakeLists.txt. Perhaps it should
but perhaps CMake build of liblzma doesn't matter much on MicroBlaze.
The problem breaks the build so it's easy to notice and can be fixed
later.)
Thanks to Vincent Fazio for reporting the problem and proposing
a patch (in the end that solution wasn't used):
https://github.com/tukaani-project/xz/pull/32
Use "member" to refer to struct members as that's the term used
by the C standard.
Use lzma_options_delta.dist and such in docs so that in Doxygen's
HTML output they will link to the doc of the struct member.
Clean up a few trailing white spaces too.
It gives C4146 here since unary minus with unsigned integer
is still unsigned (which is the intention here). Doing it
with substraction makes it clearer and avoids the warning.
Thanks to Nathan Moinvaziri for reporting this.
Standardizing each function to always specify parameters and return
values. Also moved the parameters and return values to the end of each
function description.
A few small things were reworded and long sentences broken up.
All functions now explicitly specify parameter and return values.
Also moved the note about SHA-256 functions not being exported to the
top of the file.
The bug is only a problem in applications that do not properly terminate
the filters[] array with LZMA_VLI_UNKNOWN or have more than
LZMA_FILTERS_MAX filters. This bug does not affect xz.
Added a few sentences to the description for lzma_block_encoder() and
lzma_block_decoder() to highlight that the Block Header must be coded
before calling these functions.
Standardizing each function to always specify params and return values.
Output pointer parameters are also marked with doxygen style [out] to
make it clear. Any note sections were also moved above the parameter and
return sections for consistency.
The flag description for LZMA_STR_NO_VALIDATION was previously confusing
about the treatment for filters than cannot be used with .xz format
(lzma1) without using LZMA_STR_ALL_FILTERS. Now, it is clear that
LZMA_STR_NO_VALIDATION is not a super set of LZMA_STR_ALL_FILTERS.