1
0
Fork 0
mirror of https://git.tukaani.org/xz.git synced 2024-04-04 12:36:23 +02:00
xz-archive/src/liblzma/common/common.h
Lasse Collin 31d80c6b26 liblzma: Vaccinate against an ill patch from RHEL/CentOS 7.
RHEL/CentOS 7 shipped with 5.1.2alpha, including the threaded
encoder that is behind #ifdef LZMA_UNSTABLE in the API headers.
In 5.1.2alpha these symbols are under XZ_5.1.2alpha in liblzma.map.
API/ABI compatibility tracking isn't done between development
releases so newer releases didn't have XZ_5.1.2alpha anymore.

Later RHEL/CentOS 7 updated xz to 5.2.2 but they wanted to keep
the exported symbols compatible with 5.1.2alpha. After checking
the ABI changes it turned out that >= 5.2.0 ABI is backward
compatible with the threaded encoder functions from 5.1.2alpha
(but not vice versa as fixes and extensions to these functions
were made between 5.1.2alpha and 5.2.0).

In RHEL/CentOS 7, XZ Utils 5.2.2 was patched with
xz-5.2.2-compat-libs.patch to modify liblzma.map:

  - XZ_5.1.2alpha was added with lzma_stream_encoder_mt and
    lzma_stream_encoder_mt_memusage. This matched XZ Utils 5.1.2alpha.

  - XZ_5.2 was replaced with XZ_5.2.2. It is clear that this was
    an error; the intention was to keep using XZ_5.2 (XZ_5.2.2
    has never been used in XZ Utils). So XZ_5.2.2 lists all
    symbols that were listed under XZ_5.2 before the patch.
    lzma_stream_encoder_mt and _mt_memusage are included too so
    they are listed both here and under XZ_5.1.2alpha.

The patch didn't add any __asm__(".symver ...") lines to the .c
files. Thus the resulting liblzma.so exports the threaded encoder
functions under XZ_5.1.2alpha only. Listing the two functions
also under XZ_5.2.2 in liblzma.map has no effect without
matching .symver lines.

The lack of XZ_5.2 in RHEL/CentOS 7 means that binaries linked
against unpatched XZ Utils 5.2.x won't run on RHEL/CentOS 7.
This is unfortunate but this alone isn't too bad as the problem
is contained within RHEL/CentOS 7 and doesn't affect users
of other distributions. It could also be fixed internally in
RHEL/CentOS 7.

The second problem is more serious: In XZ Utils 5.2.2 the API
headers don't have #ifdef LZMA_UNSTABLE for obvious reasons.
This is true in RHEL/CentOS 7 version too. Thus now programs
using new APIs can be compiled without an extra #define. However,
the programs end up depending on symbol version XZ_5.1.2alpha
(and possibly also XZ_5.2.2) instead of XZ_5.2 as they would
with an unpatched XZ Utils 5.2.2. This means that such binaries
won't run on other distributions shipping XZ Utils >= 5.2.0 as
they don't provide XZ_5.1.2alpha or XZ_5.2.2; they only provide
XZ_5.2 (and XZ_5.0). (This includes RHEL/CentOS 8 as the patch
luckily isn't included there anymore with XZ Utils 5.2.4.)

Binaries built by RHEL/CentOS 7 users get distributed and then
people wonder why they don't run on some other distribution.
Seems that people have found out about the patch and been copying
it to some build scripts, seemingly curing the symptoms but
actually spreading the illness further and outside RHEL/CentOS 7.

The ill patch seems to be from late 2016 (RHEL 7.3) and in 2017 it
had spread at least to EasyBuild. I heard about the events only
recently. :-(

This commit splits liblzma.map into two versions: one for
GNU/Linux and another for other OSes that can use symbol versioning
(FreeBSD, Solaris, maybe others). The Linux-specific file and the
matching additions to .c files add full compatibility with binaries
that have been built against a RHEL/CentOS-patched liblzma. Builds
for OSes other than GNU/Linux won't get the vaccine as they should
be immune to the problem (I really hope that no build script uses
the RHEL/CentOS 7 patch outside GNU/Linux).

The RHEL/CentOS compatibility symbols XZ_5.1.2alpha and XZ_5.2.2
are intentionally put *after* XZ_5.2 in liblzma_linux.map. This way
if one forgets to #define HAVE_SYMBOL_VERSIONS_LINUX when building,
the resulting liblzma.so.5 will have lzma_stream_encoder_mt@@XZ_5.2
since XZ_5.2 {...} is the first one that lists that function.
Without HAVE_SYMBOL_VERSIONS_LINUX @XZ_5.1.2alpha and @XZ_5.2.2
will be missing but that's still a minor problem compared to
only having lzma_stream_encoder_mt@@XZ_5.1.2alpha!

The "local: *;" line was moved to XZ_5.0 so that it doesn't need
to be moved around. It doesn't matter where it is put.

Having two similar liblzma_*.map files is a bit silly as it is,
at least for now, easily possible to generate the generic one
from the Linux-specific file. But that adds extra steps and
increases the risk of mistakes when supporting more than one
build system. So I rather maintain two files in parallel and let
validate_map.sh check that they are in sync when "make mydist"
is run.

This adds .symver lines for lzma_stream_encoder_mt@XZ_5.2.2 and
lzma_stream_encoder_mt_memusage@XZ_5.2.2 even though these
weren't exported by RHEL/CentOS 7 (only @@XZ_5.1.2alpha was
for these two). I added these anyway because someone might
misunderstand the RHEL/CentOS 7 patch and think that @XZ_5.2.2
(@@XZ_5.2.2) versions were exported too.

At glance one could suggest using __typeof__ to copy the function
prototypes when making aliases. However, this doesn't work trivially
because __typeof__ won't copy attributes (lzma_nothrow, lzma_pure)
and it won't change symbol visibility from hidden to default (done
by LZMA_API()). Attributes could be copied with __copy__ attribute
but that needs GCC 9 and a fallback method would be needed anyway.

This uses __symver__ attribute with GCC >= 10 and
__asm__(".symver ...") with everything else. The attribute method
is required for LTO (-flto) support with GCC. Using -flto with
GCC older than 10 is now broken on GNU/Linux and will not be fixed
(can silently result in a broken liblzma build that has dangerously
incorrect symbol versions). LTO builds with Clang seem to work
with the traditional __asm__(".symver ...") method.

Thanks to Boud Roukema for reporting the problem and discussing
the details and testing the fix.
2022-09-16 19:30:05 +03:00

342 lines
12 KiB
C

///////////////////////////////////////////////////////////////////////////////
//
/// \file common.h
/// \brief Definitions common to the whole liblzma library
//
// Author: Lasse Collin
//
// This file has been put into the public domain.
// You can do whatever you want with this file.
//
///////////////////////////////////////////////////////////////////////////////
#ifndef LZMA_COMMON_H
#define LZMA_COMMON_H
#include "sysdefs.h"
#include "mythread.h"
#include "tuklib_integer.h"
#if defined(_WIN32) || defined(__CYGWIN__)
# ifdef DLL_EXPORT
# define LZMA_API_EXPORT __declspec(dllexport)
# else
# define LZMA_API_EXPORT
# endif
// Don't use ifdef or defined() below.
#elif HAVE_VISIBILITY
# define LZMA_API_EXPORT __attribute__((__visibility__("default")))
#else
# define LZMA_API_EXPORT
#endif
#define LZMA_API(type) LZMA_API_EXPORT type LZMA_API_CALL
#include "lzma.h"
#ifdef HAVE_SYMBOL_VERSIONS_LINUX
// To keep link-time optimization (LTO, -flto) working with GCC,
// the __symver__ attribute must be used instead of __asm__(".symver ...").
// Otherwise the symbol versions may be lost, resulting in broken liblzma
// that has wrong default versions in the exported symbol list!
// The attribute was added in GCC 10; LTO with older GCC is not supported.
//
// To keep -Wmissing-prototypes happy, use LZMA_SYMVER_API only with function
// declarations (including those with __alias__ attribute) and LZMA_API with
// the function definitions. This means a little bit of silly copy-and-paste
// between declarations and definitions though.
//
// As of GCC 12.2, the __symver__ attribute supports only @ and @@ but the
// very convenient @@@ isn't supported (it's supported by GNU assembler
// since 2000). When using @@ instead of @@@, the internal name must not be
// the same as the external name to avoid problems in some situations. This
// is why "#define foo_52 foo" is needed for the default symbol versions.
# if TUKLIB_GNUC_REQ(10, 0)
# define LZMA_SYMVER_API(extnamever, type, intname) \
extern __attribute__((__symver__(extnamever))) \
LZMA_API(type) intname
# else
# define LZMA_SYMVER_API(extnamever, type, intname) \
__asm__(".symver " #intname "," extnamever); \
extern LZMA_API(type) intname
# endif
#endif
// These allow helping the compiler in some often-executed branches, whose
// result is almost always the same.
#ifdef __GNUC__
# define likely(expr) __builtin_expect(expr, true)
# define unlikely(expr) __builtin_expect(expr, false)
#else
# define likely(expr) (expr)
# define unlikely(expr) (expr)
#endif
/// Size of temporary buffers needed in some filters
#define LZMA_BUFFER_SIZE 4096
/// Maximum number of worker threads within one multithreaded component.
/// The limit exists solely to make it simpler to prevent integer overflows
/// when allocating structures etc. This should be big enough for now...
/// the code won't scale anywhere close to this number anyway.
#define LZMA_THREADS_MAX 16384
/// Starting value for memory usage estimates. Instead of calculating size
/// of _every_ structure and taking into account malloc() overhead etc., we
/// add a base size to all memory usage estimates. It's not very accurate
/// but should be easily good enough.
#define LZMA_MEMUSAGE_BASE (UINT64_C(1) << 15)
/// Start of internal Filter ID space. These IDs must never be used
/// in Streams.
#define LZMA_FILTER_RESERVED_START (LZMA_VLI_C(1) << 62)
/// Supported flags that can be passed to lzma_stream_decoder()
/// or lzma_auto_decoder().
#define LZMA_SUPPORTED_FLAGS \
( LZMA_TELL_NO_CHECK \
| LZMA_TELL_UNSUPPORTED_CHECK \
| LZMA_TELL_ANY_CHECK \
| LZMA_IGNORE_CHECK \
| LZMA_CONCATENATED )
/// Largest valid lzma_action value as unsigned integer.
#define LZMA_ACTION_MAX ((unsigned int)(LZMA_FULL_BARRIER))
/// Special return value (lzma_ret) to indicate that a timeout was reached
/// and lzma_code() must not return LZMA_BUF_ERROR. This is converted to
/// LZMA_OK in lzma_code(). This is not in the lzma_ret enumeration because
/// there's no need to have it in the public API.
#define LZMA_TIMED_OUT 32
typedef struct lzma_next_coder_s lzma_next_coder;
typedef struct lzma_filter_info_s lzma_filter_info;
/// Type of a function used to initialize a filter encoder or decoder
typedef lzma_ret (*lzma_init_function)(
lzma_next_coder *next, const lzma_allocator *allocator,
const lzma_filter_info *filters);
/// Type of a function to do some kind of coding work (filters, Stream,
/// Block encoders/decoders etc.). Some special coders use don't use both
/// input and output buffers, but for simplicity they still use this same
/// function prototype.
typedef lzma_ret (*lzma_code_function)(
void *coder, const lzma_allocator *allocator,
const uint8_t *restrict in, size_t *restrict in_pos,
size_t in_size, uint8_t *restrict out,
size_t *restrict out_pos, size_t out_size,
lzma_action action);
/// Type of a function to free the memory allocated for the coder
typedef void (*lzma_end_function)(
void *coder, const lzma_allocator *allocator);
/// Raw coder validates and converts an array of lzma_filter structures to
/// an array of lzma_filter_info structures. This array is used with
/// lzma_next_filter_init to initialize the filter chain.
struct lzma_filter_info_s {
/// Filter ID. This is used only by the encoder
/// with lzma_filters_update().
lzma_vli id;
/// Pointer to function used to initialize the filter.
/// This is NULL to indicate end of array.
lzma_init_function init;
/// Pointer to filter's options structure
void *options;
};
/// Hold data and function pointers of the next filter in the chain.
struct lzma_next_coder_s {
/// Pointer to coder-specific data
void *coder;
/// Filter ID. This is LZMA_VLI_UNKNOWN when this structure doesn't
/// point to a filter coder.
lzma_vli id;
/// "Pointer" to init function. This is never called here.
/// We need only to detect if we are initializing a coder
/// that was allocated earlier. See lzma_next_coder_init and
/// lzma_next_strm_init macros in this file.
uintptr_t init;
/// Pointer to function to do the actual coding
lzma_code_function code;
/// Pointer to function to free lzma_next_coder.coder. This can
/// be NULL; in that case, lzma_free is called to free
/// lzma_next_coder.coder.
lzma_end_function end;
/// Pointer to a function to get progress information. If this is NULL,
/// lzma_stream.total_in and .total_out are used instead.
void (*get_progress)(void *coder,
uint64_t *progress_in, uint64_t *progress_out);
/// Pointer to function to return the type of the integrity check.
/// Most coders won't support this.
lzma_check (*get_check)(const void *coder);
/// Pointer to function to get and/or change the memory usage limit.
/// If new_memlimit == 0, the limit is not changed.
lzma_ret (*memconfig)(void *coder, uint64_t *memusage,
uint64_t *old_memlimit, uint64_t new_memlimit);
/// Update the filter-specific options or the whole filter chain
/// in the encoder.
lzma_ret (*update)(void *coder, const lzma_allocator *allocator,
const lzma_filter *filters,
const lzma_filter *reversed_filters);
};
/// Macro to initialize lzma_next_coder structure
#define LZMA_NEXT_CODER_INIT \
(lzma_next_coder){ \
.coder = NULL, \
.init = (uintptr_t)(NULL), \
.id = LZMA_VLI_UNKNOWN, \
.code = NULL, \
.end = NULL, \
.get_progress = NULL, \
.get_check = NULL, \
.memconfig = NULL, \
.update = NULL, \
}
/// Internal data for lzma_strm_init, lzma_code, and lzma_end. A pointer to
/// this is stored in lzma_stream.
struct lzma_internal_s {
/// The actual coder that should do something useful
lzma_next_coder next;
/// Track the state of the coder. This is used to validate arguments
/// so that the actual coders can rely on e.g. that LZMA_SYNC_FLUSH
/// is used on every call to lzma_code until next.code has returned
/// LZMA_STREAM_END.
enum {
ISEQ_RUN,
ISEQ_SYNC_FLUSH,
ISEQ_FULL_FLUSH,
ISEQ_FINISH,
ISEQ_FULL_BARRIER,
ISEQ_END,
ISEQ_ERROR,
} sequence;
/// A copy of lzma_stream avail_in. This is used to verify that the
/// amount of input doesn't change once e.g. LZMA_FINISH has been
/// used.
size_t avail_in;
/// Indicates which lzma_action values are allowed by next.code.
bool supported_actions[LZMA_ACTION_MAX + 1];
/// If true, lzma_code will return LZMA_BUF_ERROR if no progress was
/// made (no input consumed and no output produced by next.code).
bool allow_buf_error;
};
/// Allocates memory
extern void *lzma_alloc(size_t size, const lzma_allocator *allocator)
lzma_attribute((__malloc__)) lzma_attr_alloc_size(1);
/// Allocates memory and zeroes it (like calloc()). This can be faster
/// than lzma_alloc() + memzero() while being backward compatible with
/// custom allocators.
extern void * lzma_attribute((__malloc__)) lzma_attr_alloc_size(1)
lzma_alloc_zero(size_t size, const lzma_allocator *allocator);
/// Frees memory
extern void lzma_free(void *ptr, const lzma_allocator *allocator);
/// Allocates strm->internal if it is NULL, and initializes *strm and
/// strm->internal. This function is only called via lzma_next_strm_init macro.
extern lzma_ret lzma_strm_init(lzma_stream *strm);
/// Initializes the next filter in the chain, if any. This takes care of
/// freeing the memory of previously initialized filter if it is different
/// than the filter being initialized now. This way the actual filter
/// initialization functions don't need to use lzma_next_coder_init macro.
extern lzma_ret lzma_next_filter_init(lzma_next_coder *next,
const lzma_allocator *allocator,
const lzma_filter_info *filters);
/// Update the next filter in the chain, if any. This checks that
/// the application is not trying to change the Filter IDs.
extern lzma_ret lzma_next_filter_update(
lzma_next_coder *next, const lzma_allocator *allocator,
const lzma_filter *reversed_filters);
/// Frees the memory allocated for next->coder either using next->end or,
/// if next->end is NULL, using lzma_free.
extern void lzma_next_end(lzma_next_coder *next,
const lzma_allocator *allocator);
/// Copy as much data as possible from in[] to out[] and update *in_pos
/// and *out_pos accordingly. Returns the number of bytes copied.
extern size_t lzma_bufcpy(const uint8_t *restrict in, size_t *restrict in_pos,
size_t in_size, uint8_t *restrict out,
size_t *restrict out_pos, size_t out_size);
/// \brief Return if expression doesn't evaluate to LZMA_OK
///
/// There are several situations where we want to return immediately
/// with the value of expr if it isn't LZMA_OK. This macro shortens
/// the code a little.
#define return_if_error(expr) \
do { \
const lzma_ret ret_ = (expr); \
if (ret_ != LZMA_OK) \
return ret_; \
} while (0)
/// If next isn't already initialized, free the previous coder. Then mark
/// that next is _possibly_ initialized for the coder using this macro.
/// "Possibly" means that if e.g. allocation of next->coder fails, the
/// structure isn't actually initialized for this coder, but leaving
/// next->init to func is still OK.
#define lzma_next_coder_init(func, next, allocator) \
do { \
if ((uintptr_t)(func) != (next)->init) \
lzma_next_end(next, allocator); \
(next)->init = (uintptr_t)(func); \
} while (0)
/// Initializes lzma_strm and calls func() to initialize strm->internal->next.
/// (The function being called will use lzma_next_coder_init()). If
/// initialization fails, memory that wasn't freed by func() is freed
/// along strm->internal.
#define lzma_next_strm_init(func, strm, ...) \
do { \
return_if_error(lzma_strm_init(strm)); \
const lzma_ret ret_ = func(&(strm)->internal->next, \
(strm)->allocator, __VA_ARGS__); \
if (ret_ != LZMA_OK) { \
lzma_end(strm); \
return ret_; \
} \
} while (0)
#endif