__SSE2__ is the correct macro for SSE2 support with GCC, Clang,
and ICC. __SSE2_MATH__ means doing floating point math with SSE2
instead of 387. Often the latter macro is defined if the first
one is but it was still a bug.
This commit just adds the function. Its uses will be in
separate commits.
This hasn't been tested much yet and it's perhaps a bit early
to commit it but if there are bugs they should get found quite
quickly.
Thanks to Jun I Jin from Intel for help and for pointing out
that string comparison needs to be optimized in liblzma.